mimetype

MIME type lookup and magic-number detection for Gleam on Erlang and JavaScript targets.

Features

Install

gleam add mimetype

When to use this

Use mimetype when you need a small, cross-target MIME utility in Gleam:

The extension database is generated from jshttp/mime-db, which tracks the IANA media type registry and common ecosystem aliases. Refreshing the generated table keeps lookups aligned with that upstream source.

Serving a file: pick a Content-Type from a filename

The most common use is reading the filename your handler already has, turning it into a wire-ready Content-Type value. filename_to_mime_type is case-insensitive and falls back to application/octet-stream for unknown extensions, so the helper is safe to drop into a response path without extra branching.

import mimetype

/// Pick the Content-Type header value to send back when serving
/// `filename` from disk or object storage.
pub fn content_type_for(filename: String) -> String {
  mimetype.filename_to_mime_type(filename)
  |> mimetype.to_string
}

// content_type_for("report.PDF")    -> "application/pdf"
// content_type_for("avatar.jpg")    -> "image/jpeg"
// content_type_for("archive.tar.gz") -> "application/gzip"
// content_type_for("notes")         -> "application/octet-stream"

For HTML / CSS / JS responses where browsers expect a charset, parse the wire string once and append the parameter you actually serve:

import gleam/option.{Some}
import mimetype

pub fn html_content_type() -> String {
  let assert Ok(html) = mimetype.parse("text/html; charset=utf-8")
  mimetype.to_string(html)
  // -> "text/html; charset=utf-8"
}

Validating an upload: detect from bytes, not the user's extension

Browser-uploaded filenames are user input and can lie. Match the leading bytes of the upload against mimetype.detect to get the actual format, then enforce an allowlist of MIME types your endpoint will accept.

import mimetype

pub type UploadError {
  EmptyUpload
  Unsupported(detected: String)
}

/// Allow only PNG, JPEG, and WebP uploads. The detected MIME type is
/// derived from magic bytes — the caller's filename is ignored.
pub fn validate_image_upload(
  bytes: BitArray,
) -> Result(mimetype.MimeType, UploadError) {
  case mimetype.detect_strict(bytes) {
    Ok(mime) ->
      case mimetype.is_image(mime) && image_is_allowed(mime) {
        True -> Ok(mime)
        False -> Error(Unsupported(detected: mimetype.to_string(mime)))
      }
    Error(mimetype.EmptyInput) -> Error(EmptyUpload)
    Error(_) -> Error(Unsupported(detected: "application/octet-stream"))
  }
}

fn image_is_allowed(mime: mimetype.MimeType) -> Bool {
  case mimetype.essence_of(mime) {
    "image/png" | "image/jpeg" | "image/webp" -> True
    _ -> False
  }
}

The strict variant separates EmptyInput (zero-byte upload) from NoMatch (bytes that did not match any signature) so the caller can return the right HTTP status. For a non-throwing path, mimetype.detect returns application/octet-stream for both cases instead.

Other API entry points

The full surface returns an opaque MimeType. Use mimetype.to_string to serialise for an HTTP header; use mimetype.parse to construct one from a wire-format string. Inspect with essence_of, parameter_of, charset_of_type, is_image, is_a, and the rest of the predicate / accessor family.

import gleam/option.{Some}
import mimetype

pub fn main() {
  mimetype.extension_to_mime_type(".json")
  |> mimetype.to_string
  // -> "application/json"

  let assert Ok(jpeg) = mimetype.parse("image/jpeg")
  mimetype.mime_type_to_extensions(jpeg)
  // -> ["jpg", "jpeg", "jpe"]

  mimetype.detect_with_filename(<<0, 1, 2, 3>>, "report.csv")
  |> mimetype.essence_of
  // -> "text/csv"

  let assert Ok(html) = mimetype.parse("text/html; charset=utf-8")
  mimetype.charset_of_type(html)
  // -> Some("utf-8")
}

Capabilities and limitations

This library intentionally stays focused. Knowing where the detector stops is more useful than discovering it from a surprising result:

Reader-based detection

detect_reader and detect_reader_strict let callers detect a MIME type without buffering the whole input. They take a synchronous reader plus a byte budget, and the reader is invoked at most once to fetch up to that many bytes from the start of the source.

Reader contract

pub type Reader(read_error) = fn(Int) -> Result(BitArray, read_error)

The reader is called once per detection call. There is no streaming or back-and-forth — return enough bytes for the largest signature you care about (the detector inspects up to a few KB by default), or pass a custom limit argument tuned for your workload.

In-memory adapter

The simplest case: when the bytes are already in hand, wrap them in a function that ignores its argument.

import mimetype

pub fn main() {
  let png = <<0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A>>
  let reader = fn(_limit) { Ok(png) }

  mimetype.detect_reader(reader, 3072)
  |> mimetype.to_string
  // -> "image/png"
}

BEAM file prefix reader

On the Erlang target, wrap a file-IO library so that one call returns up to limit bytes from the start of the file. Any IO library that can open a file and read a fixed-size prefix works — the snippet below sketches the shape using a read_prefix(path, limit) helper that returns Result(BitArray, your_error):

import mimetype

pub fn detect_file(path: String) -> Result(mimetype.MimeType, mimetype.DetectionError(your_error)) {
  let reader = fn(limit) { read_prefix(path, limit) }
  mimetype.detect_reader_strict(reader, 3072)
}

If read_prefix returns Ok(<<>>) for an empty file, the strict variant surfaces Error(EmptyInput). If read_prefix itself returns Error(some_io_error), the strict variant surfaces Error(ReaderError(some_io_error)) so the caller can distinguish IO failure from a genuine no-match.

JavaScript browser adapter

In the browser, File / Blob / ReadableStream reads are asynchronous, so they cannot satisfy the synchronous Reader contract directly. The intended pattern is:

  1. Read the prefix asynchronously (await blob.slice(0, limit).arrayBuffer() or the equivalent on a ReadableStream).
  2. Pass the resulting bytes to detect / detect_strict, not to detect_reader.

In Gleam pseudo-code, with an FFI helper read_blob_prefix that awaits the slice and returns a BitArray:

import mimetype

pub fn detect_blob(blob: Blob) -> mimetype.MimeType {
  // `read_blob_prefix` is your FFI: await blob.slice(0, 3072).arrayBuffer()
  let bytes = read_blob_prefix(blob, 3072)
  mimetype.detect(bytes)
}

The reader-based API is most useful when the source is itself synchronous (BEAM file IO, in-memory buffers, deterministic stream adapters). For Promise-based sources, awaiting the prefix once and calling detect is the recommended shape.

Strict variants and error handling

The strict variants return Result(MimeType, DetectionError(read_error)), where DetectionError distinguishes:

import gleam/io
import mimetype

pub fn classify(reader) {
  case mimetype.detect_reader_strict(reader, 3072) {
    Ok(mime) -> io.println(mimetype.to_string(mime))
    Error(mimetype.EmptyInput) -> io.println("empty source")
    Error(mimetype.NoMatch) -> io.println("unrecognised content")
    Error(mimetype.ReaderError(reason)) -> io.debug(reason)
    Error(mimetype.UnknownExtension(_)) -> Nil
  }
}

Supported magic-number formats

detect/1 recognises the following MIME types from byte-level signatures or structural sniffs near the start of the input. This list is generated from src/mimetype/internal/magic.gleam by scripts/generate_supported_formats.sh — do not edit it by hand; re-run just generate-readme after adding or removing a signature.

Application formats

Audio formats

Font formats

Image formats

Text formats

Video formats

The detector is intentionally shallow: it looks only at fixed signatures near the start of the byte stream, plus a small amount of targeted ZIP local-header inspection for the container formats listed above. It does not recurse arbitrarily into nested containers.

Development

mise install
just ci

The generated MIME-DB lookup tables live in src/mimetype/internal/mimetype_db_ffi.erl and src/mimetype/internal/db_ffi.mjs, with a thin Gleam wrapper at src/mimetype/internal/db.gleam. All three files are derived from doc/reference/upstream/mime-db/db.json. Refresh them with:

just generate-db

CI runs the same generator against the pinned upstream commit and fails the build if the regenerated output drifts from the committed copies.

Benchmarks

The hot lookup and detection paths have a small reproducible bench harness under test/mimetype_bench.gleam. Run it on either target:

just bench-erlang
just bench-javascript
just bench            # both, in sequence

Each run prints a Markdown table of ns/op figures. Capture a baseline from main before a refactor (just bench-erlang > before.md), then re-run on the working branch and diff the two tables to check for material regressions. The harness is intentionally not wired into PR-time CI gates — it is for local A/B comparison and ad-hoc investigation, not for blocking merges on micro-fluctuations.

Licensing

The data tables under src/mimetype/internal/ are generated from jshttp/mime-db. The generated FFI source files (mimetype_db_ffi.erl and db_ffi.mjs) carry the MIT notice inline; the same packaged notice is also included in THIRD_PARTY_NOTICES.md.