ExPdfium

Elixir bindings for pdfium — Google's Chromium PDF engine — via the Rust pdfium-render crate, shipped as a precompiled NIF with rustler_precompiled.

No Rust toolchain. No separately-installed pdfium. Add the dep and go.

A read & extract toolkit. Open documents and count pages, render pages to bitmaps, extract and search text, read metadata, page geometry and permissions, walk structure (bookmarks, links, attachments), and read forms and annotations. It does not create, edit, or save PDFs.

Why

The native PDF-rendering gap in Elixir: Vix/Image (libvips) ships without PDF support, so rasterizing a PDF normally means building libvips from source with poppler/pdfium. ExPdfium fills that gap with a precompiled pdfium binding — rendering, plus text extraction and metadata that pure-libvips can't give you.

This is a ground-up Rust rewrite of the older gmile/pdfium C++ NIF, adopting the rustler_precompiled release model so every supported OTP (27/28/29+) gets a precompiled binary from one build matrix.

Installation

def deps do
[{:ex_pdfium, "~> 0.1"}]
end

Usage

{:ok, doc} = ExPdfium.open("file.pdf") # or open(<<"%PDF...">> = bytes)
{:ok, n} = ExPdfium.page_count(doc)
:ok = ExPdfium.close(doc)
# Encrypted documents:
{:ok, doc} = ExPdfium.open("secret.pdf", password: "hunter2")

Documents are closed automatically on garbage collection; call ExPdfium.close/1 to release pdfium memory early. open/2 returns {:error, reason} for problems like :enoent, :invalid_pdf, or :password_error.

Rendering

{:ok, %ExPdfium.Bitmap{data: data, width: w, height: h}} =
ExPdfium.render_page(doc, 0, dpi: 300) # or scale:, or width:/height:
# Hand the raw RGBA buffer straight to Vix/Image:
{:ok, image} = Vix.Vips.Image.new_from_binary(data, w, h, 4, :VIPS_FORMAT_UCHAR)
Image.write(image, "page.png")

render_page/3 takes :dpi / :scale / :width / :height for sizing, format: :rgba | :bgra, and background: :white | :transparent. The bitmap is an uncompressed 4-channel buffer (width * height * 4 bytes).

{:ok, text} = ExPdfium.extract_text(doc, 0) # one page
{:ok, text} = ExPdfium.extract_text(doc) # whole document
# Text runs with bounding boxes (PDF points, origin bottom-left):
{:ok, segments} = ExPdfium.text_segments(doc, 0)
# => [%{text: "Hello", bounds: %{left: 41.9, bottom: 115.2, right: 89.0, top: 137.5}}, ...]
# Search a page (case-insensitive by default):
{:ok, matches} = ExPdfium.search_text(doc, 0, "invoice", match_case: false)
# => [%{text: "Invoice", rects: [%{left: ..., bottom: ..., right: ..., top: ...}]}, ...]

Metadata, geometry & permissions

{:ok, meta} = ExPdfium.metadata(doc)
# => %{title: "…", author: "…", creation_date: "D:…", producer: "…", ...}
{:ok, info} = ExPdfium.page_info(doc, 0)
# => %{width: 612.0, height: 792.0, rotation: 0, label: nil,
# boxes: %{media: %{left: 0.0, bottom: 0.0, right: 612.0, top: 792.0},
# crop: nil, bleed: nil, trim: nil, art: nil}} # non-media boxes often nil
{:ok, perms} = ExPdfium.permissions(doc)
# => %{print_high_quality: true, extract_text_and_graphics: true, modify_content: true, ...}

Structure & navigation

{:ok, tree} = ExPdfium.outline(doc) # bookmark tree
# => [%{title: "Chapter 1", page: 0, children: [%{title: "1.1", page: 0, children: []}]}, ...]
{:ok, links} = ExPdfium.links(doc, 0) # links on a page
# => [%{bounds: %{...}, uri: "https://example.com", page: nil},
# %{bounds: %{...}, uri: nil, page: 1}] # internal link to page 1
{:ok, files} = ExPdfium.attachments(doc) # => [%{index: 0, name: "note.txt", size: 25}]
{:ok, bytes} = ExPdfium.attachment_data(doc, 0)

Forms & annotations (read)

{:ok, :acrobat} = ExPdfium.form_type(doc) # :none | :acrobat | :xfa_full | :xfa_foreground
{:ok, fields} = ExPdfium.form_fields(doc) # AcroForm fields, one per widget
# => [%{name: "full_name", type: :text, value: "Ada Lovelace", checked: nil,
# read_only: false, required: false, page: 0, bounds: %{...}},
# %{name: "subscribe", type: :checkbox, value: "Yes", checked: true, ...}]
{:ok, anns} = ExPdfium.annotations(doc, 0) # annotations on a page (markup + widgets)
# => [%{type: :highlight, contents: "Important", bounds: %{...}, name: nil,
# hidden: false, printed: false}, ...]

Reading is the whole scope — ExPdfium does not create, fill, or save PDFs. XFA form data needs a V8-enabled pdfium build, which is not shipped; :xfa_full documents may expose an empty or partial AcroForm view.

Development

The shipped NIF binds pdfium dynamically and loads a libpdfium bundled inside the precompiled tarball, right beside the NIF (bblanchon publishes no static libpdfium.a). For local work, download a libpdfium once and point the tests at it:

just fetch-pdfium # downloads libpdfium for this host into priv/pdfium
just test # EXPDFIUM_BUILD=1 mix test (forces a from-source build)
just fmt # mix format + cargo fmt

EXPDFIUM_BUILD=1 forces a from-source NIF build instead of downloading a precompiled one. CI runs the full gate: mix format --check-formatted, cargo fmt --check, cargo clippy -- -D warnings, mix compile --warnings-as-errors, and mix test.

Releasing

See UPDATE_PROCEDURE.md. In short: just release bumps the version, rolls the CHANGELOG, tags, and pushes; the tag triggers a build matrix that attaches one NIF per target to a GitHub release; checksums are regenerated from those artifacts; Hex publish is gated behind a manual approval.

License

MIT — see LICENSE. pdfium itself is BSD-3-Clause (Google/Chromium); the precompiled pdfium binaries come from bblanchon/pdfium-binaries.