Hex.pmHexdocs.pmGithub.com

PDFRedlines

Fast PDF redline extraction via a Rust NIF (MuPDF).

Usage

{:ok, result} = PDFRedlines.extract_redlines("/path/to/document.pdf")
# %PDFRedlines.Result{redlines: [%PDFRedlines.Redline{...}, ...]}

What Are Redlines?

Redlines are tracked changes embedded in PDFs, typically represented as:

This library detects those visual signals and converts them into structured entries (deletion, insertion, or paired change).

Notes

Configuration

You can pass a keyword list or map to tune detection thresholds:

Tuning Guide (Quick)

Parity Test (Optional)

There is an optional parity test that compares Rust/MuPDF results against the Python/PyMuPDF implementation. It is skipped by default.

Run it with:

TEST_PDF_REDLINES_PARITY=true mix test test/redlines_parity_test.exs

Inputs are read from PDF_REDLINES_TEST_DIR (defaults to test/fixtures/pdfs).

Performance

Extraction runs entirely in a Rust NIF on a dirty scheduler. Benchmarked on real-world documents (M1 Mac):

Document Size Time
35 MB scanned PDF 35 MB ~350 ms
33 MB scanned/OCR’d PDF 33 MB ~640 ms
12 MB PDF with 362 redlines 12 MB ~130 ms

Even the worst case (large scanned documents) finishes under 700 ms.

Benchmarks

Run a basic benchmark across a folder of PDFs:

PDF_REDLINES_BUILD=1 mix pdf_redlines.bench

You can customize:

Development

To compile the Rust sources along with the library, you can set the RUSTLER_PRECOMPILED_FORCE_BUILD_ALL env variable to 1, e.g.:

RUSTLER_PRECOMPILED_FORCE_BUILD_ALL=1 mix compile

License

MIT