LiteParse

Elixir wrapper for LiteParse, a fast and lightweight PDF parser written in Rust. Parsing runs locally with no cloud dependencies.

Note: this Elixir binding exposes a subset of the upstream LiteParse features and may not yet cover all of them. Check the upstream project for the complete capability set.

Installation

Add to your mix.exs:

def deps do
[
{:liteparse, "~> 0.1.0"}
]
end

Usage

Parse a PDF from disk:

{:ok, %{text: text, page_count: n}} = LiteParse.parse("document.pdf")

Parse a PDF from binary data:

{:ok, %{text: text, page_count: n}} = LiteParse.parse_input(pdf_binary)

Options can be passed as a keyword list:

LiteParse.parse("doc.pdf", max_pages: 100, ocr_enabled: false)

Or as a reusable struct:

config = LiteParse.Config.new(ocr_language: "spa", max_pages: 50)
LiteParse.parse("doc.pdf", config)

See LiteParse.Config for the full list of available options.

Supported Formats

License

MIT. See LICENSE.