Popplex

CI

An Elixir NIF (Native Implemented Function) wrapper for the Poppler PDF library, providing fast and efficient PDF processing capabilities.

View Changelog | View Contributing Guidelines

Features

Prerequisites

Before using Popplex, you need to have Poppler installed on your system:

macOS

brew install poppler pkg-config

Ubuntu/Debian

sudo apt-get install libpoppler-cpp-dev pkg-config

Fedora/RHEL

sudo dnf install poppler-cpp-devel pkgconfig

Arch Linux

sudo pacman -S poppler pkgconf

Installation

Add popplex to your list of dependencies in mix.exs:

def deps do
  [
    {:popplex, "~> 0.1.0"}
  ]
end

Then run:

mix deps.get
mix compile

The NIF will be automatically compiled during the build process.

Usage

Get Page Count

# Get the number of pages in a PDF
{:ok, count} = Popplex.get_page_count("document.pdf")
IO.puts("The PDF has #{count} pages")

Extract Text

# Extract text from all pages
{:ok, text} = Popplex.get_text("document.pdf")

# Extract text from a specific page (0-indexed)
{:ok, first_page} = Popplex.get_text("document.pdf", page: 0)
{:ok, second_page} = Popplex.get_text("document.pdf", page: 1)

# Explicitly extract all pages
{:ok, all_text} = Popplex.get_text("document.pdf", all: true)

Combine PDFs

# Merge multiple PDFs into one
{:ok, output} = Popplex.combine_pdfs(
  ["file1.pdf", "file2.pdf", "file3.pdf"],
  "combined.pdf"
)

# Verify the combined PDF
{:ok, count} = Popplex.get_page_count("combined.pdf")
IO.puts("Combined PDF has #{count} pages")

Error Handling

All functions return {:ok, result} on success or {:error, reason} on failure:

case Popplex.get_page_count("document.pdf") do
  {:ok, count} ->
    IO.puts("Success! Page count: #{count}")
    
  {:error, reason} ->
    IO.puts("Error: #{reason}")
end

Common error scenarios:

Development

Building from Source

# Clone the repository
git clone https://github.com/yourusername/popplex.git
cd popplex

# Get dependencies
mix deps.get

# Compile (including the NIF)
mix compile

# Run tests
mix test

# Run integration tests (requires sample PDF files)
mix test --include integration

Testing

Unit tests can be run without any PDF files:

mix test --exclude integration

For integration tests, place sample PDF files in test/fixtures/ and run:

mix test --include integration

Continuous Integration

The project uses GitHub Actions for CI, which:

The CI workflow runs on:

You can view the CI status in the badge at the top of this README.

How It Works

Popplex uses Erlang's NIF (Native Implemented Function) interface to call C++ code that wraps the Poppler library. This provides:

The architecture consists of:

  1. C++ NIF layer (c_src/popplex_nif.cpp) - Interfaces with Poppler
  2. NIF loader (lib/popplex/nif.ex) - Loads the compiled NIF
  3. Public API (lib/popplex.ex) - User-friendly Elixir interface

Limitations

Contributing

Contributions are welcome! Please feel free to submit pull requests or open issues.

License

This project is available under the MIT License.

Acknowledgments