PDFInfo

Actions Status

Extracts all /Info and /Metadata objects from a PDF binary using Regex and with zero dependencies.

Limitations: If the PDF is encrypted or the metadata is compressed you have to first decrypt and uncompress:

qpdf --stream-data=uncompress --compress-streams=n --decrypt --password='' myfile.pdf myfile_out.pdf

Installation

Add pdf_info to your list of dependencies in mix.exs:

def deps do
  [
    {:pdf_info, "~> 0.1.0"}
  ]
end

Usage

iex(1)> pdf = File.read!("/Downloads/sample.pdf")
<<37, 80, 68, 70, 45, ...>>
iex(2)> PDFInfo.is_pdf?(pdf)
true # looks like it&#39;s a PDF!
iex(3)> PDFInfo.is_encrypted?(pdf)
false # it&#39;s not encrypted (this lib can&#39;t decrypt, if it&#39;s encrypted then decrypt first)
iex(4)> PDFInfo.info_objects(pdf)
# a map with info objects
%{"/Info 6 0 R" => [
  %{
  "Author" => "Barna Kovacs",
  "CreationDate" => "D:20200212212756Z",
  "Title" => "Can&#39;t come up with a title"
  }
]}
iex(5)> PDFInfo.metadata_objects(pdf)
# list of maps with metadata
[
  %{
    {"dc", "creator"} => "Barna Kovacs",
    {"dc", "format"} => "application/pdf",
    {"dc", "title"} => "Can&#39;t come up with a title",
    ...
  }
]

Documentation

Documentation can be found at https://hexdocs.pm/pdf_info.

License

PDFInfo is MIT licensed.

Credit

Inspired by https://gitlab.com/nxl4/pdf-metadata