ExtrText

ExtrText versionHex.pm

ExtrText is an Elixir library for extracting text and meta information from .docx, .xlsx and .pptx files.

Usage

iex> docx = File.read!("example.docx")
iex> {:ok, texts} = ExtrText.get_texts(docx)
iex> texts
[
  ["Paragraph 1", "Paragraph 2", "Paragraph 3"]
]
iex> {:ok, metadata} = ExtrText.get_metadata(docx)
iex> metadata
%ExtrText.Metadata{
  created: ~U[2021-11-19 22:25:20Z],
  creator: "John Doe",
  description: "",
  keywords: "",
  language: "ja-JP",
  last_modified_by: "John Doe",
  modified: ~U[2021-11-22 21:24:43Z],
  revision: 2,
  subject: "",
  title: "Example"
}

Installation

Add :extr_text to your mix.exs:

  defp deps do
    [
      {:extr_text, "~> 0.2.0"}
    ]
end

Then, run mix deps.get.

Limitations

Acknowledgments

This project is inspired by ranguba/chupa-text, a Ruby gem package.

Author

Tsutomu Kuroda

License

MIT licens