DocxParse

Judith is a word docx transpiler, in that it converts from XML to HTML.
The Docx format is just a zipped file of XML files. Judith takes those XML and outputs HTML with inline CSS to come close to looking like the original word doc. Judith is not feature complete but it should get close enough.

Installation

If available in Hex, the package can be installed by adding docx_parse to your list of dependencies in mix.exs:

def deps do
  [
    {:docx_parse, "~> 0.1.0"}
  ]
end

Documentation can be generated with ExDoc and published on HexDocs. Once published, the docs can be found at https://hexdocs.pm/docx_parse.

Usage

alias Remote.DocxParse
alias Remote.Docx.Document

DocxParse.document_to_html("./test/sample_doc.docx"),
  {:ok, "html output"}
end

Reasoning

The purpose of this package is to provide the basic functionality of the Word docx format and convert it to HTML. It is important to remember that Word and HTML do not nessesary have equivilent functionalities. e.g. HTML doesn't easily support tiered lists out of the box, but it does support bold and italics.

The resulting HTML will not be perfectly semantic. When combined with inline styles and a stylesheet the result will look very close enough to the original word document.