DocxParse
Judith is a word docx transpiler, in that it converts from XML to HTML.
The Docx format is just a zipped file of XML files. Judith takes those XML and outputs HTML with inline CSS to come close to looking like the original word doc. Judith is not feature complete but it should get close enough.
Installation
If available in Hex, the package can be installed
by adding docx_parse to your list of dependencies in mix.exs:
def deps do
[
{:docx_parse, "~> 0.1.0"}
]
endDocumentation can be generated with ExDoc and published on HexDocs. Once published, the docs can be found at https://hexdocs.pm/docx_parse.
Usage
alias Remote.DocxParse
alias Remote.Docx.Document
DocxParse.document_to_html("./test/sample_doc.docx"),
{:ok, "html output"}
end
Reasoning
The purpose of this package is to provide the basic functionality of the Word docx format and convert it to HTML. It is important to remember that Word and HTML do not nessesary have equivilent functionalities. e.g. HTML doesn't easily support tiered lists out of the box, but it does support bold and italics.
The resulting HTML will not be perfectly semantic. When combined with inline styles and a stylesheet the result will look very close enough to the original word document.