Htmd

Hex VersionHex.pm DownloadsHex DocsMix Test

A fast HTML to Markdown converter for Elixir, powered by Rust.

Htmd provides high-performance HTML to Markdown conversion using the Rust htmd crate as a Native Implemented Function (NIF). It offers extensive customization options for controlling the output format and is designed for applications that need to process large amounts of HTML content efficiently.

Features

Installation

Add htmd to your list of dependencies in mix.exs:

def deps do
  [
    {:htmd, "~> 0.1.1"}
  ]
end

Basic Usage

# Simple conversion
{:ok, markdown} = Htmd.convert("<h1>Hello World</h1>")
# => {:ok, "# Hello World"}

# Convert a paragraph
{:ok, markdown} = Htmd.convert("<p>This is a paragraph with <strong>bold</strong> text.</p>")
# => {:ok, "This is a paragraph with **bold** text."}

# Convert links
{:ok, markdown} = Htmd.convert("<a href=&#39;https://example.com&#39;>Example</a>")
# => {:ok, "[Example](https://example.com)"}

Advanced Usage with Options

html = """
<h1>My Document</h1>
<ul>
  <li>First item</li>
  <li>Second item</li>
</ul>
<img src="image.jpg" alt="Skip this">
<p>Final paragraph</p>
"""

{:ok, markdown} = Htmd.convert(html, [
  heading_style: :setex,           # Use underline-style headers
  bullet_list_marker: :dash,       # Use dashes for bullet points
  skip_tags: ["img"],             # Skip image tags
  link_style: :referenced         # Use reference-style links
])

Configuration Options

Option Type Default Description
:heading_style:atx | :setex:atx Header format (# vs underline)
:hr_style:dashes | :underscores | :stars:dashes Horizontal rule style
:br_style:two_spaces | :backslash:two_spaces Line break format
:link_style:inlined | :inlined_prefer_autolinks | :referenced:inlined Link format style
:link_reference_style:full | :collapsed | :shortcut:full Reference link format
:code_block_style:indented | :fenced:indented Code block format
:code_block_fence:backticks | :tildes:backticks Fence character for code blocks
:bullet_list_marker:asterisk | :dash:asterisk Bullet point character
:ul_bullet_spacingnon_neg_integer()3 Spaces between bullet and content
:ol_number_spacingnon_neg_integer()3 Spaces between number and content
:preformatted_codeboolean()false Preserve whitespace in inline code
:skip_tags[String.t()][] HTML tags to skip during conversion

Examples with Different Styles

Heading Styles

# ATX style (default)
Htmd.convert("<h1>Title</h1>", heading_style: :atx)
# => {:ok, "# Title"}

# Setex style  
Htmd.convert("<h1>Title</h1>", heading_style: :setex)  
# => {:ok, "Title\n====="}

List Styles

# Asterisk bullets (default)
Htmd.convert("<ul><li>Item</li></ul>", bullet_list_marker: :asterisk)
# => {:ok, "*   Item"}

# Dash bullets
Htmd.convert("<ul><li>Item</li></ul>", bullet_list_marker: :dash)  
# => {:ok, "-   Item"}

Link Styles

# Inline links (default)
Htmd.convert("<a href=&#39;https://example.com&#39;>Link</a>", link_style: :inlined)
# => {:ok, "[Link](https://example.com)"}

# Reference links
Htmd.convert("<a href=&#39;https://example.com&#39;>Link</a>", link_style: :referenced)
# => {:ok, "[Link][1]\n\n[1]: https://example.com"}

Performance

Htmd is designed for high-throughput applications. The Rust implementation provides:

Requirements

Documentation

Full documentation is available on HexDocs.

License

This project is licensed under the MIT License.