MDEx

MDEx logo

An extensible Markdown parser and formatter for Elixir. Compliant with CommonMark and supports GitHub, GitLab, and Discord features.

Hex VersionHex DocsMIT

Features

Compliant with CommonMark and GitHub Flavored Markdown specifications with extra extensions as Wiki Links, Discord Markdown tags, and emoji. Also supports syntax highlighting out-of-the-box using the Autumn library.

Under the hood it's calling the comrak APIs to process Markdown, a fast Rust crate that ports the cmark fork maintained by GitHub, a widely and well adopted Markdown implementation.

The AST structure is based on Floki so a similar API to manipulate HTML can be used to manipulate Markdown documents. Check out some examples at mdex/examples/

Installation

Add :mdex dependency:

def deps do
  [
    {:mdex, "~> 0.2"}
  ]
end

Usage

Mix.install([{:mdex, "~> 0.2"}])
iex> MDEx.to_html!("# Hello")
"<h1>Hello</h1>"
iex> MDEx.to_html!("# Hello :smile:", extension: [shortcodes: true])
"<h1>Hello 😄</h1>"

Sigils

Convert and generate AST, Markdown (CommonMark), HTML, and XML formats.

First, import the sigils:

iex> import MDEx.Sigil
iex> import MDEx.Sigil
iex> ~M|# Hello from `~M` sigil|
%MDEx.Document{
  nodes: [
    %MDEx.Heading{
      nodes: [
        %MDEx.Text{literal: "Hello from "},
        %MDEx.Code{num_backticks: 1, literal: "~M"},
        %MDEx.Text{literal: " sigil"}
      ],
      level: 1,
      setext: false
    }
  ]
}
iex> import MDEx.Sigil
iex> ~M|`~M` also converts to HTML format|HTML
"<p><code>~M</code> also converts to HTML format</p>"
iex> import MDEx.Sigil
iex> ~M|and to XML as well|XML
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!DOCTYPE document SYSTEM \"CommonMark.dtd\">\n<document xmlns=\"http://commonmark.org/xml/1.0\">\n  <paragraph>\n    <text xml:space=\"preserve\">and to XML as well</text>\n  </paragraph>\n</document>\n"

Use ~m to interpolate variables:

iex> import MDEx.Sigil
iex> lang = :elixir
iex> ~m|`lang = #{inspect(lang)}`|
%MDEx.Document{nodes: [%MDEx.Paragraph{nodes: [%MDEx.Code{num_backticks: 1, literal: "lang = :elixir"}]}]}

See more info at https://hexdocs.pm/mdex/MDEx.Sigil.html

Safety

For security reasons, every piece of raw HTML is omitted from the output by default:

iex> MDEx.to_html!("<h1>Hello</h1>")
<!-- raw HTML omitted -->
"

That's not very useful for most cases, but you can render raw HTML and escape it instead:

iex> MDEx.to_html!("<h1>Hello</h1>", render: [escape: true])
"<h1>Hello</h1>"

If the input is provided by external sources, it might be a good idea to sanitize it instead for extra security:

iex> MDEx.to_html!("<a href=https://elixir-lang.org/>Elixir</a>", render: [unsafe_: true], features: [sanitize: true])
"<p><a href=\"https://elixir-lang.org/\" rel=\"noopener noreferrer\">Elixir</a></p>"

Note that you must pass the unsafe_: true option to first generate the raw HTML in order to sanitize it.

All sanitization rules are defined in the ammonia docs. For example, the link in the example below was marked as noopener noreferrer to prevent attacks.

If those rules are too strict and you really trust the input, or you really need to render raw HTML, then you can just render it directly without escaping nor sanitizing:

iex> MDEx.to_html!("<script>alert(&#39;hello&#39;)</script>", render: [unsafe_: true])
"<script>alert(&#39;hello&#39;)</script>"

Parsing

Converts Markdown to an AST data structure that can be inspected and manipulated to change the content of the document programmatically.

The data structure format is inspired on Floki (with :attributes_as_maps = true) so we can keep similar APIs and keep the same mental model when working with these documents, either Markdown or HTML, where each node is represented as a struct holding the node name as the struct name and its attributes and children, for eg:

%MDEx.Heading{
  level: 1
  nodes: [...],
}

The parent node that represents the root of the document is the MDEx.Document struct, where you can find more more information about the AST and what operations are available.

The complete list of nodes is listed in the documentation, section Document Nodes.

Formatting

Formatting is the process of converting from one format to another, for example from AST or Markdown to HTML. Formatting to XML and to Markdown is also supported.

You can use MDEx.parse_document/2 to generate an AST or any of the to_* functions to convert to Markdown (CommonMark), HTML, or XML.

Options

Use options to change the behavior and the generated output.

All the comrak Options are available as keyword lists, and an additional :features option to extend it further.

The full documentation and list of all options with description and examples can be found on the links below:

Features Options

See some examples below on how to use the provided options:

GitHub Flavored Markdown with emojis

MDEx.to_html!(~S"""
# GitHub Flavored Markdown :rocket:

- [x] Task A
- [x] Task B
- [ ] Task C

| Feature | Status |
| ------- | ------ |
| Fast | :white_check_mark: |
| GFM  | :white_check_mark: |

Check out the spec at https://github.github.com/gfm/
""",
extension: [
  strikethrough: true,
  tagfilter: true,
  table: true,
  autolink: true,
  tasklist: true,
  footnotes: true,
  shortcodes: true,
],
parse: [
  smart: true,
  relaxed_tasklist_matching: true,
  relaxed_autolinks: true
],
render: [
  github_pre_lang: true,
  unsafe_: true,
],
features: [
  sanitize: true
]) |> IO.puts()
"""
<p>GitHub Flavored Markdown 🚀</p>
<ul>
  <li><input type="checkbox" checked="" disabled="" /> Task A</li>
  <li><input type="checkbox" checked="" disabled="" /> Task B</li>
  <li><input type="checkbox" disabled="" /> Task C</li>
</ul>
<table>
  <thead>
    <tr>
      <th>Feature</th>
      <th>Status</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Fast</td>
      <td>✅</td>
    </tr>
    <tr>
      <td>GFM</td>
      <td>✅</td>
    </tr>
  </tbody>
</table>
<p>Check out the spec at <a href="https://github.github.com/gfm/">https://github.github.com/gfm/</a></p>
"""

Code Syntax Highlighting

MDEx.to_html!(~S"""
```elixir
String.upcase("elixir")
```
""",
features: [syntax_highlight_theme: "catppuccin_latte"]
) |> IO.puts()
"""
<pre class=\"autumn highlight\" style=\"background-color: #282C34; color: #ABB2BF;\">
  <code class=\"language-elixir\" translate=\"no\">
    <span class=\"namespace\" style=\"color: #61AFEF;\">String</span><span class=\"operator\" style=\"color: #C678DD;\">.</span><span class=\"function\" style=\"color: #61AFEF;\">upcase</span><span class=\"\" style=\"color: #ABB2BF;\">(</span><span class=\"string\" style=\"color: #98C379;\">"elixir"</span><span class=\"\" style=\"color: #ABB2BF;\">)</span>
  </code>
</pre>
"""

Pre-compilation

Pre-compiled binaries are available for the following targets, so you don't need to have Rust installed to compile and use this library:

But in case you need or want to compile it yourself, you can do the following:

export MDEX_BUILD=1
mix deps.get
mix compile

Legacy CPUs

Modern CPU features are enabled by default but if your environment has an older CPU, you can use legacy artifacts by adding the following configuration to your config.exs:

config :mdex, use_legacy_artifacts: true

Demo and Samples

A livebook and a script are available to play with and experiment with this library.

Used By

Are you using MDEx and want to list your project here? Please send a PR!

Benchmark

A simple script is available to compare existing libs:

Name              ips        average  deviation         median         99th %
cmark         22.82 K      0.0438 ms    ±16.24%      0.0429 ms      0.0598 ms
mdex           3.57 K        0.28 ms     ±9.79%        0.28 ms        0.33 ms
md             0.34 K        2.95 ms    ±10.56%        2.90 ms        3.62 ms
earmark        0.25 K        4.04 ms     ±4.50%        4.00 ms        4.44 ms

Comparison:
cmark         22.82 K
mdex           3.57 K - 6.39x slower +0.24 ms
md             0.34 K - 67.25x slower +2.90 ms
earmark        0.25 K - 92.19x slower +4.00 ms

Motivation

MDEx was born out of the necessity of parsing CommonMark files, to parse hundreds of files quickly, and to be easily extensible by consumers of the library.

Note that MDEx is the only one that syntax highlights out-of-the-box which contributes to make it slower than cmark.

To finish, a friendly reminder that all libs have their own strengths and trade-offs so use the one that better suit your needs.

Looking for help with your Elixir project?

DockYard logo

At DockYard we are ready to help you build your next Elixir project. We have a unique expertise in Elixir and Phoenix development that is unmatched and we love to write about Elixir.

Have a project in mind? Get in touch!

Acknowledgements