Earmark—A Pure Elixir Markdown Processor

CICoverage StatusHex.pmHex.pmHex.pm

N.B.

This README contains the docstrings and doctests from the code by means of extractly and the following code examples are therefore verified with ExUnit doctests.

Table Of Content

Options

Earmark.Cli.Implementation

Functional (with the exception of reading input files with Earmark.File) interface to the CLI returning the device and the string to be output.

Earmark.Options

This is a superset of the options that need to be passed into EarmarkParser.as_ast/2

The following options are proper to Earmark only and therefore explained in detail

All other options are passed onto EarmarkParser.as_ast/2

Earmark.Options.make_options/1

Make a legal and normalized Option struct from, maps or keyword lists

Without a param or an empty input we just get a new Option struct

    iex(1)> { make_options(), make_options(%{}) }
    { {:ok, %Earmark.Options{}}, {:ok, %Earmark.Options{}} }

The same holds for the bang version of course

    iex(2)> { make_options!(), make_options!(%{}) }
    { %Earmark.Options{}, %Earmark.Options{} }

We check for unallowed keys

    iex(3)> make_options(no_such_option: true)
    {:error, [{:warning, 0, "Unrecognized option no_such_option: true"}]}

Of course we do not let our users discover one error after another

    iex(4)> make_options(no_such_option: true, gfm: false, still_not_an_option: 42)
    {:error, [{:warning, 0, "Unrecognized option no_such_option: true"}, {:warning, 0, "Unrecognized option still_not_an_option: 42"}]}

And the bang version will raise an Earmark.Error as excepted (sic)

    iex(5)> make_options!(no_such_option: true, gfm: false, still_not_an_option: 42)
    ** (Earmark.Error) [{:warning, 0, "Unrecognized option no_such_option: true"}, {:warning, 0, "Unrecognized option still_not_an_option: 42"}]

Earmark.Options.with_postprocessor/2

A convenience constructor

Earmark

Earmark

Abstract Syntax Tree and Rendering

The AST generation has now been moved out to EarmarkParser which is installed as a dependency.

This brings some changes to this documentation and also deprecates the usage of Earmark.as_ast

Earmark takes care of rendering the AST to HTML, exposing some AST Transformation Tools and providing a CLI as escript.

Therefore you will not find a detailed description of the supported Markdown here anymore as this is done in here

Earmark.as_ast

WARNING: This is just a proxy towards EarmarkParser.as_ast and is deprecated, it will be removed in version 1.5!

Replace your calls to Earmark.as_ast with EarmarkParse.as_ast as soon as possible.

N.B. If all you use is Earmark.as_ast consider only using EarmarkParser.

Also please refer yourself to the documentation of EarmarkParser

The function is described below and the other two API functions as_html and as_html! are now based upon the structure of the result of as_ast.

{:ok, ast, []}                   = EarmarkParser.as_ast(markdown)
{:ok, ast, deprecation_messages} = EarmarkParser.as_ast(markdown)
{:error, ast, error_messages}    = EarmarkParser.as_ast(markdown)

Earmark.as_html

{:ok, html_doc, []}                   = Earmark.as_html(markdown)
{:ok, html_doc, deprecation_messages} = Earmark.as_html(markdown)
{:error, html_doc, error_messages}    = Earmark.as_html(markdown)

Earmark.as_html!

html_doc = Earmark.as_html!(markdown, options)

Formats the error_messages returned by as_html and adds the filename to each. Then prints them to stderr and just returns the html_doc

Options

Options can be passed into as as_html/2 or as_html!/2 according to the documentation. A keyword list with legal options (c.f. Earmark.Options) or an Earmark.Options struct are accepted.

{status, html_doc, errors} = Earmark.as_html(markdown, options)
html_doc = Earmark.as_html!(markdown, options)
{status, ast, errors} = EarmarkParser.as_ast(markdown, options)

Rendering

All options passed through to EarmarkParser.as_ast are defined therein, however some options concern only the rendering of the returned AST

These are:

Normally Earmark aims to produce Human Readable output.

This will give results like these:

    iex(1)> markdown = "# Hello\nWorld"
    ...(1)> Earmark.as_html!(markdown, compact_output: false)
    "<h1>\nHello</h1>\n<p>\nWorld</p>\n"

But sometimes whitespace is not desired:

    iex(2)> markdown = "# Hello\nWorld"
    ...(2)> Earmark.as_html!(markdown, compact_output: true)
    "<h1>Hello</h1><p>World</p>"

Be cautions though when using this options, lines will become loooooong.

escape: defaulting to true

If set HTML will be properly escaped

      iex(3)> markdown = "Hello<br />World"
      ...(3)> Earmark.as_html!(markdown)
      "<p>\nHello<br />World</p>\n"

However disabling escape: gives you maximum control of the created document, which in some cases (e.g. inside tables) might even be necessary

      iex(4)> markdown = "Hello<br />World"
      ...(4)> Earmark.as_html!(markdown, escape: false)
      "<p>\nHello<br />World</p>\n"

inner_html: defaulting to false

This is especially useful inside templates, when a block element will disturb the layout as in this case

<span><%= Earmark.as_html!(....)%></span>
<span><%= Earmark.as_html!(....)%></span>

By means of the inner_html option the disturbing paragraph can be removed from as_html!'s output

      iex(5)> markdown = "Hello<br />World"
      ...(5)> Earmark.as_html!(markdown, escape: false, inner_html: true)
      "Hello<br />World\n"

N.B. that this applies only to top level paragraphs, as can be seen here

      iex(6)> markdown = "- Item\n\nPara"
      ...(6)> Earmark.as_html!(markdown, inner_html: true)
      "<ul>\n  <li>\nItem  </li>\n</ul>\nPara\n"

Before rendering the AST is transformed by a postprocessor. For details see the description of Earmark.Transform.map_ast below which will accept the same postprocessor as a matter of fact specifying postprocessor: fun is conecptionnaly the same as

          markdown
          |> EarmarkParser.as_ast
          |> Earmark.Transform.map_ast(fun)
          |> Earmark.Transform.transform

with all the necessary bookkeeping for options and messages

smartypants: defaulting to true

If set the following replacements will be made during rendering of inline text

"---" → "—"
"--" → "–"
"' → "’"
?" → "”"
"..." → "…"

Command line

    $ mix escript.build
    $ ./earmark file.md

Some options defined in the Earmark.Options struct can be specified as command line switches.

Use

    $ ./earmark --help

to find out more, but here is a short example

    $ ./earmark --smartypants false --code-class-prefix "a- b-" file.md

will call

    Earmark.as_html!( ..., %Earmark.Options{smartypants: false, code_class_prefix: "a- b-"})

Timeouts

By default, that is if the timeout option is not set Earmark uses parallel mapping as implemented in Earmark.pmap/2, which uses Task.await with its default timeout of 5000ms.

In rare cases that might not be enough.

By indicating a longer timeout option in milliseconds Earmark will use parallel mapping as implemented in Earmark.pmap/3, which will pass timeout to Task.await.

In both cases one can override the mapper function with either the mapper option (used if and only if timeout is nil) or the mapper_with_timeout function (used otherwise).

For the escript only the timeout command line argument can be used.

Security

Please be aware that Markdown is not a secure format. It produces HTML from Markdown and HTML. It is your job to sanitize and or filter the output of Earmark.as_html if you cannot trust the input and are to serve the produced HTML on the Web.

Earmark.Transform

Structure Conserving Transformers

For the convenience of processing the output of EarmarkParser.as_ast we expose two structure conserving mappers.

map_ast

takes a function that will be called for each node of the AST, where a leaf node is either a quadruple like {"code", [{"class", "inline"}], ["some code"], %{}} or a text leaf like "some code"

The result of the function call must be

A third parameter ignore_strings which defaults to false can be used to avoid invocation of the mapper function for text nodes

As an example let us transform an ast to have symbol keys

      iex(1)> input = [
      ...(1)> {"h1", [], ["Hello"], %{title: true}},
      ...(1)> {"ul", [], [{"li", [], ["alpha"], %{}}, {"li", [], ["beta"], %{}}], %{}}]
      ...(1)> map_ast(input, fn {t, a, _, m} -> {String.to_atom(t), a, nil, m} end, true)
      [ {:h1, [], ["Hello"], %{title: true}},
        {:ul, [], [{:li, [], ["alpha"], %{}}, {:li, [], ["beta"], %{}}], %{}} ]

N.B. If this returning convention is not respected map_ast might not complain, but the resulting transformation might not be suitable for Earmark.Transform.transform anymore. From this follows that any function passed in as value of the postprocessor: option must obey to these conventions.

map_ast_with

this is like map_ast but like a reducer an accumulator can also be passed through.

For that reason the function is called with two arguments, the first element being the same value as in map_ast and the second the accumulator. The return values need to be equally augmented tuples.

A simple example, annotating traversal order in the meta map's :count key, as we are not interested in text nodes we use the fourth parameter ignore_strings which defaults to false

       iex(2)>  input = [
       ...(2)>  {"ul", [], [{"li", [], ["one"], %{}}, {"li", [], ["two"], %{}}], %{}},
       ...(2)>  {"p", [], ["hello"], %{}}]
       ...(2)>  counter = fn {t, a, _, m}, c -> {{t, a, nil, Map.put(m, :count, c)}, c+1} end
       ...(2)>  map_ast_with(input, 0, counter, true)
       {[ {"ul", [], [{"li", [], ["one"], %{count: 1}}, {"li", [], ["two"], %{count: 2}}], %{count: 0}},
         {"p", [], ["hello"], %{count: 3}}], 4}

Postprocessors and Convenience Functions

These can be declared in the fields postprocessor and registered_processors in the Options struct, postprocessor is prepened to registered_processors and they are all applied to non string nodes (that is the quadtuples of the AST which are of the form {tag, atts, content, meta}

All postprocessors can just be functions on nodes or a TagSpecificProcessors struct which will group function applications depending on tags, as a convienience tuples of the form {tag, function} will be transformed into a TagSpecificProcessors struct.

    iex(3)> add_class1 = &Earmark.AstTools.merge_atts_in_node(&1, class: "class1")
    ...(3)> m1 = Earmark.Options.make_options!(postprocessor: add_class1) |> make_postprocessor()
    ...(3)> m1.({"a", [], nil, nil})
    {"a", [{"class", "class1"}], nil, nil}

We can also use the registered_processors field:

    iex(4)> add_class1 = &Earmark.AstTools.merge_atts_in_node(&1, class: "class1")
    ...(4)> m2 = Earmark.Options.make_options!(registered_processors: add_class1) |> make_postprocessor()
    ...(4)> m2.({"a", [], nil, nil})
    {"a", [{"class", "class1"}], nil, nil}

Knowing that values on the same attributes are added onto the front the following doctest demonstrates the order in which the processors are executed

    iex(5)> add_class1 = &Earmark.AstTools.merge_atts_in_node(&1, class: "class1")
    ...(5)> add_class2 = &Earmark.AstTools.merge_atts_in_node(&1, class: "class2")
    ...(5)> add_class3 = &Earmark.AstTools.merge_atts_in_node(&1, class: "class3")
    ...(5)> m = Earmark.Options.make_options!(postprocessor: add_class1, registered_processors: [add_class2, {"a", add_class3}])
    ...(5)> |> make_postprocessor()
    ...(5)> [{"a", [{"class", "link"}], nil, nil}, {"b", [], nil, nil}]
    ...(5)> |> Enum.map(m)
    [{"a", [{"class", "class3 class2 class1 link"}], nil, nil}, {"b", [{"class", "class2 class1"}], nil, nil}]

We can see that the tuple form has been transformed into a tag specific transformation only as a matter of fact, the explicit definition would be:

    iex(6)> m = make_postprocessor(
    ...(6)>   %Earmark.Options{
    ...(6)>     registered_processors:
    ...(6)>       [Earmark.TagSpecificProcessors.new({"a", &Earmark.AstTools.merge_atts_in_node(&1, target: "_blank")})]})
    ...(6)> [{"a", [{"href", "url"}], nil, nil}, {"b", [], nil, nil}]
    ...(6)> |> Enum.map(m)
    [{"a", [{"href", "url"}, {"target", "_blank"}], nil, nil}, {"b", [], nil, nil}]

We can also define a tag specific transformer in one step, which might (or might not) solve potential performance issues when running too many processors

    iex(7)> add_class4 = &Earmark.AstTools.merge_atts_in_node(&1, class: "class4")
    ...(7)> add_class5 = &Earmark.AstTools.merge_atts_in_node(&1, class: "class5")
    ...(7)> add_class6 = &Earmark.AstTools.merge_atts_in_node(&1, class: "class6")
    ...(7)> tsp = Earmark.TagSpecificProcessors.new([{"a", add_class5}, {"b", add_class5}])
    ...(7)> m = Earmark.Options.make_options!(
    ...(7)>       postprocessor: add_class4,
    ...(7)>       registered_processors: [tsp, add_class6])
    ...(7)> |> make_postprocessor()
    ...(7)> [{"a", [], nil, nil}, {"c", [], nil, nil}, {"b", [], nil, nil}]
    ...(7)> |> Enum.map(m)
    [{"a", [{"class", "class6 class5 class4"}], nil, nil}, {"c", [{"class", "class6 class4"}], nil, nil}, {"b", [{"class", "class6 class5 class4"}], nil, nil}]

Of course the mechanics shown above is hidden if all we want is to trigger the postprocessor chain in Earmark.as_html, here goes a typical example

    iex(8)> add_target = fn node -> # This will only be applied to nodes as it will become a TagSpecificProcessors
    ...(8)>   if Regex.match?(~r{\.x\.com\z}, Earmark.AstTools.find_att_in_node(node, "href", "")), do:
    ...(8)>     Earmark.AstTools.merge_atts_in_node(node, target: "_blank"), else: node end
    ...(8)> options = [
    ...(8)> registered_processors: [{"a", add_target}, {"p", &Earmark.AstTools.merge_atts_in_node(&1, class: "example")}]]
    ...(8)> markdown =
    ...(8)> """
    ...(8)>   http://hello.x.com
    ...(8)>
    ...(8)>   [some](url)
    ...(8)> """
    ...(8)> Earmark.as_html!(markdown, options)
    "<p class=\"example\">\n  <a href=\"http://hello.x.com\" target=\"_blank\">http://hello.x.com</a></p>\n<p class=\"example\">\n  <a href=\"url\">some</a></p>\n"
Use case: Modification of Link Attributes depending on the URL

This would be done as follows

        Earmark.as_html!(markdown, registered_processors: {"a", my_function_that_is_invoked_only_with_a_nodes})
Use case: Modification of the AST according to Annotations

N.B. Annotation are an experimental feature in 1.4.16-pre and are documented here

By annotating our markdown source we can then influence the rendering. In this example we will just add some decoration

    iex(9)> markdown = [ "A joke %% smile", "", "Charming %% in_love" ]
    ...(9)> add_smiley = fn {_, _, _, meta} = quad, _acc ->
    ...(9)>                case Map.get(meta, :annotation) do
    ...(9)>                  "%% smile"   -> {quad, "\u1F601"}
    ...(9)>                  "%% in_love" -> {quad, "\u1F60d"}
    ...(9)>                  _            -> {quad, nil}
    ...(9)>                end
    ...(9)>                text, nil -> {text, nil}
    ...(9)>                text, ann -> {"#{text} #{ann}", nil}
    ...(9)>              end
    ...(9)> Earmark.as_ast!(markdown, annotations: "%%") |> Earmark.Transform.map_ast_with(nil, add_smiley) |> Earmark.transform
    "<p>\nA joke  ὠ1</p>\n<p>\nCharming  ὠd</p>\n"

Structure Modifying Transformers

For structure modifications a tree traversal is needed and no clear pattern of how to assist this task with tools has emerged yet.

Contributing

Pull Requests are happily accepted.

Please be aware of one caveat when correcting/improving README.md.

The README.md is generated by Extractly as mentioned above and therefore contributers shall not modify it directly, but README.md.eex and the imported docs instead.

Thank you all who have already helped with Earmark, your names are duely noted in RELEASE.md.

Author

Copyright © 2014,5,6,7,8,9, 2020,1 Dave Thomas, The Pragmatic Programmers & Robert Dober @/+pragdave, dave@pragprog.com & robert.dober@gmail.com

LICENSE

Same as Elixir, which is Apache License v2.0. Please refer to LICENSE for details.