SemanticMarkdown

Description

SemanticMarkdown is a library that allows marking parts of the Markdown document with XML tags. Those are extracted and provided in Keyword-list form along with the non marked format.

E.g.:

<author>Alexander "exlee" Kaminski</author>
<date>2022-02-02</date>
<language>en_US</language>

# Hello World!

Every document has to start somewhere?!

<hint>It&#39;s possible to extend from World to Universe at some point</hint>

<mobile_content>
As _content_ on this *page* is very intensive you will not be able to see the images!
</mobile_content>

<update>2022-02-03 : Added hint</update>
<update>2022-02-04 : Set tags</update>

... is transformed into friendly keyword list:

[
  author: ...,
  date: ...,
  language: ...,
  content: ...,
  mobile_content: ...,
  update: ...,
  update: ...
]

Such list could be then used for conditional rendering or all kinds of rendering transformation using marked parts/attributes.

Solution space:

Have a local markdown-based CMS system
Conditional content loading depending on various conditions, like locale or browser configuration (without DB)
Data points embedding for interactive components
Assymetrical documents (e.g. flashcards)

Installation

SemanticMarkdown can be installed by adding semantic_markdown to your list of dependencies in mix.exs:

def deps do
  [
    {:semantic_markdown, "~> 0.1.0"}
  ]
end

Documentation can be found at https://hexdocs.pm/semantic_markdown.

Rationale

Markdown is a great format for short and longer forms. However it's somewhat limited when it comes to creating structured content. The usual solutions is to use either CMS system or directly database in order to feed content. Database modelling takes time and might be an overkill for small solutions.

Also - Markdown is VERY good for writing content, so if the solution is small, text-driven one can use Markdown instead of trying to hammer-in back office system so that the content can be provided.

Database

Same semantic information can be obtained by using database. For small solutions modelling database or even setting it up can be overkill over having flat local files. Semantic marking allows for example loading markdown into local database (like SQLite3) for faster reads and incrementally extending model as needed.

Footnotes

At the time of writing this library Earmark hard codes see footnotes and return to article when parsing them. SemanticMarkdown provides options to replace those during parse allowing to use non-English titles (e.g. with gettext), which was another motivation and the actual

TL;DR;

I wanted to have simple CMS system for a content generation, and couldn't find one, so made my own ;)

Alternative solutions

Front-matter

Some Markdown parsing solutions are using "header" parts in order to provide data with semantic value, e.g.:

date: 2022-02-02
language: en_US
author: Anonymous Writer

-----

# Title of the document
(...)

Where front-matter can be any format (XML, TOML, YAML etc.).

Such approach works well when provided data can be embedded in such data file. It doesn't allow marking parts of document and it's usually developer's responsibility to make sure that document is split in proper manner.

XML Parsing

XML parsing (with library such as SweetXml) would probably be preferable.

<xml>
  <title> ... </title>
  <content>
    ...
  </content>
  <sources>
    ...
  </sources>
</xml>

Since not only that would provide semantic tagging and formatting but also allow for hardening data with name spaces. However, if one decides to use it, they're on their own to implement Markdown parsing for specific nodes.

Markdown classes

Instead of toying with semantic tagging one could use IAL extensions (see Earmark's) and then use other methods of hiding content (like CSS/JS).

Document splitting

It it's also possible to split the .md files into multiple ones using schema like, but if there are a lot of information with semantic meaning such split would be very cumbersome to uphold.

Missing features / known issues

it should be possible to have inner transforms on tag-by-tag or even node-by-node basis
footnotes need to be in the same semantic node making them somewhat useless
since parsing is done using Earmark it shares some caveats (like HTML Limitation)
no performance tests were done, but most likely it's not very fast so the input files should be pre-processed and cached
it'd be nice to have tag transformers provided in form of (text) -> any so that output can be "smarter"
nested semantic tags are not supported (this probably would require switching parser entirely)

More tests, especially with more complex documents
Configurable transformers for tags
Per-tag inner-parsing