PureHTML

Hex.pmDocs

A pure Elixir HTML5 parser. No NIFs. No native dependencies. Just Elixir.

Why PureHTML?

Pure Elixir

PureHTML has zero dependencies. It's pure Elixir code all the way down.

Correct

PureHTML implements the WHATWG HTML5 specification. It handles all the complex error-recovery rules that browsers use.

Fast Enough

For raw speed, use a NIF-based parser. But for most use cases, PureHTML is fast enough while giving you the benefits of pure Elixir.

Installation

Add pure_html to your list of dependencies in mix.exs:

def deps do
  [
    {:pure_html, "~> 0.2.0"}
  ]
end

Quick Example

# Parse HTML into a document tree
PureHTML.parse("<p class=&#39;intro&#39;>Hello!</p>")
# => [{"html", [], [{"head", [], []}, {"body", [], [{"p", [{"class", "intro"}], ["Hello!"]}]}]}]

# Works with malformed HTML just like browsers do
PureHTML.parse("<p>One<p>Two")
# => [{"html", [], [{"head", [], []}, {"body", [], [{"p", [], ["One"]}, {"p", [], ["Two"]}]}]}]

# Convert back to HTML
PureHTML.parse("<p>Hello</p>") |> PureHTML.to_html()
# => "<html><head></head><body><p>Hello</p></body></html>"

Querying

Find elements using CSS selectors.

html = PureHTML.parse("<div><p class=&#39;intro&#39;>Hello</p><p>World</p></div>")

# Find by tag
PureHTML.query(html, "p")
# => [{"p", [{"class", "intro"}], ["Hello"]}, {"p", [], ["World"]}]

# Find by class
PureHTML.query(html, ".intro")
# => [{"p", [{"class", "intro"}], ["Hello"]}]

# Compound selectors
PureHTML.query(html, "p.intro")
# => [{"p", [{"class", "intro"}], ["Hello"]}]

# Combinators
PureHTML.query(html, "div > p")      # Direct children
PureHTML.query(html, "div p")        # All descendants

# Extract text content
PureHTML.text(html)
# => "HelloWorld"

# Extract attributes
PureHTML.attribute(html, "p", "class")
# => ["intro"]

Supported selectors: tag, *, .class, #id, [attr], [attr=val], [attr^=prefix], [attr$=suffix], [attr*=substring], selector lists (.a, .b), combinators (div p, div > p, h1 + p, h1 ~ p).

See the Querying Guide for complete documentation.

License

Copyright 2026 (c) Marcelo De Polli.

PureHTML source code is released under MIT License.

Check LICENSE file for more information.