SimdXml
SIMD-accelerated XML parsing with full XPath 1.0 support for Elixir.
SimdXml parses XML into a flat structural index (~16 bytes per tag) using SIMD instructions, then evaluates XPath expressions against it using array operations. No DOM tree, no atom creation from untrusted input, no XXE vulnerabilities.
Wraps the simdxml Rust crate via Rustler NIFs with precompiled binaries for all major platforms.
Installation
def deps do
[{:simdxml, "~> 0.1.0"}]
end
Precompiled NIF binaries are provided for macOS (Apple Silicon, Intel), Linux
(x86_64, aarch64, musl), and Windows. Set SIMDXML_BUILD=1 to compile from
source if needed.
Quick start
# Parse
doc = SimdXml.parse!("<library><book lang='en'><title>Elixir</title></book></library>")
# Query with XPath
SimdXml.xpath_text!(doc, "//title")
#=> ["Elixir"]
# Navigate elements (Enumerable)
root = SimdXml.Document.root(doc)
Enum.map(root, & &1.tag)
#=> ["book"]
# Attributes
[book] = SimdXml.Element.children(root)
SimdXml.Element.get(book, "lang")
#=> "en"Query combinators
Build XPath queries with Elixir pipes instead of strings:
import SimdXml.Query
query = descendant("book") |> where_attr("lang", "en") |> child("title") |> text()
SimdXml.query!(doc, query)
#=> ["Elixir"]
# Inspect the generated XPath
SimdXml.Query.to_xpath(query)
#=> "//book[@lang='en']/title/text()"Queries are composable data structures — extract common fragments and reuse them:
books = descendant("book")
english = books |> where_attr("lang", "en")
titles = english |> child("title") |> text()
authors = english |> child("author") |> text()Compiled queries
Compile once, evaluate against many documents:
query = SimdXml.compile!("//title")
SimdXml.eval_text!(doc1, query)
SimdXml.eval_text!(doc2, query)
# Optimized short-circuit operations
SimdXml.eval_count!(doc, query) #=> 1
SimdXml.eval_exists?(doc, query) #=> {:ok, true}Compiled queries are NIF resources — safe to share across processes, store in ETS, or hold in module attributes.
Batch processing
Process thousands of documents with bloom filter prescanning:
query = SimdXml.compile!("//claim")
{:ok, results} = SimdXml.Batch.eval_text_bloom(xml_binaries, query)Documents that cannot contain the target tags are skipped without parsing.
Quick grep mode
For simple //tagname extraction at memory bandwidth — no structural index:
scanner = SimdXml.Quick.new("claim")
SimdXml.Quick.extract_first(scanner, xml) #=> "First claim text"
SimdXml.Quick.exists?(scanner, xml) #=> true
SimdXml.Quick.count(scanner, xml) #=> 42Result helpers
SimdXml.Result.one(doc, "//title") #=> "Elixir"
SimdXml.Result.fetch(doc, "//title") #=> {:ok, "Elixir"}
SimdXml.Result.all(doc, "//title") #=> ["Elixir"]Why SimdXml?
| SimdXml | SweetXml | Saxy | |
|---|---|---|---|
| Parser | SIMD Rust NIF | xmerl (Erlang) | Pure Elixir SAX |
| XPath | Full 1.0 | Full 1.0 (via xmerl) | None |
| Memory | ~16 bytes/tag | ~350 bytes/node | Streaming |
| Atom safety | Strings only | Creates atoms | Strings only |
| XXE safe | No DTD processing | Vulnerable by default | No DTD processing |
| API | Combinators + XPath | ~x sigil | SAX handlers |
| Batch | Bloom-filtered | No | No |
Documentation
Full API docs and interactive Livebook guides:
License
MIT