localize_mf2_treesitter

Elixir bindings to the MF2 tree-sitter grammar. Server-side MF2 parsing in the BEAM via a C NIF — incremental, error-recovering, position-aware.

Complements the Localize.Message.Parser parser

The localize hex package provides a NimbleParsec parser for MF2. That parser is strict, fast, and fails on the first error — correct for runtime formatting via Localize.Message.format/3. This package is resilient and position-aware — correct for:

The two coexist in the same app with no conflict. Use whichever fits the job.

If you're after MF2 parsing in a broswer

If you need to edit MF2 messages in a browser, see the tree-sitter-mf2 npm packages which works from the same MF2 tree-sitter grammar and build by the same author.

Installation

def deps do
  [
    {:localize_mf2_treesitter, "~> 0.1"}
  ]
end

No runtime dependencies beyond the ERTS NIF interface. A C11-capable compiler is required at build time.

Usage

iex> {:ok, tree} = Localize.Mf2.TreeSitter.parse("hello {$name}")
iex> root = Localize.Mf2.TreeSitter.root(tree)
iex> Localize.Mf2.TreeSitter.Node.type(root)
"source_file"
iex> Localize.Mf2.TreeSitter.Node.has_error?(root)
false
iex> Localize.Mf2.TreeSitter.Node.text(root)
"hello {$name}"

Walk the tree with the accessors in Localize.Mf2.TreeSitter.Node — see the moduledoc for the full list. Byte ranges are in bytes; point coordinates are {row, column} with row zero-indexed.

Error recovery

Invalid MF2 input still produces a tree. Tree-sitter's GLR engine places ERROR or MISSING nodes at the failure points rather than aborting.

iex> {:ok, tree} = Localize.Mf2.TreeSitter.parse("hello {$")
iex> Localize.Mf2.TreeSitter.Node.has_error?(Localize.Mf2.TreeSitter.root(tree))
true
iex> [{:error, _} | _] = Localize.Mf2.TreeSitter.diagnostics(tree)

This is the property that makes the grammar LSP-friendly — an editor never sees a dead parse.

Queries

The highlight query shipped in mf2_treesitter is available via Query.load/1:

iex> {:ok, tree} = Localize.Mf2.TreeSitter.parse("hello {$name}")
iex> {:ok, query} = Localize.Mf2.TreeSitter.Query.load(:highlights)
iex> [{"variable", node} | _] =
...>   Localize.Mf2.TreeSitter.Query.captures(query, Localize.Mf2.TreeSitter.root(tree))
...>   |> Enum.filter(fn {name, _} -> name == "variable" end)
iex> Localize.Mf2.TreeSitter.Node.text(node)
"name"

Use Query.matches/2 when pattern provenance matters (e.g. distinguishing which rule matched), and Query.captures/2 for the flat highlight-pass case.

Incremental reparse

After a text edit, produce a new tree without redoing the full parse:

iex> old_src = "hello {$name}"
iex> new_src = "hello {$name}!"
iex> {:ok, old} = Localize.Mf2.TreeSitter.parse(old_src)
iex> edit = %Localize.Mf2.TreeSitter.Edit{
...>   start_byte: byte_size(old_src),
...>   old_end_byte: byte_size(old_src),
...>   new_end_byte: byte_size(new_src),
...>   start_point: {0, byte_size(old_src)},
...>   old_end_point: {0, byte_size(old_src)},
...>   new_end_point: {0, byte_size(new_src)}
...> }
iex> {:ok, new} = Localize.Mf2.TreeSitter.parse_incremental(old, [edit], new_src)
iex> Localize.Mf2.TreeSitter.changed_ranges(old, new)
[{13, 14, {0, 13}, {0, 14}}]

The old tree is not mutated; the NIF clones it, applies the edits to the clone, and feeds the clone to the parser. changed_ranges/2 returns the byte/point ranges that differ.

Keeping the grammar current

The grammar files under c_src/grammar/ and priv/queries/ are vendored from the tree-sitter-mf2 npm package (published from mf2_treesitter). A mix task pins a specific version and fetches files from the published tarball via the unpkg CDN — no sibling repo checkout required, fully reproducible.

# Fetch from npm at the pinned version and update vendored files.
mix localize_mf2_treesitter.sync

# CI check — exit non-zero if vendored files drift from the pinned
# version. Does not modify files.
mix localize_mf2_treesitter.sync --check

The pinned version lives at the top of the task module as @tree_sitter_mf2_version. To move to a newer grammar release, bump that string and re-run the task. Keep the pin in step with mf2_wasm_editor's own sync task — grammar tree shape is the API boundary between this NIF (server-side parse) and the WASM editor (browser-side parse); a version skew can produce different trees for the same input, breaking the canonicalisation round-trip.

Offline / local-iteration override

If you're iterating on the grammar locally and want the sync to read from a sibling checkout rather than hit the network, set MF2_TREESITTER_DIR:

MF2_TREESITTER_DIR=/path/to/mf2_treesitter mix localize_mf2_treesitter.sync

No --build-wasm flag

This package doesn't produce a WASM bundle. The WASM consumer is mf2_wasm_editor, which has its own sync task. The grammar package (tree-sitter-mf2) is the common upstream both packages sync from.

Keeping the tree-sitter runtime current

Separate from the grammar, the package also embeds the tree-sitter C runtime under c_src/runtime/. It compiles alongside parser.c into the NIF .so. The runtime's supported ABI version must be ≥ the version parser.c was generated against — otherwise ts_parser_set_language() refuses the language at load time and every parse returns {:error, :parse_failed}.

A dedicated mix task refreshes the runtime from upstream tree-sitter/tree-sitter:

# Fetch and overlay the pinned runtime version.
mix localize_mf2_treesitter.update_runtime

# CI check — exits non-zero if any vendored runtime file drifts
# from the pinned version. Doesn't modify files.
mix localize_mf2_treesitter.update_runtime --check

The pinned version lives at the top of the task module as @runtime_version. Bump it whenever you bump the grammar pin in localize_mf2_treesitter.sync, so the runtime's ABI support stays ahead of (or equal to) whatever parser.c needs. As a rule of thumb, match the @runtime_version here to the tree-sitter CLI version that generated c_src/grammar/parser.c (check its first-line comment).

The task preserves c_src/runtime/src/lib.c (our hand-written amalgamation wrapper — it adds #define _POSIX_C_SOURCE 200112L and #includes every runtime .c file). If upstream adds a new .c file under lib/src/, the task warns that lib.c needs an extra #include line.

Roadmap

Licence

Apache-2.0 for this package. The vendored tree-sitter runtime under c_src/runtime/ is MIT — see c_src/runtime/LICENSE.