IREE.Tokenizers

Fast Hugging Face tokenizer.json bindings for Elixir backed by the IREE tokenizer runtime.

Features

Scope

V1 is intentionally inference-only.

Repository Usage

Install dependencies and run the full local validation flow from the repo root:

mix deps.get
mix test
cargo test --manifest-path native/iree_tokenizers_native/Cargo.toml

In :dev and :test, the project forces a local source build of the Rust NIF, so you do not need precompiled release assets for normal development.

Example

{:ok, tokenizer} = IREE.Tokenizers.Tokenizer.from_file("tokenizer.json")

{:ok, encoding} =
  IREE.Tokenizers.Tokenizer.encode(tokenizer, "Hello world", add_special_tokens: false)

encoding.ids

{:ok, text} =
  IREE.Tokenizers.Tokenizer.decode(tokenizer, encoding.ids, skip_special_tokens: false)

You can also load directly from the Hugging Face Hub:

{:ok, tokenizer} = IREE.Tokenizers.Tokenizer.from_pretrained("gpt2")

If you need authentication for gated/private repos:

{:ok, tokenizer} =
  IREE.Tokenizers.Tokenizer.from_pretrained("some/private-model",
    token: System.fetch_env!("HF_TOKEN")
  )

Benchmarks

Current Local Results

The benchmark harness compares this package against the published tokenizers package.

On a recent local GPT-2 batch-of-100 encode run, this package measured 9.4M tokens/sec. The IREE tokenizer author reports 10.1M tokens/sec in the upstream post. That difference is small enough to be unsurprising and does not indicate a correctness problem by itself:

The important result is that the implementation remains in the same performance class and preserves the expected large speedup over the Elixir tokenizers package.

Model latency comparison

The current checked-in local snapshot from bench/results/model_matrix.md contains:

Model Repo used Tokenizers package (ms) IREE oneshot / stream (ms) Speedup
LiquidAI/LFM2.5-1.2B-InstructLiquidAI/LFM2.5-1.2B-Instruct64.0 ms4.68 ms / 4.77 ms13.7x / 13.4x
Qwen/Qwen3.5-9BQwen/Qwen3.5-9B70.2 ms4.93 ms / 11.3 ms14.2x / 6.2x
zai-org/GLM-5.1zai-org/GLM-5.163.1 ms4.74 ms / 5.59 ms13.3x / 11.3x
mistralai/Ministral-3-3B-Reasoning-2512mistralai/Ministral-3-3B-Reasoning-251263.0 ms4.69 ms / 5.66 ms13.4x / 11.1x
google/gemma-4-31B-itgoogle/gemma-4-31B-it20.1 ms3.39 ms / 3.81 ms5.9x / 5.3x

The benchmark harness intentionally keeps only one representative repo per tokenizer family when multiple model variants share the same tokenizer. The current family-level matrix targets:

Latency chart:

Model matrix latency

Speedup chart:

Model matrix speedup

Benchmark Harness

The benchmark harness lives under bench/.

Set it up once:

cd bench
mix deps.get

Run the generic encode/decode comparison:

mix run compare.exs

Generate the multi-model latency/speedup graphs:

mix run model_matrix_graphs.exs

Limit the multi-model run to a single model while iterating:

MODEL_FILTER="Qwen/Qwen3.5-9B" mix run model_matrix_graphs.exs

You can also target the latest GLM run specifically:

MODEL_FILTER="zai-org/GLM-5.1" mix run model_matrix_graphs.exs

All benchmark outputs are written to bench/results/.

If any benchmark target requires authentication, set HF_TOKEN before running the script:

HF_TOKEN=... mix run model_matrix_graphs.exs

Vendored IREE Bundle

The native crate builds against a curated vendored source bundle under native/iree_tokenizers_native/vendor/iree_tokenizer_src.

The vendored bundle is pinned to the IREE commit recorded in native/iree_tokenizers_native/vendor/IREE_COMMIT.

To refresh that bundle from the pinned upstream IREE checkout:

scripts/update_iree_bundle.sh /path/to/iree