FastestTiktoken
Fast OpenAI-compatible tokenization for Elixir.
FastestTiktoken is a Rustler-backed Elixir library built on the
high-performance pure-Rust tiktoken crate.
It is designed for projects that need exact OpenAI tokenizer behavior without
depending on older wrappers around tiktoken-rs.
Full public-behavior parity is tested against official OpenAI
tiktoken0.13.0 for the OpenAI
encodings and API surfaces exposed here: model mapping, GPT-2/r50k fixtures,
regex edge cases, roundtrips, special-token handling, o200k_harmony, large
inputs, and batch helpers.
Installation
Add fastest_tiktoken to your dependencies:
def deps do
[
{:fastest_tiktoken, "~> 0.1.1"}
]
endThen fetch and compile:
mix deps.get
mix compilePublished releases use precompiled NIFs from GitHub Releases. Local source builds require Rust 1.94 or newer.
Quick Start
Count tokens by OpenAI model:
iex> FastestTiktoken.count_tokens("hello world", model: "gpt-4o")
{:ok, 2}Encode and decode text:
{:ok, tokens} = FastestTiktoken.encode("hello world", model: "gpt-4o")
#=> {:ok, [24912, 2375]}
FastestTiktoken.decode(tokens, model: "gpt-4o")
#=> {:ok, "hello world"}Use an explicit encoding:
FastestTiktoken.encode("hello world", encoding: :cl100k_base)
#=> {:ok, [15339, 1917]}
Resolve GPT OSS models through the official o200k_harmony mapping:
FastestTiktoken.encoding_for_model("gpt-oss-120b")
#=> {:ok, "o200k_harmony"}
FastestTiktoken.encode("<|start|>hello<|end|>",
model: "gpt-oss-120b",
allowed_special: :all
)
#=> {:ok, [200006, 24912, 200007]}Batch encode and decode:
{:ok, batch} =
FastestTiktoken.encode_batch(["hello world", "goodbye world"],
encoding: :cl100k_base
)
FastestTiktoken.decode_batch(batch, encoding: :cl100k_base)
#=> {:ok, ["hello world", "goodbye world"]}Handle special tokens explicitly:
FastestTiktoken.encode("hello <|endoftext|>",
encoding: :cl100k_base,
allowed_special: :all
)
#=> {:ok, [15339, 220, 100257]}
FastestTiktoken.encode("hello <|endoftext|>",
encoding: :cl100k_base,
allowed_special: ["<|endoftext|>"]
)
#=> {:ok, [15339, 220, 100257]}
By default, special token strings are treated as ordinary text. That matches
encode_ordinary semantics and keeps count_tokens/2 on the Rust crate's
zero-allocation count path.
Why FastestTiktoken
-
Compared with other Elixir tokenizer wrappers that depend on older
tiktoken-rsbindings, this project uses the faster pure-Rusttiktokencrate. -
Keeps a small Elixir API with explicit
{:ok, value}/{:error, reason}return values. - Supports RustlerPrecompiled artifacts so production installs do not need a Rust toolchain.
-
Parity-tested against official OpenAI
tiktoken0.13.0, including theo200k_harmonyspecial-token table used by GPT OSS models.
More Documentation
Source Builds
To force a local Rust build instead of using a precompiled NIF:
FASTEST_TIKTOKEN_BUILD=1 mix testSource builds require Rust 1.94 or newer, as declared by the native crate.