FastestTiktoken

Fast OpenAI-compatible tokenization for Elixir.

FastestTiktoken is a Rustler-backed Elixir library built on the high-performance pure-Rust tiktoken crate. It is designed for projects that need exact OpenAI tokenizer behavior without depending on older wrappers around tiktoken-rs.

Full public-behavior parity is tested against official OpenAI tiktoken0.13.0 for the OpenAI encodings and API surfaces exposed here: model mapping, GPT-2/r50k fixtures, regex edge cases, roundtrips, special-token handling, o200k_harmony, large inputs, and batch helpers.

Installation

Add fastest_tiktoken to your dependencies:

def deps do
  [
    {:fastest_tiktoken, "~> 0.1.1"}
  ]
end

Then fetch and compile:

mix deps.get
mix compile

Published releases use precompiled NIFs from GitHub Releases. Local source builds require Rust 1.94 or newer.

Quick Start

Count tokens by OpenAI model:

iex> FastestTiktoken.count_tokens("hello world", model: "gpt-4o")
{:ok, 2}

Encode and decode text:

{:ok, tokens} = FastestTiktoken.encode("hello world", model: "gpt-4o")
#=> {:ok, [24912, 2375]}

FastestTiktoken.decode(tokens, model: "gpt-4o")
#=> {:ok, "hello world"}

Use an explicit encoding:

FastestTiktoken.encode("hello world", encoding: :cl100k_base)
#=> {:ok, [15339, 1917]}

Resolve GPT OSS models through the official o200k_harmony mapping:

FastestTiktoken.encoding_for_model("gpt-oss-120b")
#=> {:ok, "o200k_harmony"}

FastestTiktoken.encode("<|start|>hello<|end|>",
  model: "gpt-oss-120b",
  allowed_special: :all
)
#=> {:ok, [200006, 24912, 200007]}

Batch encode and decode:

{:ok, batch} =
  FastestTiktoken.encode_batch(["hello world", "goodbye world"],
    encoding: :cl100k_base
  )

FastestTiktoken.decode_batch(batch, encoding: :cl100k_base)
#=> {:ok, ["hello world", "goodbye world"]}

Handle special tokens explicitly:

FastestTiktoken.encode("hello <|endoftext|>",
  encoding: :cl100k_base,
  allowed_special: :all
)
#=> {:ok, [15339, 220, 100257]}

FastestTiktoken.encode("hello <|endoftext|>",
  encoding: :cl100k_base,
  allowed_special: ["<|endoftext|>"]
)
#=> {:ok, [15339, 220, 100257]}

By default, special token strings are treated as ordinary text. That matches encode_ordinary semantics and keeps count_tokens/2 on the Rust crate's zero-allocation count path.

Why FastestTiktoken

More Documentation

Source Builds

To force a local Rust build instead of using a precompiled NIF:

FASTEST_TIKTOKEN_BUILD=1 mix test

Source builds require Rust 1.94 or newer, as declared by the native crate.