onyx
Erlang NIF library for ONNX model inference, powered by tract — a pure Rust ONNX runtime with no external dependencies.
- Zero external dependencies — tract is statically linked; a single
priv/onyx.dllships in the hex package - No Rust toolchain required — pre-compiled NIF bundled,
rebar3 compilejust works - BEAM-safe — inference runs on dirty CPU schedulers; all Rust panics are caught and returned as Erlang error tuples
- Session-based API —
load/1compiles and optimises the model once,run/2executes it repeatedly with zero re-compilation overhead - Explicit lifecycle control —
unload/1immediately invalidates a session; GC also reclaims sessions automatically - Sied-compatible binary format — tensors are little-endian packed binaries, the same convention used by sied and kvex
Ecosystem
onyx is part of a pure-Erlang ML stack:
sied 0.2.4 — SIMD kernels: POPCNT, dot-product, L2-norm, 1-bit quantization
onyx 0.1.0 — ONNX inference: load any ONNX model, run it on the BEAM
kvex 0.2.1 — Approximate nearest-neighbour index with persistenceTypical pipeline: tokenize text externally → onyx generates embeddings → kvex performs ANN search.
Installation
%% rebar.config
{deps, [{onyx, "0.1.0"}]}.No Rust toolchain required at compile time.
Quick start
%% Load and compile the model (runs on a DirtyIO scheduler, ~100ms–1s)
{ok, Model} = onyx:load("sentence-transformer.onnx"),
%% Inspect what inputs the model expects
#{inputs := Inputs, outputs := Outputs} = Model,
%% Inputs = [{<<"input_ids">>, [1, 32], i32}, {<<"attention_mask">>, [1, 32], i32}]
%% Outputs = [{<<"sentence_embedding">>, [1, 384], f32}]
%% Build input tensors — little-endian packed binaries
IdsBin = << <<Id:32/signed-little>> || Id <- TokenIds >>,
MaskBin = << <<M:32/signed-little>> || M <- AttentionMask >>,
%% Run inference (runs on a DirtyCPU scheduler, ~1ms–100ms)
{ok, #{<<"sentence_embedding">> := {EmbBin, [1, 384], f32}}} =
onyx:run(Model, #{
<<"input_ids">> => {IdsBin, [1, 32], i32},
<<"attention_mask">> => {MaskBin, [1, 32], i32}
}),
%% EmbBin is a 384×4 = 1536-byte little-endian float32 binary
%% Feed directly into kvex — no conversion needed
ok = kvex:add(Index, DocumentId, EmbBin).API
load/1
-spec load(file:filename()) -> {ok, session()} | {error, term()}.Loads an ONNX model from disk, runs tract's optimiser (constant folding, op fusion), and compiles an execution plan. Accepts both binary and charlist paths.
Runs on a DirtyIO scheduler — will not block normal BEAM schedulers.
run/2
-spec run(session(), #{binary() => tensor()}) ->
{ok, #{binary() => tensor()}} | {error, term()}.
Executes one forward pass. Inputs is a map of input name to tensor. The map must contain exactly the inputs the model expects (as reported in session.inputs). Outputs is a map of output name to tensor.
Runs on a DirtyCPU scheduler — will not block normal BEAM schedulers.
unload/1
-spec unload(session()) -> ok.
Immediately marks the session invalid. Any subsequent run/2 on this session returns {error, session_unloaded}. The underlying model memory is freed when the GC collects the session reference (which may happen slightly later). Calling unload/1 multiple times is safe.
tensor/3
-spec tensor(binary(), [integer()], dtype()) -> tensor().
Constructs a tensor from raw parts. Pure Erlang — no NIF call. Validates that Data is a binary and Shape is a list.
to_list/1
-spec to_list(tensor()) -> [number()].Decodes a packed binary tensor to a list of Erlang numbers. Intended for debugging and lightweight post-processing. For hot paths, work with the raw binary directly.
Raises error({dynamic_shape, Shape}) if the shape contains -1 (dynamic dimension).
Types
-type dtype() :: f32 | f64 | i32 | i64 | u8.
%% A tensor is a packed little-endian binary with its shape and element type.
%% This is the same binary convention used by sied and kvex.
-type tensor() :: {Data :: binary(), Shape :: [integer()], DType :: dtype()}.
%% input_spec and output_spec describe the model's declared I/O contract.
-type input_spec() :: {Name :: binary(), Shape :: [integer()], DType :: dtype()}.
-type output_spec() :: {Name :: binary(), Shape :: [integer()], DType :: dtype()}.
%% A loaded, compiled model session.
-type session() :: #{
ref := reference(), %% NIF resource handle (ResourceArc)
inputs := [input_spec()], %% model's declared inputs, in order
outputs := [output_spec()] %% model's declared outputs, in order
}.Tensor binary format
Tensors use the same packed little-endian format as sied:
| dtype | bytes per element | Erlang binary pattern |
|---|---|---|
| f32 | 4 | <<V:32/float-little>> |
| f64 | 8 | <<V:64/float-little>> |
| i32 | 4 | <<V:32/signed-little>> |
| i64 | 8 | <<V:64/signed-little>> |
| u8 | 1 | <<V:8>> |
Shape dimensions may be -1 in output specs to indicate dynamic (batch) dimensions that vary per inference call.
Error reference
| Error | Cause |
|---|---|
{error, bad_file} | File not found or inaccessible |
{error, {load_failed, Reason}} | Invalid ONNX file, unsupported operators, or model compilation failure |
{error, {run_failed, Reason}} | Shape mismatch, byte count mismatch, dtype error, or runtime failure |
{error, {input_not_found, Name}} | A required input name is missing from the inputs map |
{error, session_unloaded} |
Session was explicitly unloaded via unload/1 |
How it works
Session compilation
onyx:load("model.onnx")
→ NIF [DirtyIO]
→ tract_onnx::onnx().model_for_path(path) % parse ONNX protobuf
→ into_optimized() % constant folding, op fusion
→ into_runnable() % compile execution plan
→ ResourceArc<OnyxSession> % BEAM-managed lifetime
→ {ok, #{ref, inputs, outputs}}into_optimized() and into_runnable() are the expensive steps (10ms–1s depending on model size). The result is a compiled execution plan held in a ResourceArc — a reference-counted Rust object managed by the BEAM GC. When no Erlang terms reference the session, the GC automatically frees the compiled model.
Inference
onyx:run(Session, Inputs)
→ NIF [DirtyCPU]
→ check valid flag (AtomicBool)
→ for each model input (in declaration order):
decode {Binary, Shape, DType} → tract Tensor (validate byte count first)
→ SimplePlan::run(inputs) % execute compiled plan
→ for each output tensor:
encode tract Tensor → {Binary, Shape, DType}
→ {ok, #{name => tensor()}}
Input tensors are decoded directly from the Erlang binary payload — no extra heap allocation for the data bytes. The ResourceArc keeps the session alive for the duration of run/2, even if a concurrent unload/1 fires mid-inference.
Scheduler assignment
| Function | Scheduler | Why |
|---|---|---|
load/1 | DirtyIO | Disk read + model compilation: 10ms–1s |
run/2 | DirtyCPU | Matrix computation: 1ms–100ms |
unload/1 | Normal | Atomic flag flip: nanoseconds |
Usage with kvex — semantic search pipeline
%% Index a corpus of documents
index_documents(Docs) ->
{ok, Model} = onyx:load("all-MiniLM-L6-v2.onnx"),
{ok, Index} = kvex:new(384),
lists:foreach(fun({DocId, Text}) ->
{IdsBin, MaskBin} = tokenize(Text, 32),
{ok, #{<<"sentence_embedding">> := {Emb, _, _}}} =
onyx:run(Model, #{
<<"input_ids">> => {IdsBin, [1, 32], i32},
<<"attention_mask">> => {MaskBin, [1, 32], i32}
}),
ok = kvex:add(Index, DocId, Emb)
end, Docs),
{ok, Model, Index}.
%% Query the index
search(Model, Index, QueryText, K) ->
{IdsBin, MaskBin} = tokenize(QueryText, 32),
{ok, #{<<"sentence_embedding">> := {QueryEmb, _, _}}} =
onyx:run(Model, #{
<<"input_ids">> => {IdsBin, [1, 32], i32},
<<"attention_mask">> => {MaskBin, [1, 32], i32}
}),
kvex:search(Index, QueryEmb, K).Building from source
Requires a Rust stable toolchain (1.70+).
git clone https://github.com/roquess/onyx
cd onyx
make build # compiles native/onyx/ and writes priv/onyx.dll
rebar3 ct # run test suite
The Makefile uses --manifest-path so it runs correctly from any working directory.
Supported ONNX operators
onyx relies on tract's supported operator set. tract 0.21 covers the operators needed by most embedding and classification models, including:
- All arithmetic and activation ops (Add, Mul, Relu, Sigmoid, Tanh, GELU, Softmax, ...)
- Matrix multiplication (MatMul, Gemm)
- Normalisation (LayerNorm, BatchNorm)
- Attention mechanisms (used in transformer models)
- Convolution, pooling
- Reshape, Transpose, Concat, Slice
Models from Hugging Face (exported with optimum or transformers) and ONNX Model Zoo generally work out of the box. Exotic custom operators, some recurrent layers, and dynamic control flow may not be supported — load/1 will return {error, {load_failed, Reason}} in those cases.
Links
- Hex.pm: https://hex.pm/packages/onyx
- GitHub: https://github.com/roquess/onyx
- tract (Rust ONNX runtime): https://github.com/sonos/tract
- sied (SIMD NIFs): https://hex.pm/packages/sied
- kvex (ANN index): https://hex.pm/packages/kvex
License
Apache License 2.0 — see LICENSE.