onyx

Hex.pmHex DocsLicense

Erlang NIF library for ONNX model inference, powered by tract — a pure Rust ONNX runtime with no external dependencies.

Ecosystem

onyx is part of a pure-Erlang ML stack:

sied  0.2.4  — SIMD kernels: POPCNT, dot-product, L2-norm, 1-bit quantization
onyx  0.1.0  — ONNX inference: load any ONNX model, run it on the BEAM
kvex  0.2.1  — Approximate nearest-neighbour index with persistence

Typical pipeline: tokenize text externally → onyx generates embeddings → kvex performs ANN search.

Installation

%% rebar.config
{deps, [{onyx, "0.1.0"}]}.

No Rust toolchain required at compile time.

Quick start

%% Load and compile the model (runs on a DirtyIO scheduler, ~100ms–1s)
{ok, Model} = onyx:load("sentence-transformer.onnx"),

%% Inspect what inputs the model expects
#{inputs := Inputs, outputs := Outputs} = Model,
%% Inputs  = [{<<"input_ids">>, [1, 32], i32}, {<<"attention_mask">>, [1, 32], i32}]
%% Outputs = [{<<"sentence_embedding">>, [1, 384], f32}]

%% Build input tensors — little-endian packed binaries
IdsBin  = << <<Id:32/signed-little>>  || Id  <- TokenIds >>,
MaskBin = << <<M:32/signed-little>>   || M   <- AttentionMask >>,

%% Run inference (runs on a DirtyCPU scheduler, ~1ms–100ms)
{ok, #{<<"sentence_embedding">> := {EmbBin, [1, 384], f32}}} =
    onyx:run(Model, #{
        <<"input_ids">>      => {IdsBin,  [1, 32], i32},
        <<"attention_mask">> => {MaskBin, [1, 32], i32}
    }),

%% EmbBin is a 384×4 = 1536-byte little-endian float32 binary
%% Feed directly into kvex — no conversion needed
ok = kvex:add(Index, DocumentId, EmbBin).

API

load/1

-spec load(file:filename()) -> {ok, session()} | {error, term()}.

Loads an ONNX model from disk, runs tract's optimiser (constant folding, op fusion), and compiles an execution plan. Accepts both binary and charlist paths.

Runs on a DirtyIO scheduler — will not block normal BEAM schedulers.

run/2

-spec run(session(), #{binary() => tensor()}) ->
        {ok, #{binary() => tensor()}} | {error, term()}.

Executes one forward pass. Inputs is a map of input name to tensor. The map must contain exactly the inputs the model expects (as reported in session.inputs). Outputs is a map of output name to tensor.

Runs on a DirtyCPU scheduler — will not block normal BEAM schedulers.

unload/1

-spec unload(session()) -> ok.

Immediately marks the session invalid. Any subsequent run/2 on this session returns {error, session_unloaded}. The underlying model memory is freed when the GC collects the session reference (which may happen slightly later). Calling unload/1 multiple times is safe.

tensor/3

-spec tensor(binary(), [integer()], dtype()) -> tensor().

Constructs a tensor from raw parts. Pure Erlang — no NIF call. Validates that Data is a binary and Shape is a list.

to_list/1

-spec to_list(tensor()) -> [number()].

Decodes a packed binary tensor to a list of Erlang numbers. Intended for debugging and lightweight post-processing. For hot paths, work with the raw binary directly.

Raises error({dynamic_shape, Shape}) if the shape contains -1 (dynamic dimension).

Types

-type dtype() :: f32 | f64 | i32 | i64 | u8.

%% A tensor is a packed little-endian binary with its shape and element type.
%% This is the same binary convention used by sied and kvex.
-type tensor() :: {Data :: binary(), Shape :: [integer()], DType :: dtype()}.

%% input_spec and output_spec describe the model&#39;s declared I/O contract.
-type input_spec()  :: {Name :: binary(), Shape :: [integer()], DType :: dtype()}.
-type output_spec() :: {Name :: binary(), Shape :: [integer()], DType :: dtype()}.

%% A loaded, compiled model session.
-type session() :: #{
    ref     := reference(),        %% NIF resource handle (ResourceArc)
    inputs  := [input_spec()],     %% model&#39;s declared inputs, in order
    outputs := [output_spec()]     %% model&#39;s declared outputs, in order
}.

Tensor binary format

Tensors use the same packed little-endian format as sied:

dtype bytes per element Erlang binary pattern
f32 4 <<V:32/float-little>>
f64 8 <<V:64/float-little>>
i32 4 <<V:32/signed-little>>
i64 8 <<V:64/signed-little>>
u8 1 <<V:8>>

Shape dimensions may be -1 in output specs to indicate dynamic (batch) dimensions that vary per inference call.

Error reference

Error Cause
{error, bad_file} File not found or inaccessible
{error, {load_failed, Reason}} Invalid ONNX file, unsupported operators, or model compilation failure
{error, {run_failed, Reason}} Shape mismatch, byte count mismatch, dtype error, or runtime failure
{error, {input_not_found, Name}} A required input name is missing from the inputs map
{error, session_unloaded} Session was explicitly unloaded via unload/1

How it works

Session compilation

onyx:load("model.onnx")
  → NIF [DirtyIO]
  → tract_onnx::onnx().model_for_path(path)   % parse ONNX protobuf
  → into_optimized()                            % constant folding, op fusion
  → into_runnable()                             % compile execution plan
  → ResourceArc<OnyxSession>                   % BEAM-managed lifetime
  → {ok, #{ref, inputs, outputs}}

into_optimized() and into_runnable() are the expensive steps (10ms–1s depending on model size). The result is a compiled execution plan held in a ResourceArc — a reference-counted Rust object managed by the BEAM GC. When no Erlang terms reference the session, the GC automatically frees the compiled model.

Inference

onyx:run(Session, Inputs)
  → NIF [DirtyCPU]
  → check valid flag (AtomicBool)
  → for each model input (in declaration order):
      decode {Binary, Shape, DType} → tract Tensor (validate byte count first)
  → SimplePlan::run(inputs)                     % execute compiled plan
  → for each output tensor:
      encode tract Tensor → {Binary, Shape, DType}
  → {ok, #{name => tensor()}}

Input tensors are decoded directly from the Erlang binary payload — no extra heap allocation for the data bytes. The ResourceArc keeps the session alive for the duration of run/2, even if a concurrent unload/1 fires mid-inference.

Scheduler assignment

Function Scheduler Why
load/1 DirtyIO Disk read + model compilation: 10ms–1s
run/2 DirtyCPU Matrix computation: 1ms–100ms
unload/1 Normal Atomic flag flip: nanoseconds

Usage with kvex — semantic search pipeline

%% Index a corpus of documents
index_documents(Docs) ->
    {ok, Model} = onyx:load("all-MiniLM-L6-v2.onnx"),
    {ok, Index} = kvex:new(384),
    lists:foreach(fun({DocId, Text}) ->
        {IdsBin, MaskBin} = tokenize(Text, 32),
        {ok, #{<<"sentence_embedding">> := {Emb, _, _}}} =
            onyx:run(Model, #{
                <<"input_ids">>      => {IdsBin,  [1, 32], i32},
                <<"attention_mask">> => {MaskBin, [1, 32], i32}
            }),
        ok = kvex:add(Index, DocId, Emb)
    end, Docs),
    {ok, Model, Index}.

%% Query the index
search(Model, Index, QueryText, K) ->
    {IdsBin, MaskBin} = tokenize(QueryText, 32),
    {ok, #{<<"sentence_embedding">> := {QueryEmb, _, _}}} =
        onyx:run(Model, #{
            <<"input_ids">>      => {IdsBin,  [1, 32], i32},
            <<"attention_mask">> => {MaskBin, [1, 32], i32}
        }),
    kvex:search(Index, QueryEmb, K).

Building from source

Requires a Rust stable toolchain (1.70+).

git clone https://github.com/roquess/onyx
cd onyx
make build      # compiles native/onyx/ and writes priv/onyx.dll
rebar3 ct       # run test suite

The Makefile uses --manifest-path so it runs correctly from any working directory.

Supported ONNX operators

onyx relies on tract's supported operator set. tract 0.21 covers the operators needed by most embedding and classification models, including:

Models from Hugging Face (exported with optimum or transformers) and ONNX Model Zoo generally work out of the box. Exotic custom operators, some recurrent layers, and dynamic control flow may not be supported — load/1 will return {error, {load_failed, Reason}} in those cases.

Links

License

Apache License 2.0 — see LICENSE.