Single Instruction Erlang Data (sied)
High-performance SIMD operations for Erlang through Rust NIFs.
Motivation
Erlang excels at building concurrent, fault-tolerant systems, but numerical computations on large datasets can be a bottleneck. Modern CPUs offer SIMD (Single Instruction, Multiple Data) instructions that can process multiple data elements simultaneously, providing significant performance improvements for vectorized operations.
sied bridges this gap by exposing SIMD-accelerated mathematical operations to Erlang through a Rust NIF. The name combines SIMD with Erlang and Data, representing the library's core purpose: bringing efficient data-parallel processing to the Erlang ecosystem.
Features
- Automatic SIMD optimization: Compiler leverages AVX2, AVX-512, NEON, and other instruction sets via simdeez
- Flat-binary search primitives:
hamming_topk_flat/4anddot_product_topk_flat/4operate on concatenated binary buffers — zero per-element Erlang overhead, designed for ANN search - Simple API: Consistent
{ok, Result} | {error, Reason}interface throughout - Cross-platform: Works across different CPU architectures with graceful scalar fallback
Installation
Add to your rebar.config:
{deps, [{sied, "0.2.0"}]}.Or from GitHub:
{deps, [
{sied, {git, "https://github.com/roquess/sied.git", {branch, "main"}}}
]}.Building
rebar3 compileRequirements:
- Erlang/OTP 24 or later
- Rust 1.70 or later (Cargo included)
The Rust toolchain must be available in your PATH. Visit rustup.rs to install Rust.
For maximum performance (full AVX2/AVX-512), build with:
RUSTFLAGS="-C target-cpu=native" rebar3 as prod compileAPI Reference
All functions return {ok, Result} on success or {error, Reason} on failure.
Basic Arithmetic
Element-wise operations on f32 or f64 vectors.
{ok, Result} = sied:add_f32([1.0, 2.0, 3.0], [4.0, 5.0, 6.0]).
%% Result = [5.0, 7.0, 9.0]
{ok, Result} = sied:multiply_f64([2.0, 3.0], [4.0, 5.0]).
%% Result = [8.0, 15.0]
Available: add_f32/2, add_f64/2, subtract_f32/2, subtract_f64/2,
multiply_f32/2, multiply_f64/2, divide_f32/2, divide_f64/2.
Dot Product & Sum
{ok, Dot} = sied:dot_product_f32([1.0, 2.0, 3.0], [4.0, 5.0, 6.0]).
%% Dot = 32.0
{ok, Sum} = sied:sum_f32([1.0, 2.0, 3.0, 4.0, 5.0]).
%% Sum = 15.0Statistics
{ok, Mean} = sied:mean_f32([1.0, 2.0, 3.0]).
{ok, Var} = sied:variance_f32([1.0, 2.0, 3.0]).
{ok, Std} = sied:std_dev_f32([1.0, 2.0, 3.0]).Also available in f64 variants.
Min / Max
{ok, Min} = sied:min_f32([3.0, 1.0, 2.0]). %% 1.0
{ok, Max} = sied:max_f32([3.0, 1.0, 2.0]). %% 3.0
{ok, R} = sied:min_elementwise_f32([1.0, 4.0], [2.0, 3.0]). %% [1.0, 3.0]Unary Operations
{ok, R} = sied:abs_f32([-1.0, 2.0, -3.0]). %% [1.0, 2.0, 3.0]
{ok, R} = sied:sqrt_f32([4.0, 9.0, 16.0]). %% [2.0, 3.0, 4.0]
{ok, R} = sied:negate_f32([1.0, -2.0]). %% [-1.0, 2.0]L2 Norm and Normalization
{ok, Norm} = sied:l2_norm_f32([3.0, 4.0]). %% 5.0
{ok, Unit} = sied:l2_normalize_f32([3.0, 4.0]). %% [0.6, 0.8]
{ok, Vecs} = sied:l2_normalize_batch_f32([[3.0, 4.0], [0.0, 2.0]]).Cosine Similarity
{ok, Sim} = sied:cosine_similarity_f32([1.0, 0.0], [0.0, 1.0]). %% 0.0
{ok, Sims} = sied:cosine_similarity_batch_f32(Query, [Vec1, Vec2, Vec3]).Batch Dot Product
One query against many vectors in a single NIF call.
{ok, Scores} = sied:dot_product_batch_f32(Query, [Vec1, Vec2, Vec3]).
%% Binary variant — avoids float-list marshalling when vectors are stored
%% as little-endian f32 binaries (e.g. in ETS):
{ok, Scores} = sied:dot_product_batch_f32_bin(QBin, [Bin1, Bin2, Bin3]).Binary Quantization & Flat-Buffer ANN Search (v0.2.0)
These primitives are designed for two-phase approximate nearest-neighbour search on large indexes where the entire vector set is stored as a single concatenated binary (flat buffer) in ETS.
to_binary_f32/1 and to_binary_f32_bin/1
1-bit quantize a vector: each dimension becomes 1 if above the mean, else 0. 128 dims → 16 bytes.
{ok, BinVec} = sied:to_binary_f32([0.1, 0.9, 0.4, 0.8]).
%% BinVec = <<0b0101:4, ...>> (packed bits)
%% Zero-copy variant when the vector is already a little-endian f32 binary:
{ok, BinVec} = sied:to_binary_f32_bin(F32Binary).hamming_topk_flat/4
SIMD POPCNT over a flat binary buffer. Returns the indices of the TopK closest vectors, sorted ascending by Hamming distance. O(N) + O(K log K).
%% BvecFlat = all binary-quantized vectors concatenated
%% VecLen = byte size of one quantized vector
{ok, Indices} = sied:hamming_topk_flat(QBinVec, BvecFlat, VecLen, TopK).
%% Indices = [3, 17, 42, ...] (0-based, sorted by distance)Uses a max-heap of size K — O(K) memory regardless of corpus size.
dot_product_topk_flat/4
SIMD dot-product scoring of a candidate set selected from a flat f32 buffer. Designed to follow hamming_topk_flat/4 in the two-phase search pipeline.
%% F32Flat = all f32 vectors concatenated
%% VecByteLen = dim * 4
%% Indices = output of hamming_topk_flat/4
{ok, Scored} = sied:dot_product_topk_flat(QF32Bin, F32Flat, VecByteLen, Indices).
%% Scored = [{Score, Idx}, ...] sorted by descending scoreUses zero-copy f32 reinterpretation — no heap allocation per candidate.
Two-phase ANN example
%% Phase 1 — fast Hamming filter
CandCount = K * 10,
{ok, Cands} = sied:hamming_topk_flat(QBin, BvecFlat, VecByteLen, CandCount),
%% Phase 2 — precise dot-product rerank
{ok, Scored} = sied:dot_product_topk_flat(QF32Bin, F32Flat, VecF32ByteLen, Cands),
TopK = lists:sublist(Scored, K).See kvex for a complete k-NN index built on these primitives.
Error Handling
Binary operations require vectors of equal length:
case sied:add_f32([1.0, 2.0], [3.0]) of
{ok, Result} -> io:format("Success: ~p~n", [Result]);
{error, length_mismatch} -> io:format("vectors must have equal length~n")
end.Testing
rebar3 eunitPerformance
The flat-buffer primitives (hamming_topk_flat, dot_product_topk_flat) are designed for high-throughput ANN search:
hamming_topk_flaton 10 000 × 128-dim vectors: ~10 μs (release, AVX2)dot_product_topk_flaton 100 candidates × 128-dim: ~5 μs (release, AVX2)- Combined two-phase search with kvex: 21 000+ queries/s at 10 000 vectors, dim=128, K=10
General characteristics by vector size:
- Small (< 100 elements): NIF call overhead dominates — use pure Erlang for tiny vectors
- Medium (100–10 000 elements): Good speedup over pure Erlang
- Large (> 10 000 elements): Maximum SIMD benefit
Project Structure
sied/
├── src/
│ ├── sied.app.src # OTP application metadata
│ └── sied.erl # Erlang API module
├── native/
│ └── sied/
│ ├── Cargo.toml # Rust dependencies
│ └── src/
│ └── lib.rs # Rust NIF implementation
├── test/
│ └── sied_tests.erl # EUnit test suite
├── rebar.config
└── README.mdLinks
- GitHub: https://github.com/roquess/sied
- Hex.pm: https://hex.pm/packages/sied
- kvex (ANN index using sied): https://hex.pm/packages/kvex
- Rustler: https://crates.io/crates/rustler
- Simdeez: https://crates.io/crates/simdeez
License
Apache 2.0 — see LICENSE.