hoopdb

A lightweight, bring-your-own-boards semantic search substrate for Erlang RAG systems

⚠️ Status: early research preview. hoopdb is at the inception stage. Core architecture decisions are still being settled by measurement (see Status & roadmap), and the API will change without notice before 1.0. This is a placeholder release — not yet production-ready.

The pitch

If you want the full barrel, use Benoit Chesneau's barrel-db — a complete, clustered vector database with RocksDB persistence, embedders, a gateway, and a Raft cluster story.

If bring-your-own-boards fits your needs, we've got the hoops.

The metaphor is load-bearing. A barrel is staves (boards) bound by hoops around a chosen volume. A full vector database ships the assembled cask: its own storage, its own embedding boundary, its own clustering. hoopdb ships the hoops — the retrieval algorithms and the seams between them — and lets you supply the boards: your persistence (DETS/Mnesia), your embeddings (offline or sidecar), your process architecture.

What it aims to be

A small, honest retrieval core for building graph/RAG systems in Erlang, with three retrievers that share one result shape so they fuse cleanly:

BM25 — pure-Erlang lexical search. No ML dependency, no embedding runtime.
Vector search — semantic recall over embeddings (HNSW and/or exact brute-force k-NN).
Hybrid — reciprocal-rank or linear fusion of the two.

Around that core, hoopdb owns the parts a vector database usually hides: structure-aware Markdown chunking, a fusion layer, and a thin persistence seam that treats the index as one opaque, rebuildable blob.

Hard constraints (the spine of the project)

Pure Erlang at runtime, optionally accelerated by a prebuilt plain-C SIMD NIF. The accelerator is optional; the pure-Erlang path always works.
No Rust, Elixir, or Python at the user's build or runtime. Embedding a corpus is treated as an offline, build-time step (like running a compiler) — the tool never ships. For live text queries, BM25 needs no embedding runtime at all.
A C compiler at build time is fine; a prebuilt .so/.dll is preferred.

Who it's for

Erlang/OTP developers building retrieval or RAG over small, curated corpora — think a handful of books, manuals, internal docs, or a knowledge base (thousands of chunks, not millions). If your corpus is small enough that you'd rather own a few hundred lines of transparent Erlang than operate a separate vector-database service, hoopdb is aimed at you.

It is not trying to be a clustered, planet-scale vector store. For that, use the full barrel.

Status & roadmap

hoopdb is being built measurement-first. Two questions are open and under active investigation, and the answers will shape the defaults:

Vector path: at this scale, does an approximate index (HNSW) earn its complexity over exact brute-force k-NN, or is brute-force the better default? HNSW is treated as a hypothesis to validate, not a settled choice.
Retrieval quality: how should textbook-style Markdown be chunked, and is the vector path even necessary as the primary retriever — or is BM25 (plus hybrid) enough for technical corpora?

Deferred for later milestones: a knowledge-graph layer over chunks, quantization tuning, and any clustering/serving concerns (use the full barrel if you need those).

Built on `barrel_vectordb`

hoopdb will likely builds upon portions of barrel_vectordb (Apache-2.0), reusing its well-engineered, storage-agnostic pure-Erlang HNSW, distance, and BM25 modules. Attribution is a feature, not a footnote. hoopdb is a deliberately narrowed assembly of those modules, re-bound around different boards — not a fork that hides its origins.

License

Apache-2.0. See LICENSE.md.