hoopdb
A lightweight, bring-your-own-boards semantic search substrate for Erlang RAG systems
⚠️ Status: early research preview. hoopdb is at the inception stage. Core architecture decisions are still being settled by measurement (see Status & roadmap), and the API will change without notice before 1.0. This is a placeholder release — not yet production-ready.
The pitch
If you want the full barrel, use Benoit Chesneau's barrel-db — a complete, clustered vector database with RocksDB persistence, embedders, a gateway, and a Raft cluster story.
If bring-your-own-boards fits your needs, we've got the hoops.
The metaphor is load-bearing. A barrel is staves (boards) bound by hoops around a chosen volume. A full vector database ships the assembled cask: its own storage, its own embedding boundary, its own clustering. hoopdb ships the hoops — the retrieval algorithms and the seams between them — and lets you supply the boards: your persistence (DETS/Mnesia), your embeddings (offline or sidecar), your process architecture.
What it aims to be
A small, honest retrieval core for building graph/RAG systems in Erlang, with three retrievers that share one result shape so they fuse cleanly:
- BM25 — pure-Erlang lexical search. No ML dependency, no embedding runtime.
- Vector search — semantic recall over embeddings (HNSW and/or exact brute-force k-NN).
- Hybrid — reciprocal-rank or linear fusion of the two.
Around that core, hoopdb owns the parts a vector database usually hides: structure-aware Markdown chunking, a fusion layer, and a thin persistence seam that treats the index as one opaque, rebuildable blob.
Hard constraints (the spine of the project)
- Pure Erlang at runtime, optionally accelerated by a prebuilt plain-C SIMD NIF. The accelerator is optional; the pure-Erlang path always works.
- No Rust, Elixir, or Python at the user's build or runtime. Embedding a corpus is treated as an offline, build-time step (like running a compiler) — the tool never ships. For live text queries, BM25 needs no embedding runtime at all.
- A C compiler at build time is fine; a prebuilt
.so/.dllis preferred.
Who it's for
Erlang/OTP developers building retrieval or RAG over small, curated corpora — think a handful of books, manuals, internal docs, or a knowledge base (thousands of chunks, not millions). If your corpus is small enough that you'd rather own a few hundred lines of transparent Erlang than operate a separate vector-database service, hoopdb is aimed at you.
It is not trying to be a clustered, planet-scale vector store. For that, use the full barrel.
Status & roadmap
hoopdb is being built measurement-first. Two questions are open and under active investigation, and the answers will shape the defaults:
- Vector path: at this scale, does an approximate index (HNSW) earn its complexity over exact brute-force k-NN, or is brute-force the better default? HNSW is treated as a hypothesis to validate, not a settled choice.
- Retrieval quality: how should textbook-style Markdown be chunked, and is the vector path even necessary as the primary retriever — or is BM25 (plus hybrid) enough for technical corpora?
Deferred for later milestones: a knowledge-graph layer over chunks, quantization tuning, and any clustering/serving concerns (use the full barrel if you need those).
Built on barrel_vectordb
hoopdb will likely builds upon portions of
barrel_vectordb (Apache-2.0),
reusing its well-engineered, storage-agnostic pure-Erlang HNSW, distance, and BM25
modules. Attribution is a feature, not a footnote. hoopdb is a deliberately
narrowed assembly of those modules, re-bound around different boards — not a fork
that hides its origins.
License
Apache-2.0. See LICENSE.md.