hoopdb

Project Logo

A lightweight, bring-your-own-boards semantic search substrate for Erlang RAG systems

⚠️ Status: early research preview. hoopdb is at the inception stage. Core architecture decisions are still being settled by measurement (see Status & roadmap), and the API will change without notice before 1.0. This is a placeholder release — not yet production-ready.


The pitch

If you want the full barrel, use Benoit Chesneau's barrel-db — a complete, clustered vector database with RocksDB persistence, embedders, a gateway, and a Raft cluster story.

If bring-your-own-boards fits your needs, we've got the hoops.

The metaphor is load-bearing. A barrel is staves (boards) bound by hoops around a chosen volume. A full vector database ships the assembled cask: its own storage, its own embedding boundary, its own clustering. hoopdb ships the hoops — the retrieval algorithms and the seams between them — and lets you supply the boards: your persistence (DETS/Mnesia), your embeddings (offline or sidecar), your process architecture.

What it aims to be

A small, honest retrieval core for building graph/RAG systems in Erlang, with three retrievers that share one result shape so they fuse cleanly:

Around that core, hoopdb owns the parts a vector database usually hides: structure-aware Markdown chunking, a fusion layer, and a thin persistence seam that treats the index as one opaque, rebuildable blob.

Hard constraints (the spine of the project)

Who it's for

Erlang/OTP developers building retrieval or RAG over small, curated corpora — think a handful of books, manuals, internal docs, or a knowledge base (thousands of chunks, not millions). If your corpus is small enough that you'd rather own a few hundred lines of transparent Erlang than operate a separate vector-database service, hoopdb is aimed at you.

It is not trying to be a clustered, planet-scale vector store. For that, use the full barrel.

Status & roadmap

hoopdb is being built measurement-first. Two questions are open and under active investigation, and the answers will shape the defaults:

  1. Vector path: at this scale, does an approximate index (HNSW) earn its complexity over exact brute-force k-NN, or is brute-force the better default? HNSW is treated as a hypothesis to validate, not a settled choice.
  2. Retrieval quality: how should textbook-style Markdown be chunked, and is the vector path even necessary as the primary retriever — or is BM25 (plus hybrid) enough for technical corpora?

Deferred for later milestones: a knowledge-graph layer over chunks, quantization tuning, and any clustering/serving concerns (use the full barrel if you need those).

Built on barrel_vectordb

hoopdb will likely builds upon portions of barrel_vectordb (Apache-2.0), reusing its well-engineered, storage-agnostic pure-Erlang HNSW, distance, and BM25 modules. Attribution is a feature, not a footnote. hoopdb is a deliberately narrowed assembly of those modules, re-bound around different boards — not a fork that hides its origins.

License

Apache-2.0. See LICENSE.md.