Arcana 🔮📚

An embeddable RAG library for Elixir and Phoenix. Arcana lets you add vector search, knowledge graphs, and LLM-driven retrieval to any app that already has an Ecto repo, without standing up a separate vector database, indexing service, or orchestration layer.

[!TIP] See arcana-adept for a complete Phoenix app with the Doctor Who corpus pre-ingested and ready to query.

Why this exists

Most RAG libraries are written in Python and assume you'll bolt them onto your stack via HTTP. That works, but it leaves you running a vector DB you don't otherwise need, juggling two languages, and gluing telemetry together across processes. The BEAM is particularly well-suited to RAG: pgvector is excellent, supervision trees are the right shape for long-running embedders and rerankers, telemetry is built into the platform, and your Phoenix app already has the Repo, the LiveView for the dashboard, and the user session for chat.

Arcana takes that observation seriously. Everything lives inside your app:

One Repo. Documents, chunks, embeddings, and the knowledge graph are tables in your existing Postgres database. No new infrastructure.
Local-first by default. Embeddings run on Bumblebee with EXLA, EMLX (Apple Silicon), or Torchx. The cross-encoder reranker is also local. You can swap to OpenAI/Cohere/whatever, but you don't have to.
One process model. Embedders and rerankers are Nx.Serving instances under your supervision tree. Telemetry events are :telemetry spans you can already consume. There is no separate "RAG service" to operate.
Pluggable, but not abstract. Every step that can be replaced is a behaviour with a single callback and a sensible default. Custom rerankers, custom searchers, custom answerers — they're all 10-line modules.

Three modes of operation

Singh et al.'s 2025 Agentic RAG survey splits RAG systems into four progressively more flexible patterns. The key axis is who decides the control flow:

Pattern	Flow decided by	What it looks like
Naive RAG	nobody, there is none	embed → retrieve → generate, one shot
Advanced RAG	author, at code time	naive + query rewriting, reranking, fusion
Modular RAG	author, at code time	composable pluggable steps you wire together
Agentic RAG	the LLM, at runtime	LLM picks tools each turn until it can answer

Arcana ships three usage shapes that map onto the last three slots:

Arcana surface	Singh slot	When to reach for it
`Arcana.search/2`, `Arcana.ask/2`	Advanced RAG	The default door. One call, sensible defaults: query rewriting, hybrid search, optional graph fusion, cross-encoder reranking. Use this unless you need more control.
`Arcana.Pipeline.*`	Modular RAG	Compose your own steps when you need control over order or behavior: `gate → rewrite → expand → decompose → search → reason → rerank → answer → ground`. Each step is a behaviour you can replace.
`Arcana.Loop.*`	Agentic RAG	Hand the wheel to the LLM. It picks tools (`search`, `answer`, `give_up`) each turn. Best for open-ended or multi-hop questions where the right sequence of searches isn't obvious upfront.

Arcana intentionally does not ship a "Naive RAG" mode. Even the simplest entry point already does query rewriting, reranking, and graph fusion when available.

How it feels

The shortest useful program:

{:ok, _doc} = Arcana.ingest("Phoenix LiveView is a server-rendered UI library...", repo: MyApp.Repo)

{:ok, answer} = Arcana.ask("What is Phoenix LiveView?", repo: MyApp.Repo, llm: "openai:gpt-4o-mini")

When that's not enough, drop down to the Pipeline:

alias Arcana.Pipeline

ctx =
  Pipeline.new("Compare Elixir and Erlang for building web services",
    repo: MyApp.Repo,
    llm: llm
  )
  |> Pipeline.rewrite()                                  # clean up conversational input
  |> Pipeline.select(collections: ["elixir", "erlang"])  # let the LLM pick collections
  |> Pipeline.decompose()                                # split into sub-questions
  |> Pipeline.search()                                   # search each one
  |> Pipeline.rerank()                                   # cross-encoder rerank
  |> Pipeline.answer()
  |> Pipeline.ground()                                   # NLI hallucination check

ctx.answer
ctx.grounding.score

When the right sequence of searches isn't knowable upfront, hand control to the LLM:

{:ok, ctx} =
  Arcana.Loop.new("Find episodes where a Time Lord betrayed the Doctor",
    repo: MyApp.Repo,
    collection: "doctor-who"
  )
  |> Arcana.Loop.run(controller_llm: "openai:gpt-4o-mini")

ctx = Arcana.Loop.ground(ctx)  # optional: faithfulness scoring

ctx.answer
ctx.tool_history         # which tools the LLM picked, in order
ctx.terminated_by        # :answered, :gave_up, :max_iterations, or :error
ctx.grounding.score      # 0.0-1.0 if you called ground/2

When the corpus is split into multiple collections, pass them all to new/2 and the controller picks which one to search per call. Pass a single collection to lock the loop to it; the controller physically can't express anything else (the tool schema won't even include the parameter). See the Loop guide for the lock vs pick semantics.

Loop also supports the standard router/answerer split: a cheap fast model picks tools each turn, and a stronger model writes the user-facing answer.

Arcana.Loop.run(ctx,
  controller_llm: "zai:glm-4.5-flash",  # cheap, fast: picks tools
  answer_llm:     "zai:glm-4.6"         # stronger: writes the final answer
)

The controller drives the loop iterations, and when it commits via the answer tool, the answerer takes over and produces the final user-visible text from the same accumulated context. The same answerer is also used by the synthesis fallback when the loop runs out of budget without committing.

Each surface is a thin layer over the same primitives: chunkers, embedders, vector stores, graph stores, rerankers. You can mix and match — Arcana.search/2 and Arcana.Loop both call into Arcana.Searcher, so a custom searcher you write for one is a custom searcher for all of them.

Architecture

┌─────────────────────────────────────────────────────────────────────────────────┐
│                                Your Phoenix App                                 │
├──────────────────┬──────────────────┬──────────────────┬────────────────────────┤
│  Arcana.search/2 │   Arcana.ask/2   │  Arcana.Loop.*   │    Arcana.Pipeline.*   │
├──────────────────┴──────────────────┴──────────────────┴────────────────────────┤
│                                                                                 │
│  ┌──────────┐  ┌────────────┐  ┌──────────┐  ┌──────────┐  ┌────────────────┐   │
│  │ Chunker  │  │  Embedder  │  │  Search  │  │ Reranker │  │   Grounding    │   │
│  └──────────┘  └────────────┘  └──────────┘  └──────────┘  └────────────────┘   │
│                                                                                 │
│  ┌───────────────────────────────────────────────────────────────────────────┐  │
│  │                              Knowledge Graph                              │  │
│  │  entity extraction → relationship linking → community detection (Leiden)  │  │
│  └───────────────────────────────────────────────────────────────────────────┘  │
│                                                                                 │
├─────────────────────────────────────────────────────────────────────────────────┤
│                              Your Existing Ecto Repo                            │
│                          PostgreSQL + pgvector extension                        │
└─────────────────────────────────────────────────────────────────────────────────┘

Everything between the top row (the four user-facing surfaces) and the Repo is pluggable. Default implementations cover the common case; behaviours let you swap any single piece without touching the rest.

Installation

With Igniter:

mix igniter.install arcana
mix ecto.migrate

This adds the dependency, creates migrations, configures your Repo, and mounts the dashboard route.

For manual installation, supervision setup, embedder configuration, and the rest of the moving parts, see the Getting Started guide.

Documentation

The README is the brochure. The guides are the manual.

Getting Started — installation, supervision tree, embedder and chunker setup, first ingestion, first query
LLM Integration — configuring providers (OpenAI, Anthropic, Z.ai, custom), passing per-call LLMs, model strings vs functions
Pipeline (Modular RAG) — Arcana.Pipeline, every step in detail, custom behaviours, telemetry per step
Loop (Agentic RAG) — Arcana.Loop, the default toolset, controller models, the system prompt, fallback synthesis
Search Algorithms — semantic, fulltext, hybrid, RRF fusion, hybrid weights
Reranking — cross-encoder, ColBERT, LLM-based rerankers, when each is appropriate
GraphRAG — entity extraction, community detection (Leiden), graph search, fusion with vector
Evaluation — synthetic test sets, MRR / Recall / Hit metrics, evaluation runs
Telemetry — every event Arcana emits, attaching handlers, hooking into Phoenix LiveDashboard
Dashboard — the LiveView UI, mounting it, what it shows

References

Arcana's design borrows heavily from published work. The implementation choices map back to specific papers wherever possible:

RAG architecture and the agentic taxonomy

Agentic Retrieval-Augmented Generation: A Survey (Singh et al., 2025) — the four-level taxonomy (Naive / Advanced / Modular / Agentic) that Arcana's three surfaces map onto
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection (Asai et al., ICLR 2024) — inspires Pipeline.gate/2 and Pipeline.ground/2
Corrective Retrieval Augmented Generation (CRAG) (Yan et al., 2024) — inspires the self_correct: true modes on Pipeline.search/2 and Pipeline.answer/2

Retrieval

Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods (Cormack et al., SIGIR 2009) — RRF is the fusion algorithm Arcana uses to combine vector and full-text (and graph) results
Lost in the Middle: How Language Models Use Long Contexts (Liu et al., 2023) — informs context ordering and the chunk count we ship as the default
Precise Zero-Shot Dense Retrieval without Relevance Labels (HyDE) (Gao et al., 2022) — hypothetical document embeddings (planned, not yet shipped)

GraphRAG

From Local to Global: A Graph RAG Approach to Query-Focused Summarization (Microsoft, 2024) — the community-summary-based local search pattern Arcana implements
Graph Retrieval-Augmented Generation: A Survey (2024) — comprehensive survey
HopRAG: Multi-Hop Reasoning for Knowledge-Aware RAG (ACL 2025) — LLM-guided graph traversal

Reranking

Cross-encoder reranking via Bumblebee — cross-encoder/ms-marco-MiniLM-L-6-v2 is the default. Cross-encoders consistently improve top-k accuracy by 10-25% over bi-encoder retrieval alone, and that held up in our doctor-who eval (MRR +39%, Hit@1 +62%).

Agent prompting (informs `Arcana.Loop`)

Anthropic: Writing tools for agents — heavy detail in tool descriptions, tell the model when NOT to call tools
Anthropic: Effective context engineering — high-signal summaries from tools rather than raw chunk dumps
OpenAI: GPT-5 prompting guide — soft language instead of MUST/CRITICAL, tool budget in the prompt alongside hard caps

Evaluation and grounding

RAGAS: Automated Evaluation of Retrieval Augmented Generation (Shahul et al., 2023) — faithfulness, relevance, context metrics
LettuceDetect: A Hallucination Detector for RAG (2025) — token-level grounding, an alternative to the NLI scoring Pipeline.ground/2 does today

Roadmap

LiveView dashboard
Hybrid search (vector + full-text with RRF)
File ingestion (text, markdown, PDF)
Telemetry events for observability
In-memory vector store (HNSWLib backend)
Modular pipeline (Arcana.Pipeline) with pluggable behaviours for every step
Cross-encoder reranking (local, via Bumblebee)
GraphRAG (entity extraction, community summaries, fusion)
Agentic loop (Arcana.Loop) with native tool calling and fallback synthesis
E5 embedding model prefix support
HyDE (Hypothetical Document Embeddings)
Async ingestion via Oban
Additional vector backends (TurboPuffer, ChromaDB)

Development

docker compose up -d        # Postgres + pgvector
mix deps.get
MIX_ENV=test mix ecto.create && MIX_ENV=test mix ecto.migrate
mix test

License

Apache-2.0