Arcana 🔮📚

Run in Livebook

An embeddable RAG library for Elixir and Phoenix. Arcana lets you add vector search, knowledge graphs, and LLM-driven retrieval to any app that already has an Ecto repo, without standing up a separate vector database, indexing service, or orchestration layer.

[!TIP] See arcana-adept for a complete Phoenix app with the Doctor Who corpus pre-ingested and ready to query.

Why this exists

Most RAG libraries are written in Python and assume you'll bolt them onto your stack via HTTP. That works, but it leaves you running a vector DB you don't otherwise need, juggling two languages, and gluing telemetry together across processes. The BEAM is particularly well-suited to RAG: pgvector is excellent, supervision trees are the right shape for long-running embedders and rerankers, telemetry is built into the platform, and your Phoenix app already has the Repo, the LiveView for the dashboard, and the user session for chat.

Arcana takes that observation seriously. Everything lives inside your app:

Three modes of operation

Singh et al.'s 2025 Agentic RAG survey splits RAG systems into four progressively more flexible patterns. The key axis is who decides the control flow:

Pattern Flow decided by What it looks like
Naive RAG nobody, there is none embed → retrieve → generate, one shot
Advanced RAG author, at code time naive + query rewriting, reranking, fusion
Modular RAG author, at code time composable pluggable steps you wire together
Agentic RAG the LLM, at runtime LLM picks tools each turn until it can answer

Arcana ships three usage shapes that map onto the last three slots:

Arcana surface Singh slot When to reach for it
Arcana.search/2, Arcana.ask/2 Advanced RAG The default door. One call, sensible defaults: query rewriting, hybrid search, optional graph fusion, cross-encoder reranking. Use this unless you need more control.
Arcana.Pipeline.* Modular RAG Compose your own steps when you need control over order or behavior: gate → rewrite → expand → decompose → search → reason → rerank → answer → ground. Each step is a behaviour you can replace.
Arcana.Loop.* Agentic RAG Hand the wheel to the LLM. It picks tools (search, answer, give_up) each turn. Best for open-ended or multi-hop questions where the right sequence of searches isn't obvious upfront.

Arcana intentionally does not ship a "Naive RAG" mode. Even the simplest entry point already does query rewriting, reranking, and graph fusion when available.

How it feels

The shortest useful program:

{:ok, _doc} = Arcana.ingest("Phoenix LiveView is a server-rendered UI library...", repo: MyApp.Repo)

{:ok, answer} = Arcana.ask("What is Phoenix LiveView?", repo: MyApp.Repo, llm: "openai:gpt-4o-mini")

When that's not enough, drop down to the Pipeline:

alias Arcana.Pipeline

ctx =
  Pipeline.new("Compare Elixir and Erlang for building web services",
    repo: MyApp.Repo,
    llm: llm
  )
  |> Pipeline.rewrite()                                  # clean up conversational input
  |> Pipeline.select(collections: ["elixir", "erlang"])  # let the LLM pick collections
  |> Pipeline.decompose()                                # split into sub-questions
  |> Pipeline.search()                                   # search each one
  |> Pipeline.rerank()                                   # cross-encoder rerank
  |> Pipeline.answer()
  |> Pipeline.ground()                                   # NLI hallucination check

ctx.answer
ctx.grounding.score

When the right sequence of searches isn't knowable upfront, hand control to the LLM:

{:ok, ctx} =
  Arcana.Loop.new("Find episodes where a Time Lord betrayed the Doctor",
    repo: MyApp.Repo,
    collection: "doctor-who"
  )
  |> Arcana.Loop.run(controller_llm: "openai:gpt-4o-mini")

ctx = Arcana.Loop.ground(ctx)  # optional: faithfulness scoring

ctx.answer
ctx.tool_history         # which tools the LLM picked, in order
ctx.terminated_by        # :answered, :gave_up, :max_iterations, or :error
ctx.grounding.score      # 0.0-1.0 if you called ground/2

When the corpus is split into multiple collections, pass them all to new/2 and the controller picks which one to search per call. Pass a single collection to lock the loop to it; the controller physically can't express anything else (the tool schema won't even include the parameter). See the Loop guide for the lock vs pick semantics.

Loop also supports the standard router/answerer split: a cheap fast model picks tools each turn, and a stronger model writes the user-facing answer.

Arcana.Loop.run(ctx,
  controller_llm: "zai:glm-4.5-flash",  # cheap, fast: picks tools
  answer_llm:     "zai:glm-4.6"         # stronger: writes the final answer
)

The controller drives the loop iterations, and when it commits via the answer tool, the answerer takes over and produces the final user-visible text from the same accumulated context. The same answerer is also used by the synthesis fallback when the loop runs out of budget without committing.

Each surface is a thin layer over the same primitives: chunkers, embedders, vector stores, graph stores, rerankers. You can mix and match — Arcana.search/2 and Arcana.Loop both call into Arcana.Searcher, so a custom searcher you write for one is a custom searcher for all of them.

Architecture

┌─────────────────────────────────────────────────────────────────────────────────┐
│                                Your Phoenix App                                 │
├──────────────────┬──────────────────┬──────────────────┬────────────────────────┤
│  Arcana.search/2 │   Arcana.ask/2   │  Arcana.Loop.*   │    Arcana.Pipeline.*   │
├──────────────────┴──────────────────┴──────────────────┴────────────────────────┤
│                                                                                 │
│  ┌──────────┐  ┌────────────┐  ┌──────────┐  ┌──────────┐  ┌────────────────┐   │
│  │ Chunker  │  │  Embedder  │  │  Search  │  │ Reranker │  │   Grounding    │   │
│  └──────────┘  └────────────┘  └──────────┘  └──────────┘  └────────────────┘   │
│                                                                                 │
│  ┌───────────────────────────────────────────────────────────────────────────┐  │
│  │                              Knowledge Graph                              │  │
│  │  entity extraction → relationship linking → community detection (Leiden)  │  │
│  └───────────────────────────────────────────────────────────────────────────┘  │
│                                                                                 │
├─────────────────────────────────────────────────────────────────────────────────┤
│                              Your Existing Ecto Repo                            │
│                          PostgreSQL + pgvector extension                        │
└─────────────────────────────────────────────────────────────────────────────────┘

Everything between the top row (the four user-facing surfaces) and the Repo is pluggable. Default implementations cover the common case; behaviours let you swap any single piece without touching the rest.

Installation

With Igniter:

mix igniter.install arcana
mix ecto.migrate

This adds the dependency, creates migrations, configures your Repo, and mounts the dashboard route.

For manual installation, supervision setup, embedder configuration, and the rest of the moving parts, see the Getting Started guide.

Documentation

The README is the brochure. The guides are the manual.

References

Arcana's design borrows heavily from published work. The implementation choices map back to specific papers wherever possible:

RAG architecture and the agentic taxonomy

Retrieval

GraphRAG

Reranking

Agent prompting (informs Arcana.Loop)

Evaluation and grounding

Roadmap

Development

docker compose up -d        # Postgres + pgvector
mix deps.get
MIX_ENV=test mix ecto.create && MIX_ENV=test mix ecto.migrate
mix test

License

Apache-2.0