LlmCore

Provider-agnostic LLM orchestration for Elixir. Route to any model, run agentic loops, extract structured output, and connect to Hindsight semantic memory — all through composable ALF pipelines with hot-reload TOML configuration.

LlmCore is the shared LLM substrate that powers the Fosferon ecosystem. It handles the messy parts of working with LLMs — provider routing, CLI wrapping, structured extraction, tool-calling loops, and Hindsight semantic memory integration — so your application code stays clean.

Why LlmCore?

One API, every provider. Cloud APIs (Anthropic, OpenAI, Z.ai), local inference (Ollama, DGX Spark), and CLI tools (Claude Code, Gemini CLI, Codex, Droid, Kimi) all share the same Provider behaviour. Route by task type, fall back gracefully, add new providers without writing Elixir.
Config-driven CLI providers. Adding a new CLI tool is a TOML block — no Elixir code needed. Declare the binary, flags, prompt transport, system prompt strategy, and output normalization. LlmCore handles the rest.
In-process agentic loops.LlmCore.Agent.Loop runs tool-calling iterations inside the BEAM VM — no subprocess, no CLI overhead. Built-in circuit breaking detects stuck loops. Uses any API provider that supports tool use.
Hot-reload TOML configuration. Change providers, routing rules, and memory settings without restarting. File watcher with debouncing keeps the runtime store (ETS) in sync with disk.
Structured output without the weight. JSON-mode extraction and schema validation built in. No Instructor dependency. Custom validators via functions.
Hindsight semantic memory client. Resilient integration with caching, circuit breaker, retry with backoff, and write buffering. Store once, recall by meaning.
Observable by default. Every operation emits :telemetry events. Pipeline spans, provider dispatch, router decisions, memory operations — all instrumented.

Installation

Add llm_core to your dependencies in mix.exs:

def deps do
  [
    {:llm_core, "~> 0.3"}
  ]
end

Then fetch dependencies:

mix deps.get

Quick Start

Send a prompt through the router

# Routes automatically based on [routing.tasks] config
{:ok, response} = LlmCore.send("Explain pattern matching in Elixir", :reasoning)
IO.puts(response.content)

Stream a response

{:ok, stream} = LlmCore.stream("Write a GenServer example", :coding)
Enum.each(stream, fn chunk -> IO.write(chunk) end)

Extract structured output

schema = %{
  type: "object",
  properties: %{
    name: %{type: "string"},
    confidence: %{type: "number"}
  },
  required: ["name"]
}

{:ok, response} = LlmCore.send("Analyze this code", :reasoning,
  response_format: {:json_schema, schema}
)

response.structured
#=> %{"name" => "authenticate/2", "confidence" => 0.92}

Run an agentic tool-calling loop

alias LlmCore.Agent.Loop

tools = MyApp.Tools.available()
resolve = &MyApp.Tools.resolve/1

llm_send = fn messages, opts ->
  LlmCore.LLM.Provider.dispatch(LlmCore.LLM.Anthropic, messages, opts)
end

{:ok, response, messages} =
  Loop.run(
    [%{role: :user, content: "Research Elixir ALF"}],
    llm_send,
    tools: tools,
    resolve_tool: resolve,
    max_iterations: 10
  )

Semantic memory (via Hindsight)

LlmCore ships a resilient client for Hindsight, a standalone semantic memory server. The client handles caching, circuit breaking, retry with backoff, and write buffering so your application code doesn't have to.

# Store a fact (async, buffered)
:ok = LlmCore.retain("Schema-per-tenant isolation pattern", %{context: "architecture"})

# Recall by meaning
{:ok, results} = LlmCore.recall("how does multi-tenancy work?", bank_id: "my-bank")

# Synthesize an insight
{:ok, insight} = LlmCore.reflect("What patterns are most effective?", bank_id: "my-bank")

Query available providers

# All configured providers
providers = LlmCore.Provider.Registry.all()

# Only available ones (API keys present, binaries in PATH)
available = LlmCore.Provider.Registry.available()

# Find by alias
{:ok, provider} = LlmCore.Provider.Registry.lookup_alias("claude")

# Fuzzy suggestions (Jaro distance)
LlmCore.Provider.Registry.suggest_alias("claud")
#=> ["claude"]

# Capable providers for requirements
LlmCore.Provider.Registry.suggest_capable(%{streaming: true, tool_use: true})

CLI provider discovery

# List all CLI providers (built-in + configured)
entries = LlmCore.CLIProvider.Registry.list()

# Only those with binary in PATH
available = LlmCore.CLIProvider.Registry.available()

# Resolve by id or alias
{:ok, provider} = LlmCore.CLIProvider.Registry.resolve(:droid)

# Check capabilities
{:ok, caps} = LlmCore.CLIProvider.Registry.capabilities(:codex_cli)

Configuration

LlmCore uses layered TOML configuration. Later sources override earlier ones:

1. Compiled defaults    (priv/config/llm_core.toml)
2. Global override      (~/.llm_core/config/llm_core.toml)
3. Project override     (<project>/.llm_core/llm_core.toml)
4. Environment variable (LLM_CORE_CONFIG=path)
5. Custom path          (explicit :path option)
6. Runtime overrides    (ETS, via mix tasks or API)

Minimal configuration

[routing]
default = "claude"

[providers.anthropic]
module = "LlmCore.LLM.Anthropic"
aliases = ["claude"]

[providers.anthropic.auth]
api_key_env = "ANTHROPIC_API_KEY"

Task-based routing

[routing]
default = "claude"

[routing.tasks.coding]
alias = "openai"
mode = "passthrough"
capabilities = { structured_output = true, tool_use = true }

[routing.tasks.planning]
alias = "claude"
mode = "abstracted"
capabilities = { reasoning = true }

Add a CLI provider (no code needed)

[providers.my_tool]
type = "cli"
enabled = true
aliases = ["my-tool", "mt"]

[providers.my_tool.cli]
binary = "my-tool"
default_model = "v2"
default_timeout = 60000
prompt_position = "last"
install_hint = "pip install my-tool"
auto_approve_args = ["--yes"]

[providers.my_tool.cli.flags]
model = "--model"
temperature = "--temp"

[providers.my_tool.cli.preflight]
help_args = ["--help"]
expect_in_help = ["--model"]

Mix task helpers

# Inspect configuration
mix llm_core.config.show
mix llm_core.config.show --section providers --json

# Edit configuration
mix llm_core.config.set --path routing.default.alias --value claude
mix llm_core.config.set --path telemetry.sample_rate --value 0.25 --type float

# Validate configuration
mix llm_core.config.validate

See the Configuration Guide for the full TOML schema, environment variable interpolation, and agent registration rules.

Architecture

LlmCore is built on ALF (Antonmi's Flow-based Framework) for composable, observable data pipelines:

┌─────────────────────────────────────────────────────────────┐
│                       LlmCore                                │
│                                                              │
│  ┌──────────────┐  ┌──────────────┐  ┌────────────────────┐ │
│  │  Inference   │  │   Routing    │  │   Hindsight        │ │
│  │  Pipeline    │  │   Pipeline   │  │   Memory Client    │ │
│  └──────────────┘  └──────────────┘  └────────────────────┘ │
│                                                              │
│  ┌──────────────┐  ┌──────────────┐  ┌────────────────────┐ │
│  │  Agent Loop  │  │   Config     │  │   Telemetry        │ │
│  │  (Tool Use)  │  │   (Hot TOML) │  │   (Observable)     │ │
│  └──────────────┘  └──────────────┘  └────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Three ALF pipelines handle the core flows:

Inference Pipeline — normalize request → resolve route → check capabilities → dispatch provider → apply structured output → emit telemetry
Routing Pipeline — parse task type → load routing config → match rules → resolve agent or apply fallback
Memory Pipeline — route operation (retain/recall/reflect) → circuit breaker gate → retry with backoff → update cache

See the Architecture Guide for pipeline internals, provider behaviour contracts, and the agent loop design.

Telemetry Events

# Provider dispatch
[:llm_core, :provider, :send, :start | :stop | :exception]
[:llm_core, :provider, :stream, :start | :chunk | :stop]

# Router decisions
[:llm_core, :router, :resolve, :start | :stop]
[:llm_core, :router, :fallback]

# Agent loop
[:llm_core, :agent, :complete]

# Memory operations
[:llm_core, :hindsight, :retain | :recall | :reflect]
[:llm_core, :hindsight, :circuit_breaker, :state_change]

# Configuration
[:llm_core, :config, :reload]

Built-in Providers

Provider	Type	Module	Key Capabilities
Anthropic	API	`LlmCore.LLM.Anthropic`	Streaming, tool use, vision, structured output
OpenAI	API	`LlmCore.LLM.OpenAI`	Streaming, tool use, vision, structured output
Ollama	Local	`LlmCore.LLM.Ollama`	Streaming, JSON mode, local models
Appliance	Local	`LlmCore.LLM.Appliance`	OpenAI-compatible local endpoints
Native	API	`LlmCore.LLM.Native`	In-process agentic loop with cascade fallback
Claude Code	CLI	Config-driven	`--print`, system prompt file, auto-approve
Droid	CLI	Config-driven	`exec` subcommand, `--auto`, `--cwd`
Pi CLI	CLI	Config-driven	`--print`, `--provider`, `--thinking`
Kimi CLI	CLI	Config-driven	Agent-file YAML transform, final-message capture
Codex CLI	CLI	Config-driven	`--full-auto`, file capture, sandbox bypass
Gemini CLI	CLI	Config-driven	Model selection

Documentation

Configuration Guide — Full TOML schema, layered config, mix tasks
Architecture Guide — Pipeline design, provider system, memory integration
CLI Providers — Adding and configuring CLI-based providers
Agent Loop — Tool-calling loops, context, pipeline stages

License

MIT — see the LICENSE file.