LLM Eval Workbench
From medieval Latin, derived from Arabic al-uthāl — a vessel for sublimation, where matter is refined through stages.
Like jinn answering invocation, LLMs respond to prompts; here, their nature is revealed, tested, and distilled.
Run prompts across OpenAI, Anthropic, and Ollama simultaneously. Compare output quality, latency, token usage, and cost in real-time.
Features
- Multi-provider comparison — Run the same prompt across providers side-by-side. Track latency, token usage, and cost per run.
- Prompt management — Version-controlled templates with
{{variable}}interpolation. Every edit creates an immutable new version. Supports tags and descriptions. - Evolution tracking — Visualize prompt version performance over time. Track pass rates, cost, and latency trends across versions and providers.
- Evaluation suites — Visual test case editor with document attachments (PDF, images, CSV, JSON, TXT). Automated assertions including
contains,regex,exact_match, andjson_field. Track pass rates and catch regressions over time. - Dashboard — Live metrics as runs execute: cost trends, latency, and per-provider performance.
Installation
Aludel can be embedded into any Phoenix LiveView application as a self-contained dashboard.
1. Add dependency
Add Aludel to your mix.exs:
def deps do
[
{:aludel, "~> 0.1"}
]
end
Run mix deps.get
2. Configure the repo
Add to config/config.exs:
config :aludel, repo: YourApp.Repo3. Install migrations
mix aludel.install
This copies Aludel’s migrations to your priv/repo/migrations/ directory.
4. Run migrations
mix ecto.migrate5. Add router macro
In your lib/your_app_web/router.ex:
use YourAppWeb, :router
import Aludel.Web.Router # Add this line
# In development
if Mix.env() == :dev do
scope "/dev" do
pipe_through :browser
aludel_dashboard "/aludel" # Dashboard will be at /dev/aludel
end
end
# Or in production (with authentication)
# scope "/admin" do
# pipe_through [:browser, :require_admin]
# aludel_dashboard "/aludel"
# end
The dashboard can be mounted at any path you choose. It’s common to mount it under /dev in development or /admin in production (with proper authentication).
6. Configure API keys (optional)
Aludel reads provider API keys from application config. Add to your host app’s config:
# config/dev.exs (or config/runtime.exs for production)
config :aludel, :llm,
openai_api_key: System.get_env("OPENAI_API_KEY"),
anthropic_api_key: System.get_env("ANTHROPIC_API_KEY")Then set environment variables before starting the server:
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
mix phx.serverOllama runs locally and requires no API keys.
7. Install ImageMagick (for Ollama PDF support)
If you want to test evaluation suites with PDF documents using Ollama vision models, install ImageMagick v7+:
# macOS
brew install imagemagick
# Ubuntu/Debian
sudo apt-get install imagemagick
# Check installation
magick -versionNote: PDF-to-image conversion is only required for Ollama vision models. OpenAI and Anthropic Claude 4.5+ accept PDFs directly in their APIs without conversion. For Ollama, PDFs are converted to PNG (first page only, 150 DPI) before being sent to the model.
8. Seed demo data (optional)
mix aludel.seedThis populates the database with sample providers, prompts, and evaluation suites.
Visit the dashboard at your configured path (e.g., http://localhost:4000/dev/aludel).
Standalone Mode
Aludel includes a standalone application in the standalone/ directory for running the dashboard without embedding it in a Phoenix app.
Setup
cd standalone
mix deps.get
mix ecto.create
mix ecto.migrate
mix aludel.seed # Optional: add demo data
mix phx.server
Visit http://localhost:4000
Configuration
Edit standalone/config/dev.exs to configure:
- Database — Default:
postgres://postgres:postgres@localhost/aludel_dash_dev - Port — Default:
4000 - API Keys — Set
OPENAI_API_KEYandANTHROPIC_API_KEYenvironment variables
Production Deployment
# Set required environment variables
export DATABASE_URL=postgres://...
export SECRET_KEY_BASE=$(mix phx.gen.secret)
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
# Optional: Enable basic auth
export BASIC_AUTH_USER=admin
export BASIC_AUTH_PASS=secret
# Optional: Set read-only mode
export READ_ONLY=true
# Run the app
MIX_ENV=prod mix release
_build/prod/rel/aludel_dash/bin/aludel_dash startProviders
| Provider | API Key Required | Configuration |
|---|---|---|
| Ollama ⭐ | No | Local models - works out of the box |
| OpenAI | Yes |
Add OPENAI_API_KEY to .env (Get key) |
| Anthropic | Yes |
Add ANTHROPIC_API_KEY to .env (Get key) |
Ollama quickstart
# Install from https://ollama.com, then:
ollama serve
ollama pull llama3 # or: mistral, codellama
mix run priv/repo/seeds.exsSeeds create providers for Ollama, OpenAI, and Anthropic, along with 3 sample prompts and 3 evaluation suites with document attachments.
Usage
1. Create a prompt — Go to Prompts → New Prompt and use {{variable}} syntax:
Explain {{topic}} in exactly 3 sentences.2. Run across providers — Click New Run, fill in variables, select providers, and watch results stream in.
3. Build evaluation suites — Go to Suites → New Suite, add test cases with assertions, and run regression tests.
4. Track evolution — View the Evolution tab on any prompt to see how versions improve over time. Metrics show pass rates, cost, and latency per version and provider.
5. Add a provider — Go to Providers → New Provider, select the type, choose a model, and configure parameters (temperature, max_tokens, etc.). API keys are set via environment variables in .env.
Development
Working with assets (CSS/JS)
Aludel uses Tailwind CSS and esbuild for styling and JavaScript bundling.
After making changes to CSS or JS files:
# 1. Rebuild assets (from aludel directory)
mix assets.build
# 2. Force recompile to pick up new asset hashes
mix compile --force
# 3. If working on an embedded installation, recompile the dependency
cd ../your_host_app
mix deps.compile aludel --force
# 4. Restart the Phoenix server to pick up changesAsset files:
-
CSS:
assets/css/app.css -
JavaScript:
assets/js/app.jsandassets/js/hooks/ -
Built assets:
priv/static/app.cssandpriv/static/app.js(committed to git)
Live development workflow:
For faster iteration during development, you can use Mix tasks with watchers:
# Watch and rebuild CSS on changes
mix tailwind aludel --watch
# Watch and rebuild JS on changes (in another terminal)
mix esbuild aludel --watchAlternatively, run the standalone app for a full development server:
cd standalone
mix phx.server # Starts asset watchers automaticallyCommunity
- 💬 Discussions — Ask questions, share ideas, or discuss use cases
- 🐛 Issues — Report bugs or request features
Contributing
See CONTRIBUTING.md for detailed guidelines.
Quick start:
- Fork the repository
-
Create a feature branch (
git checkout -b feature/your-feature) - Commit using conventional commits
-
Run
mix precommitbefore submitting - Open a Pull Request