BenchGecko for Elixir

The CoinGecko for AI. Elixir client for accessing AI model benchmarks, comparing language models, estimating inference costs, and discovering AI agents.

BenchGecko tracks 300+ AI models across 50+ providers with real benchmark scores, latency metrics, and transparent pricing. This package gives you structured access to that data in idiomatic Elixir with pattern matching, pipes, and typespecs throughout.

Installation

Add benchgecko to your list of dependencies in mix.exs:

def deps do
  [
    {:benchgecko, "~> 0.1.0"}
  ]
end

Then run mix deps.get.

Quick Start

# Look up any model
{:ok, model} = BenchGecko.get_model("claude-3.5-sonnet")
model.name       #=> "Claude 3.5 Sonnet"
model.provider   #=> "Anthropic"

BenchGecko.score(model, "MMLU")  #=> 88.7

# Bang variant raises on unknown models
model = BenchGecko.get_model!("gpt-4o")

Comparing Models

The comparison engine uses Elixir maps to surface benchmark differences and pricing ratios. Positive diff values mean the first model scores higher:

"gpt-4o"
|> BenchGecko.compare_models("claude-3.5-sonnet")
|> then(fn result ->
  IO.puts("Cheaper: #{result.cheaper}")
  IO.puts("Cost ratio: #{result.cost_ratio}")

  result.benchmark_diff
  |> Enum.filter(fn {_bench, diff} -> diff != nil end)
  |> Enum.each(fn {bench, diff} ->
    winner = if diff >= 0, do: "GPT-4o", else: "Claude 3.5 Sonnet"
    IO.puts("#{bench}: #{winner} by #{abs(diff)} pts")
  end)
end)

Cost Estimation

Estimate inference costs before committing to a provider. All prices are per million tokens:

case BenchGecko.estimate_cost("gpt-4o", 2_000_000, 500_000) do
  %{total: total, input_cost: input, output_cost: output} ->
    IO.puts("Input: $#{input}, Output: $#{output}, Total: $#{total}")

  :error ->
    IO.puts("Model not found or missing pricing data")
end

Finding the Right Model

Filter models by benchmark performance with pipes and pattern matching:

# All models scoring 87+ on MMLU, sorted by score
BenchGecko.top_models("MMLU", 87.0)
|> Enum.each(fn model ->
  IO.puts("#{model.name}: #{BenchGecko.score(model, "MMLU")}")
end)

# Cheapest model above a quality threshold
case BenchGecko.cheapest_above("MMLU", 85.0) do
  nil -> IO.puts("No model meets the threshold")
  model -> IO.puts("#{model.name} at $#{BenchGecko.cost_per_million(model)}/M tokens")
end

Benchmark Categories

BenchGecko organizes 40+ benchmarks into categories covering reasoning, coding, math, instruction following, safety, multimodal, multilingual, and long context evaluation:

BenchGecko.benchmark_categories()
|> Enum.each(fn {_key, info} ->
  IO.puts("#{info.name}: #{Enum.join(info.benchmarks, ", ")}")
  IO.puts("  #{info.description}")
end)

Built-in Model Catalog

The package ships with a curated catalog of major models from OpenAI, Anthropic, Google, Meta, Mistral, and DeepSeek. Each entry includes benchmark scores, parameter counts, context window sizes, and per-token pricing.

All data is compiled into the module at build time for zero-overhead lookups -- no HTTP calls, no external dependencies.

model = BenchGecko.get_model!("deepseek-v3")
model.parameters       #=> 671
model.context_window   #=> 128_000
BenchGecko.cost_per_million(model)  #=> 0.685

Typespecs and Documentation

Every public function has @spec and @doc annotations. Generate local documentation with:

mix docs

Resources

License

MIT License. See LICENSE for details.