CrucibleIR Hexagonal Mark

CrucibleIR

Hex.pmDocsLicense

Intermediate Representation for the Crucible ML reliability ecosystem. Full docs: https://hexdocs.pm/crucible_ir

Overview

CrucibleIR provides shared data structures for defining ML reliability experiments across the Crucible ecosystem. It serves as the common language for experiment configuration, enabling consistency across all Crucible tools and components.

Requirements

Features

Installation

Add crucible_ir to your list of dependencies in mix.exs:

def deps do
[
{:crucible_ir, "~> 0.1.0"}
]
end

Fetch dependencies:

mix deps.get

Quick Start

alias CrucibleIR.{Experiment, BackendRef, StageDef, DatasetRef}
alias CrucibleIR.Reliability.{Config, Ensemble, Stats}
# Define a simple experiment
experiment = CrucibleIR.new_experiment(
id: :gpt4_benchmark,
backend: %BackendRef{id: :openai_gpt4},
pipeline: [
%StageDef{name: :preprocessing},
%StageDef{name: :inference},
%StageDef{name: :evaluation}
],
dataset: %DatasetRef{name: :mmlu, split: :test}
)
# Add reliability mechanisms
experiment = %{experiment |
reliability: %Config{
ensemble: %Ensemble{
strategy: :majority,
models: [:gpt4, :claude, :gemini],
execution_mode: :parallel
},
stats: %Stats{
tests: [:ttest, :bootstrap],
alpha: 0.05
}
}
}
# Serialize to JSON
{:ok, json} = Jason.encode(experiment)

Usage Workflow

  1. Define an Experiment with id, backend, and pipeline stages.
  2. Add a DatasetRef if the experiment targets a dataset.
  3. Attach Reliability.Config options (ensemble, hedging, stats, fairness, guardrails).
  4. Add OutputSpec entries to describe where and how to emit results.
  5. Serialize with Jason.encode/1 to pass the IR into other Crucible services.

Core Components

Experiment Definition

Reliability Mechanisms

Struct Field Reference

Examples

Ensemble Voting Experiment

experiment = CrucibleIR.new_experiment(
id: :ensemble_exp,
backend: %BackendRef{id: :gpt4},
pipeline: [%StageDef{name: :inference}],
reliability: %Config{
ensemble: %Ensemble{
strategy: :weighted,
models: [:gpt4, :claude, :gemini],
weights: %{gpt4: 0.5, claude: 0.3, gemini: 0.2},
execution_mode: :parallel
}
}
)

Hedging for Low Latency

experiment = CrucibleIR.new_experiment(
id: :low_latency_exp,
backend: %BackendRef{id: :gpt4},
pipeline: [%StageDef{name: :inference}],
reliability: %Config{
hedging: %Hedging{
strategy: :percentile,
percentile: 0.95,
max_hedges: 2,
budget_percent: 15
}
}
)

Statistical Testing

experiment = CrucibleIR.new_experiment(
id: :stats_exp,
backend: %BackendRef{id: :gpt4},
pipeline: [%StageDef{name: :inference}],
dataset: %DatasetRef{name: :mmlu},
reliability: %Config{
stats: %Stats{
tests: [:ttest, :mannwhitney, :bootstrap],
alpha: 0.01,
effect_size_type: :cohens_d,
bootstrap_iterations: 10000
}
}
)

Fairness Checking

experiment = CrucibleIR.new_experiment(
id: :fairness_exp,
backend: %BackendRef{id: :gpt4},
pipeline: [%StageDef{name: :inference}],
reliability: %Config{
fairness: %Fairness{
enabled: true,
metrics: [:demographic_parity, :equalized_odds],
group_by: :gender,
threshold: 0.8,
fail_on_violation: true
}
}
)

Security Guardrails

experiment = CrucibleIR.new_experiment(
id: :secure_exp,
backend: %BackendRef{id: :gpt4},
pipeline: [%StageDef{name: :inference}],
reliability: %Config{
guardrails: %Guardrail{
profiles: [:strict],
prompt_injection_detection: true,
jailbreak_detection: true,
pii_detection: true,
pii_redaction: true,
fail_on_detection: true
}
}
)

Architecture

CrucibleIR follows a hierarchical structure:

Experiment (top-level)
├── BackendRef (which LLM to use)
├── Pipeline (list of StageDef)
├── DatasetRef (what data to evaluate)
├── Reliability.Config
│ ├── Ensemble (multi-model voting)
│ ├── Hedging (latency optimization)
│ ├── Stats (statistical testing)
│ ├── Fairness (bias detection)
│ └── Guardrails (security)
└── Outputs (list of OutputSpec)

Testing

All modules have comprehensive test coverage:

mix test

Current test stats: 78 tests, 0 failures (3 doctests, 75 unit tests)

Documentation

Generate HTML documentation:

mix docs

Integration with Crucible Ecosystem

CrucibleIR is used by:

Design Principles

  1. Immutable Data Structures: All structs are immutable
  2. Type Safety: Full type specifications with @type and @spec
  3. JSON-First: All structs support JSON serialization
  4. Documentation: Every module and public function is documented
  5. Test Coverage: High test coverage with property-based testing

Contributing

This library is part of the North-Shore-AI organization. Contributions welcome!

License

MIT License - See LICENSE file for details