CrucibleIR Hexagonal Mark

CrucibleIR

Hex.pmDocsLicense

Intermediate Representation for the Crucible ML reliability ecosystem. Full docs: https://hexdocs.pm/crucible_ir

Overview

CrucibleIR provides shared data structures for defining ML reliability experiments across the Crucible ecosystem. It serves as the common language for experiment configuration, enabling consistency across all Crucible tools and components.

Requirements

Features

Installation

Add crucible_ir to your list of dependencies in mix.exs:

def deps do
  [
    {:crucible_ir, "~> 0.3.0"}
  ]
end

Fetch dependencies:

mix deps.get

Quick Start

alias CrucibleIR.{Experiment, BackendRef, StageDef, DatasetRef}
alias CrucibleIR.Reliability.{Config, Ensemble, Stats}

# Define a simple experiment
experiment = CrucibleIR.new_experiment(
  id: :gpt4_benchmark,
  backend: %BackendRef{id: :openai_gpt4},
  pipeline: [
    %StageDef{name: :preprocessing},
    %StageDef{name: :inference},
    %StageDef{name: :evaluation}
  ],
  dataset: %DatasetRef{name: :mmlu, split: :test}
)

# Add reliability mechanisms
experiment = %{experiment |
  reliability: %Config{
    ensemble: %Ensemble{
      strategy: :majority,
      models: [:gpt4, :claude, :gemini],
      execution_mode: :parallel
    },
    stats: %Stats{
      tests: [:ttest, :bootstrap],
      alpha: 0.05
    }
  }
}

# Serialize to JSON
{:ok, json} = Jason.encode(experiment)

Backend IR Quick Start

alias CrucibleIR.Backend.{Prompt, Options, Completion, Capabilities}

prompt = %Prompt{
  messages: [%{role: :user, content: "Summarize this text."}],
  options: %Options{model: "gpt-4o", temperature: 0.2, response_format: :text}
}

completion = %Completion{
  model: "gpt-4o",
  choices: [
    %{index: 0, message: %{role: :assistant, content: "Summary..."}, finish_reason: :stop}
  ]
}

caps = %Capabilities{backend_id: :openai, provider: "openai", models: ["gpt-4o"]}

{:ok, json} = Jason.encode(prompt)

Examples Directory

See examples/README.md for a full set of API integration examples and setup notes for accounts and keys.

Usage Workflow

  1. Define an Experiment with id, backend, and pipeline stages.
  2. Add a DatasetRef if the experiment targets a dataset.
  3. Attach Reliability.Config options (ensemble, hedging, stats, fairness, guardrails).
  4. Add OutputSpec entries to describe where and how to emit results.
  5. Serialize with Jason.encode/1 to pass the IR into other Crucible services.

Core Components

Experiment Definition

Backend IR

Reliability Mechanisms

Struct Field Reference

New in v0.1.1

Validation

Validate experiments before execution:

alias CrucibleIR.{Experiment, BackendRef, StageDef}

# Valid experiment
exp = %Experiment{
  id: :test,
  backend: %BackendRef{id: :gpt4},
  pipeline: [%StageDef{name: :run}]
}

{:ok, ^exp} = CrucibleIR.validate(exp)
true = CrucibleIR.valid?(exp)

# Invalid experiment
invalid = %Experiment{id: :test, backend: nil, pipeline: nil}
{:error, errors} = CrucibleIR.validate(invalid)
# errors: ["backend is required", "pipeline must be a list"]

JSON Serialization

Serialize to/from JSON with automatic type conversion:

alias CrucibleIR.{Experiment, BackendRef, StageDef}

# Create experiment
exp = %Experiment{
  id: :test,
  backend: %BackendRef{id: :gpt4},
  pipeline: [%StageDef{name: :inference}]
}

# Serialize to JSON
json = CrucibleIR.to_json(exp)

# Deserialize from JSON
{:ok, decoded} = CrucibleIR.from_json(json, Experiment)
decoded.id == :test  # true
decoded.backend.id == :gpt4  # true

# Works with nested structs and reliability configs

Fluent Builder API

Build experiments with a chainable, ergonomic API:

alias CrucibleIR.Builder

{:ok, exp} =
  Builder.experiment(:comprehensive_test)
  |> Builder.with_description("Production reliability test")
  |> Builder.with_backend(:gpt4, profile: :fast)
  |> Builder.add_stage(:preprocessing, options: %{normalize: true})
  |> Builder.add_stage(:inference)
  |> Builder.add_stage(:postprocessing)
  |> Builder.with_dataset(:mmlu, split: :test)
  |> Builder.with_ensemble(:majority, models: [:gpt4, :claude])
  |> Builder.with_hedging(:fixed, delay_ms: 100)
  |> Builder.with_stats([:ttest, :bootstrap], alpha: 0.01)
  |> Builder.with_fairness(metrics: [:demographic_parity], threshold: 0.8)
  |> Builder.with_guardrails(profiles: [:strict], pii_detection: true)
  |> Builder.add_output(:results, formats: [:json, :html])
  |> Builder.build()  # Validates and returns {:ok, exp} or {:error, errors}

# Builder automatically validates - build() returns errors if invalid
{:error, errors} =
  Builder.experiment(:invalid)
  |> Builder.build()  # Missing backend and pipeline

Or use the convenience function from the main module:

{:ok, exp} =
  CrucibleIR.experiment(:my_test)
  |> Builder.with_backend(:gpt4)
  |> Builder.add_stage(:inference)
  |> Builder.build()

Examples

Ensemble Voting Experiment

experiment = CrucibleIR.new_experiment(
  id: :ensemble_exp,
  backend: %BackendRef{id: :gpt4},
  pipeline: [%StageDef{name: :inference}],
  reliability: %Config{
    ensemble: %Ensemble{
      strategy: :weighted,
      models: [:gpt4, :claude, :gemini],
      weights: %{gpt4: 0.5, claude: 0.3, gemini: 0.2},
      execution_mode: :parallel
    }
  }
)

Hedging for Low Latency

experiment = CrucibleIR.new_experiment(
  id: :low_latency_exp,
  backend: %BackendRef{id: :gpt4},
  pipeline: [%StageDef{name: :inference}],
  reliability: %Config{
    hedging: %Hedging{
      strategy: :percentile,
      percentile: 0.95,
      max_hedges: 2,
      budget_percent: 15
    }
  }
)

Statistical Testing

experiment = CrucibleIR.new_experiment(
  id: :stats_exp,
  backend: %BackendRef{id: :gpt4},
  pipeline: [%StageDef{name: :inference}],
  dataset: %DatasetRef{name: :mmlu},
  reliability: %Config{
    stats: %Stats{
      tests: [:ttest, :mannwhitney, :bootstrap],
      alpha: 0.01,
      effect_size_type: :cohens_d,
      bootstrap_iterations: 10000
    }
  }
)

Fairness Checking

experiment = CrucibleIR.new_experiment(
  id: :fairness_exp,
  backend: %BackendRef{id: :gpt4},
  pipeline: [%StageDef{name: :inference}],
  reliability: %Config{
    fairness: %Fairness{
      enabled: true,
      metrics: [:demographic_parity, :equalized_odds],
      group_by: :gender,
      threshold: 0.8,
      fail_on_violation: true
    }
  }
)

Security Guardrails

experiment = CrucibleIR.new_experiment(
  id: :secure_exp,
  backend: %BackendRef{id: :gpt4},
  pipeline: [%StageDef{name: :inference}],
  reliability: %Config{
    guardrails: %Guardrail{
      profiles: [:strict],
      prompt_injection_detection: true,
      jailbreak_detection: true,
      pii_detection: true,
      pii_redaction: true,
      fail_on_detection: true
    }
  }
)

Architecture

CrucibleIR follows a hierarchical structure:

Experiment (top-level)
├── BackendRef (which LLM to use)
├── Pipeline (list of StageDef)
├── DatasetRef (what data to evaluate)
├── Reliability.Config
│   ├── Ensemble (multi-model voting)
│   ├── Hedging (latency optimization)
│   ├── Stats (statistical testing)
│   ├── Fairness (bias detection)
│   └── Guardrails (security)
└── Outputs (list of OutputSpec)

Testing

All modules have comprehensive test coverage:

mix test

Current test stats: 174 tests, 0 failures (6 doctests + 168 unit tests)

New in v0.1.1:

Documentation

Generate HTML documentation:

mix docs

Integration with Crucible Ecosystem

CrucibleIR is used by:

Design Principles

  1. Immutable Data Structures: All structs are immutable
  2. Type Safety: Full type specifications with @type and @spec
  3. JSON-First: All structs support JSON serialization
  4. Documentation: Every module and public function is documented
  5. Test Coverage: High test coverage with property-based testing

Boundary and Serialization Contract

See docs/20251226/ir_boundary/IR_BOUNDARY_AND_CONTRACT.md for the full contract.

Contributing

This library is part of the North-Shore-AI organization. Contributions welcome!

License

MIT License - See LICENSE file for details

Links