Statwise

Statwise is an Elixir statistics library that aims for idiomatic Elixir APIs with results checked against well-known Python references.

This first milestone includes:

Descriptive statistics for lists and one-dimensional Nx tensors.
Normal and Student's t distribution helpers.
One-sample, paired, Welch, and pooled t-tests.
Average-rank utilities.
Asymptotic and exact Mann-Whitney U tests.
Dataframe-style column wrappers for running tests from maps or Explorer dataframes.
Visualization builders for histograms, ECDFs, QQ plots, box plots, scatter plots, line plots, summary bars and points with intervals, count plots, strip plots, and heatmaps with Vega-Lite-compatible output.
Committed JSONL fixtures generated from pinned Python references.

Examples

Statwise.Descriptive.mean([1, 2, 3])
#=> 2.0

Statwise.TTest.independent([1.2, 1.9, 2.4], [2.2, 3.0, 3.4],
  variance: :welch
)
#=> %Statwise.TestResult{}

Statwise.MannWhitney.test([1, 3, 5], [2, 4],
  alternative: :two_sided,
  method: :asymptotic
)
#=> %Statwise.TestResult{}

Statwise.Visualization.histogram([1, 2, 2, 3], bins: 10)
|> Statwise.Visualization.to_vega_lite()
#=> %{"$schema" => "https://vega.github.io/schema/vega-lite/v5.json", ...}

# In Livebook with :jason, :vega_lite, and :kino_vega_lite installed:
Statwise.Visualization.histogram([1, 2, 2, 3], bins: 10)
|> Statwise.Visualization.with_style(width: 420, color: "#2563eb")
|> Statwise.Visualization.show()

rows = [
  %{site: :north, treatment: :control, time: 1, score: 1.2},
  %{site: :north, treatment: :control, time: 2, score: 1.8},
  %{site: :south, treatment: :treated, time: 1, score: 2.4},
  %{site: :south, treatment: :treated, time: 2, score: 2.9}
]

rows
|> Statwise.Visualization.plot(x: :time, y: :score, color: :treatment)
|> Statwise.Visualization.add(:point)
|> Statwise.Visualization.add(:line)
|> Statwise.Visualization.facet(column: :site)
|> Statwise.Visualization.show()

rows
|> Statwise.Visualization.box_plot(x: :treatment, y: :score)
|> Statwise.Visualization.with_test(:t_test, groups: {:control, :treated})
|> Statwise.Visualization.show()

T-Tests

Statwise.TTest.one_sample([2.5, 3.1, 3.6, 4.0], mean: 3.0)

Statwise.TTest.paired(
  [10.2, 11.5, 12.1, 13.8],
  [9.9, 10.8, 11.2, 12.6],
  alternative: :greater
)

Statwise.TTest.independent(
  [1.2, 1.9, 2.4, 2.9],
  [2.2, 3.0, 3.4, 4.1, 4.8],
  variance: :welch,
  alternative: :less,
  null_difference: 0.0,
  confidence_level: 0.95,
  effect_size: true
)

The test APIs can also pull samples from dataframe-like column data. Statwise does not depend on Explorer, but if your application has Explorer loaded, Explorer.DataFrame columns are accepted. Maps of columns work too:

df = %{
  before: [10.2, 11.5, 12.1, 13.8],
  after: [9.9, 10.8, 11.2, 12.6],
  control: [1.2, 1.9, 2.4, 2.9],
  treatment: [2.2, 3.0, 3.4, 4.1]
}

Statwise.TTest.one_sample(df, columns: [:before, :after], mean: 10.0)
#=> %{before: %Statwise.TestResult{}, after: %Statwise.TestResult{}}

Statwise.TTest.paired(df, columns: [:before, :after])
#=> %Statwise.TestResult{}

Statwise.TTest.independent(df, columns: [:control, :treatment], variance: :welch)
#=> %Statwise.TestResult{}

Column extraction defaults to ordinary lists. Pass input: :tensor to extract map or Explorer columns as one-dimensional f64 tensors. With Explorer loaded, Statwise uses Explorer.Series.to_tensor/2 when it is available:

Statwise.TTest.one_sample(df,
  columns: [:before, :after],
  mean: 10.0,
  input: :tensor,
  backend: :tensor
)

Use pairs: to run several two-sample tests in one call:

Statwise.TTest.paired(df,
  pairs: [
    before: :after,
    control: :treatment
  ]
)
#=> %{{:before, :after} => %Statwise.TestResult{}, ...}

Supported alternatives are :two_sided, :greater, and :less. Independent t-tests support variance: :welch and variance: :pooled. T-test results include confidence intervals by default. Pass effect_size: true to include Cohen's d and Hedges' g.

Nonparametric Tests

Statwise.Nonparametric.Rank.ranks([10, 20, 20, 30])
#=> [1.0, 2.5, 2.5, 4.0]

Statwise.MannWhitney.test(
  [1.0, 3.0, 5.0],
  [2.0, 4.0],
  alternative: :two_sided,
  method: :auto,
  continuity: true
)

Dataframe columns are supported with the same columns: and pairs: options:

Statwise.MannWhitney.test(df, columns: [:control, :treatment], method: :auto)

Statwise.MannWhitney.test(df,
  pairs: [
    control: :treatment,
    before: :after
  ],
  method: :auto
)

Ranking currently supports SciPy-compatible average ranks for ties. Mann-Whitney U supports method: :asymptotic, method: :exact, and method: :auto. Like SciPy, explicit method: :exact does not apply a tie correction. :auto uses exact p-values when there are no ties and the smaller sample has at most 8 observations; otherwise it uses the asymptotic normal approximation. Mann-Whitney results include common-language and rank-biserial effect sizes. effect_size.cliffs_delta is also provided as an alias of rank-biserial.

Stage-one behavior is intentionally strict: raw samples must be finite numeric lists or one-dimensional Nx tensors. Test APIs can also extract raw samples from dataframe-style columns with columns: or pairs:. Tensor-native Nx reductions are opt-in with backend: :tensor; the default path still favors the fastest scalar implementation for the current Nx binary backend. NaN behavior is controlled with nan_policy: :raise | :propagate | :omit; see docs/compatibility.md. Degenerate t-tests with zero standard error return explicit :nan, :infinity, or :neg_infinity statistics according to the compatibility contract.

Python Compatibility

The Elixir tests use committed fixtures from:

NumPy 2.3.0 for descriptive statistics.
SciPy 1.16.0 for distributions and Mann-Whitney U.
Statsmodels 0.14.6 for independent t-tests.

Python is not required for the normal test suite. To intentionally refresh fixtures:

cd reference/python
uv sync
uv run python generate_fixtures.py
cd ../..
mix test

Review fixture diffs before committing refreshed values.

For randomized pre-release checks against Python references:

cd reference/python
uv sync
uv run python differential_check.py --cases 250 --seed 202607

See docs/release_checklist.md for the release readiness checklist.

For runnable tutorials, see docs/statistical_tests_gallery.livemd and docs/visualization_gallery.livemd.

CI

Run:

mix format --check-formatted
mix compile --warnings-as-errors
mix test