Statwise

Statwise is an Elixir statistics library that aims for idiomatic Elixir APIs with results checked against well-known Python references.

This first milestone includes:

Examples

Statwise.Descriptive.mean([1, 2, 3])
#=> 2.0
Statwise.TTest.independent([1.2, 1.9, 2.4], [2.2, 3.0, 3.4],
variance: :welch
)
#=> %Statwise.TestResult{}
Statwise.MannWhitney.test([1, 3, 5], [2, 4],
alternative: :two_sided,
method: :asymptotic
)
#=> %Statwise.TestResult{}
Statwise.Visualization.histogram([1, 2, 2, 3], bins: 10)
|> Statwise.Visualization.to_vega_lite()
#=> %{"$schema" => "https://vega.github.io/schema/vega-lite/v5.json", ...}
# In Livebook with :jason, :vega_lite, and :kino_vega_lite installed:
Statwise.Visualization.histogram([1, 2, 2, 3], bins: 10)
|> Statwise.Visualization.with_style(width: 420, color: "#2563eb")
|> Statwise.Visualization.show()
rows = [
%{site: :north, treatment: :control, time: 1, score: 1.2},
%{site: :north, treatment: :control, time: 2, score: 1.8},
%{site: :south, treatment: :treated, time: 1, score: 2.4},
%{site: :south, treatment: :treated, time: 2, score: 2.9}
]
rows
|> Statwise.Visualization.plot(x: :time, y: :score, color: :treatment)
|> Statwise.Visualization.add(:point)
|> Statwise.Visualization.add(:line)
|> Statwise.Visualization.facet(column: :site)
|> Statwise.Visualization.show()
rows
|> Statwise.Visualization.box_plot(x: :treatment, y: :score)
|> Statwise.Visualization.with_test(:t_test, groups: {:control, :treated})
|> Statwise.Visualization.show()

T-Tests

Statwise.TTest.one_sample([2.5, 3.1, 3.6, 4.0], mean: 3.0)
Statwise.TTest.paired(
[10.2, 11.5, 12.1, 13.8],
[9.9, 10.8, 11.2, 12.6],
alternative: :greater
)
Statwise.TTest.independent(
[1.2, 1.9, 2.4, 2.9],
[2.2, 3.0, 3.4, 4.1, 4.8],
variance: :welch,
alternative: :less,
null_difference: 0.0,
confidence_level: 0.95,
effect_size: true
)

The test APIs can also pull samples from dataframe-like column data. Statwise does not depend on Explorer, but if your application has Explorer loaded, Explorer.DataFrame columns are accepted. Maps of columns work too:

df = %{
before: [10.2, 11.5, 12.1, 13.8],
after: [9.9, 10.8, 11.2, 12.6],
control: [1.2, 1.9, 2.4, 2.9],
treatment: [2.2, 3.0, 3.4, 4.1]
}
Statwise.TTest.one_sample(df, columns: [:before, :after], mean: 10.0)
#=> %{before: %Statwise.TestResult{}, after: %Statwise.TestResult{}}
Statwise.TTest.paired(df, columns: [:before, :after])
#=> %Statwise.TestResult{}
Statwise.TTest.independent(df, columns: [:control, :treatment], variance: :welch)
#=> %Statwise.TestResult{}

Column extraction defaults to ordinary lists. Pass input: :tensor to extract map or Explorer columns as one-dimensional f64 tensors. With Explorer loaded, Statwise uses Explorer.Series.to_tensor/2 when it is available:

Statwise.TTest.one_sample(df,
columns: [:before, :after],
mean: 10.0,
input: :tensor,
backend: :tensor
)

Use pairs: to run several two-sample tests in one call:

Statwise.TTest.paired(df,
pairs: [
before: :after,
control: :treatment
]
)
#=> %{{:before, :after} => %Statwise.TestResult{}, ...}

Supported alternatives are :two_sided, :greater, and :less. Independent t-tests support variance: :welch and variance: :pooled. T-test results include confidence intervals by default. Pass effect_size: true to include Cohen's d and Hedges' g.

Nonparametric Tests

Statwise.Nonparametric.Rank.ranks([10, 20, 20, 30])
#=> [1.0, 2.5, 2.5, 4.0]
Statwise.MannWhitney.test(
[1.0, 3.0, 5.0],
[2.0, 4.0],
alternative: :two_sided,
method: :auto,
continuity: true
)

Dataframe columns are supported with the same columns: and pairs: options:

Statwise.MannWhitney.test(df, columns: [:control, :treatment], method: :auto)
Statwise.MannWhitney.test(df,
pairs: [
control: :treatment,
before: :after
],
method: :auto
)

Ranking currently supports SciPy-compatible average ranks for ties. Mann-Whitney U supports method: :asymptotic, method: :exact, and method: :auto. Like SciPy, explicit method: :exact does not apply a tie correction. :auto uses exact p-values when there are no ties and the smaller sample has at most 8 observations; otherwise it uses the asymptotic normal approximation. Mann-Whitney results include common-language and rank-biserial effect sizes. effect_size.cliffs_delta is also provided as an alias of rank-biserial.

Stage-one behavior is intentionally strict: raw samples must be finite numeric lists or one-dimensional Nx tensors. Test APIs can also extract raw samples from dataframe-style columns with columns: or pairs:. Tensor-native Nx reductions are opt-in with backend: :tensor; the default path still favors the fastest scalar implementation for the current Nx binary backend. NaN behavior is controlled with nan_policy: :raise | :propagate | :omit; see docs/compatibility.md. Degenerate t-tests with zero standard error return explicit :nan, :infinity, or :neg_infinity statistics according to the compatibility contract.

Python Compatibility

The Elixir tests use committed fixtures from:

Python is not required for the normal test suite. To intentionally refresh fixtures:

cd reference/python
uv sync
uv run python generate_fixtures.py
cd ../..
mix test

Review fixture diffs before committing refreshed values.

For randomized pre-release checks against Python references:

cd reference/python
uv sync
uv run python differential_check.py --cases 250 --seed 202607

See docs/release_checklist.md for the release readiness checklist.

For runnable tutorials, see docs/statistical_tests_gallery.livemd and docs/visualization_gallery.livemd.

CI

Run:

mix format --check-formatted
mix compile --warnings-as-errors
mix test