FastDecimal

Hex.pmCILicense

Fast arbitrary-precision decimal arithmetic for Elixir.

A pure-Elixir alternative to decimal — designed for the hot paths fintech, ledger, and pricing code live in: add, sub, mult, div, sum, parse, format. Drop-in via a compat shim, ships with Ecto integration. No native dependencies.

import FastDecimal
~d"1.23"
|> FastDecimal.add(~d"4.567")
|> FastDecimal.mult(~d"2")
|> FastDecimal.to_string()
# => "11.594"
FastDecimal.sum([~d"1.5", ~d"2.5", ~d"3"])
# => ~d"7.0"
FastDecimal.round(~d"1.236", 2) # ~d"1.24"
FastDecimal.sqrt(~d"2", precision: 10) # ~d"1.414213562"
FastDecimal.div(~d"10", ~d"3", precision: 5) # ~d"3.3333"

Benchmarks

mix bench reproduces the headline summary in about a minute.

Methodology

Every number below is the median across 7 independent samples × 200,000 iterations per scenario. Each sample runs in a fresh process (resetting BEAM state), with 1,000 warmup iterations and a forced GC before measurement. Times use :erlang.monotonic_time(:nanosecond).

We report median (p25–p75 IQR) — the interquartile range survives outliers from GC pauses and scheduler steals. A row is marked stable when even the pessimistic ratio (FastDecimal's p75 vs Decimal's p25) clears 2×.

The geometric mean speedup is reproducible across runs (observed: 11.11× – 11.28× across 4 consecutive runs on the same JIT-enabled OTP install). Specific per-op nanosecond values shift 5-10% per run due to macOS scheduler noise (E-core vs P-core dispatch, GC interactions); the speedup ratios are stable. Numbers below are from one representative run — run mix bench to see your own.

Headline summary (mix bench)

Tested on macOS arm64 / 10 cores against two flavors of OTP 26 on the same hardware:

EmulatorGeometric mean speedupScenarios fasterStable ≥2× at IQR edges
OTP 26, BEAMAsm JIT (asdf 26.0.2, emu_flavor=jit)9.87×21/2219/22
OTP 26, threaded-code interpreter (asdf 26.2.4, emu_flavor=emu)7.71×21/2218/22

JIT helps FastDecimal proportionally more than decimal (FastDecimal's hot paths have more inlining opportunities per work-unit), so the speedup ratio is larger on JIT — but even on the older interpreter without JIT, FastDecimal is still ~8× faster on average.

Detailed table (BEAMAsm JIT, OTP 26)

Format: median (p25–p75 IQR). Speedup column: median (pessimistic – optimistic ratios).

opsizedecimalFastDecimalspeedup
addmedium282 ns (264–294)15 ns (13–16)19× (17–23)
addlarge1.72 µs (1.68–1.77)20 ns (20–21)81× (79–86)
submedium359 ns (332–585)14 ns (12–24)25× (14–47)
sublarge747 ns (744–797)22 ns (21–23)34× (33–37)
multmedium224 ns (224–227)13 ns (13–13)18× (17–18)
multlarge1.94 µs (1.93–1.96)20 ns (20–21)97× (92–98)
div p=28medium2.92 µs (2.91–2.94)374 ns (371–378)7.8× (7.7–7.9)
div p=28large6.88 µs (6.84–6.98)416 ns (414–421)17× (16–17)
div_intmedium128 ns (128–131)15 ns (15–16)8.4× (8.2–8.6)
div_remmedium139 ns (137–141)50 ns (50–51)2.8× (2.7–2.8)
comparemedium85 ns (84–85)8.5 ns (8.5–8.8)10× (10–10)
comparelarge302 ns (298–304)16 ns (15–17)19× (17–20)
negatemedium181 ns (178–182)15 ns (15–16)12× (11–12)
absmedium162 ns (159–164)15 ns (14–15)11× (11–12)
round (3dp)medium433 ns (427–435)33 ns (32–35)13× (12–14)
normalizemedium180 ns (176–181)18 ns (18–18)10× (10–10)
parsesmall179 ns (177–181)52 ns (51–57)3.4× (3.1–3.5)
parsemedium242 ns (235–246)65 ns (64–66)3.7× (3.6–3.8)
to_stringmedium137 ns (137–138)135 ns (134–136)1.0× — parity
to_string scimedium137 ns (136–138)181 ns (180–182)0.76× — regression
to_integermedium16 ns (16–17)11 ns (10–11)1.5× (1.5–1.6)
sum of 10023.4 µs (22.7–24.3)785 ns (775–804)30× (28–31)

At-parity ops (called out honestly):

Realistic workloads (mix run bench/realistic.exs)

Production-style code patterns. Speedups vary 10-25% across runs (the workload code allocates more, so GC interactions vary), but every workload comes in 10×+ faster than decimal:

Workloadtypical speedup
Invoice total (50 line items × price)14-17×
10% discount + 8.25% tax × 100 prices18-22×
FX conversion + round 2dp × 100 prices12-15×
Sum + min + max over 1000 amounts23-28×
Parse 100 CSV strings2.7-3.2×

Allocations + reductions (mix run bench/profile.exs)

opdec timefd timedec allocfd allocdec redsfd reds
add (medium)266 ns12 ns266 B53 B634
add (large)1536 ns19 ns552 B12 B1644
mult (large)1970 ns20 ns777 B11 B2734
compare85 ns8 ns0 B0 B204
sum of 10022.0 µs0.88 µs983 B4947 B6214307

4 reductions per add is at the BEAM floor — no operation on a struct can do less.

Reproduce

The whole suite is in bench/ and runs from mix. No Docker, no setup beyond mix deps.get. See the Benchmark suite page for methodology and per-file detail.

mix deps.get
mix test # 15 doctests + 37 properties + 313 unit tests = 365 total
mix bench # → bench/summary.exs (headline table, ~1 minute)
mix bench.all # → every bench file end-to-end (~20 minutes)
# Or run a specific bench:
mix run bench/division.exs # div / div_int / div_rem / rem
mix run bench/rounding.exs # round/3 × 7 modes
mix run bench/sqrt.exs # sqrt at 6 precisions
mix run bench/conversion.exs # to_string formats, cast, to_int/float
mix run bench/special_values.exs # NaN/Inf overhead
mix run bench/realistic.exs # fintech-style workloads
mix run bench/batch.exs # sum/product at 4 list sizes
mix run bench/profile.exs # per-op time + alloc + reductions
mix run bench/parse.exs # parser strategy shootout
mix run bench/representation.exs # struct vs raw tuple
mix run bench/targeted.exs # parse / compare-large-gap / normalize-multi-zero / to_integer
mix run bench/disasm.exs # BEAM bytecode dump

See bench/README.md for what each script measures and the design decision it backed.

Test coverage

The suite is the regression gate for future optimization work and the correctness floor for trusting outputs:

Run with mix test. Full suite finishes in under a second.

Total: 365 tests/properties/doctests — stable across consecutive runs. Includes 23 dedicated security regression tests covering CVE-2026-32686-class exponent-amplification DoS protection.

Security

FastDecimal is not vulnerable to CVE-2026-32686 (exponent-amplification DoS that affected ericmj/decimal < 2.4.0). Three layers of defense:

  1. Parser rejects scientific-notation inputs with explicit exponent magnitude > 65,535. FastDecimal.parse("1e1000000000") returns :error rather than producing a value whose materialization would OOM the BEAM.
  2. pow10/1 internal cap raises on n > 100,000. Catches operations that would materialize huge values even when the value was constructed directly via new(coef, exp) bypassing the parser.
  3. to_string(_, :normal) refuses to produce output larger than 1 MB. The :scientific and :raw formats remain available for legitimate large-exponent values (they don't materialize the zeros).

These bounds are well above any practical use case (IEEE 754 decimal128 itself tops out at exp ±6,144) but kill the runaway path. Regression tests live at test/fastdecimal/security_test.exs.

Where the two libraries legitimately diverge

FastDecimal does exact arithmetic; decimal rounds to its Context.precision (28 by default). For inputs whose true result has >28 significant digits, the two libraries produce different values — that's a documented design difference, not a bug. The differential tests constrain inputs so the result stays within 28 sig figs (where the libs should agree); the property tests document the divergence explicitly.

Installation

def deps do
[
{:fastdecimal, "~> 1.0"}
]
end

Feature surface

Construction

import FastDecimal
~d"1.23" # Compile-time literal (zero parse cost at runtime)
~d"1.23e10" # Scientific notation
~d"Infinity" # +∞
~d"-Inf" # -∞
~d"NaN" # NaN
FastDecimal.new("1.23") # Runtime parse, raises on bad input
FastDecimal.new(42) # From integer
FastDecimal.new(123, -2) # From coef + exp
FastDecimal.parse("1.23") # {:ok, t} | :error — no raise
FastDecimal.cast(value) # Soft parse, accepts FastDecimal/Decimal/int/string/float/nil

Arithmetic

FastDecimal.add(a, b)
FastDecimal.sub(a, b)
FastDecimal.mult(a, b)
FastDecimal.div(a, b, precision: 28, rounding: :half_even)
FastDecimal.div_int(a, b) # Truncated integer division
FastDecimal.div_rem(a, b) # {quotient, remainder}
FastDecimal.rem(a, b)
FastDecimal.negate(a)
FastDecimal.abs(a)
FastDecimal.sqrt(a, precision: 28) # Newton-Raphson
FastDecimal.round(a, places, mode) # All 7 rounding modes

Batch

FastDecimal.sum(list) # Tight Elixir-side reduce
FastDecimal.product(list)

Comparison & predicates

FastDecimal.compare(a, b) # :lt | :eq | :gt | :nan
FastDecimal.equal?(a, b)
FastDecimal.lt?(a, b) ; FastDecimal.gt?(a, b)
FastDecimal.min(a, b) ; FastDecimal.max(a, b)
FastDecimal.zero?(d) ; FastDecimal.positive?(d) ; FastDecimal.negative?(d)
FastDecimal.nan?(d) ; FastDecimal.inf?(d) ; FastDecimal.finite?(d)

Conversion

FastDecimal.to_string(d) # "1.23"
FastDecimal.to_string(d, :scientific) # "1.23" — IEEE compact (only emits E for very small/large)
FastDecimal.to_string(d, :raw) # "123E-2"
FastDecimal.to_string(d, :xsd) # XSD canonical (= :normal for our repr)
FastDecimal.to_integer(d) # raises on fractional
FastDecimal.to_float(d) # lossy for non-terminating binaries
FastDecimal.normalize(d) # strips trailing zeros

Guard-safe macro

require FastDecimal
def process(d) when FastDecimal.is_decimal(d), do: ...

Migrating from decimal

The 30-second version, for the common case:

defmodule MyLedger do
alias FastDecimal.Compat, as: Decimal # add this line, rest stays the same
def total(items) do
Enum.reduce(items, Decimal.new(0), fn item, acc ->
Decimal.add(acc, item.amount)
end)
end
end

The Compat shim mirrors decimal's public surface and auto-coerces inputs (real %Decimal{}, %FastDecimal{}, strings, integers, floats). It costs 5-15% vs calling FastDecimal.* directly.

Five things that don't translate cleanly and how to handle each:

See MIGRATION.md for the full guide — decision tree, mechanical steps, real before/after examples, and an FAQ. Most projects migrate in under an hour; some need a wrapper module around precision-policy code; a few should stay on decimal.

Differences from decimal (summary)

decimalFastDecimal
Precision contextPer-process (Decimal.Context)Per call (only div, sqrt, round take precision)
Default rounding mode:half_up:half_even (the Compat shim uses :half_up for parity)
NaN distinction:sNaN, :qNaNSingle :nan (no signaling NaN)
Sign storageSeparate sign fieldIn coef
Negative zero-0 distinguishable from 0Collapsed to 0
Arithmetic semanticsBounded by context precisionExact — chain add/mult without rounding
compare/2 with NaNRaisesReturns :nan
DoS protection (CVE-2026-32686)Sticky-bit precision-bounded scaling, per-call :max_digits/:max_exponent optsHardcoded global limits (parser caps at exp ±65,535; pow10 caps internally at n=100,000; to_string :normal caps output at 1 MB). No per-call options.

Ecto integration

defmodule MyApp.Invoice do
use Ecto.Schema
schema "invoices" do
field :total, FastDecimal.Ecto.Type
end
end

FastDecimal.Ecto.Type is automatically compiled when Ecto is in your deps. It bridges between Decimal (what the database adapter speaks) and FastDecimal (what your code holds). cast/1, load/1, dump/1, equal?/2 are all implemented.

Design philosophy

Every operation's implementation was chosen by running a benchmark, not by guessing. The full decision record — with the measurements behind each call — lives in bench/README.md. A few highlights:

The rule: if you have a hypothesis about a faster way, write the bench, run it, commit the script. Negative results stay in the tree so we don't re-test the same idea.

Why pure Elixir (the bench data)

NIF dispatch overhead is ~36 ns on this machine. A pure-Elixir add total is ~12 ns. The dispatch cost alone is 3× the work-cost for every cheap op. The Rust NIF prototype we built and benchmarked confirmed this — it lost on every per-op arithmetic and only won at high-precision div and long-string parse. Not enough to justify shipping a binary dependency that requires Rust on every consumer's machine. FastDecimal is pure Elixir; no native compilation step.

License

MIT. See LICENSE.