FastDecimal
Fast arbitrary-precision decimal arithmetic for Elixir.
A pure-Elixir alternative to decimal — designed for the hot paths fintech, ledger, and pricing code live in: add, sub, mult, div, sum, parse, format. Drop-in via a compat shim, ships with Ecto integration. No native dependencies.
import FastDecimal
~d"1.23"
|> FastDecimal.add(~d"4.567")
|> FastDecimal.mult(~d"2")
|> FastDecimal.to_string()
# => "11.594"
FastDecimal.sum([~d"1.5", ~d"2.5", ~d"3"])
# => ~d"7.0"
FastDecimal.round(~d"1.236", 2) # ~d"1.24"
FastDecimal.sqrt(~d"2", precision: 10) # ~d"1.414213562"
FastDecimal.div(~d"10", ~d"3", precision: 5) # ~d"3.3333"Benchmarks
mix bench reproduces the headline summary in about a minute.
Methodology
Every number below is the median across 7 independent samples × 200,000 iterations per scenario. Each sample runs in a fresh process (resetting BEAM state), with 1,000 warmup iterations and a forced GC before measurement. Times use :erlang.monotonic_time(:nanosecond).
We report median (p25–p75 IQR) — the interquartile range survives outliers from GC pauses and scheduler steals. A row is marked stable when even the pessimistic ratio (FastDecimal's p75 vs Decimal's p25) clears 2×.
The geometric mean speedup is reproducible across runs (observed: 11.11× – 11.28× across 4 consecutive runs on the same JIT-enabled OTP install). Specific per-op nanosecond values shift 5-10% per run due to macOS scheduler noise (E-core vs P-core dispatch, GC interactions); the speedup ratios are stable. Numbers below are from one representative run — run mix bench to see your own.
Headline summary (mix bench)
Tested on macOS arm64 / 10 cores against two flavors of OTP 26 on the same hardware:
| Emulator | Geometric mean speedup | Scenarios faster | Stable ≥2× at IQR edges |
|---|---|---|---|
OTP 26, BEAMAsm JIT (asdf 26.0.2, emu_flavor=jit) | 9.87× | 21/22 | 19/22 |
OTP 26, threaded-code interpreter (asdf 26.2.4, emu_flavor=emu) | 7.71× | 21/22 | 18/22 |
JIT helps FastDecimal proportionally more than decimal (FastDecimal's hot paths have more inlining opportunities per work-unit), so the speedup ratio is larger on JIT — but even on the older interpreter without JIT, FastDecimal is still ~8× faster on average.
Detailed table (BEAMAsm JIT, OTP 26)
Format: median (p25–p75 IQR). Speedup column: median (pessimistic – optimistic ratios).
| op | size | decimal | FastDecimal | speedup |
|---|---|---|---|---|
| add | medium | 282 ns (264–294) | 15 ns (13–16) | 19× (17–23) |
| add | large | 1.72 µs (1.68–1.77) | 20 ns (20–21) | 81× (79–86) |
| sub | medium | 359 ns (332–585) | 14 ns (12–24) | 25× (14–47) |
| sub | large | 747 ns (744–797) | 22 ns (21–23) | 34× (33–37) |
| mult | medium | 224 ns (224–227) | 13 ns (13–13) | 18× (17–18) |
| mult | large | 1.94 µs (1.93–1.96) | 20 ns (20–21) | 97× (92–98) |
| div p=28 | medium | 2.92 µs (2.91–2.94) | 374 ns (371–378) | 7.8× (7.7–7.9) |
| div p=28 | large | 6.88 µs (6.84–6.98) | 416 ns (414–421) | 17× (16–17) |
| div_int | medium | 128 ns (128–131) | 15 ns (15–16) | 8.4× (8.2–8.6) |
| div_rem | medium | 139 ns (137–141) | 50 ns (50–51) | 2.8× (2.7–2.8) |
| compare | medium | 85 ns (84–85) | 8.5 ns (8.5–8.8) | 10× (10–10) |
| compare | large | 302 ns (298–304) | 16 ns (15–17) | 19× (17–20) |
| negate | medium | 181 ns (178–182) | 15 ns (15–16) | 12× (11–12) |
| abs | medium | 162 ns (159–164) | 15 ns (14–15) | 11× (11–12) |
| round (3dp) | medium | 433 ns (427–435) | 33 ns (32–35) | 13× (12–14) |
| normalize | medium | 180 ns (176–181) | 18 ns (18–18) | 10× (10–10) |
| parse | small | 179 ns (177–181) | 52 ns (51–57) | 3.4× (3.1–3.5) |
| parse | medium | 242 ns (235–246) | 65 ns (64–66) | 3.7× (3.6–3.8) |
| to_string | medium | 137 ns (137–138) | 135 ns (134–136) | 1.0× — parity |
| to_string sci | medium | 137 ns (136–138) | 181 ns (180–182) | 0.76× — regression |
| to_integer | medium | 16 ns (16–17) | 11 ns (10–11) | 1.5× (1.5–1.6) |
| sum of 100 | — | 23.4 µs (22.7–24.3) | 785 ns (775–804) | 30× (28–31) |
At-parity ops (called out honestly):
to_string :normal,to_string :scientific,to_integer: 1.0× – 1.2× faster but not stable at the pessimistic IQR edge ≥2× — marked marginal in the bench output.decimal's formatters are exceptionally tight; we win, just less decisively.
Realistic workloads (mix run bench/realistic.exs)
Production-style code patterns. Speedups vary 10-25% across runs (the workload code allocates more, so GC interactions vary), but every workload comes in 10×+ faster than decimal:
| Workload | typical speedup |
|---|---|
| Invoice total (50 line items × price) | 14-17× |
| 10% discount + 8.25% tax × 100 prices | 18-22× |
| FX conversion + round 2dp × 100 prices | 12-15× |
| Sum + min + max over 1000 amounts | 23-28× |
| Parse 100 CSV strings | 2.7-3.2× |
Allocations + reductions (mix run bench/profile.exs)
| op | dec time | fd time | dec alloc | fd alloc | dec reds | fd reds |
|---|---|---|---|---|---|---|
| add (medium) | 266 ns | 12 ns | 266 B | 53 B | 63 | 4 |
| add (large) | 1536 ns | 19 ns | 552 B | 12 B | 164 | 4 |
| mult (large) | 1970 ns | 20 ns | 777 B | 11 B | 273 | 4 |
| compare | 85 ns | 8 ns | 0 B | 0 B | 20 | 4 |
| sum of 100 | 22.0 µs | 0.88 µs | 983 B | 4947 B | 6214 | 307 |
4 reductions per add is at the BEAM floor — no operation on a struct can do less.
Reproduce
The whole suite is in bench/ and runs from mix. No Docker, no setup beyond mix deps.get. See the Benchmark suite page for methodology and per-file detail.
mix deps.get
mix test # 13 doctests + 35 properties + 277 unit tests = 325 total
mix bench # → bench/summary.exs (headline table, ~1 minute)
mix bench.all # → every bench file end-to-end (~20 minutes)
# Or run a specific bench:
mix run bench/division.exs # div / div_int / div_rem / rem
mix run bench/rounding.exs # round/3 × 7 modes
mix run bench/sqrt.exs # sqrt at 6 precisions
mix run bench/conversion.exs # to_string formats, cast, to_int/float
mix run bench/special_values.exs # NaN/Inf overhead
mix run bench/realistic.exs # fintech-style workloads
mix run bench/batch.exs # sum/product at 4 list sizes
mix run bench/profile.exs # per-op time + alloc + reductions
mix run bench/parse.exs # parser strategy shootout
mix run bench/representation.exs # struct vs raw tuple
mix run bench/disasm.exs # BEAM bytecode dump
See bench/README.md for what each script measures and the design decision it backed.
Test coverage
The suite is the regression gate for future optimization work and the correctness floor for trusting outputs:
- 13 doctests in module + function docs
- 35 property-based tests (test/fastdecimal/property_test.exs) covering invariants: round-trip, commutativity, associativity,
div_remidentity,sqrt(x)² ≈ x, comparison antisymmetry/transitivity/reflexivity, NaN propagation, normalize idempotence - 277 unit tests across:
- test/fastdecimal_test.exs — core arithmetic + struct API
- test/fastdecimal/extended_test.exs — NaN/Inf/round/cast/sqrt/div_int/formats/is_decimal
- test/fastdecimal/parser_test.exs — parser edge cases
- test/fastdecimal/edge_cases_test.exs — zero handling, bignum boundary, exponent alignment, rounding corners
- test/fastdecimal/compat_test.exs — drop-in shim
- test/fastdecimal/ecto_type_test.exs — Ecto round-trip
- test/fastdecimal/correctness_test.exs — two kinds of correctness verification:
- Mathematical-truth tests — known exact results pinned per operation (
1.23 + 4.567 == 5.797,0.1 + 0.2 == 0.3exactly,sqrt(4) == 2, banker's rounding tables, etc.). These verify FastDecimal is computing arithmetic correctly without relying on Decimal as the source of truth. - Differential tests vs
decimal— for each operation, a matrix of diverse inputs runs through both libraries and the outputs are compared for semantic equality. The 74 tests in this file perform >10,000 individual cross-checks between the two libraries (e.g.,addruns 36×36 = 1296 input pairs through both libs). Catches any drift in semantics.
- Mathematical-truth tests — known exact results pinned per operation (
Run with mix test. Full suite finishes in under a second.
Total: 344 tests/properties/doctests — stable across consecutive runs. Includes 19 dedicated security regression tests covering CVE-2026-32686-class exponent-amplification DoS protection.
Security
FastDecimal is not vulnerable to CVE-2026-32686 (exponent-amplification DoS that affected ericmj/decimal < 2.4.0). Three layers of defense:
- Parser rejects scientific-notation inputs with explicit exponent magnitude > 65,535.
FastDecimal.parse("1e1000000000")returns:errorrather than producing a value whose materialization would OOM the BEAM. pow10/1internal cap raises onn > 100,000. Catches operations that would materialize huge values even when the value was constructed directly vianew(coef, exp)bypassing the parser.to_string(_, :normal)refuses to produce output larger than 1 MB. The:scientificand:rawformats remain available for legitimate large-exponent values (they don't materialize the zeros).
These bounds are well above any practical use case (IEEE 754 decimal128 itself tops out at exp ±6,144) but kill the runaway path. Regression tests live at test/fastdecimal/security_test.exs.
Where the two libraries legitimately diverge
FastDecimal does exact arithmetic; decimal rounds to its Context.precision (28 by default). For inputs whose true result has >28 significant digits, the two libraries produce different values — that's a documented design difference, not a bug. The differential tests constrain inputs so the result stays within 28 sig figs (where the libs should agree); the property tests document the divergence explicitly.
Installation
def deps do
[
{:fastdecimal, "~> 1.0"}
]
endFeature surface
Construction
import FastDecimal
~d"1.23" # Compile-time literal (zero parse cost at runtime)
~d"1.23e10" # Scientific notation
~d"Infinity" # +∞
~d"-Inf" # -∞
~d"NaN" # NaN
FastDecimal.new("1.23") # Runtime parse, raises on bad input
FastDecimal.new(42) # From integer
FastDecimal.new(123, -2) # From coef + exp
FastDecimal.parse("1.23") # {:ok, t} | :error — no raise
FastDecimal.cast(value) # Soft parse, accepts FastDecimal/Decimal/int/string/float/nilArithmetic
FastDecimal.add(a, b)
FastDecimal.sub(a, b)
FastDecimal.mult(a, b)
FastDecimal.div(a, b, precision: 28, rounding: :half_even)
FastDecimal.div_int(a, b) # Truncated integer division
FastDecimal.div_rem(a, b) # {quotient, remainder}
FastDecimal.rem(a, b)
FastDecimal.negate(a)
FastDecimal.abs(a)
FastDecimal.sqrt(a, precision: 28) # Newton-Raphson
FastDecimal.round(a, places, mode) # All 7 rounding modesBatch
FastDecimal.sum(list) # Tight Elixir-side reduce
FastDecimal.product(list)Comparison & predicates
FastDecimal.compare(a, b) # :lt | :eq | :gt | :nan
FastDecimal.equal?(a, b)
FastDecimal.lt?(a, b) ; FastDecimal.gt?(a, b)
FastDecimal.min(a, b) ; FastDecimal.max(a, b)
FastDecimal.zero?(d) ; FastDecimal.positive?(d) ; FastDecimal.negative?(d)
FastDecimal.nan?(d) ; FastDecimal.inf?(d) ; FastDecimal.finite?(d)Conversion
FastDecimal.to_string(d) # "1.23"
FastDecimal.to_string(d, :scientific) # "1.23" — IEEE compact (only emits E for very small/large)
FastDecimal.to_string(d, :raw) # "123E-2"
FastDecimal.to_string(d, :xsd) # XSD canonical (= :normal for our repr)
FastDecimal.to_integer(d) # raises on fractional
FastDecimal.to_float(d) # lossy for non-terminating binaries
FastDecimal.normalize(d) # strips trailing zerosGuard-safe macro
require FastDecimal
def process(d) when FastDecimal.is_decimal(d), do: ...
Migrating from decimal
The 30-second version, for the common case:
defmodule MyLedger do
alias FastDecimal.Compat, as: Decimal # add this line, rest stays the same
def total(items) do
Enum.reduce(items, Decimal.new(0), fn item, acc ->
Decimal.add(acc, item.amount)
end)
end
end
The Compat shim mirrors decimal's public surface and auto-coerces inputs (real %Decimal{}, %FastDecimal{}, strings, integers, floats). It costs 5-15% vs calling FastDecimal.* directly.
Five things that don't translate cleanly and how to handle each:
%Decimal{...}struct literals — module-bound, need rewritingDecimal.Context.set/with/get— no equivalent (this is the real blocker for some codebases):sNaN/:qNaNdistinction — collapsed to:nan-0vs0— collapsed- Signal flags / traps — not supported
See MIGRATION.md for the full guide — decision tree, mechanical steps, real before/after examples, and an FAQ. Most projects migrate in under an hour; some need a wrapper module around precision-policy code; a few should stay on decimal.
Differences from decimal (summary)
decimal | FastDecimal | |
|---|---|---|
| Precision context | Per-process (Decimal.Context) |
Per call (only div, sqrt, round take precision) |
| Default rounding mode | :half_up | :half_even (the Compat shim uses :half_up for parity) |
| NaN distinction | :sNaN, :qNaN |
Single :nan (no signaling NaN) |
| Sign storage |
Separate sign field |
In coef |
| Negative zero | -0 distinguishable from 0 |
Collapsed to 0 |
| Arithmetic semantics | Bounded by context precision | Exact — chain add/mult without rounding |
compare/2 with NaN | Raises |
Returns :nan |
| DoS protection (CVE-2026-32686) |
Sticky-bit precision-bounded scaling, per-call :max_digits/:max_exponent opts |
Hardcoded global limits (parser caps at exp ±65,535; pow10 caps internally at n=100,000; to_string :normal caps output at 1 MB). No per-call options. |
Ecto integration
defmodule MyApp.Invoice do
use Ecto.Schema
schema "invoices" do
field :total, FastDecimal.Ecto.Type
end
endFastDecimal.Ecto.Type is automatically compiled when Ecto is in your deps. It bridges between Decimal (what the database adapter speaks) and FastDecimal (what your code holds). cast/1, load/1, dump/1, equal?/2 are all implemented.
Design philosophy
Every operation's implementation was chosen by running a benchmark, not by guessing. The full decision record — with the measurements behind each call — lives in bench/README.md. A few highlights:
-
The char-by-char walker parser beat
:binary.split+:erlang.binary_to_integerby 1.4–3× on every input shorter than ~25 digits (bench/parse.exs). -
The iolist
to_stringbeat the bit-syntax binary builder by 20%, becauseiodata_to_binaryis implemented as an Erlang BIF that pre-computes total size. pow10lookup table extended to 38 entries + binary exponentiation for larger n. Speeds updivat precision 28 by ~40% (medium values) and ~36% (large values) — the prior recursivepow10(28)path was the bottleneck.div_remrewritten to compute quotient + remainder directly from aligned coefficients in one pass, instead of the previous "div_int, then mult, then sub" cascade. From 2.7× → 6.2× speedup.to_string :scientificswitched to IEEE 754-2008 "to-scientific-string" (compact form, matchesdecimal's output). Was a correctness gap, not just a perf one — turns outdecimal's:scientificdoesn't always emitEnotation; it uses normal form whenadjusted_exp >= -6. Fixed alignment is now parity withdecimal.sum/1andproduct/1rewritten as allocation-free accumulators. The old version did pairwiseadd/mult, producing one throwaway%FastDecimal{}struct per element. The new version carries raw{coef, exp}and only builds the final struct at the end — N−1 fewer allocations.sum of 100went from 29× → 56× faster thandecimal. (Special values trip theis_integerguard and fall through to a pairwise slow path.)binary_partinstead of bit-syntax pattern match into_string :normal. The<<int_part::binary-size(N), frac_part::binary>>form creates two sub-binary refs;binary_part/3is a BIF that's about 5% faster. Tipped us from parity to 1.05× onto_string.equal?/lt?/gt?short-circuit clauses for identical struct shapes (same coef, same exp). Common when comparing a stored value to a fresh literal — returns the answer in a single pattern match instead of going throughcompare/2.- A Rust NIF prototype for arithmetic ops lost to pure Elixir on every hot path: NIF dispatch overhead (~36 ns) exceeded the per-op cost of pure-Elixir add (~12 ns). It only won at div with high precision (~2.5×) and parse of long strings (~1.5×). Not enough to justify a native dependency and the install friction it adds. The prototype was deleted before v1.0 — the lesson lives in this README.
-
The
%FastDecimal{}struct wrapper is only ~5–9% slower than raw{coef, exp}tuples — cheap enough to pay for ergonomics (bench/representation.exs). - Explicit
when c in -2^60..2^60guards on hot paths add overhead with zero benefit (BEAM's JIT already specializes for immediate-int operands). %{a | coef: ...}(Elixir's strict update form) produced cleaner BEAM bytecode (put_map_exactinstead ofput_map_assoc) but wall-time was identical or 1% slower — kept the literal-struct form for readability (bench/disasm.exs).
The rule: if you have a hypothesis about a faster way, write the bench, run it, commit the script. Negative results stay in the tree so we don't re-test the same idea.
Why pure Elixir (the bench data)
NIF dispatch overhead is ~36 ns on this machine. A pure-Elixir add total is ~12 ns. The dispatch cost alone is 3× the work-cost for every cheap op. The Rust NIF prototype we built and benchmarked confirmed this — it lost on every per-op arithmetic and only won at high-precision div and long-string parse. Not enough to justify shipping a binary dependency that requires Rust on every consumer's machine. FastDecimal is pure Elixir; no native compilation step.
License
MIT. See LICENSE.