FastDecimal

Fast arbitrary-precision decimal arithmetic for Elixir.

A pure-Elixir alternative to decimal — designed for the hot paths fintech, ledger, and pricing code live in: add, sub, mult, div, sum, parse, format. Drop-in via a compat shim, ships with Ecto integration. No native dependencies.

import FastDecimal

~d"1.23"
|> FastDecimal.add(~d"4.567")
|> FastDecimal.mult(~d"2")
|> FastDecimal.to_string()
# => "11.594"

FastDecimal.sum([~d"1.5", ~d"2.5", ~d"3"])
# => ~d"7.0"

FastDecimal.round(~d"1.236", 2)         # ~d"1.24"
FastDecimal.sqrt(~d"2", precision: 10)   # ~d"1.414213562"
FastDecimal.div(~d"10", ~d"3", precision: 5)  # ~d"3.3333"

Benchmarks

mix bench reproduces the headline summary in about a minute.

Methodology

Every number below is the median across 7 independent samples × 200,000 iterations per scenario. Each sample runs in a fresh process (resetting BEAM state), with 1,000 warmup iterations and a forced GC before measurement. Times use :erlang.monotonic_time(:nanosecond).

We report median (p25–p75 IQR) — the interquartile range survives outliers from GC pauses and scheduler steals. A row is marked stable when even the pessimistic ratio (FastDecimal's p75 vs Decimal's p25) clears 2×.

The geometric mean speedup is reproducible across runs (observed: 11.11× – 11.28× across 4 consecutive runs on the same JIT-enabled OTP install). Specific per-op nanosecond values shift 5-10% per run due to macOS scheduler noise (E-core vs P-core dispatch, GC interactions); the speedup ratios are stable. Numbers below are from one representative run — run mix bench to see your own.

Headline summary (`mix bench`)

Tested on macOS arm64 / 10 cores against two flavors of OTP 26 on the same hardware:

Emulator	Geometric mean speedup	Scenarios faster	Stable ≥2× at IQR edges
OTP 26, BEAMAsm JIT (asdf 26.0.2, `emu_flavor=jit`)	9.87×	21/22	19/22
OTP 26, threaded-code interpreter (asdf 26.2.4, `emu_flavor=emu`)	7.71×	21/22	18/22

JIT helps FastDecimal proportionally more than decimal (FastDecimal's hot paths have more inlining opportunities per work-unit), so the speedup ratio is larger on JIT — but even on the older interpreter without JIT, FastDecimal is still ~8× faster on average.

Detailed table (BEAMAsm JIT, OTP 26)

Format: median (p25–p75 IQR). Speedup column: median (pessimistic – optimistic ratios).

op	size	decimal	FastDecimal	speedup
add	medium	282 ns (264–294)	15 ns (13–16)	19× (17–23)
add	large	1.72 µs (1.68–1.77)	20 ns (20–21)	81× (79–86)
sub	medium	359 ns (332–585)	14 ns (12–24)	25× (14–47)
sub	large	747 ns (744–797)	22 ns (21–23)	34× (33–37)
mult	medium	224 ns (224–227)	13 ns (13–13)	18× (17–18)
mult	large	1.94 µs (1.93–1.96)	20 ns (20–21)	97× (92–98)
div p=28	medium	2.92 µs (2.91–2.94)	374 ns (371–378)	7.8× (7.7–7.9)
div p=28	large	6.88 µs (6.84–6.98)	416 ns (414–421)	17× (16–17)
div_int	medium	128 ns (128–131)	15 ns (15–16)	8.4× (8.2–8.6)
div_rem	medium	139 ns (137–141)	50 ns (50–51)	2.8× (2.7–2.8)
compare	medium	85 ns (84–85)	8.5 ns (8.5–8.8)	10× (10–10)
compare	large	302 ns (298–304)	16 ns (15–17)	19× (17–20)
negate	medium	181 ns (178–182)	15 ns (15–16)	12× (11–12)
abs	medium	162 ns (159–164)	15 ns (14–15)	11× (11–12)
round (3dp)	medium	433 ns (427–435)	33 ns (32–35)	13× (12–14)
normalize	medium	180 ns (176–181)	18 ns (18–18)	10× (10–10)
parse	small	179 ns (177–181)	52 ns (51–57)	3.4× (3.1–3.5)
parse	medium	242 ns (235–246)	65 ns (64–66)	3.7× (3.6–3.8)
to_string	medium	137 ns (137–138)	135 ns (134–136)	1.0× — parity
to_string sci	medium	137 ns (136–138)	181 ns (180–182)	0.76× — regression
to_integer	medium	16 ns (16–17)	11 ns (10–11)	1.5× (1.5–1.6)
sum of 100	—	23.4 µs (22.7–24.3)	785 ns (775–804)	30× (28–31)

At-parity ops (called out honestly):

to_string :normal, to_string :scientific, to_integer: 1.0× – 1.2× faster but not stable at the pessimistic IQR edge ≥2× — marked marginal in the bench output. decimal's formatters are exceptionally tight; we win, just less decisively.

Realistic workloads (`mix run bench/realistic.exs`)

Production-style code patterns. Speedups vary 10-25% across runs (the workload code allocates more, so GC interactions vary), but every workload comes in 10×+ faster than decimal:

Workload	typical speedup
Invoice total (50 line items × price)	14-17×
10% discount + 8.25% tax × 100 prices	18-22×
FX conversion + round 2dp × 100 prices	12-15×
Sum + min + max over 1000 amounts	23-28×
Parse 100 CSV strings	2.7-3.2×

Allocations + reductions (`mix run bench/profile.exs`)

op	dec time	fd time	dec alloc	fd alloc	dec reds	fd reds
add (medium)	266 ns	12 ns	266 B	53 B	63	4
add (large)	1536 ns	19 ns	552 B	12 B	164	4
mult (large)	1970 ns	20 ns	777 B	11 B	273	4
compare	85 ns	8 ns	0 B	0 B	20	4
sum of 100	22.0 µs	0.88 µs	983 B	4947 B	6214	307

4 reductions per add is at the BEAM floor — no operation on a struct can do less.

Reproduce

The whole suite is in bench/ and runs from mix. No Docker, no setup beyond mix deps.get. See the Benchmark suite page for methodology and per-file detail.

mix deps.get
mix test                  # 13 doctests + 35 properties + 277 unit tests = 325 total
mix bench                 # → bench/summary.exs (headline table, ~1 minute)
mix bench.all             # → every bench file end-to-end (~20 minutes)

# Or run a specific bench:
mix run bench/division.exs        # div / div_int / div_rem / rem
mix run bench/rounding.exs        # round/3 × 7 modes
mix run bench/sqrt.exs            # sqrt at 6 precisions
mix run bench/conversion.exs      # to_string formats, cast, to_int/float
mix run bench/special_values.exs  # NaN/Inf overhead
mix run bench/realistic.exs       # fintech-style workloads
mix run bench/batch.exs           # sum/product at 4 list sizes
mix run bench/profile.exs         # per-op time + alloc + reductions
mix run bench/parse.exs           # parser strategy shootout
mix run bench/representation.exs  # struct vs raw tuple
mix run bench/disasm.exs          # BEAM bytecode dump

See bench/README.md for what each script measures and the design decision it backed.

Test coverage

The suite is the regression gate for future optimization work and the correctness floor for trusting outputs:

13 doctests in module + function docs
35 property-based tests (test/fastdecimal/property_test.exs) covering invariants: round-trip, commutativity, associativity, div_rem identity, sqrt(x)² ≈ x, comparison antisymmetry/transitivity/reflexivity, NaN propagation, normalize idempotence
277 unit tests across:
- test/fastdecimal_test.exs — core arithmetic + struct API
- test/fastdecimal/extended_test.exs — NaN/Inf/round/cast/sqrt/div_int/formats/is_decimal
- test/fastdecimal/parser_test.exs — parser edge cases
- test/fastdecimal/edge_cases_test.exs — zero handling, bignum boundary, exponent alignment, rounding corners
- test/fastdecimal/compat_test.exs — drop-in shim
- test/fastdecimal/ecto_type_test.exs — Ecto round-trip
- test/fastdecimal/correctness_test.exs — two kinds of correctness verification:
  1. Mathematical-truth tests — known exact results pinned per operation (1.23 + 4.567 == 5.797, 0.1 + 0.2 == 0.3 exactly, sqrt(4) == 2, banker's rounding tables, etc.). These verify FastDecimal is computing arithmetic correctly without relying on Decimal as the source of truth.
  2. Differential tests vs decimal — for each operation, a matrix of diverse inputs runs through both libraries and the outputs are compared for semantic equality. The 74 tests in this file perform >10,000 individual cross-checks between the two libraries (e.g., add runs 36×36 = 1296 input pairs through both libs). Catches any drift in semantics.

Run with mix test. Full suite finishes in under a second.

Total: 344 tests/properties/doctests — stable across consecutive runs. Includes 19 dedicated security regression tests covering CVE-2026-32686-class exponent-amplification DoS protection.

Security

FastDecimal is not vulnerable to CVE-2026-32686 (exponent-amplification DoS that affected ericmj/decimal < 2.4.0). Three layers of defense:

Parser rejects scientific-notation inputs with explicit exponent magnitude > 65,535. FastDecimal.parse("1e1000000000") returns :error rather than producing a value whose materialization would OOM the BEAM.
pow10/1 internal cap raises on n > 100,000. Catches operations that would materialize huge values even when the value was constructed directly via new(coef, exp) bypassing the parser.
to_string(_, :normal) refuses to produce output larger than 1 MB. The :scientific and :raw formats remain available for legitimate large-exponent values (they don't materialize the zeros).

These bounds are well above any practical use case (IEEE 754 decimal128 itself tops out at exp ±6,144) but kill the runaway path. Regression tests live at test/fastdecimal/security_test.exs.

Where the two libraries legitimately diverge

FastDecimal does exact arithmetic; decimal rounds to its Context.precision (28 by default). For inputs whose true result has >28 significant digits, the two libraries produce different values — that's a documented design difference, not a bug. The differential tests constrain inputs so the result stays within 28 sig figs (where the libs should agree); the property tests document the divergence explicitly.

Installation

def deps do
  [
    {:fastdecimal, "~> 1.0"}
  ]
end

Feature surface

Construction

import FastDecimal

~d"1.23"                          # Compile-time literal (zero parse cost at runtime)
~d"1.23e10"                       # Scientific notation
~d"Infinity"                      # +∞
~d"-Inf"                          # -∞
~d"NaN"                           # NaN

FastDecimal.new("1.23")           # Runtime parse, raises on bad input
FastDecimal.new(42)               # From integer
FastDecimal.new(123, -2)          # From coef + exp
FastDecimal.parse("1.23")         # {:ok, t} | :error  — no raise
FastDecimal.cast(value)           # Soft parse, accepts FastDecimal/Decimal/int/string/float/nil

Arithmetic

FastDecimal.add(a, b)
FastDecimal.sub(a, b)
FastDecimal.mult(a, b)
FastDecimal.div(a, b, precision: 28, rounding: :half_even)
FastDecimal.div_int(a, b)         # Truncated integer division
FastDecimal.div_rem(a, b)         # {quotient, remainder}
FastDecimal.rem(a, b)
FastDecimal.negate(a)
FastDecimal.abs(a)
FastDecimal.sqrt(a, precision: 28)  # Newton-Raphson
FastDecimal.round(a, places, mode)   # All 7 rounding modes

Batch

FastDecimal.sum(list)             # Tight Elixir-side reduce
FastDecimal.product(list)

Comparison & predicates

FastDecimal.compare(a, b)         # :lt | :eq | :gt | :nan
FastDecimal.equal?(a, b)
FastDecimal.lt?(a, b) ; FastDecimal.gt?(a, b)
FastDecimal.min(a, b) ; FastDecimal.max(a, b)

FastDecimal.zero?(d) ; FastDecimal.positive?(d) ; FastDecimal.negative?(d)
FastDecimal.nan?(d)  ; FastDecimal.inf?(d)     ; FastDecimal.finite?(d)

Conversion

FastDecimal.to_string(d)              # "1.23"
FastDecimal.to_string(d, :scientific) # "1.23" — IEEE compact (only emits E for very small/large)
FastDecimal.to_string(d, :raw)        # "123E-2"
FastDecimal.to_string(d, :xsd)        # XSD canonical (= :normal for our repr)

FastDecimal.to_integer(d)             # raises on fractional
FastDecimal.to_float(d)               # lossy for non-terminating binaries
FastDecimal.normalize(d)              # strips trailing zeros

Guard-safe macro

require FastDecimal

def process(d) when FastDecimal.is_decimal(d), do: ...

Migrating from `decimal`

The 30-second version, for the common case:

defmodule MyLedger do
  alias FastDecimal.Compat, as: Decimal   # add this line, rest stays the same

  def total(items) do
    Enum.reduce(items, Decimal.new(0), fn item, acc ->
      Decimal.add(acc, item.amount)
    end)
  end
end

The Compat shim mirrors decimal's public surface and auto-coerces inputs (real %Decimal{}, %FastDecimal{}, strings, integers, floats). It costs 5-15% vs calling FastDecimal.* directly.

Five things that don't translate cleanly and how to handle each:

%Decimal{...} struct literals — module-bound, need rewriting
Decimal.Context.set/with/get — no equivalent (this is the real blocker for some codebases)
:sNaN / :qNaN distinction — collapsed to :nan
-0 vs 0 — collapsed
Signal flags / traps — not supported

See MIGRATION.md for the full guide — decision tree, mechanical steps, real before/after examples, and an FAQ. Most projects migrate in under an hour; some need a wrapper module around precision-policy code; a few should stay on decimal.

Differences from `decimal` (summary)

	`decimal`	FastDecimal
Precision context	Per-process (Decimal.Context)	Per call (only `div`, `sqrt`, `round` take precision)
Default rounding mode	`:half_up`	`:half_even` (the Compat shim uses `:half_up` for parity)
NaN distinction	`:sNaN`, `:qNaN`	Single `:nan` (no signaling NaN)
Sign storage	Separate `sign` field	In `coef`
Negative zero	`-0` distinguishable from `0`	Collapsed to `0`
Arithmetic semantics	Bounded by context precision	Exact — chain `add`/`mult` without rounding
`compare/2` with NaN	Raises	Returns `:nan`
DoS protection (CVE-2026-32686)	Sticky-bit precision-bounded scaling, per-call `:max_digits`/`:max_exponent` opts	Hardcoded global limits (parser caps at exp ±65,535; `pow10` caps internally at n=100,000; `to_string :normal` caps output at 1 MB). No per-call options.

Ecto integration

defmodule MyApp.Invoice do
  use Ecto.Schema

  schema "invoices" do
    field :total, FastDecimal.Ecto.Type
  end
end

FastDecimal.Ecto.Type is automatically compiled when Ecto is in your deps. It bridges between Decimal (what the database adapter speaks) and FastDecimal (what your code holds). cast/1, load/1, dump/1, equal?/2 are all implemented.

Design philosophy

Every operation's implementation was chosen by running a benchmark, not by guessing. The full decision record — with the measurements behind each call — lives in bench/README.md. A few highlights:

The char-by-char walker parser beat :binary.split + :erlang.binary_to_integer by 1.4–3× on every input shorter than ~25 digits (bench/parse.exs).
The iolist to_string beat the bit-syntax binary builder by 20%, because iodata_to_binary is implemented as an Erlang BIF that pre-computes total size.
pow10 lookup table extended to 38 entries + binary exponentiation for larger n. Speeds up div at precision 28 by ~40% (medium values) and ~36% (large values) — the prior recursive pow10(28) path was the bottleneck.
div_rem rewritten to compute quotient + remainder directly from aligned coefficients in one pass, instead of the previous "div_int, then mult, then sub" cascade. From 2.7× → 6.2× speedup.
to_string :scientific switched to IEEE 754-2008 "to-scientific-string" (compact form, matches decimal's output). Was a correctness gap, not just a perf one — turns out decimal's :scientific doesn't always emit E notation; it uses normal form when adjusted_exp >= -6. Fixed alignment is now parity with decimal.
sum/1 and product/1 rewritten as allocation-free accumulators. The old version did pairwise add/mult, producing one throwaway %FastDecimal{} struct per element. The new version carries raw {coef, exp} and only builds the final struct at the end — N−1 fewer allocations. sum of 100 went from 29× → 56× faster than decimal. (Special values trip the is_integer guard and fall through to a pairwise slow path.)
binary_part instead of bit-syntax pattern match in to_string :normal. The <<int_part::binary-size(N), frac_part::binary>> form creates two sub-binary refs; binary_part/3 is a BIF that's about 5% faster. Tipped us from parity to 1.05× on to_string.
equal? / lt? / gt? short-circuit clauses for identical struct shapes (same coef, same exp). Common when comparing a stored value to a fresh literal — returns the answer in a single pattern match instead of going through compare/2.
A Rust NIF prototype for arithmetic ops lost to pure Elixir on every hot path: NIF dispatch overhead (~36 ns) exceeded the per-op cost of pure-Elixir add (~12 ns). It only won at div with high precision (~2.5×) and parse of long strings (~1.5×). Not enough to justify a native dependency and the install friction it adds. The prototype was deleted before v1.0 — the lesson lives in this README.
The %FastDecimal{} struct wrapper is only ~5–9% slower than raw {coef, exp} tuples — cheap enough to pay for ergonomics (bench/representation.exs).
Explicit when c in -2^60..2^60 guards on hot paths add overhead with zero benefit (BEAM's JIT already specializes for immediate-int operands).
%{a | coef: ...} (Elixir's strict update form) produced cleaner BEAM bytecode (put_map_exact instead of put_map_assoc) but wall-time was identical or 1% slower — kept the literal-struct form for readability (bench/disasm.exs).

The rule: if you have a hypothesis about a faster way, write the bench, run it, commit the script. Negative results stay in the tree so we don't re-test the same idea.

Why pure Elixir (the bench data)

NIF dispatch overhead is ~36 ns on this machine. A pure-Elixir add total is ~12 ns. The dispatch cost alone is 3× the work-cost for every cheap op. The Rust NIF prototype we built and benchmarked confirmed this — it lost on every per-op arithmetic and only won at high-precision div and long-string parse. Not enough to justify shipping a binary dependency that requires Rust on every consumer's machine. FastDecimal is pure Elixir; no native compilation step.

License

MIT. See LICENSE.