ExDataSketch

Production-grade streaming data sketching algorithms for Elixir.

ExDataSketch provides probabilistic data structures for approximate counting, frequency estimation, quantile computation, heavy-hitter detection, membership testing with deletion, and set reconciliation on streaming data. All sketch state is stored as Elixir-owned binaries, enabling straightforward serialization, distribution, and persistence.

CIHex versionHex docsLicense

Supported Algorithms

Algorithm Purpose Status
HyperLogLog (HLL) Cardinality estimation Implemented (Pure + Rust)
Count-Min Sketch (CMS) Frequency estimation Implemented (Pure + Rust)
Theta Sketch Set operations on cardinalities Implemented (Pure + Rust)
KLL Quantiles Rank and quantile estimation Implemented (Pure + Rust)
DDSketch Relative-error quantile estimation Implemented (Pure + Rust)
FrequentItems (SpaceSaving) Heavy-hitter / top-k detection Implemented (Pure + Rust)
Bloom Filter Probabilistic membership testing Implemented (Pure + Rust)
Cuckoo Filter Membership testing with deletion Implemented (Pure + Rust)
Quotient Filter Membership with deletion and merge Implemented (Pure + Rust)
CQF (Counting Quotient) Multiset membership with counting Implemented (Pure + Rust)
XorFilter Static immutable membership testing Implemented (Pure + Rust)
IBLT Set reconciliation Implemented (Pure + Rust)
REQ Sketch Relative-error quantile estimation Implemented (Pure)
Misra-Gries Deterministic heavy-hitter detection Implemented (Pure)
UltraLogLog (ULL) Improved cardinality estimation Implemented (Pure + Rust)

Capability Matrix

Structure insert delete merge count serialize static reconciliation
Bloom yes -- yes yes* yes -- --
Cuckoo yes yes -- yes yes -- --
Quotient yes yes yes yes yes -- --
CQF yes yes yes yes yes -- --
XorFilter -- -- -- yes yes yes --
IBLT yes yes yes yes yes -- yes

*Bloom count is a popcount-based cardinality estimate.

When to Choose

Installation

Add ex_data_sketch to your list of dependencies in mix.exs:

def deps do
  [
    {:ex_data_sketch, "~> 0.7.1"}
  ]
end

Quick Start

# HLL: count distinct elements
hll = ExDataSketch.HLL.new() |> ExDataSketch.HLL.update_many(1..100_000)
ExDataSketch.HLL.estimate(hll)  # ~100_000

# KLL: quantile estimation
kll = ExDataSketch.KLL.new() |> ExDataSketch.KLL.update_many(1..100_000)
ExDataSketch.KLL.quantile(kll, 0.5)   # approximate median (~50_000)
ExDataSketch.KLL.quantile(kll, 0.99)  # 99th percentile (~99_000)

# Bloom: membership testing
bloom = ExDataSketch.Bloom.new(capacity: 100_000)
bloom = ExDataSketch.Bloom.put_many(bloom, 1..50_000)
ExDataSketch.Bloom.member?(bloom, 42)      # true
ExDataSketch.Bloom.member?(bloom, 99_999)  # false (probably)

# Cuckoo: membership testing with deletion
cuckoo = ExDataSketch.Cuckoo.new(capacity: 100_000)
{:ok, cuckoo} = ExDataSketch.Cuckoo.put(cuckoo, "user_42")
ExDataSketch.Cuckoo.member?(cuckoo, "user_42")  # true
{:ok, cuckoo} = ExDataSketch.Cuckoo.delete(cuckoo, "user_42")
ExDataSketch.Cuckoo.member?(cuckoo, "user_42")  # false

See the Quick Start Guide for more examples.

Documentation

Full documentation is available at HexDocs.

Architecture

Compatibility and Stability

The following guarantees apply within the v0.x release series:

Not guaranteed:

Development

# Get dependencies
mix deps.get

# Run tests with coverage
mix test --cover

# Run lints
mix lint

# Run benchmarks
mix bench

# Generate docs
mix docs

Roadmap

Version Focus Status
v0.1.0 Core sketches (HLL, CMS, Theta) + Rust NIFs Released
v0.2.0 KLL quantiles Released
v0.2.1 DDSketch relative-error quantiles Released
v0.3.0 FrequentItems (SpaceSaving) Released
v0.4.0 Bloom filter (membership testing) Released
v0.5.0 Advanced membership filters (Cuckoo, Quotient, CQF, XorFilter, IBLT, FilterChain) Released
v0.6.0 REQ sketch, Misra-Gries, XXHash3, Rust NIF parity for all membership filters Released
v0.7.0 ULL (UltraLogLog) -- improved cardinality estimation with Pure + Rust NIF Released
v0.7.1 NIF batch hashing, hash customization, quotient filter fix, merge safety Released
v0.8.0 Massive Static Data & Industry Interop -- Binary Fuse Filters, Ribbon Filter, Apache DataSketches Interop Planned
v0.9.0 Dequantized HLL and Sphinx (Succinct Perfect Hash Index) Planned

License

MIT License. See LICENSE for details.