ExDataSketch

Production-grade streaming data sketching algorithms for Elixir.

ExDataSketch provides probabilistic data structures for approximate counting, frequency estimation, and quantile computation on streaming data. All sketch state is stored as Elixir-owned binaries, enabling straightforward serialization, distribution, and persistence.

CIHex versionHex docsLicense

Supported Algorithms

Algorithm Purpose Status
HyperLogLog (HLL) Cardinality estimation Implemented (Pure + Rust)
Count-Min Sketch (CMS) Frequency estimation Implemented (Pure + Rust)
Theta Sketch Set operations on cardinalities Implemented (Pure + Rust)
KLL Quantiles Rank and quantile estimation Implemented (Pure + Rust)

Installation

Add ex_data_sketch to your list of dependencies in mix.exs:

def deps do
  [
    {:ex_data_sketch, "~> 0.2.0"}
  ]
end

Quick Start

# HLL: count distinct elements
hll = ExDataSketch.HLL.new() |> ExDataSketch.HLL.update_many(1..100_000)
ExDataSketch.HLL.estimate(hll)  # ~100_000

# KLL: quantile estimation
kll = ExDataSketch.KLL.new() |> ExDataSketch.KLL.update_many(1..100_000)
ExDataSketch.KLL.quantile(kll, 0.5)   # approximate median (~50_000)
ExDataSketch.KLL.quantile(kll, 0.99)  # 99th percentile (~99_000)

See the Quick Start Guide for more examples.

Documentation

Full documentation is available at HexDocs.

Architecture

Compatibility and Stability

The following guarantees apply within the v0.x release series:

Not guaranteed:

Development

# Get dependencies
mix deps.get

# Run tests with coverage
mix test --cover

# Run lints
mix lint

# Run benchmarks
mix bench

# Generate docs
mix docs

License

MIT License. See LICENSE for details.