Spacesaving

Simple algorithm to estimate distinct elements in a stream using bounded space. The estimate is the upper bound on the element's actual count.

Docs on hex

Usage

Add it to you mix.exs deps

{:spacesaving, "~> 0.0.2"}

Init with 3 spaces, so we track 3 elements

import Spacesaving

state = init(3)

Push some elements

state = state
|> push(:foo) |> push(:foo) |> push(:foo) |> push(:foo)
|> push(:bar) |> push(:bar) |> push(:bar)
|> push(:baz) |> push(:baz)
|> push(:buzz)

Get the top k elements

top(state, 2) # This will be [foo: 4, bar: 3]
top(state, 3) # This will be [foo: 4, bar: 3, buzz: 3], so the inaccuracy starts to come into play when an element is kicked out, and the estimate is the upper bound

Merge two states

left  = init(4) |> push(:foo) |> push(:bar)
right = init(4) |> push(:foo) |> push(:baz)

merge(left, right)
|> top(3) # Would be [foo: 2, bar: 1, baz: 1]

References

Original Paper