High Dynamic Range (HDR) Histogram for Elixir

Get percentile values from stream of input. Each histogram has a configurable min, max and precision value which control both the accuracy and memory requirements.

Learn more about HDR Histograms.

Writes (which ought to substantially outnumber reads) are in constant time regardless of the configuration. Querying a histogram for a specific percentile, or other metric (max value, min value) is slightly slower as the range and/or precision grows. However, the implementation relies on write-optimized ETS tables and thus operations are not serialized (beyond the locks used by ETS).

A histogram with a range of 1..1_000_000 and a precision of 3 takes roughly 16000 bytes.

Usage

The fist step involves creating a registry:

defmodule MyApp.Stats do
  use Histogrex

  histogrex :load_user, min: 1, max: 10_000_000, precision: 3
  histogrex :db_save_settings, min: 1, max: 10_000, precision: 2
  ...
end

And then adding this module as a worker to your application's supervisor tree:

worker(MyApp.Stats, [])

Values can then be recorded via the record! or record functions:

alias MyApp.Stats

Stats.record!(:load_user, 233)
Stats.record!(:db_save_settings, 84)

min, max, total_count and value_at_quantile are used to query the histogram:

alias MyApp.Stats
Stats.mean(:load_user)
Stats.max(:db_save_settings)
Stats.total_count(:db_save_settings)
Stats.value_at_quantile(:load_user, 99.9)

It would be reasonable to have a GenServer dump these statistics to some log ingestor every X seconds (10? 60?). This would be the only reader (though concurrent reads are fully supported).

Implementation

The core histogram implementation is taken from the Go version.

In order to maintain high write throughput, data access is not serialized through a single process. Instead, write-optimized (via the write_concurrency: true option) ETS tables are used. Functions are fully executed by the calling process. A write consists of as single :ets.update_counter call. A read consists of a single :ets.lookup.

It is possible to use multiple registries, though I suspect this will have no impact unless thousands or metrics with high load are used. Benchmark it.

You can get the memory requirements of a registry by calling:

:ets.info(MyApp.Stats)[:memory]

There's a bit of additional overhead for each histogram, but that will give you the bulk of it.