GorillaStream

CIHex.pmDocs

A high-performance, lossless compression library for time series data in Elixir, implementing Facebook's Gorilla and the Chimp (VLDB 2022) compression algorithms.

Features

Installation

Add gorilla_stream to your list of dependencies in mix.exs:

def deps do
  [
    {:gorilla_stream, "~> 3.0"}
  ]
end

For better compression ratios, optionally add zstd support:

def deps do
  [
    {:gorilla_stream, "~> 3.0"},
    {:ezstd, "~> 1.2"}  # Optional - enables zstd compression
  ]
end

Quick Start

# Sample time series data: {timestamp, value} tuples
data = [
  {1609459200, 23.5},
  {1609459260, 23.7},
  {1609459320, 23.4},
  {1609459380, 23.6},
  {1609459440, 23.8}
]

# Compress with Gorilla (default)
{:ok, compressed} = GorillaStream.compress(data)

# Compress with Chimp
{:ok, compressed} = GorillaStream.compress(data, algorithm: :chimp)

# Compress with Chimp128 (ring buffer of 128 previous values)
{:ok, compressed} = GorillaStream.compress(data, algorithm: :chimp128)

# Decompress — auto-detects algorithm
{:ok, decompressed} = GorillaStream.decompress(compressed)

# Verify lossless compression
decompressed == data  # => true

Algorithms

Gorilla (default)

Facebook's original algorithm (VLDB 2015). XOR-based encoding with 3-case variable-length flags. Best general-purpose choice.

Chimp

Improved XOR encoding (VLDB 2022) with:

Modest improvement over Gorilla (~2 bits per value saved).

Chimp128

Chimp with a ring buffer of 128 previous values. For each new value, it selects the reference value (from the last 128) that produces the most trailing zeros in the XOR result. This benefits data with repeating patterns or values that revisit previous states.

All three algorithms are fully streaming — encode/decode one point at a time with no lookahead or chunking required. Timestamps use the same delta-of-delta encoding across all algorithms.

# Algorithm selection
{:ok, _} = GorillaStream.compress(data)                        # Gorilla
{:ok, _} = GorillaStream.compress(data, algorithm: :chimp)     # Chimp
{:ok, _} = GorillaStream.compress(data, algorithm: :chimp128)  # Chimp128

# Decompress auto-detects — no algorithm option needed
{:ok, _} = GorillaStream.decompress(compressed)

Container Compression

GorillaStream supports optional container compression on top of the value encoding:

Option Description
:none No container compression (default)
:zlib Zlib compression (always available, built into Erlang)
:zstd Zstd compression (requires ezstd package)
:auto Use zstd if available, fall back to zlib
# Zstd compression (best ratio, requires ezstd)
{:ok, compressed} = GorillaStream.compress(data, compression: :zstd)
{:ok, decompressed} = GorillaStream.decompress(compressed, compression: :zstd)

# Combine algorithm + container
{:ok, compressed} = GorillaStream.compress(data, algorithm: :chimp128, compression: :zstd)
{:ok, decompressed} = GorillaStream.decompress(compressed, compression: :zstd)

# Check zstd availability at runtime
GorillaStream.zstd_available?()  # => true or false

Streaming and Chunked Processing

Process large datasets efficiently using chunked streams:

alias GorillaStream.Stream, as: GStream

large_dataset
|> GStream.compress_stream(chunk_size: 10_000)
|> Enum.to_list()
# => [{:ok, chunk1, metadata1}, {:ok, chunk2, metadata2}, ...]

See the User Guide for streaming, GenStage, Broadway, and Flow integration examples.

Analysis Tools

GorillaStream includes Mix tasks to help evaluate compression strategies:

# Analyze compression ratios across data patterns
mix gorilla_stream.compression_analysis

# Benchmark VictoriaMetrics preprocessing
mix gorilla_stream.vm_benchmark 10000

When to Use

Ideal for:

Not optimal for:

Documentation

License

MIT - see LICENSE for details.