Deflate

Pure Elixir implementation of DEFLATE (RFC 1951) and zlib (RFC 1950) decompression with exact byte consumption tracking.

Hex.pmDocs

Why This Library?

Erlang's :zlib module is fast (it's a C NIF), but it has a critical limitation: you can't determine exactly how many bytes were consumed when decompressing a stream.

This matters when parsing binary formats with concatenated compressed streams, like:

# With :zlib - no way to know where stream 1 ends and stream 2 begins
:zlib.uncompress(concatenated_streams)  # Returns data, but how many bytes consumed?

# With Deflate - exact byte tracking
{:ok, data1, bytes_consumed} = Deflate.inflate(concatenated_streams)
remaining = binary_part(concatenated_streams, bytes_consumed, byte_size(concatenated_streams) - bytes_consumed)
{:ok, data2, _} = Deflate.inflate(remaining)

Installation

Add deflate to your list of dependencies in mix.exs:

def deps do
  [
    {:deflate, "~> 1.0"}
  ]
end

Usage

Decompress zlib-wrapped data

# Standard zlib format (2-byte header + deflate + 4-byte adler32)
compressed = :zlib.compress("hello world")

{:ok, "hello world", bytes_consumed} = Deflate.inflate(compressed)

Decompress raw DEFLATE data

# Raw DEFLATE without zlib wrapper
{:ok, data, bytes_consumed} = Deflate.inflate_raw(raw_deflate_data)

Parse concatenated streams (e.g., git pack files)

defmodule GitPack do
  def parse_objects(data, count, objects \\ [])

  def parse_objects(_data, 0, objects), do: Enum.reverse(objects)

  def parse_objects(data, remaining, objects) do
    {type, size, header_len} = parse_object_header(data)
    rest = binary_part(data, header_len, byte_size(data) - header_len)

    # Deflate tells us exactly how many compressed bytes were consumed
    {:ok, content, consumed} = Deflate.inflate(rest)

    after_object = binary_part(rest, consumed, byte_size(rest) - consumed)
    parse_objects(after_object, remaining - 1, [{type, content} | objects])
  end
end

Streaming decompression (hash/store without full memory)

For large files or when you want to compute hashes without holding the entire decompressed content in memory, use the streaming API:

# Stream directly to SHA-256 hasher
hasher = :crypto.hash_init(:sha256)

{:ok, final_hasher, bytes_consumed} = Deflate.inflate_stream(compressed, hasher, fn chunk, h ->
  :crypto.hash_update(h, chunk)
end)

hash = :crypto.hash_final(final_hasher)
# Stream to file
{:ok, file} = File.open("output.bin", [:write, :binary])

{:ok, _, _} = Deflate.inflate_stream(compressed, nil, fn chunk, _ ->
  IO.binwrite(file, chunk)
  nil
end)

File.close(file)

Note::zlib cannot do streaming with byte tracking at all. It either gives you:

This library provides both streaming output AND exact byte consumption tracking.

Chunked input (for network streams)

When data arrives in chunks (e.g., from SSH or HTTP streaming), use the stateful decoder:

alias Deflate.Decoder

# Initialize decoder
{:ok, decoder} = Decoder.new()

# Feed chunks as they arrive from network
{:ok, output1, decoder} = Decoder.decode(decoder, packet1)
{:ok, output2, decoder} = Decoder.decode(decoder, packet2)
{:ok, output3, decoder} = Decoder.decode(decoder, packet3)

# When stream ends, get final results
{:done, <<>>, bytes_consumed} = Decoder.finish(decoder)

The decoder maintains state between chunks, handling all the complexity of:

This is essential for protocols like git-over-SSH where pack data arrives in network packets.

Performance

The Honest Numbers

For decompression speed, :zlib (C NIF) is faster:

Input Type Deflate (Elixir) :zlib (C NIF) Difference
Random 100KB 13x faster - Stored blocks, minimal decoding
Text 100KB 6x slower - ~177 μs difference
Text 10KB 30x slower - ~149 μs difference

Why It Doesn't Matter

Those ratios sound scary, but look at the absolute times:

Input Deflate :zlib Actual Difference
Text 100KB 213 μs 36 μs 0.18 ms
Text 10KB 154 μs 5 μs 0.15 ms
Text 100B 102 μs 2 μs 0.10 ms

In real-world applications, decompression is rarely the bottleneck:

Typical file processing pipeline:
  Decompress:     ~200 μs  (this library)
  Parse content:  ~5,000 μs  (JSON/AST/etc)
  Database I/O:   ~2,000 μs
  Network I/O:    ~10,000+ μs
  ─────────────────────────────
  Decompression: ~1% of total time

Use this library when you need byte tracking. Use :zlib when you need raw speed and don't care about consumption tracking.

Features

Limitations

How It Works

The library implements:

  1. zlib header parsing (RFC 1950) - CMF, FLG bytes, optional dict, Adler-32 checksum
  2. DEFLATE decompression (RFC 1951):
    • Stored blocks (type 0) - uncompressed data
    • Fixed Huffman (type 1) - predefined code tables
    • Dynamic Huffman (type 2) - custom code tables in stream
  3. Bit-level reading - LSB-first bit extraction with precise tracking
  4. LZ77 back-references - Copy from sliding window with overlap handling

Compile-time generated lookup tables provide O(1) Huffman symbol decoding.

License

MIT License - see LICENSE file.