Deflate
Pure Elixir implementation of DEFLATE (RFC 1951) and zlib (RFC 1950) decompression with exact byte consumption tracking.
Why This Library?
Erlang's :zlib module is fast (it's a C NIF), but it has a critical limitation: you can't determine exactly how many bytes were consumed when decompressing a stream.
This matters when parsing binary formats with concatenated compressed streams, like:
- Git pack files - multiple zlib-compressed objects back-to-back
- PNG chunks - IDAT chunks contain zlib streams
- PDF streams - FlateDecode compressed content
- ZIP files - DEFLATE compressed entries
# With :zlib - no way to know where stream 1 ends and stream 2 begins
:zlib.uncompress(concatenated_streams) # Returns data, but how many bytes consumed?
# With Deflate - exact byte tracking
{:ok, data1, bytes_consumed} = Deflate.inflate(concatenated_streams)
remaining = binary_part(concatenated_streams, bytes_consumed, byte_size(concatenated_streams) - bytes_consumed)
{:ok, data2, _} = Deflate.inflate(remaining)Installation
Add deflate to your list of dependencies in mix.exs:
def deps do
[
{:deflate, "~> 1.0"}
]
endUsage
Decompress zlib-wrapped data
# Standard zlib format (2-byte header + deflate + 4-byte adler32)
compressed = :zlib.compress("hello world")
{:ok, "hello world", bytes_consumed} = Deflate.inflate(compressed)Decompress raw DEFLATE data
# Raw DEFLATE without zlib wrapper
{:ok, data, bytes_consumed} = Deflate.inflate_raw(raw_deflate_data)Parse concatenated streams (e.g., git pack files)
defmodule GitPack do
def parse_objects(data, count, objects \\ [])
def parse_objects(_data, 0, objects), do: Enum.reverse(objects)
def parse_objects(data, remaining, objects) do
{type, size, header_len} = parse_object_header(data)
rest = binary_part(data, header_len, byte_size(data) - header_len)
# Deflate tells us exactly how many compressed bytes were consumed
{:ok, content, consumed} = Deflate.inflate(rest)
after_object = binary_part(rest, consumed, byte_size(rest) - consumed)
parse_objects(after_object, remaining - 1, [{type, content} | objects])
end
endStreaming decompression (hash/store without full memory)
For large files or when you want to compute hashes without holding the entire decompressed content in memory, use the streaming API:
# Stream directly to SHA-256 hasher
hasher = :crypto.hash_init(:sha256)
{:ok, final_hasher, bytes_consumed} = Deflate.inflate_stream(compressed, hasher, fn chunk, h ->
:crypto.hash_update(h, chunk)
end)
hash = :crypto.hash_final(final_hasher)# Stream to file
{:ok, file} = File.open("output.bin", [:write, :binary])
{:ok, _, _} = Deflate.inflate_stream(compressed, nil, fn chunk, _ ->
IO.binwrite(file, chunk)
nil
end)
File.close(file)Note::zlib cannot do streaming with byte tracking at all. It either gives you:
-
Streaming decompression (
:zlib.inflateInit/1+:zlib.inflate/2) but no way to know where the stream ends -
Full decompression (
:zlib.uncompress/1) but no byte tracking
This library provides both streaming output AND exact byte consumption tracking.
Chunked input (for network streams)
When data arrives in chunks (e.g., from SSH or HTTP streaming), use the stateful decoder:
alias Deflate.Decoder
# Initialize decoder
{:ok, decoder} = Decoder.new()
# Feed chunks as they arrive from network
{:ok, output1, decoder} = Decoder.decode(decoder, packet1)
{:ok, output2, decoder} = Decoder.decode(decoder, packet2)
{:ok, output3, decoder} = Decoder.decode(decoder, packet3)
# When stream ends, get final results
{:done, <<>>, bytes_consumed} = Decoder.finish(decoder)The decoder maintains state between chunks, handling all the complexity of:
- Bit-level boundaries (Huffman codes can span byte boundaries)
- Back-references that cross chunk boundaries
- Dynamic Huffman table parsing across chunks
This is essential for protocols like git-over-SSH where pack data arrives in network packets.
Performance
The Honest Numbers
For decompression speed, :zlib (C NIF) is faster:
| Input Type | Deflate (Elixir) | :zlib (C NIF) | Difference |
|---|---|---|---|
| Random 100KB | 13x faster | - | Stored blocks, minimal decoding |
| Text 100KB | 6x slower | - | ~177 μs difference |
| Text 10KB | 30x slower | - | ~149 μs difference |
Why It Doesn't Matter
Those ratios sound scary, but look at the absolute times:
| Input | Deflate | :zlib | Actual Difference |
|---|---|---|---|
| Text 100KB | 213 μs | 36 μs | 0.18 ms |
| Text 10KB | 154 μs | 5 μs | 0.15 ms |
| Text 100B | 102 μs | 2 μs | 0.10 ms |
In real-world applications, decompression is rarely the bottleneck:
Typical file processing pipeline:
Decompress: ~200 μs (this library)
Parse content: ~5,000 μs (JSON/AST/etc)
Database I/O: ~2,000 μs
Network I/O: ~10,000+ μs
─────────────────────────────
Decompression: ~1% of total timeUse this library when you need byte tracking. Use :zlib when you need raw speed and don't care about consumption tracking.
Features
- Pure Elixir - No NIFs, no ports, works everywhere BEAM runs
- Exact byte tracking - Know precisely how many bytes were consumed
- Streaming output - Decompress to callbacks for hashing/storage without full memory
- Chunked input - Feed data as it arrives from network (SSH, HTTP streams)
- Full DEFLATE support - Stored, fixed Huffman, and dynamic Huffman blocks
- zlib wrapper support - Handles standard zlib header/trailer
- Zero dependencies - Only uses Erlang/OTP standard library
Limitations
- Decompression only - This library does not compress data (use
:zlib.compress/1for that) - No preset dictionaries - zlib preset dictionary feature not supported
How It Works
The library implements:
- zlib header parsing (RFC 1950) - CMF, FLG bytes, optional dict, Adler-32 checksum
- DEFLATE decompression (RFC 1951):
- Stored blocks (type 0) - uncompressed data
- Fixed Huffman (type 1) - predefined code tables
- Dynamic Huffman (type 2) - custom code tables in stream
- Bit-level reading - LSB-first bit extraction with precise tracking
- LZ77 back-references - Copy from sliding window with overlap handling
Compile-time generated lookup tables provide O(1) Huffman symbol decoding.
License
MIT License - see LICENSE file.