ExZarr

Hex versionHex docsLicenseCICoverage Status

Elixir implementation of Zarr: compressed, chunked, N-dimensional arrays designed for parallel computing and scientific data storage.

Full Zarr v3 Support: ExZarr implements both Zarr v2 and v3 specifications with production-ready support for v3's unified codec pipeline, improved metadata format, and modern features. Automatic version detection ensures seamless interoperability. See ZARR_V3_STATUS.md for complete v3 support details.

Features

Installation

Add ex_zarr to your list of dependencies in mix.exs:

def deps do
[
{:ex_zarr, "~> 1.1"}
]
end

Quick Start

Creating an Array

# Create a Zarr v3 array (recommended for new projects)
{:ok, array} = ExZarr.create(
shape: {1000, 1000},
chunks: {100, 100},
dtype: :float64,
codecs: [
%{name: "bytes"},
%{name: "gzip", configuration: %{level: 5}}
],
zarr_version: 3,
storage: :memory
)
# Or use v2 format for compatibility with older tools
{:ok, array_v2} = ExZarr.create(
shape: {1000, 1000},
chunks: {100, 100},
dtype: :float64,
compressor: :zlib,
zarr_version: 2,
storage: :memory
)

Saving and Loading Arrays

# Save array to filesystem
:ok = ExZarr.save(array, path: "/tmp/my_array")
# Open existing array
{:ok, array} = ExZarr.open(path: "/tmp/my_array")
# Load entire array into memory
{:ok, data} = ExZarr.load(path: "/tmp/my_array")

Streaming Large Arrays (v1.1+)

Process arrays larger than memory with lazy chunk streaming:

{:ok, array} = ExZarr.open(path: "/data/large_dataset")
array
|> ExZarr.Array.stream_chunks(concurrency: 8, ordered: false)
|> Stream.map(fn {_index, data} -> process_chunk(data) end)
|> Stream.run()
# Row-wise slice streaming
array
|> ExZarr.Array.stream_slices(0, start: {0, 0}, stop: {100, 10})
|> Enum.each(fn {_start, row} -> process_row(row) end)

Attach telemetry handlers for production observability - see guides/telemetry.md.

ExZarr.Array.write_stream(array, chunk_stream,
batch_size: 4,
checkpoint: fn stats -> save_progress(stats) end
)

See migration_guide_v1_1_0.md and docs/educational/v1_1_streaming_guide.md.

Performance

ExZarr v0.8+ includes major performance optimizations:

Benchmark results (400×400 array, 16 chunks):

See Performance Guide for tuning recommendations and Benchmarks for running your own tests.

# Run quick performance check (completes in 6 seconds)
mix run benchmarks/slicing_bench_quick.exs

Zarr Format Support

ExZarr provides production-ready support for both Zarr v2 and v3 specifications. Arrays can be created in either format, and opening arrays automatically detects the version.

Zarr v3 is fully implemented with a unified codec pipeline and improved metadata format:

# Create v3 array with unified codec pipeline
{:ok, array} = ExZarr.create(
shape: {1000, 1000},
chunks: {100, 100},
dtype: :float64,
codecs: [
%{name: "bytes"}, # Required array-to-bytes codec
%{name: "gzip", configuration: %{level: 5}} # Optional compression
],
zarr_version: 3,
storage: :filesystem,
path: "/tmp/my_v3_array"
)

Zarr v2 (Default for Compatibility)

Zarr v2 uses separate filters and compressor configuration:

# Create v2 array (explicit version)
{:ok, array} = ExZarr.create(
shape: {1000, 1000},
chunks: {100, 100},
dtype: :float64,
filters: [{:shuffle, [elementsize: 8]}],
compressor: :zlib,
zarr_version: 2,
storage: :filesystem,
path: "/tmp/my_v2_array"
)

Automatic Version Detection

When opening arrays, ExZarr automatically detects the format version:

# Opens v2 or v3 transparently
{:ok, array} = ExZarr.open(path: "/tmp/my_array")
# Check which version was detected
array.version # Returns 2 or 3

Key Differences Between v2 and v3

Featurev2v3
Metadata file.zarrayzarr.json
Chunk keysDot-separated (0.1.2)Slash-separated with prefix (c/0/1/2)
Codec organizationSeparate filters and compressorUnified codecs array
Data typesNumPy-style strings (<f8)Simplified names (float64)
GroupsSeparate .zgroup filesUnified zarr.json with node_type
AttributesSeparate .zattrs filesEmbedded in zarr.json

Converting from v2 to v3

v2-style configuration is automatically converted when creating v3 arrays:

# This v2-style configuration
{:ok, array} = ExZarr.create(
shape: {1000},
chunks: {100},
dtype: :int64,
filters: [{:shuffle, [elementsize: 8]}],
compressor: :zlib,
zarr_version: 3 # Request v3 format
)
# Automatically converts to v3 codec pipeline:
# [
# %{name: "shuffle", configuration: %{elementsize: 8}},
# %{name: "bytes"},
# %{name: "gzip", configuration: %{level: 5}}
# ]

For detailed migration guidance, see docs/V2_TO_V3_MIGRATION.md.

Working with Groups

# Create a hierarchical group structure
{:ok, root} = ExZarr.Group.create("/data",
storage: :filesystem,
path: "/tmp/zarr_data"
)
# Create arrays within the group
{:ok, measurements} = ExZarr.Group.create_array(root, "measurements",
shape: {1000},
chunks: {100},
dtype: :float64
)
# Create subgroups
{:ok, subgroup} = ExZarr.Group.create_group(root, "experiments")

Interoperability with Python

ExZarr is fully compatible with Python's zarr library. Arrays created by one can be read by the other:

# Run the interoperability demo
elixir examples/python_interop_demo.exs

This demonstrates:

For detailed interoperability information, see INTEROPERABILITY.md which covers:

Custom Codecs Example

See how to create and use custom compression codecs:

# Run the custom codec example
mix run examples/custom_codec_example.exs

This demonstrates:

Custom Storage Backend Example

See the test suite for a complete example of implementing a custom storage backend:

# View the custom storage tests
cat test/ex_zarr_custom_storage_test.exs

The example demonstrates:

Supported Data Types

ExZarr supports the following data types:

All data types use little-endian byte order by default, consistent with the Zarr specification.

Compression Codecs

ExZarr provides the following built-in compression options:

The :zlib codec uses Erlang's built-in :zlib module for maximum reliability and compatibility.

Custom Codecs

ExZarr supports custom codecs through a behavior-based plugin system. You can create your own compression, checksum, or transformation codecs:

defmodule MyCustomCodec do
@behaviour ExZarr.Codecs.Codec
@impl true
def codec_id, do: :my_codec
@impl true
def codec_info do
%{
name: "My Custom Codec",
version: "1.0.0",
type: :compression, # or :transformation
description: "My custom compression algorithm"
}
end
@impl true
def available?, do: true
@impl true
def encode(data, opts) when is_binary(data) do
# Your encoding logic here
{:ok, compressed_data}
end
@impl true
def decode(data, opts) when is_binary(data) do
# Your decoding logic here
{:ok, decompressed_data}
end
@impl true
def validate_config(opts) do
# Validate options
:ok
end
end
# Register your codec
:ok = ExZarr.Codecs.register_codec(MyCustomCodec)
# Use it like any built-in codec
{:ok, array} = ExZarr.create(
shape: {1000, 1000},
chunks: {100, 100},
compressor: :my_codec
)

For complete examples, see examples/custom_codec_example.exs which includes:

Custom codec features:

Storage Backends

ExZarr includes three built-in storage backends:

Arrays stored on the filesystem use the standard Zarr format:

Using Zip Storage

Zip storage stores the entire array (metadata + all chunks) in a single zip file:

# Create array with zip storage
{:ok, array} = ExZarr.create(
shape: {1000, 1000},
chunks: {100, 100},
dtype: :float64,
storage: :zip,
path: "/tmp/my_array.zip"
)
# Write data
ExZarr.Array.set_slice(array, data, start: {0, 0}, stop: {100, 100})
# Save to zip file
:ok = ExZarr.save(array, path: "/tmp/my_array.zip")
# Open existing zip
{:ok, reopened} = ExZarr.open(path: "/tmp/my_array.zip", storage: :zip)

Custom Storage Backends

ExZarr supports custom storage backends through a behavior-based plugin system, similar to custom codecs. Create backends for S3, databases, cloud storage, or any other storage system:

defmodule MyApp.S3Storage do
@behaviour ExZarr.Storage.Backend
@impl true
def backend_id, do: :s3
@impl true
def init(config) do
# Initialize S3 connection
bucket = Keyword.fetch!(config, :bucket)
{:ok, %{bucket: bucket, client: setup_s3_client()}}
end
@impl true
def read_chunk(state, chunk_index) do
# Read chunk from S3
key = build_s3_key(chunk_index)
AWS.S3.get_object(state.client, state.bucket, key)
end
@impl true
def write_chunk(state, chunk_index, data) do
# Write chunk to S3
key = build_s3_key(chunk_index)
AWS.S3.put_object(state.client, state.bucket, key, data)
end
# Implement other required callbacks...
end
# Register your backend
:ok = ExZarr.Storage.Registry.register(MyApp.S3Storage)
# Use it like any built-in backend
{:ok, array} = ExZarr.create(
shape: {1000, 1000},
chunks: {100, 100},
storage: :s3,
bucket: "my-zarr-data"
)

Custom storage backend features:

Required callbacks:

Cloud and Database Storage Backends

ExZarr includes several pre-built storage backends for cloud services and databases:

AWS S3 Storage

# Add dependencies
{:ex_aws, "~> 2.5"},
{:ex_aws_s3, "~> 2.5"}
# Register and use
:ok = ExZarr.Storage.Registry.register(ExZarr.Storage.Backend.S3)
{:ok, array} = ExZarr.create(
shape: {1000, 1000},
chunks: {100, 100},
storage: :s3,
bucket: "my-zarr-bucket",
prefix: "experiments/array1",
region: "us-west-2"
)

Azure Blob Storage

# Add dependency
{:azurex, "~> 0.3"}
# Register and use
:ok = ExZarr.Storage.Registry.register(ExZarr.Storage.Backend.AzureBlob)
{:ok, array} = ExZarr.create(
shape: {1000, 1000},
chunks: {100, 100},
storage: :azure_blob,
account_name: "mystorageaccount",
account_key: System.get_env("AZURE_STORAGE_KEY"),
container: "zarr-data",
prefix: "experiments/array1"
)

Google Cloud Storage

# Add dependencies
{:goth, "~> 1.4"},
{:req, "~> 0.4"}
# Register and use
:ok = ExZarr.Storage.Registry.register(ExZarr.Storage.Backend.GCS)
{:ok, array} = ExZarr.create(
shape: {1000, 1000},
chunks: {100, 100},
storage: :gcs,
bucket: "my-zarr-bucket",
prefix: "experiments/array1",
credentials: "/path/to/service-account.json"
)

Mnesia (Distributed Database)

# No external dependencies - Mnesia is built into Erlang/OTP
# Initialize Mnesia
:mnesia.create_schema([node()])
:mnesia.start()
# Register and use
:ok = ExZarr.Storage.Registry.register(ExZarr.Storage.Backend.Mnesia)
{:ok, array} = ExZarr.create(
shape: {1000, 1000},
chunks: {100, 100},
storage: :mnesia,
array_id: "experiment_001",
table_name: :zarr_storage
)

MongoDB GridFS

# Add dependency
{:mongodb_driver, "~> 1.4"}
# Register and use
:ok = ExZarr.Storage.Registry.register(ExZarr.Storage.Backend.MongoGridFS)
{:ok, array} = ExZarr.create(
shape: {1000, 1000},
chunks: {100, 100},
storage: :mongo_gridfs,
url: "mongodb://localhost:27017",
database: "zarr_db",
bucket: "arrays",
array_id: "experiment_001"
)

Mock Storage (Testing)

# No dependencies - built-in for testing
:ok = ExZarr.Storage.Registry.register(ExZarr.Storage.Backend.Mock)
# Test with error simulation
{:ok, array} = ExZarr.create(
shape: {100},
chunks: {10},
storage: :mock,
pid: self(),
error_mode: :random,
delay: 50 # Simulate 50ms latency
)
# Verify operations
assert_received {:mock_storage, :write_chunk, _}

Cloud Storage Features:

Database Storage Features:

Mock Storage Features:

Architecture

ExZarr uses:

Development

Requires Elixir ~> 1.14, OTP 25+, and Zig 0.16.0 for codec NIF compilation (via zigler 0.16). Install compression libraries before compiling:

# macOS
brew install zstd lz4 snappy c-blosc bzip2
# Ubuntu/Debian
sudo apt-get install libzstd-dev liblz4-dev libsnappy-dev libblosc-dev libbz2-dev
# Install dependencies
mix deps.get
# Compile the project (requires zig 0.16 on PATH)
mix compile
# Run tests
mix test
# Run tests with coverage
mix coveralls
# Run specific test suites
mix test test/ex_zarr_property_test.exs # Property-based tests
mix test test/ex_zarr_python_integration_test.exs # Python integration tests
# Run static analysis
mix credo
# Run type checking
mix dialyzer
# Generate documentation
mix docs

Quality Checks

Before committing, ensure all quality checks pass:

# Run all tests
mix test
# Check code style
mix credo --strict
# Run type checker
mix dialyzer
# Verify test coverage
mix coveralls

CI/CD

The project uses GitHub Actions for continuous integration. The CI pipeline:

Testing

ExZarr includes comprehensive test coverage:

Key testing areas:

Python Integration Tests

ExZarr includes integration tests that verify compatibility with Python's zarr library:

# Install Python dependencies (one-time setup)
./test/support/setup_python_tests.sh
# Run integration tests
mix test test/ex_zarr_python_integration_test.exs

These tests verify that:

Requirements: Python 3.6+, zarr-python 2.x, numpy

Documentation

Guides

Comprehensive guides for all skill levels:

Examples

Practical examples demonstrating real-world usage:

API Documentation

Full API documentation is available at hexdocs.pm/ex_zarr.

Key modules:

Roadmap

See ROADMAP.md for the full release plan.

v1.1.0 (current) - BEAM-native streaming: stream_chunks/2, stream_slices/3, write_stream/3, telemetry, Flow/GenStage/Broadway integrations, cloud patterns guide, and production cookbook.

Upcoming (high level):

Contributing

Contributions are welcome. Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass with mix test
  5. Run code quality checks with mix credo and mix dialyzer
  6. Submit a pull request

License

MIT

Credits

Inspired by zarr-python. Implements both Zarr v2 and v3 specifications for full compatibility with the broader Zarr ecosystem.