Natch
β‘ High-performance native ClickHouse client for Elixir
Natch provides fast access to ClickHouse using the native TCP protocol (port 9000) via C++ NIFs. Native protocol benefits include binary columnar format, efficient compression, and reduced overhead compared to HTTP-based clients.
Why Natch?
- π Native Protocol Performance - Direct TCP connection using ClickHouse's binary protocol
- π Columnar-First Design - API designed for analytics workloads, not OLTP
- π§ Production Ready - 227 tests covering all ClickHouse types including complex nested structures
- πͺ Type Complete - Full support for all ClickHouse types: primitives, dates, decimals, UUIDs, arrays, maps, tuples, nullables, enums, and low cardinality
- π― Zero-Copy Efficiency - Bulk operations with minimal overhead
- π Memory Safe - Built with FINE for crash-proof NIFs
Requirements
- Elixir: 1.18+ / Erlang 27+
- ClickHouse: Server 20.3+
- Build: C++17 compiler, CMake 3.15+, clickhouse-cpp dependencies
Installation
Add natch to your list of dependencies in mix.exs:
def deps do
[
{:natch, "~> 0.2.0"}
]
endPrebuilt binaries are available for macOS (x86_64, ARM64) and Linux (x86_64, ARM64) and will be downloaded automatically during installation.
Building from Source
If prebuilt binaries are not available for your platform, or if you prefer to build from source:
# Clone the repository
git clone https://github.com/Intellection/natch.git
cd natch
# Initialize the clickhouse-cpp submodule
git submodule update --init --recursive
# Build
mix deps.get
mix compileBuild Requirements:
- C++17 compiler (GCC 7+, Clang 5+, or MSVC 2017+)
- CMake 3.15+
- OpenSSL development headers
- Git (for submodule)
Quick Start
Local ClickHouse
# Start a connection
{:ok, conn} = Natch.Connection.start_link(
host: "localhost",
port: 9000,
database: "default"
)
# Create a table
Natch.Connection.execute(conn, """
CREATE TABLE events (
id UInt64,
user_id UInt32,
event_type LowCardinality(String),
properties Map(String, String),
tags Array(String),
timestamp DateTime,
metadata Nullable(String)
) ENGINE = MergeTree()
ORDER BY (timestamp, user_id)
""")
# Insert data (columnar format - optimal performance!)
columns = %{
id: [1, 2, 3],
user_id: [100, 101, 100],
event_type: ["click", "view", "click"],
properties: [
%{"page" => "home", "referrer" => "google"},
%{"page" => "about"},
%{"page" => "pricing"}
],
tags: [["web", "desktop"], ["mobile"], ["web"]],
timestamp: [~U[2024-01-01 10:00:00Z], ~U[2024-01-01 10:01:00Z], ~U[2024-01-01 10:02:00Z]],
metadata: ["extra", nil, "data"]
}
schema = [
id: :uint64,
user_id: :uint32,
event_type: {:low_cardinality, :string},
properties: {:map, :string, :string},
tags: {:array, :string},
timestamp: :datetime,
metadata: {:nullable, :string}
]
:ok = Natch.insert(conn, "events", columns, schema)
# Query data
{:ok, results} = Natch.Connection.select_rows(conn, "SELECT * FROM events WHERE user_id = 100")
IO.inspect(results)
# => [
# %{id: 1, user_id: 100, event_type: "click", ...},
# %{id: 3, user_id: 100, event_type: "click", ...}
# ]ClickHouse Cloud (SSL)
ClickHouse Cloud requires SSL/TLS connections on port 9440:
{:ok, conn} = Natch.Connection.start_link(
host: "your-instance.clickhouse.cloud",
port: 9440,
database: "default",
user: "default",
password: "your-password",
ssl: true # Enable SSL/TLS
)Note: SSL support requires clickhouse-cpp to be built with OpenSSL. If you get a Natch.OpenSSLError saying "Library was built with no SSL support", the C++ library needs to be rebuilt with -DWITH_OPENSSL=ON CMake flag. This is typically handled automatically by package managers on systems with OpenSSL development libraries installed.
Timeout Configuration
Configure socket-level timeouts to prevent operations from hanging indefinitely in production:
{:ok, conn} = Natch.Connection.start_link(
host: "localhost",
port: 9000,
connect_timeout: 5_000, # Time to establish TCP connection (default: 5000ms)
recv_timeout: 60_000, # Time to receive data from server (default: 0 = infinite)
send_timeout: 60_000 # Time to send data to server (default: 0 = infinite)
)Important: The default recv_timeout is 0 (no timeout), which allows long-running analytical queries to complete. For production use, consider setting explicit timeouts based on your workload. When a timeout occurs, a Natch.ConnectionError is raised.
Benchmarks
Real-world performance comparison vs Pillar (HTTP-based client) on M3 Pro, tested with 7-column schema.
Important: Benchmarks use Pillar.select/2 which parses JSON responses. Using Pillar.query/2 (which returns unparsed TSV strings) is not a fair comparison.
INSERT Performance
| Rows | Natch | Pillar | Speedup | Memory (Natch) | Memory (Pillar) |
|---|---|---|---|---|---|
| 10k | 13.5 ms | 63.9 ms | 4.7x faster | 976 B | 45 MB |
| 100k | 184 ms | 626 ms | 3.4x faster | 976 B | 452 MB |
| 1M | 2,094 ms | 5,545 ms | 2.6x faster | 976 B | 4.5 GB |
Natch uses ~4.6 million times less memory than Pillar for inserts due to columnar format.
SELECT Performance
| Query Type | Natch | Pillar | Speedup | Memory (Natch) | Memory (Pillar) |
|---|---|---|---|---|---|
| Aggregation | 3.6 ms | 4.9 ms | 1.4x faster | 544 B | 17 KB |
| Filtered (10k rows) | 12 ms | 53 ms | 4.4x faster | 128 B | 30 MB |
| Full scan (1M rows) | 802 ms | 4,980 ms | 6.2x faster | 128 B | 3 GB |
Natch uses ~5.5 million times less memory than Pillar for large SELECT queries due to streaming columnar format vs materialized row-oriented maps.
Key Takeaways
- Native protocol is faster - Natch's native TCP protocol with binary columnar format outperforms HTTP+JSON
- Massive memory efficiency - Millions of times less memory usage due to streaming and columnar format
- Scales better - Performance advantage increases with data size (6.2x for 1M rows vs 1.4x for aggregations)
See bench/README.md and BINARY_PASSTHROUGH.md for detailed analysis and methodology.
Core Concepts
Columnar Format (Recommended)
Natch uses a columnar-first API that matches ClickHouse's native storage format:
# β
GOOD: Columnar format - 3 NIF calls for any number of rows
columns = %{
id: [1, 2, 3, 4, 5],
name: ["Alice", "Bob", "Charlie", "Dave", "Eve"],
value: [100.0, 200.0, 300.0, 400.0, 500.0]
}
Natch.insert(conn, "table", columns, schema)Why columnar?
- 100x faster - M NIF calls instead of NΓM (rows Γ columns)
- Natural fit - ClickHouse is a columnar database
- Analytics-first - Matches how you work with data (SUM, AVG, GROUP BY operate on columns)
- Better compression - Column values compressed together
Type System
Natch supports all ClickHouse types with full roundtrip fidelity:
Primitive Types
schema = [
id: :uint64, # UInt8, UInt16, UInt32, UInt64
count: :int32, # Int8, Int16, Int32, Int64
price: :float64, # Float32, Float64
name: :string, # String
active: :bool # Bool (UInt8)
]Date and Time
schema = [
created: :date, # Date (days since epoch)
updated: :datetime, # DateTime (seconds since epoch)
logged: :datetime64 # DateTime64(6) - microsecond precision
]
# Works with Elixir DateTime structs or integers
columns = %{
created: [~D[2024-01-01], ~D[2024-01-02]],
updated: [~U[2024-01-01 10:00:00Z], ~U[2024-01-01 11:00:00Z]],
logged: [~U[2024-01-01 10:00:00.123456Z], 1704103200123456]
}Decimals and UUIDs
schema = [
amount: :decimal64, # Decimal64(9) - fixed-point decimals
user_id: :uuid # UUID - 128-bit identifiers
]
columns = %{
amount: [Decimal.new("99.99"), Decimal.new("149.50")],
user_id: ["550e8400-e29b-41d4-a716-446655440000", "6ba7b810-9dad-11d1-80b4-00c04fd430c8"]
}Nullable Types
schema = [
description: {:nullable, :string},
count: {:nullable, :uint64}
]
columns = %{
description: ["text", nil, "more text"],
count: [100, nil, 200]
}Arrays
schema = [
tags: {:array, :string},
matrix: {:array, {:array, :uint64}}, # Nested arrays
nullable_list: {:array, {:nullable, :string}} # Arrays with nulls
]
columns = %{
tags: [["web", "mobile"], ["desktop"], []],
matrix: [[[1, 2], [3, 4]], [[5, 6]]],
nullable_list: [["a", nil, "b"], [nil, "c"]]
}Maps and Tuples
schema = [
properties: {:map, :string, :uint64},
location: {:tuple, [:string, :float64, :float64]},
metrics: {:map, :string, {:nullable, :uint64}} # Maps with nullable values
]
columns = %{
properties: [%{"clicks" => 10, "views" => 100}, %{"shares" => 5}],
location: [{"NYC", 40.7128, -74.0060}, {"LA", 34.0522, -118.2437}],
metrics: [%{"count" => 100, "missing" => nil}, %{"total" => nil}]
}Enums and LowCardinality
schema = [
status: {:enum8, [{"pending", 1}, {"active", 2}, {"archived", 3}]},
category: {:low_cardinality, :string},
tags: {:array, {:low_cardinality, {:nullable, :string}}} # Complex nesting!
]
columns = %{
status: ["pending", "active", "pending"],
category: ["news", "sports", "news"],
tags: [["tech", nil], ["sports"], ["tech", "startup"]]
}Usage Guide
Connection Management
# Basic connection
{:ok, conn} = Natch.Connection.start_link(
host: "localhost",
port: 9000
)
# With authentication and options
{:ok, conn} = Natch.Connection.start_link(
host: "clickhouse.example.com",
port: 9000,
database: "analytics",
user: "app_user",
password: "secret",
compression: :lz4,
name: MyApp.ClickHouse
)Connection options:
:host- Server hostname (default:"localhost"):port- Native TCP port (default:9000):database- Database name (default:"default"):user- Username (optional):password- Password (optional):compression- Compression::lz4,:none(default::lz4):name- Register connection with a name (optional)
Executing Queries
DDL Operations
# Create table
:ok = Natch.Connection.execute(conn, """
CREATE TABLE users (
id UInt64,
name String,
created DateTime
) ENGINE = MergeTree()
ORDER BY id
""")
# Drop table
:ok = Natch.Connection.execute(conn, "DROP TABLE users")
# Alter table
:ok = Natch.Connection.execute(conn, "ALTER TABLE users ADD COLUMN age UInt8")SELECT Queries
Natch provides two query formats to suit different use cases:
Row-Major Format (Traditional)
Returns results as a list of maps, where each map represents a row:
# Simple query
{:ok, rows} = Natch.Connection.select_rows(conn, "SELECT * FROM users")
# => {:ok, [%{id: 1, name: "Alice"}, %{id: 2, name: "Bob"}]}
# With WHERE clause
{:ok, rows} = Natch.Connection.select_rows(conn, "SELECT * FROM users WHERE id > 100")
# Aggregations
{:ok, [result]} = Natch.Connection.select_rows(conn, """
SELECT
event_type,
count() as count,
uniqExact(user_id) as unique_users
FROM events
GROUP BY event_type
ORDER BY count DESC
""")Columnar Format (Efficient for Analytics)
Returns results as a map of column lists, ideal for large result sets and data analysis:
# Query returns columnar format
{:ok, cols} = Natch.Connection.select_cols(conn, "SELECT * FROM users")
# => {:ok, %{id: [1, 2, 3], name: ["Alice", "Bob", "Charlie"]}}
# Perfect for analytics workflows
{:ok, data} = Natch.Connection.select_cols(conn, "SELECT user_id, value FROM events")
# => {:ok, %{user_id: [1, 2, 1, 3], value: [10.5, 20.0, 15.5, 30.0]}}
# Easy integration with data processing libraries
%{user_id: user_ids, value: values} = data
total = Enum.sum(values)Inserting Data
High-Level API (Recommended)
# Columnar format - optimal performance
columns = %{
id: [1, 2, 3],
name: ["Alice", "Bob", "Charlie"]
}
schema = [id: :uint64, name: :string]
:ok = Natch.insert(conn, "users", columns, schema)Low-Level API (Advanced)
# Build block manually for maximum control
block = Natch.Native.block_create()
# Create and populate columns
id_col = Natch.Column.new(:uint64)
Natch.Column.append_bulk(id_col, [1, 2, 3])
Natch.Native.block_append_column(block, "id", id_col.ref)
name_col = Natch.Column.new(:string)
Natch.Column.append_bulk(name_col, ["Alice", "Bob", "Charlie"])
Natch.Native.block_append_column(block, "name", name_col.ref)
# Get client and insert
client_ref = GenServer.call(conn, :get_client)
Natch.Native.client_insert(client_ref, "users", block)Performance Tips
1. Use Columnar Format
# β BAD: Row-oriented (requires conversion)
rows = [
%{id: 1, name: "Alice"},
%{id: 2, name: "Bob"}
]
# β
GOOD: Columnar (direct insertion)
columns = %{
id: [1, 2],
name: ["Alice", "Bob"]
}2. Batch Your Inserts
# Insert in batches of 10,000-100,000 rows for optimal throughput
chunk_size = 50_000
data
|> Stream.chunk_every(chunk_size)
|> Enum.each(fn chunk ->
columns = transpose_to_columnar(chunk)
Natch.insert(conn, "table", columns, schema)
end)3. Use Appropriate Types
# β
GOOD: LowCardinality for repeated strings
schema = [status: {:low_cardinality, :string}]
# β
GOOD: Enum for known values
schema = [priority: {:enum8, [{"low", 1}, {"medium", 2}, {"high", 3}]}]
# β
GOOD: Use smallest integer type that fits
schema = [age: :uint8] # Not :uint644. Enable Compression
# LZ4 compression reduces bandwidth by ~70% for typical workloads
{:ok, conn} = Natch.Connection.start_link(
host: "localhost",
port: 9000,
compression: :lz4 # Enabled by default
)Complex Nesting Examples
Natch supports arbitrarily complex nested types:
# Triple-nested arrays with nullables
schema = [matrix: {:array, {:array, {:nullable, :uint64}}}]
columns = %{matrix: [[[1, nil, 3], [nil, 5]], [[10, 20], [], [nil]]]}
# Maps with array values
schema = [data: {:map, :string, {:array, :uint64}}]
columns = %{data: [%{"ids" => [1, 2, 3], "counts" => [10, 20]}]}
# Tuples with complex elements
schema = [record: {:tuple, [:string, {:array, :uint64}, {:nullable, :float64}]}]
columns = %{record: [{"Alice", [1, 2, 3], 99.9}, {"Bob", [4, 5], nil}]}
# Array of low cardinality nullable strings (triple wrapper!)
schema = [tags: {:array, {:low_cardinality, {:nullable, :string}}}]
columns = %{tags: [["tech", nil, "startup"], [nil, "news"]]}All these patterns work with full INSERTβSELECT roundtrip fidelity.
Architecture
Natch uses a three-layer architecture:
βββββββββββββββββββββββββββββββββββββββ
β Elixir Application Layer β
β - Natch.insert/4 β
β - Natch.Connection GenServer β
β - Idiomatic Elixir API β
βββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββ
β FINE NIF Layer (C++) β
β - Type conversion Elixir β C++ β
β - Resource management β
β - Exception handling β
βββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββ
β clickhouse-cpp Library β
β - Native TCP protocol β
β - Binary columnar format β
β - LZ4/ZSTD compression β
βββββββββββββββββββββββββββββββββββββββWhy FINE + clickhouse-cpp?
- Native Protocol - Binary columnar format with efficient compression
- Mature Library - Leverage official ClickHouse C++ client
- Type Safety - FINE provides crash-proof NIFs
- Fast Development - 4-6 weeks vs 4-6 months for pure Elixir
Development
Running ClickHouse Locally
# Start ClickHouse
docker-compose up -d
# Check it's running
clickhouse-client --query "SELECT version()"Running Tests
# Run all tests
mix test
# Run with tracing
mix test --trace
# Run specific test file
mix test test/nesting_integration_test.exsTest Coverage
- β 227 tests passing (as of Phase 5E)
- β All primitive types (integers, floats, strings, bools)
- β All temporal types (Date, DateTime, DateTime64)
- β All special types (UUID, Decimal64, Enum8/16, LowCardinality)
- β All complex types (Array, Map, Tuple, Nullable)
- β 14 comprehensive nesting integration tests
- β Full INSERTβSELECT roundtrip validation
Roadmap
Completed (Phase 1-5)
- β Native TCP protocol support
- β All ClickHouse primitive types
- β All temporal types (Date, DateTime, DateTime64)
- β UUID and Decimal64 support
- β Nullable types
- β Array types with arbitrary nesting
- β Map and Tuple types
- β Enum8/Enum16 types
- β LowCardinality types
- β Complex type nesting (Array(Map(String, Nullable(T))), etc.)
- β Columnar insert API
- β LZ4 compression
Planned (Phase 6+)
- β³ Explorer DataFrame integration (zero-copy)
- β³ SSL/TLS support
- β³ Connection pooling
- β³ Async query execution
- β³ Prepared statements
Not Planned
- β Ecto integration (ClickHouse is OLAP, not OLTP - not a good fit)
- β HTTP protocol support (use native TCP for better performance)
Contributing
Contributions are welcome! Areas where we'd love help:
- Additional type support - FixedString, IPv4/IPv6, Geo types
- Performance optimization - Zero-copy paths, SIMD operations
- Documentation - More examples, guides
- Testing - Edge cases, stress tests
Please feel free to submit a Pull Request or open an issue.
License
MIT License - See LICENSE file for details.
Acknowledgments
- Built with FINE for crash-proof NIFs
- Powered by clickhouse-cpp official C++ client
- Inspired by the excellent work of the ClickHouse and Elixir communities
Resources
- ClickHouse Documentation
- ClickHouse Data Types
- FINE Documentation
- Implementation Plan - Detailed architecture and design decisions