Natch

⚑ High-performance native ClickHouse client for Elixir

Natch provides fast access to ClickHouse using the native TCP protocol (port 9000) via C++ NIFs. Native protocol benefits include binary columnar format, efficient compression, and reduced overhead compared to HTTP-based clients.

Why Natch?

Requirements

Installation

Add natch to your list of dependencies in mix.exs:

def deps do
  [
    {:natch, "~> 0.2.0"}
  ]
end

Prebuilt binaries are available for macOS (x86_64, ARM64) and Linux (x86_64, ARM64) and will be downloaded automatically during installation.

Building from Source

If prebuilt binaries are not available for your platform, or if you prefer to build from source:

# Clone the repository
git clone https://github.com/Intellection/natch.git
cd natch

# Initialize the clickhouse-cpp submodule
git submodule update --init --recursive

# Build
mix deps.get
mix compile

Build Requirements:

Quick Start

Local ClickHouse

# Start a connection
{:ok, conn} = Natch.Connection.start_link(
  host: "localhost",
  port: 9000,
  database: "default"
)

# Create a table
Natch.Connection.execute(conn, """
CREATE TABLE events (
  id UInt64,
  user_id UInt32,
  event_type LowCardinality(String),
  properties Map(String, String),
  tags Array(String),
  timestamp DateTime,
  metadata Nullable(String)
) ENGINE = MergeTree()
ORDER BY (timestamp, user_id)
""")

# Insert data (columnar format - optimal performance!)
columns = %{
  id: [1, 2, 3],
  user_id: [100, 101, 100],
  event_type: ["click", "view", "click"],
  properties: [
    %{"page" => "home", "referrer" => "google"},
    %{"page" => "about"},
    %{"page" => "pricing"}
  ],
  tags: [["web", "desktop"], ["mobile"], ["web"]],
  timestamp: [~U[2024-01-01 10:00:00Z], ~U[2024-01-01 10:01:00Z], ~U[2024-01-01 10:02:00Z]],
  metadata: ["extra", nil, "data"]
}

schema = [
  id: :uint64,
  user_id: :uint32,
  event_type: {:low_cardinality, :string},
  properties: {:map, :string, :string},
  tags: {:array, :string},
  timestamp: :datetime,
  metadata: {:nullable, :string}
]

:ok = Natch.insert(conn, "events", columns, schema)

# Query data
{:ok, results} = Natch.Connection.select_rows(conn, "SELECT * FROM events WHERE user_id = 100")
IO.inspect(results)
# => [
#      %{id: 1, user_id: 100, event_type: "click", ...},
#      %{id: 3, user_id: 100, event_type: "click", ...}
#    ]

ClickHouse Cloud (SSL)

ClickHouse Cloud requires SSL/TLS connections on port 9440:

{:ok, conn} = Natch.Connection.start_link(
  host: "your-instance.clickhouse.cloud",
  port: 9440,
  database: "default",
  user: "default",
  password: "your-password",
  ssl: true  # Enable SSL/TLS
)

Note: SSL support requires clickhouse-cpp to be built with OpenSSL. If you get a Natch.OpenSSLError saying "Library was built with no SSL support", the C++ library needs to be rebuilt with -DWITH_OPENSSL=ON CMake flag. This is typically handled automatically by package managers on systems with OpenSSL development libraries installed.

Timeout Configuration

Configure socket-level timeouts to prevent operations from hanging indefinitely in production:

{:ok, conn} = Natch.Connection.start_link(
  host: "localhost",
  port: 9000,
  connect_timeout: 5_000,   # Time to establish TCP connection (default: 5000ms)
  recv_timeout: 60_000,     # Time to receive data from server (default: 0 = infinite)
  send_timeout: 60_000      # Time to send data to server (default: 0 = infinite)
)

Important: The default recv_timeout is 0 (no timeout), which allows long-running analytical queries to complete. For production use, consider setting explicit timeouts based on your workload. When a timeout occurs, a Natch.ConnectionError is raised.

Benchmarks

Real-world performance comparison vs Pillar (HTTP-based client) on M3 Pro, tested with 7-column schema.

Important: Benchmarks use Pillar.select/2 which parses JSON responses. Using Pillar.query/2 (which returns unparsed TSV strings) is not a fair comparison.

INSERT Performance

Rows Natch Pillar Speedup Memory (Natch) Memory (Pillar)
10k 13.5 ms 63.9 ms 4.7x faster 976 B 45 MB
100k 184 ms 626 ms 3.4x faster 976 B 452 MB
1M 2,094 ms 5,545 ms 2.6x faster 976 B 4.5 GB

Natch uses ~4.6 million times less memory than Pillar for inserts due to columnar format.

SELECT Performance

Query Type Natch Pillar Speedup Memory (Natch) Memory (Pillar)
Aggregation 3.6 ms 4.9 ms 1.4x faster 544 B 17 KB
Filtered (10k rows) 12 ms 53 ms 4.4x faster 128 B 30 MB
Full scan (1M rows) 802 ms 4,980 ms 6.2x faster 128 B 3 GB

Natch uses ~5.5 million times less memory than Pillar for large SELECT queries due to streaming columnar format vs materialized row-oriented maps.

Key Takeaways

See bench/README.md and BINARY_PASSTHROUGH.md for detailed analysis and methodology.

Core Concepts

Columnar Format (Recommended)

Natch uses a columnar-first API that matches ClickHouse's native storage format:

# βœ… GOOD: Columnar format - 3 NIF calls for any number of rows
columns = %{
  id: [1, 2, 3, 4, 5],
  name: ["Alice", "Bob", "Charlie", "Dave", "Eve"],
  value: [100.0, 200.0, 300.0, 400.0, 500.0]
}

Natch.insert(conn, "table", columns, schema)

Why columnar?

Type System

Natch supports all ClickHouse types with full roundtrip fidelity:

Primitive Types

schema = [
  id: :uint64,           # UInt8, UInt16, UInt32, UInt64
  count: :int32,         # Int8, Int16, Int32, Int64
  price: :float64,       # Float32, Float64
  name: :string,         # String
  active: :bool          # Bool (UInt8)
]

Date and Time

schema = [
  created: :date,        # Date (days since epoch)
  updated: :datetime,    # DateTime (seconds since epoch)
  logged: :datetime64    # DateTime64(6) - microsecond precision
]

# Works with Elixir DateTime structs or integers
columns = %{
  created: [~D[2024-01-01], ~D[2024-01-02]],
  updated: [~U[2024-01-01 10:00:00Z], ~U[2024-01-01 11:00:00Z]],
  logged: [~U[2024-01-01 10:00:00.123456Z], 1704103200123456]
}

Decimals and UUIDs

schema = [
  amount: :decimal64,    # Decimal64(9) - fixed-point decimals
  user_id: :uuid         # UUID - 128-bit identifiers
]

columns = %{
  amount: [Decimal.new("99.99"), Decimal.new("149.50")],
  user_id: ["550e8400-e29b-41d4-a716-446655440000", "6ba7b810-9dad-11d1-80b4-00c04fd430c8"]
}

Nullable Types

schema = [
  description: {:nullable, :string},
  count: {:nullable, :uint64}
]

columns = %{
  description: ["text", nil, "more text"],
  count: [100, nil, 200]
}

Arrays

schema = [
  tags: {:array, :string},
  matrix: {:array, {:array, :uint64}},           # Nested arrays
  nullable_list: {:array, {:nullable, :string}}  # Arrays with nulls
]

columns = %{
  tags: [["web", "mobile"], ["desktop"], []],
  matrix: [[[1, 2], [3, 4]], [[5, 6]]],
  nullable_list: [["a", nil, "b"], [nil, "c"]]
}

Maps and Tuples

schema = [
  properties: {:map, :string, :uint64},
  location: {:tuple, [:string, :float64, :float64]},
  metrics: {:map, :string, {:nullable, :uint64}}  # Maps with nullable values
]

columns = %{
  properties: [%{"clicks" => 10, "views" => 100}, %{"shares" => 5}],
  location: [{"NYC", 40.7128, -74.0060}, {"LA", 34.0522, -118.2437}],
  metrics: [%{"count" => 100, "missing" => nil}, %{"total" => nil}]
}

Enums and LowCardinality

schema = [
  status: {:enum8, [{"pending", 1}, {"active", 2}, {"archived", 3}]},
  category: {:low_cardinality, :string},
  tags: {:array, {:low_cardinality, {:nullable, :string}}}  # Complex nesting!
]

columns = %{
  status: ["pending", "active", "pending"],
  category: ["news", "sports", "news"],
  tags: [["tech", nil], ["sports"], ["tech", "startup"]]
}

Usage Guide

Connection Management

# Basic connection
{:ok, conn} = Natch.Connection.start_link(
  host: "localhost",
  port: 9000
)

# With authentication and options
{:ok, conn} = Natch.Connection.start_link(
  host: "clickhouse.example.com",
  port: 9000,
  database: "analytics",
  user: "app_user",
  password: "secret",
  compression: :lz4,
  name: MyApp.ClickHouse
)

Connection options:

Executing Queries

DDL Operations

# Create table
:ok = Natch.Connection.execute(conn, """
CREATE TABLE users (
  id UInt64,
  name String,
  created DateTime
) ENGINE = MergeTree()
ORDER BY id
""")

# Drop table
:ok = Natch.Connection.execute(conn, "DROP TABLE users")

# Alter table
:ok = Natch.Connection.execute(conn, "ALTER TABLE users ADD COLUMN age UInt8")

SELECT Queries

Natch provides two query formats to suit different use cases:

Row-Major Format (Traditional)

Returns results as a list of maps, where each map represents a row:

# Simple query
{:ok, rows} = Natch.Connection.select_rows(conn, "SELECT * FROM users")
# => {:ok, [%{id: 1, name: "Alice"}, %{id: 2, name: "Bob"}]}

# With WHERE clause
{:ok, rows} = Natch.Connection.select_rows(conn, "SELECT * FROM users WHERE id > 100")

# Aggregations
{:ok, [result]} = Natch.Connection.select_rows(conn, """
  SELECT
    event_type,
    count() as count,
    uniqExact(user_id) as unique_users
  FROM events
  GROUP BY event_type
  ORDER BY count DESC
""")
Columnar Format (Efficient for Analytics)

Returns results as a map of column lists, ideal for large result sets and data analysis:

# Query returns columnar format
{:ok, cols} = Natch.Connection.select_cols(conn, "SELECT * FROM users")
# => {:ok, %{id: [1, 2, 3], name: ["Alice", "Bob", "Charlie"]}}

# Perfect for analytics workflows
{:ok, data} = Natch.Connection.select_cols(conn, "SELECT user_id, value FROM events")
# => {:ok, %{user_id: [1, 2, 1, 3], value: [10.5, 20.0, 15.5, 30.0]}}

# Easy integration with data processing libraries
%{user_id: user_ids, value: values} = data
total = Enum.sum(values)

Inserting Data

High-Level API (Recommended)

# Columnar format - optimal performance
columns = %{
  id: [1, 2, 3],
  name: ["Alice", "Bob", "Charlie"]
}

schema = [id: :uint64, name: :string]

:ok = Natch.insert(conn, "users", columns, schema)

Low-Level API (Advanced)

# Build block manually for maximum control
block = Natch.Native.block_create()

# Create and populate columns
id_col = Natch.Column.new(:uint64)
Natch.Column.append_bulk(id_col, [1, 2, 3])
Natch.Native.block_append_column(block, "id", id_col.ref)

name_col = Natch.Column.new(:string)
Natch.Column.append_bulk(name_col, ["Alice", "Bob", "Charlie"])
Natch.Native.block_append_column(block, "name", name_col.ref)

# Get client and insert
client_ref = GenServer.call(conn, :get_client)
Natch.Native.client_insert(client_ref, "users", block)

Performance Tips

1. Use Columnar Format

# ❌ BAD: Row-oriented (requires conversion)
rows = [
  %{id: 1, name: "Alice"},
  %{id: 2, name: "Bob"}
]

# βœ… GOOD: Columnar (direct insertion)
columns = %{
  id: [1, 2],
  name: ["Alice", "Bob"]
}

2. Batch Your Inserts

# Insert in batches of 10,000-100,000 rows for optimal throughput
chunk_size = 50_000

data
|> Stream.chunk_every(chunk_size)
|> Enum.each(fn chunk ->
  columns = transpose_to_columnar(chunk)
  Natch.insert(conn, "table", columns, schema)
end)

3. Use Appropriate Types

# βœ… GOOD: LowCardinality for repeated strings
schema = [status: {:low_cardinality, :string}]

# βœ… GOOD: Enum for known values
schema = [priority: {:enum8, [{"low", 1}, {"medium", 2}, {"high", 3}]}]

# βœ… GOOD: Use smallest integer type that fits
schema = [age: :uint8]  # Not :uint64

4. Enable Compression

# LZ4 compression reduces bandwidth by ~70% for typical workloads
{:ok, conn} = Natch.Connection.start_link(
  host: "localhost",
  port: 9000,
  compression: :lz4  # Enabled by default
)

Complex Nesting Examples

Natch supports arbitrarily complex nested types:

# Triple-nested arrays with nullables
schema = [matrix: {:array, {:array, {:nullable, :uint64}}}]
columns = %{matrix: [[[1, nil, 3], [nil, 5]], [[10, 20], [], [nil]]]}

# Maps with array values
schema = [data: {:map, :string, {:array, :uint64}}]
columns = %{data: [%{"ids" => [1, 2, 3], "counts" => [10, 20]}]}

# Tuples with complex elements
schema = [record: {:tuple, [:string, {:array, :uint64}, {:nullable, :float64}]}]
columns = %{record: [{"Alice", [1, 2, 3], 99.9}, {"Bob", [4, 5], nil}]}

# Array of low cardinality nullable strings (triple wrapper!)
schema = [tags: {:array, {:low_cardinality, {:nullable, :string}}}]
columns = %{tags: [["tech", nil, "startup"], [nil, "news"]]}

All these patterns work with full INSERT→SELECT roundtrip fidelity.

Architecture

Natch uses a three-layer architecture:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Elixir Application Layer           β”‚
β”‚  - Natch.insert/4                    β”‚
β”‚  - Natch.Connection GenServer         β”‚
β”‚  - Idiomatic Elixir API             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  FINE NIF Layer (C++)               β”‚
β”‚  - Type conversion Elixir ↔ C++     β”‚
β”‚  - Resource management              β”‚
β”‚  - Exception handling               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  clickhouse-cpp Library             β”‚
β”‚  - Native TCP protocol              β”‚
β”‚  - Binary columnar format           β”‚
β”‚  - LZ4/ZSTD compression             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Why FINE + clickhouse-cpp?

Development

Running ClickHouse Locally

# Start ClickHouse
docker-compose up -d

# Check it's running
clickhouse-client --query "SELECT version()"

Running Tests

# Run all tests
mix test

# Run with tracing
mix test --trace

# Run specific test file
mix test test/nesting_integration_test.exs

Test Coverage

Roadmap

Completed (Phase 1-5)

Planned (Phase 6+)

Not Planned

Contributing

Contributions are welcome! Areas where we'd love help:

  1. Additional type support - FixedString, IPv4/IPv6, Geo types
  2. Performance optimization - Zero-copy paths, SIMD operations
  3. Documentation - More examples, guides
  4. Testing - Edge cases, stress tests

Please feel free to submit a Pull Request or open an issue.

License

MIT License - See LICENSE file for details.

Acknowledgments

Resources