QuackDB
Remote DuckDB analytics from Elixir, backed by DuckDB's experimental Quack protocol.
quackdb lets Elixir applications query and append to a remote DuckDB process without embedding DuckDB in the BEAM. It provides a DBConnection-first client, direct Quack protocol decoding, streaming result fetches, native append writes, source helpers for analytical files and lakehouse tables, and optional Ecto/Explorer integrations.
Warning
QuackDB itself is experimental and not production-ready. The package API, result shapes, Ecto adapter behavior, and supported type coverage may change as the project evolves. It also targets DuckDB's experimental Quack protocol, which may change across DuckDB releases. Use it at your own risk, validate behavior against your DuckDB version, and avoid relying on it for critical production workloads yet.
Status
QuackDB currently focuses on the remote protocol and DBConnection client core. It supports:
DBConnection-backed remote DuckDB queries over Quack HTTP endpoints- streaming and fetch continuation for large analytical result sets
- native append writes through DuckDB
DataChunks viaQuackDB.insert_rows/4 - rich DuckDB scalar fidelity including decimals, UUIDs, enums,
BIGNUM, nanosecond temporal values, intervals, andTIME WITH TIME ZONE - nested DuckDB values such as
LIST,STRUCT,ARRAY, andMAP - source helpers for Parquet, CSV, JSON, XLSX, Delta, and Iceberg table functions
- column-oriented result helpers for analytical/vector-style workflows
- optional Ecto SQL adapter support for raw SQL, analytical reads,
insert/2, andinsert_all/3 - optional Explorer dataframe handoff helpers
- supervised local DuckDB Quack server processes for development
Raw SQL can use the full DuckDB surface. Ecto query generation is growing toward analytical DuckDB usage while keeping unsupported features explicit.
Why QuackDB?
Use QuackDB when you want DuckDB's analytical SQL from Elixir while keeping DuckDB in a separate process.
That gives you:
- a normal OTP/
DBConnectionclient surface - remote DuckDB execution instead of embedding native database state in your BEAM
- efficient protocol-level result decoding
- explicit unsupported-feature errors instead of silent lossy behavior
- optional Ecto and Explorer layers without making them required
Installation
Add :quackdb to your dependencies:
def deps do
[
{:quackdb, "~> 0.2.0"}
]
end
DuckDB's Quack protocol is currently experimental. For local testing, use DuckDB 1.5.3 or newer with the quack extension.
Optional integrations are compiled only when their packages are available. Add Explorer when you want dataframe handoff helpers:
def deps do
[
{:quackdb, "~> 0.2.0"},
{:explorer, "~> 0.11"}
]
end
Start a DuckDB Quack server
For local development, QuackDB can supervise DuckDB's external CLI process for you:
children = [
{QuackDB.Server,
name: MyApp.DuckDB,
endpoint: "quack:localhost:9494",
uri: "http://[::1]:9494",
token: "super_secret"},
{QuackDB,
name: MyApp.QuackDB,
uri: "http://[::1]:9494",
token: "super_secret"}
]
Or start DuckDB manually:
duckdb -interactive -init /dev/null \
-cmd "LOAD quack; CALL quack_serve('quack:localhost', token='super_secret');"
quack:localhost may bind on IPv6 localhost, so the examples use http://[::1]:9494.
Usage
Connect
{:ok, conn} =
QuackDB.start_link(
uri: "http://[::1]:9494",
token: "super_secret"
)
Query
:ok = QuackDB.ping(conn)
{:ok, result} = QuackDB.query(conn, "SELECT 1 AS n")
result.columns
#=> ["n"]
result.rows
#=> [[1]]
QuackDB formats positional parameters as DuckDB SQL literals client-side because the current Quack request path does not expose server-side bind parameters:
{:ok, result} = QuackDB.query(conn, "SELECT ? AS name, ? AS n", ["duck", 42])
result.rows
#=> [["duck", 42]]
Placeholders inside strings and comments are ignored while formatting, and unsupported parameter values raise explicit errors.
Results use compact IEx-friendly inspection so large result sets do not flood the console:
#QuackDB.Result<command: :select, columns: ["n"], rows: 1, preview: [[1]], connection_id: "...", needs_more_fetch?: false>
Nested DuckDB values
DuckDB nested types decode to ordinary Elixir terms:
{:ok, result} =
QuackDB.query(conn, """
SELECT
[1, 2, 3] AS xs,
{'name': 'duck', 'count': 2} AS obj,
array_value(1, 2, 3) AS arr,
map(['a', 'b'], [1, 2]) AS m
""")
result.rows
#=> [[[1, 2, 3], %{"name" => "duck", "count" => 2}, [1, 2, 3], %{"a" => 1, "b" => 2}]]
Source helpers
DuckDB can scan files, object stores, and lakehouse table formats directly. QuackDB.Source builds safe table-function fragments for raw SQL:
source =
QuackDB.Source.parquet("s3://bucket/events/*.parquet",
hive_partitioning: true,
union_by_name: true
)
QuackDB.query!(conn, ["SELECT category, count(*) FROM ", source, " GROUP BY category"])
Available helpers include parquet/2, csv/2, json/2, xlsx/2, delta/2, and iceberg/2. Options are emitted as DuckDB named parameters, and paths/options are formatted as SQL literals instead of interpolated directly.
The same fragments can be used as Ecto sources for read-oriented analytical queries:
source = QuackDB.Source.csv("events.csv", header: true)
MyApp.AnalyticsRepo.all(
from event in source,
where: event.id > 1,
select: %{id: event.id, name: event.name}
)
Streaming
QuackDB.query/4 materializes the full result. Use streaming helpers for large analytical result sets.
QuackDB.stream/4 yields %QuackDB.Result{} batches:
row_count =
conn
|> QuackDB.stream("SELECT i FROM range(0, 50_000) t(i)")
|> Enum.reduce(0, fn result, count -> count + result.num_rows end)
row_count
#=> 50_000
QuackDB.rows/4 yields row lists:
conn
|> QuackDB.rows("SELECT i FROM range(0, ?) t(i)", [50_000])
|> Enum.take(3)
#=> [[0], [1], [2]]
QuackDB.maps/4 yields maps keyed by column names. Duplicate column names are disambiguated with suffixes such as _2 and _3:
conn
|> QuackDB.maps("SELECT i AS n FROM range(0, ?) t(i)", [50_000])
|> Enum.take(2)
#=> [%{"n" => 0}, %{"n" => 1}]
Use QuackDB.columnar/4 when an analytical workflow wants vectors plus column order and metadata:
{:ok, columns} = QuackDB.columnar(conn, "SELECT id, name FROM events ORDER BY id")
columns.names
#=> ["id", "name"]
columns["id"]
#=> [1, 2]
QuackDB.columns/4 returns just the column map:
{:ok, columns} = QuackDB.columns(conn, "SELECT id, name FROM events ORDER BY id")
columns
#=> %{"id" => [1, 2], "name" => ["duck", "goose"]}
For large results, QuackDB.columnar_batches/4 streams QuackDB.Columns fetch batches without materializing the whole result set. QuackDB.column_batches/4 returns just the map from each batch:
conn
|> QuackDB.column_batches("SELECT i AS n FROM range(0, 50_000) t(i)", [], max_rows: 1_000)
|> Enum.take(1)
#=> [%{"n" => [0, 1, 2, ...]}]
This is not Arrow IPC yet, but it exposes a column-oriented shape that can back future Arrow integration without changing the query API.
Explorer DataFrames
When :explorer is available, QuackDB exposes optional helpers for building Explorer.DataFrame values from query results:
{:ok, df} =
QuackDB.Explorer.dataframe(conn, "SELECT id, name FROM events ORDER BY id")
You can also pass Ecto queries directly, including source helpers:
source = QuackDB.Source.csv("events.csv", header: true)
query =
from event in source,
where: event.id > ^1,
select: %{id: event.id, name: event.name}
{:ok, df} = QuackDB.Explorer.dataframe(conn, query)
The Explorer integration materializes query results in Elixir before constructing a dataframe. It is useful for interactive analysis and downstream Explorer pipelines, but it is not a zero-copy Arrow IPC path yet.
You can also convert existing results:
{:ok, result} = QuackDB.query(conn, "SELECT 1 AS id, 'duck' AS name")
{:ok, df} = QuackDB.Explorer.from_result(result)
{:ok, columns} = QuackDB.columnar(conn, "SELECT 1 AS id, 'duck' AS name")
{:ok, df} = QuackDB.Explorer.from_columns(columns)
Command results
DuckDB returns affected-row counts through a Count column. QuackDB normalizes those into num_rows for command results:
{:ok, _} = QuackDB.query(conn, "CREATE TEMP TABLE events(id INTEGER)")
{:ok, result} = QuackDB.query(conn, "INSERT INTO events VALUES (1), (2)")
result.command
#=> :insert
result.num_rows
#=> 2
result.rows
#=> nil
The original DuckDB shape is preserved in metadata for debugging:
result.metadata[:duckdb_columns]
#=> ["Count"]
result.metadata[:duckdb_rows]
#=> [[2]]
Append rows
QuackDB.insert_rows/4 uses Quack's append protocol to send a DuckDB DataChunk directly to a table:
QuackDB.query!(conn, "CREATE TEMP TABLE events(id INTEGER, name VARCHAR, active BOOLEAN)")
{:ok, result} =
QuackDB.insert_rows(conn, "events", [
[id: 1, name: "duck", active: true],
[id: 2, name: "goose", active: false]
])
result.command
#=> :insert
result.num_rows
#=> 2
Keyword rows preserve append order and allow QuackDB to infer the column list from the first row. Map rows are also accepted, but pass :columns for stable append order and types. Explicit columns are still required for empty batches or all-null columns. Use batch_size: n to split large inputs across multiple append requests while returning the total inserted row count.
Native append columns can be declared with scalar QuackDB.Type specs and nested specs such as {:list, :varchar}, {:struct, [source: :varchar, count: :integer]}, {:array, :integer, 3}, and {:map, :varchar, :varchar}. Temporal append values are normalized through Elixir's Calendar-aware Date, Time, NaiveDateTime, and DateTime conversion APIs before encoding.
Prepare and execute
{:ok, query, result} = QuackDB.prepare_execute(conn, "SELECT 1 AS n")
query.columns
#=> ["n"]
result.rows
#=> [[1]]
Ecto raw SQL
QuackDB includes an initial Ecto SQL adapter for raw SQL queries. If your app does not already depend on Ecto SQL, add it alongside QuackDB:
def deps do
[
{:quackdb, "~> 0.2.0"},
{:ecto_sql, "~> 3.13"}
]
end
Then define a repo:
defmodule MyApp.AnalyticsRepo do
use Ecto.Repo,
otp_app: :my_app,
adapter: Ecto.Adapters.QuackDB
end
Configure the repo with the same connection options used by QuackDB.start_link/1:
config :my_app, MyApp.AnalyticsRepo,
uri: "http://[::1]:9494",
token: "super_secret"
Then use Repo.query/3:
{:ok, result} = MyApp.AnalyticsRepo.query("SELECT 1 AS n")
result.rows
#=> [[1]]
Raw SQL and generated DDL also work inside Ecto transactions:
{:ok, :committed} =
MyApp.AnalyticsRepo.transaction(fn ->
MyApp.AnalyticsRepo.query!(
QuackDB.DDL.create_table("events", [id: :integer], temporary: true)
)
MyApp.AnalyticsRepo.query!("INSERT INTO events VALUES (1), (2)")
:committed
end)
Use Repo.rollback/1 to abort transaction work:
{:error, :rolled_back} =
MyApp.AnalyticsRepo.transaction(fn ->
MyApp.AnalyticsRepo.query!("INSERT INTO events VALUES (3)")
MyApp.AnalyticsRepo.rollback(:rolled_back)
end)
Read-only Ecto queries against table names are also supported, including CTEs, window functions, joins, grouping, having, distinct, aggregate FILTER, arithmetic expressions, in/2, predicates, ordering, limits, aggregates, fragments, and DuckDB analytical helpers:
import Ecto.Query
import QuackDB.Ecto.Analytics
MyApp.AnalyticsRepo.all(
from event in "events",
where: event.id > ^min_id and like(event.name, "d%"),
group_by: event.category,
select: %{
category: event.category,
median_score: median(event.score),
p95_score: quantile_cont(event.score, 0.95),
scores: duckdb_list(event.score)
}
)
Ecto insert/2 and insert_all/3 are supported for straightforward row inserts. DuckDB RETURNING works through the SQL insert path:
MyApp.AnalyticsRepo.insert!(%Event{id: 1, name: "duck"})
MyApp.AnalyticsRepo.insert_all(
"events",
[[id: 2, name: "goose"]],
returning: [:id]
)
Use insert_method: :append to opt into Quack's native append protocol for plain insert_all workloads. This fast path does not support query inserts, :returning, placeholders, or upserts.
MyApp.AnalyticsRepo.insert_all(
"events",
[[id: 1, name: "duck"], [id: 2, name: "goose"]],
insert_method: :append,
chunk_every: 10_000
)
For temporary analytical setup, QuackDB.DDL.create_table/3 builds quoted DuckDB CREATE TABLE statements:
MyApp.AnalyticsRepo.query!(
QuackDB.DDL.create_table("events",
[payload: :json, occurred_at: :timestamp],
temporary: true
)
)
Ecto support is analytical rather than CRUD-shaped, but still early. Migrations, set combinations, locks, updates, deletes, and upserts raise explicit unsupported-feature errors for now.
Current limitations
- Bind parameters are not exposed through this Quack client path yet.
- Native appends support row batches but not Arrow IPC or automatic local-file/data staging yet.
- Ecto support is limited to raw SQL, read-only analytical table queries, and straightforward
insert_all/3row inserts. - The low-level protocol is experimental and tracks DuckDB's Quack extension behavior.
Supervision and connection options
Use QuackDB under your application supervisor when you want a long-lived connection pool:
children = [
{QuackDB,
uri: "http://[::1]:9494",
token: "super_secret",
name: MyApp.QuackDB,
pool_size: 5}
]
The client accepts QuackDB options such as :uri, :token, and :transport, plus DBConnection pool options such as :name, :pool_size, :queue_target, :queue_interval, and :timeout on individual calls.
QuackDB.query(MyApp.QuackDB, "SELECT 1", [], timeout: 10_000)
For local development, tests, or notebooks, QuackDB can also supervise a local DuckDB Quack server process with MuonTrap:
children = [
{QuackDB.Server,
name: MyApp.DuckDB,
endpoint: "quack:localhost:9494",
uri: "http://[::1]:9494",
token: System.fetch_env!("QUACKDB_TOKEN")},
{QuackDB,
name: MyApp.QuackDB,
uri: "http://[::1]:9494",
token: System.fetch_env!("QUACKDB_TOKEN")}
]
QuackDB.Server starts the external duckdb executable and serves the Quack protocol. It is a convenience process supervisor, not an embedded DuckDB driver and not required for remote DuckDB servers.
Development
mix deps.get
mix ci
Integration tests are skipped by default. To run them against a Quack server:
QUACKDB_TEST_URI='http://[::1]:9494' \
QUACKDB_TEST_TOKEN=super_secret \
mix test --include integration
See guides/getting-started.md for a longer walkthrough, guides/type-support.md for the current DuckDB type matrix, and docs/research.md for protocol notes.