QuackDB

Remote DuckDB analytics from Elixir, backed by DuckDB's experimental Quack protocol.

quackdb lets Elixir applications query and append to a remote DuckDB process without embedding DuckDB in the BEAM. It provides a DBConnection-first client, direct Quack protocol decoding, streaming result fetches, native append writes, source helpers for analytical files and lakehouse tables, and optional Ecto/Explorer integrations.

Warning

QuackDB itself is experimental and not production-ready. The package API, result shapes, Ecto adapter behavior, and supported type coverage may change as the project evolves. It also targets DuckDB's experimental Quack protocol, which may change across DuckDB releases. Use it at your own risk, validate behavior against your DuckDB version, and avoid relying on it for critical production workloads yet.

Status

QuackDB currently focuses on the remote protocol and DBConnection client core. It supports:

DBConnection-backed remote DuckDB queries over Quack HTTP endpoints
streaming and fetch continuation for large analytical result sets
native append writes through DuckDB DataChunks via QuackDB.insert_rows/4
rich DuckDB scalar fidelity including decimals, UUIDs, enums, BIGNUM, nanosecond temporal values, intervals, and TIME WITH TIME ZONE
nested DuckDB values such as LIST, STRUCT, ARRAY, and MAP
source helpers for Parquet, CSV, JSON, XLSX, Delta, and Iceberg table functions
column-oriented result helpers for analytical/vector-style workflows
optional Ecto SQL adapter support for raw SQL, analytical reads, insert/2, and insert_all/3
optional Explorer dataframe handoff helpers
supervised local DuckDB Quack server processes for development

Raw SQL can use the full DuckDB surface. Ecto query generation is growing toward analytical DuckDB usage while keeping unsupported features explicit.

Why QuackDB?

Use QuackDB when you want DuckDB's analytical SQL from Elixir while keeping DuckDB in a separate process.

That gives you:

a normal OTP/DBConnection client surface
remote DuckDB execution instead of embedding native database state in your BEAM
efficient protocol-level result decoding
explicit unsupported-feature errors instead of silent lossy behavior
optional Ecto and Explorer layers without making them required

Installation

Add :quackdb to your dependencies:

def deps do
  [
    {:quackdb, "~> 0.2.0"}
  ]
end

DuckDB's Quack protocol is currently experimental. For local testing, use DuckDB 1.5.3 or newer with the quack extension.

Optional integrations are compiled only when their packages are available. Add Explorer when you want dataframe handoff helpers:

def deps do
  [
    {:quackdb, "~> 0.2.0"},
    {:explorer, "~> 0.11"}
  ]
end

Start a DuckDB Quack server

For local development, QuackDB can supervise DuckDB's external CLI process for you:

children = [
  {QuackDB.Server,
   name: MyApp.DuckDB,
   endpoint: "quack:localhost:9494",
   uri: "http://[::1]:9494",
   token: "super_secret"},

  {QuackDB,
   name: MyApp.QuackDB,
   uri: "http://[::1]:9494",
   token: "super_secret"}
]

Or start DuckDB manually:

duckdb -interactive -init /dev/null \
  -cmd "LOAD quack; CALL quack_serve('quack:localhost', token='super_secret');"

quack:localhost may bind on IPv6 localhost, so the examples use http://[::1]:9494.

Usage

Connect

{:ok, conn} =
  QuackDB.start_link(
    uri: "http://[::1]:9494",
    token: "super_secret"
  )

Query

:ok = QuackDB.ping(conn)

{:ok, result} = QuackDB.query(conn, "SELECT 1 AS n")

result.columns
#=> ["n"]

result.rows
#=> [[1]]

QuackDB formats positional parameters as DuckDB SQL literals client-side because the current Quack request path does not expose server-side bind parameters:

{:ok, result} = QuackDB.query(conn, "SELECT ? AS name, ? AS n", ["duck", 42])

result.rows
#=> [["duck", 42]]

Placeholders inside strings and comments are ignored while formatting, and unsupported parameter values raise explicit errors.

Results use compact IEx-friendly inspection so large result sets do not flood the console:

#QuackDB.Result<command: :select, columns: ["n"], rows: 1, preview: [[1]], connection_id: "...", needs_more_fetch?: false>

Nested DuckDB values

DuckDB nested types decode to ordinary Elixir terms:

{:ok, result} =
  QuackDB.query(conn, """
  SELECT
    [1, 2, 3] AS xs,
    {'name': 'duck', 'count': 2} AS obj,
    array_value(1, 2, 3) AS arr,
    map(['a', 'b'], [1, 2]) AS m
  """)

result.rows
#=> [[[1, 2, 3], %{"name" => "duck", "count" => 2}, [1, 2, 3], %{"a" => 1, "b" => 2}]]

Source helpers

DuckDB can scan files, object stores, and lakehouse table formats directly. QuackDB.Source builds safe table-function fragments for raw SQL:

source =
  QuackDB.Source.parquet("s3://bucket/events/*.parquet",
    hive_partitioning: true,
    union_by_name: true
  )

QuackDB.query!(conn, ["SELECT category, count(*) FROM ", source, " GROUP BY category"])

Available helpers include parquet/2, csv/2, json/2, xlsx/2, delta/2, and iceberg/2. Options are emitted as DuckDB named parameters, and paths/options are formatted as SQL literals instead of interpolated directly.

The same fragments can be used as Ecto sources for read-oriented analytical queries:

source = QuackDB.Source.csv("events.csv", header: true)

MyApp.AnalyticsRepo.all(
  from event in source,
    where: event.id > 1,
    select: %{id: event.id, name: event.name}
)

Streaming

QuackDB.query/4 materializes the full result. Use streaming helpers for large analytical result sets.

QuackDB.stream/4 yields %QuackDB.Result{} batches:

row_count =
  conn
  |> QuackDB.stream("SELECT i FROM range(0, 50_000) t(i)")
  |> Enum.reduce(0, fn result, count -> count + result.num_rows end)

row_count
#=> 50_000

QuackDB.rows/4 yields row lists:

conn
|> QuackDB.rows("SELECT i FROM range(0, ?) t(i)", [50_000])
|> Enum.take(3)
#=> [[0], [1], [2]]

QuackDB.maps/4 yields maps keyed by column names. Duplicate column names are disambiguated with suffixes such as _2 and _3:

conn
|> QuackDB.maps("SELECT i AS n FROM range(0, ?) t(i)", [50_000])
|> Enum.take(2)
#=> [%{"n" => 0}, %{"n" => 1}]

Use QuackDB.columnar/4 when an analytical workflow wants vectors plus column order and metadata:

{:ok, columns} = QuackDB.columnar(conn, "SELECT id, name FROM events ORDER BY id")

columns.names
#=> ["id", "name"]

columns["id"]
#=> [1, 2]

QuackDB.columns/4 returns just the column map:

{:ok, columns} = QuackDB.columns(conn, "SELECT id, name FROM events ORDER BY id")

columns
#=> %{"id" => [1, 2], "name" => ["duck", "goose"]}

For large results, QuackDB.columnar_batches/4 streams QuackDB.Columns fetch batches without materializing the whole result set. QuackDB.column_batches/4 returns just the map from each batch:

conn
|> QuackDB.column_batches("SELECT i AS n FROM range(0, 50_000) t(i)", [], max_rows: 1_000)
|> Enum.take(1)
#=> [%{"n" => [0, 1, 2, ...]}]

This is not Arrow IPC yet, but it exposes a column-oriented shape that can back future Arrow integration without changing the query API.

Explorer DataFrames

When :explorer is available, QuackDB exposes optional helpers for building Explorer.DataFrame values from query results:

{:ok, df} =
  QuackDB.Explorer.dataframe(conn, "SELECT id, name FROM events ORDER BY id")

You can also pass Ecto queries directly, including source helpers:

source = QuackDB.Source.csv("events.csv", header: true)

query =
  from event in source,
    where: event.id > ^1,
    select: %{id: event.id, name: event.name}

{:ok, df} = QuackDB.Explorer.dataframe(conn, query)

The Explorer integration materializes query results in Elixir before constructing a dataframe. It is useful for interactive analysis and downstream Explorer pipelines, but it is not a zero-copy Arrow IPC path yet.

You can also convert existing results:

{:ok, result} = QuackDB.query(conn, "SELECT 1 AS id, 'duck' AS name")
{:ok, df} = QuackDB.Explorer.from_result(result)

{:ok, columns} = QuackDB.columnar(conn, "SELECT 1 AS id, 'duck' AS name")
{:ok, df} = QuackDB.Explorer.from_columns(columns)

Command results

DuckDB returns affected-row counts through a Count column. QuackDB normalizes those into num_rows for command results:

{:ok, _} = QuackDB.query(conn, "CREATE TEMP TABLE events(id INTEGER)")
{:ok, result} = QuackDB.query(conn, "INSERT INTO events VALUES (1), (2)")

result.command
#=> :insert

result.num_rows
#=> 2

result.rows
#=> nil

The original DuckDB shape is preserved in metadata for debugging:

result.metadata[:duckdb_columns]
#=> ["Count"]

result.metadata[:duckdb_rows]
#=> [[2]]

Append rows

QuackDB.insert_rows/4 uses Quack's append protocol to send a DuckDB DataChunk directly to a table:

QuackDB.query!(conn, "CREATE TEMP TABLE events(id INTEGER, name VARCHAR, active BOOLEAN)")

{:ok, result} =
  QuackDB.insert_rows(conn, "events", [
    [id: 1, name: "duck", active: true],
    [id: 2, name: "goose", active: false]
  ])

result.command
#=> :insert

result.num_rows
#=> 2

Keyword rows preserve append order and allow QuackDB to infer the column list from the first row. Map rows are also accepted, but pass :columns for stable append order and types. Explicit columns are still required for empty batches or all-null columns. Use batch_size: n to split large inputs across multiple append requests while returning the total inserted row count.

Native append columns can be declared with scalar QuackDB.Type specs and nested specs such as {:list, :varchar}, {:struct, [source: :varchar, count: :integer]}, {:array, :integer, 3}, and {:map, :varchar, :varchar}. Temporal append values are normalized through Elixir's Calendar-aware Date, Time, NaiveDateTime, and DateTime conversion APIs before encoding.

Prepare and execute

{:ok, query, result} = QuackDB.prepare_execute(conn, "SELECT 1 AS n")

query.columns
#=> ["n"]

result.rows
#=> [[1]]

Ecto raw SQL

QuackDB includes an initial Ecto SQL adapter for raw SQL queries. If your app does not already depend on Ecto SQL, add it alongside QuackDB:

def deps do
  [
    {:quackdb, "~> 0.2.0"},
    {:ecto_sql, "~> 3.13"}
  ]
end

Then define a repo:

defmodule MyApp.AnalyticsRepo do
  use Ecto.Repo,
    otp_app: :my_app,
    adapter: Ecto.Adapters.QuackDB
end

Configure the repo with the same connection options used by QuackDB.start_link/1:

config :my_app, MyApp.AnalyticsRepo,
  uri: "http://[::1]:9494",
  token: "super_secret"

Then use Repo.query/3:

{:ok, result} = MyApp.AnalyticsRepo.query("SELECT 1 AS n")

result.rows
#=> [[1]]

Raw SQL and generated DDL also work inside Ecto transactions:

{:ok, :committed} =
  MyApp.AnalyticsRepo.transaction(fn ->
    MyApp.AnalyticsRepo.query!(
      QuackDB.DDL.create_table("events", [id: :integer], temporary: true)
    )

    MyApp.AnalyticsRepo.query!("INSERT INTO events VALUES (1), (2)")
    :committed
  end)

Use Repo.rollback/1 to abort transaction work:

{:error, :rolled_back} =
  MyApp.AnalyticsRepo.transaction(fn ->
    MyApp.AnalyticsRepo.query!("INSERT INTO events VALUES (3)")
    MyApp.AnalyticsRepo.rollback(:rolled_back)
  end)

Read-only Ecto queries against table names are also supported, including CTEs, window functions, joins, grouping, having, distinct, aggregate FILTER, arithmetic expressions, in/2, predicates, ordering, limits, aggregates, fragments, and DuckDB analytical helpers:

import Ecto.Query
import QuackDB.Ecto.Analytics

MyApp.AnalyticsRepo.all(
  from event in "events",
    where: event.id > ^min_id and like(event.name, "d%"),
    group_by: event.category,
    select: %{
      category: event.category,
      median_score: median(event.score),
      p95_score: quantile_cont(event.score, 0.95),
      scores: duckdb_list(event.score)
    }
)

Ecto insert/2 and insert_all/3 are supported for straightforward row inserts. DuckDB RETURNING works through the SQL insert path:

MyApp.AnalyticsRepo.insert!(%Event{id: 1, name: "duck"})

MyApp.AnalyticsRepo.insert_all(
  "events",
  [[id: 2, name: "goose"]],
  returning: [:id]
)

Use insert_method: :append to opt into Quack's native append protocol for plain insert_all workloads. This fast path does not support query inserts, :returning, placeholders, or upserts.

MyApp.AnalyticsRepo.insert_all(
  "events",
  [[id: 1, name: "duck"], [id: 2, name: "goose"]],
  insert_method: :append,
  chunk_every: 10_000
)

For temporary analytical setup, QuackDB.DDL.create_table/3 builds quoted DuckDB CREATE TABLE statements:

MyApp.AnalyticsRepo.query!(
  QuackDB.DDL.create_table("events",
    [payload: :json, occurred_at: :timestamp],
    temporary: true
  )
)

Ecto support is analytical rather than CRUD-shaped, but still early. Migrations, set combinations, locks, updates, deletes, and upserts raise explicit unsupported-feature errors for now.

Current limitations

Bind parameters are not exposed through this Quack client path yet.
Native appends support row batches but not Arrow IPC or automatic local-file/data staging yet.
Ecto support is limited to raw SQL, read-only analytical table queries, and straightforward insert_all/3 row inserts.
The low-level protocol is experimental and tracks DuckDB's Quack extension behavior.

Supervision and connection options

Use QuackDB under your application supervisor when you want a long-lived connection pool:

children = [
  {QuackDB,
   uri: "http://[::1]:9494",
   token: "super_secret",
   name: MyApp.QuackDB,
   pool_size: 5}
]

The client accepts QuackDB options such as :uri, :token, and :transport, plus DBConnection pool options such as :name, :pool_size, :queue_target, :queue_interval, and :timeout on individual calls.

QuackDB.query(MyApp.QuackDB, "SELECT 1", [], timeout: 10_000)

For local development, tests, or notebooks, QuackDB can also supervise a local DuckDB Quack server process with MuonTrap:

children = [
  {QuackDB.Server,
   name: MyApp.DuckDB,
   endpoint: "quack:localhost:9494",
   uri: "http://[::1]:9494",
   token: System.fetch_env!("QUACKDB_TOKEN")},

  {QuackDB,
   name: MyApp.QuackDB,
   uri: "http://[::1]:9494",
   token: System.fetch_env!("QUACKDB_TOKEN")}
]

QuackDB.Server starts the external duckdb executable and serves the Quack protocol. It is a convenience process supervisor, not an embedded DuckDB driver and not required for remote DuckDB servers.

Development

mix deps.get
mix ci

Integration tests are skipped by default. To run them against a Quack server:

QUACKDB_TEST_URI='http://[::1]:9494' \
QUACKDB_TEST_TOKEN=super_secret \
mix test --include integration

See guides/getting-started.md for a longer walkthrough, guides/type-support.md for the current DuckDB type matrix, and docs/research.md for protocol notes.