Dux

DuckDB-native DataFrames for Elixir.

Dux is a dataframe library where DuckDB is the execution engine and the BEAM is the distributed runtime. Pipelines are lazy, operations compile to SQL CTEs, and DuckDB handles all the heavy lifting.

require Dux
Dux.from_parquet("s3://data/sales/**/*.parquet")
|> Dux.filter(amount > 100 and region == ^selected_region)
|> Dux.mutate(revenue: price * quantity)
|> Dux.group_by(:product)
|> Dux.summarise(total: sum(revenue), orders: count(product))
|> Dux.sort_by(desc: :total)
|> Dux.to_parquet("results.parquet", compression: :zstd)

Why Dux?

Installation

def deps do
[
{:dux, "~> 0.1.0"}
]
end

Precompiled NIF binaries are available for macOS (arm64, x86_64), Linux (gnu, musl), and Windows. No Rust or DuckDB compilation needed.

To force a local build (requires Rust toolchain):

DUX_BUILD=true mix deps.compile dux --force

Quick start

require Dux
# Read data
df = Dux.from_csv("sales.csv")
# Transform
result =
df
|> Dux.filter(amount > 100)
|> Dux.mutate(tax: amount * 0.08)
|> Dux.group_by(:region)
|> Dux.summarise(total: sum(amount), avg_tax: avg(tax))
|> Dux.sort_by(desc: :total)
|> Dux.to_rows()
# result is a list of maps:
# [%{"region" => "US", "total" => 15000, "avg_tax" => 120.0}, ...]

Verbs

All operations are verbs on %Dux{} structs:

VerbDescription
filter/2Filter rows (macro: filter(df, x > 10))
mutate/2Add/replace columns (macro: mutate(df, y: x * 2))
select/2Keep columns
discard/2Drop columns
sort_by/2Sort rows (asc/desc)
group_by/2Group for aggregation
summarise/2Aggregate (macro: summarise(df, total: sum(x)))
join/3Inner, left, right, cross, anti, semi joins
head/2First N rows
slice/3Offset + limit
distinct/1Deduplicate
drop_nil/2Remove rows with nil values
rename/2Rename columns
pivot_wider/4Long → wide (DuckDB PIVOT)
pivot_longer/3Wide → long (DuckDB UNPIVOT)
concat_rows/1UNION ALL
compute/1Execute the pipeline
to_rows/1Execute and return list of maps (atom_keys: true option)
to_columns/1Execute and return column map
peek/2Print formatted table preview
n_rows/1Count rows
sql_preview/2Show generated SQL (pretty: true option)

The _with variants (filter_with/2, mutate_with/2, summarise_with/2) accept raw SQL strings for programmatic use.

IO

DuckDB handles all file formats and remote access natively:

# Read
Dux.from_csv("data.csv", delimiter: "\t")
Dux.from_parquet("data/**/*.parquet")
Dux.from_ndjson("events.ndjson")
Dux.from_query("SELECT * FROM read_parquet('s3://bucket/data.parquet')")
# Write
Dux.to_csv(df, "output.csv")
Dux.to_parquet(df, "output.parquet", compression: :zstd)
Dux.to_ndjson(df, "output.ndjson")

S3, HTTP, Postgres, MySQL, SQLite — all via DuckDB extensions. No separate libraries needed.

Distributed queries

Dux distributes analytical workloads across a BEAM cluster:

# Workers auto-register via :pg
workers = Dux.Remote.Worker.list()
# Mark for distributed, then use the same verbs
Dux.from_parquet("data/**/*.parquet")
|> Dux.distribute(workers)
|> Dux.filter(amount > 100)
|> Dux.group_by(:region)
|> Dux.summarise(total: sum(amount))
|> Dux.to_rows()

No function serialization — %Dux{} is plain data. Ship it anywhere, compile to SQL there. No cluster manager — just libcluster + :pg. No heavyweight RPC — just :erpc.multicall.

Graph analytics

graph = Dux.Graph.new(vertices: users, edges: follows)
# All algorithms are verb compositions
graph |> Dux.Graph.pagerank() |> Dux.sort_by(desc: :rank) |> Dux.head(10)
graph |> Dux.Graph.shortest_paths(start_node)
graph |> Dux.Graph.connected_components()
graph |> Dux.Graph.triangle_count()
# Distribute graph across workers
graph |> Dux.Graph.distribute(workers) |> Dux.Graph.pagerank()

Nx interop

Numeric columns become tensors:

tensor = Dux.to_tensor(df, :price)
# #Nx.Tensor<f64[1000] [...]>

Dux implements Nx.LazyContainer for use in defn.

Raw SQL escape hatch

For anything the macro doesn't support — window functions, CASE WHEN, PIVOT, CTEs — use the _with variants with raw DuckDB SQL:

# Window functions
Dux.mutate_with(df, rank: "ROW_NUMBER() OVER (PARTITION BY \"dept\" ORDER BY \"salary\" DESC)")
# CASE WHEN
Dux.mutate_with(df, tier: "CASE WHEN amount > 1000 THEN 'high' ELSE 'low' END")
# Pivot
Dux.from_query("PIVOT sales ON product USING SUM(amount) GROUP BY region")
# Any DuckDB SQL
Dux.from_query("SELECT * FROM read_parquet('s3://bucket/data.parquet') WHERE year = 2025")

License

Dual-licensed under Apache 2.0 and MIT. See LICENSE-APACHE and LICENSE-MIT.