Dux
DuckDB-native DataFrames for Elixir.
Dux is a dataframe library where DuckDB is the execution engine and the BEAM is the distributed runtime. Pipelines are lazy, operations compile to SQL CTEs, and DuckDB handles all the heavy lifting.
require Dux
Dux.from_parquet("s3://data/sales/**/*.parquet")
|> Dux.filter(amount > 100 and region == ^selected_region)
|> Dux.mutate(revenue: price * quantity)
|> Dux.group_by(:product)
|> Dux.summarise(total: sum(revenue), orders: count(product))
|> Dux.sort_by(desc: :total)
|> Dux.to_parquet("results.parquet", compression: :zstd)Why Dux?
- The module IS the dataframe.
Dux.filter(df, ...)— noDux.DataFrame, noDux.Series. Just verbs that pipe. - Everything is lazy. Operations accumulate until
compute/1. DuckDB optimizes the full pipeline. - DuckDB-only. No pluggable backends, no abstraction tax. Full access to DuckDB extensions, window functions, recursive CTEs.
- Elixir expressions compile to SQL.
Dux.filter(df, x > ^min_val)becomesWHERE x > $1with parameter bindings. SQL injection safe by construction. - Distributed. Ship
%Dux{}structs to any BEAM node, compile to SQL there, execute against that node’s local DuckDB. Fan out with the Coordinator, merge results. - Graph analytics.
Dux.Graph— a graph is two dataframes (vertices + edges). PageRank, shortest paths, connected components as verb compositions. - Nx interop. Numeric columns become tensors via
Nx.LazyContainer. Zero-copy where possible.
Installation
def deps do
[
{:dux, "~> 0.1.1"}
]
endPrecompiled NIF binaries are available for macOS (arm64, x86_64), Linux (gnu, musl), and Windows. No Rust or DuckDB compilation needed.
To force a local build (requires Rust toolchain):
DUX_BUILD=true mix deps.compile dux --forceQuick start
require Dux
# Read data
df = Dux.from_csv("sales.csv")
# Transform
result =
df
|> Dux.filter(amount > 100)
|> Dux.mutate(tax: amount * 0.08)
|> Dux.group_by(:region)
|> Dux.summarise(total: sum(amount), avg_tax: avg(tax))
|> Dux.sort_by(desc: :total)
|> Dux.to_rows()
# result is a list of maps:
# [%{"region" => "US", "total" => 15000, "avg_tax" => 120.0}, ...]Verbs
All operations are verbs on %Dux{} structs:
| Verb | Description |
|---|---|
filter/2 |
Filter rows (macro: filter(df, x > 10)) |
mutate/2 |
Add/replace columns (macro: mutate(df, y: x * 2)) |
select/2 | Keep columns |
discard/2 | Drop columns |
sort_by/2 | Sort rows (asc/desc) |
group_by/2 | Group for aggregation |
summarise/2 |
Aggregate (macro: summarise(df, total: sum(x))) |
join/3 | Inner, left, right, cross, anti, semi joins |
head/2 | First N rows |
slice/3 | Offset + limit |
distinct/1 | Deduplicate |
drop_nil/2 | Remove rows with nil values |
rename/2 | Rename columns |
pivot_wider/4 | Long → wide (DuckDB PIVOT) |
pivot_longer/3 | Wide → long (DuckDB UNPIVOT) |
concat_rows/1 | UNION ALL |
compute/1 | Execute the pipeline |
to_rows/1 |
Execute and return list of maps (atom_keys: true option) |
to_columns/1 | Execute and return column map |
peek/2 | Print formatted table preview |
n_rows/1 | Count rows |
sql_preview/2 |
Show generated SQL (pretty: true option) |
The _with variants (filter_with/2, mutate_with/2, summarise_with/2) accept raw SQL strings for programmatic use.
IO
DuckDB handles all file formats and remote access natively:
# Read
Dux.from_csv("data.csv", delimiter: "\t")
Dux.from_parquet("data/**/*.parquet")
Dux.from_ndjson("events.ndjson")
Dux.from_query("SELECT * FROM read_parquet('s3://bucket/data.parquet')")
# Write
Dux.to_csv(df, "output.csv")
Dux.to_parquet(df, "output.parquet", compression: :zstd)
Dux.to_ndjson(df, "output.ndjson")S3, HTTP, Postgres, MySQL, SQLite — all via DuckDB extensions. No separate libraries needed.
Distributed queries
Dux distributes analytical workloads across a BEAM cluster:
# Workers auto-register via :pg
workers = Dux.Remote.Worker.list()
# Mark for distributed, then use the same verbs
Dux.from_parquet("data/**/*.parquet")
|> Dux.distribute(workers)
|> Dux.filter(amount > 100)
|> Dux.group_by(:region)
|> Dux.summarise(total: sum(amount))
|> Dux.to_rows()
No function serialization — %Dux{} is plain data. Ship it anywhere, compile to SQL there. No cluster manager — just libcluster + :pg. No heavyweight RPC — just :erpc.multicall.
Graph analytics
graph = Dux.Graph.new(vertices: users, edges: follows)
# All algorithms are verb compositions
graph |> Dux.Graph.pagerank() |> Dux.sort_by(desc: :rank) |> Dux.head(10)
graph |> Dux.Graph.shortest_paths(start_node)
graph |> Dux.Graph.connected_components()
graph |> Dux.Graph.triangle_count()
# Distribute graph across workers
graph |> Dux.Graph.distribute(workers) |> Dux.Graph.pagerank()Nx interop
Numeric columns become tensors:
tensor = Dux.to_tensor(df, :price)
# #Nx.Tensor<f64[1000] [...]>Dux implements Nx.LazyContainer for use in defn.
Raw SQL escape hatch
For anything the macro doesn’t support — window functions, CASE WHEN, PIVOT, CTEs — use the _with variants with raw DuckDB SQL:
# Window functions
Dux.mutate_with(df, rank: "ROW_NUMBER() OVER (PARTITION BY \"dept\" ORDER BY \"salary\" DESC)")
# CASE WHEN
Dux.mutate_with(df, tier: "CASE WHEN amount > 1000 THEN 'high' ELSE 'low' END")
# Pivot
Dux.from_query("PIVOT sales ON product USING SUM(amount) GROUP BY region")
# Any DuckDB SQL
Dux.from_query("SELECT * FROM read_parquet('s3://bucket/data.parquet') WHERE year = 2025")License
Dual-licensed under Apache 2.0 and MIT. See LICENSE-APACHE and LICENSE-MIT.