Dux

DuckDB-native DataFrames for Elixir.

Dux is a dataframe library where DuckDB is the execution engine and the BEAM is the distributed runtime. Pipelines are lazy, operations compile to SQL CTEs, and DuckDB handles all the heavy lifting.

require Dux

Dux.from_parquet("s3://data/sales/**/*.parquet")
|> Dux.filter(amount > 100 and region == ^selected_region)
|> Dux.mutate(revenue: price * quantity)
|> Dux.group_by(:product)
|> Dux.summarise(total: sum(revenue), orders: count(product))
|> Dux.sort_by(desc: :total)
|> Dux.to_parquet("results.parquet", compression: :zstd)

Why Dux?

Installation

def deps do
  [
    {:dux, "~> 0.1.1"}
  ]
end

Precompiled NIF binaries are available for macOS (arm64, x86_64), Linux (gnu, musl), and Windows. No Rust or DuckDB compilation needed.

To force a local build (requires Rust toolchain):

DUX_BUILD=true mix deps.compile dux --force

Quick start

require Dux

# Read data
df = Dux.from_csv("sales.csv")

# Transform
result =
  df
  |> Dux.filter(amount > 100)
  |> Dux.mutate(tax: amount * 0.08)
  |> Dux.group_by(:region)
  |> Dux.summarise(total: sum(amount), avg_tax: avg(tax))
  |> Dux.sort_by(desc: :total)
  |> Dux.to_rows()

# result is a list of maps:
# [%{"region" => "US", "total" => 15000, "avg_tax" => 120.0}, ...]

Verbs

All operations are verbs on %Dux{} structs:

Verb Description
filter/2 Filter rows (macro: filter(df, x > 10))
mutate/2 Add/replace columns (macro: mutate(df, y: x * 2))
select/2 Keep columns
discard/2 Drop columns
sort_by/2 Sort rows (asc/desc)
group_by/2 Group for aggregation
summarise/2 Aggregate (macro: summarise(df, total: sum(x)))
join/3 Inner, left, right, cross, anti, semi joins
head/2 First N rows
slice/3 Offset + limit
distinct/1 Deduplicate
drop_nil/2 Remove rows with nil values
rename/2 Rename columns
pivot_wider/4 Long → wide (DuckDB PIVOT)
pivot_longer/3 Wide → long (DuckDB UNPIVOT)
concat_rows/1 UNION ALL
compute/1 Execute the pipeline
to_rows/1 Execute and return list of maps (atom_keys: true option)
to_columns/1 Execute and return column map
peek/2 Print formatted table preview
n_rows/1 Count rows
sql_preview/2 Show generated SQL (pretty: true option)

The _with variants (filter_with/2, mutate_with/2, summarise_with/2) accept raw SQL strings for programmatic use.

IO

DuckDB handles all file formats and remote access natively:

# Read
Dux.from_csv("data.csv", delimiter: "\t")
Dux.from_parquet("data/**/*.parquet")
Dux.from_ndjson("events.ndjson")
Dux.from_query("SELECT * FROM read_parquet('s3://bucket/data.parquet')")

# Write
Dux.to_csv(df, "output.csv")
Dux.to_parquet(df, "output.parquet", compression: :zstd)
Dux.to_ndjson(df, "output.ndjson")

S3, HTTP, Postgres, MySQL, SQLite — all via DuckDB extensions. No separate libraries needed.

Distributed queries

Dux distributes analytical workloads across a BEAM cluster:

# Workers auto-register via :pg
workers = Dux.Remote.Worker.list()

# Mark for distributed, then use the same verbs
Dux.from_parquet("data/**/*.parquet")
|> Dux.distribute(workers)
|> Dux.filter(amount > 100)
|> Dux.group_by(:region)
|> Dux.summarise(total: sum(amount))
|> Dux.to_rows()

No function serialization — %Dux{} is plain data. Ship it anywhere, compile to SQL there. No cluster manager — just libcluster + :pg. No heavyweight RPC — just :erpc.multicall.

Graph analytics

graph = Dux.Graph.new(vertices: users, edges: follows)

# All algorithms are verb compositions
graph |> Dux.Graph.pagerank() |> Dux.sort_by(desc: :rank) |> Dux.head(10)
graph |> Dux.Graph.shortest_paths(start_node)
graph |> Dux.Graph.connected_components()
graph |> Dux.Graph.triangle_count()

# Distribute graph across workers
graph |> Dux.Graph.distribute(workers) |> Dux.Graph.pagerank()

Nx interop

Numeric columns become tensors:

tensor = Dux.to_tensor(df, :price)
# #Nx.Tensor<f64[1000] [...]>

Dux implements Nx.LazyContainer for use in defn.

Raw SQL escape hatch

For anything the macro doesn’t support — window functions, CASE WHEN, PIVOT, CTEs — use the _with variants with raw DuckDB SQL:

# Window functions
Dux.mutate_with(df, rank: "ROW_NUMBER() OVER (PARTITION BY \"dept\" ORDER BY \"salary\" DESC)")

# CASE WHEN
Dux.mutate_with(df, tier: "CASE WHEN amount > 1000 THEN &#39;high&#39; ELSE &#39;low&#39; END")

# Pivot
Dux.from_query("PIVOT sales ON product USING SUM(amount) GROUP BY region")

# Any DuckDB SQL
Dux.from_query("SELECT * FROM read_parquet(&#39;s3://bucket/data.parquet&#39;) WHERE year = 2025")

License

Dual-licensed under Apache 2.0 and MIT. See LICENSE-APACHE and LICENSE-MIT.

Links