ExArrow

CIHex versionHex docsLicense

Native Apache Arrow for the BEAM: IPC streaming, Arrow Flight, and ADBC database bindings. Column data lives in Rust buffers; Elixir holds lightweight opaque handles. Precompiled NIFs for Linux, macOS, and Windows — no Rust required to use.


Contents


Why ExArrow was built

The Arrow ecosystem has become the de-facto interchange standard for columnar data. Python, R, Rust, Java, Go, and C++ all speak Arrow natively. Data warehouses, query engines, stream processors, ML frameworks, and databases expose Arrow Flight endpoints or ADBC interfaces. The BEAM had no first-class way to participate in this ecosystem.

ExArrow was written to fill that gap. It gives Elixir and Erlang applications the same low-level, zero-copy Arrow primitives that the rest of the ecosystem already takes for granted — without requiring callers to understand NIF memory management, dirty schedulers, or the Arrow C Data Interface.

The design goal is intentionally narrow: be the reliable Arrow transport and interchange layer for the BEAM, and let other libraries (Explorer, Nx, etc.) do the analysis on top of it.


What it brings to the Elixir Ecosystem

Prior to ExArrow, an Elixir application that needed to exchange data with a Flight server, query a database via ADBC, or read/write an Arrow IPC file had three options: shell out to Python, implement the protocol manually in Elixir (row-by-row, with all the copying that entails), or simply not do it.

ExArrow adds:


How ExArrow differs from Explorer, Nx, ADBC, and ExZarr

These libraries are complementary, not competing. Each has a distinct role.

Library Role Overlap with ExArrow
Explorer In-memory dataframe analysis (filter, group, sort, plot). Backed by Polars/Arrow internally. Explorer can load/dump Arrow IPC streams. ExArrow is the transport; Explorer is the analysis layer.
Nx Numerical computing and tensor operations (multi-dimensional arrays, GPU support, ML). Nx tensors and Arrow columns are both typed flat arrays. There is currently no direct bridge, but ExArrow IPC can produce data for downstream tensor conversion.
adbc (livebook-dev) Elixir wrapper around the ADBC C library for driver management — downloading and configuring drivers. ExArrow uses adbc optionally for driver download; adbc's core purpose is driver lifecycle, not Arrow streaming or Flight.
ExZarr Read/write Zarr v2/v3 chunked array format (used in climate science, genomics, cloud-native ND arrays). Zarr and Arrow are complementary storage formats. ExZarr addresses ND chunk storage; ExArrow addresses columnar interchange and network transport.

In short: ExArrow is a transport and interchange library. It moves Arrow data between processes, databases, services, and files as efficiently as possible. It does not analyse, transform, or visualise data — that is the job of Explorer, Nx, or your own application logic.


Where ExArrow fits

flowchart TB
    App("Your Elixir Application")

    App --> Explorer("Explorer\ndataframes & analysis")
    App --> Nx("Nx\ntensors & ML")
    App --> ExArrow("ExArrow\nIPC · Flight · ADBC")
    App --> ExZarr("ExZarr\nZarr chunked arrays")

    ExArrow --> IPC("Arrow IPC\nstream & file")
    ExArrow --> FlightSvr("Arrow Flight\ngRPC server")
    ExArrow --> ADBCDrv("ADBC\ndriver")

    IPC -. "interop via IPC binary" .-> Explorer

    FlightSvr --> FlightSvcs("Dremio · InfluxDB IOx\nDuckDB · Snowflake")
    ADBCDrv   --> Databases("PostgreSQL · SQLite\nDuckDB · BigQuery")

    classDef app      fill:#1a1a2e,stroke:#4a90d9,color:#e0e0e0,rx:6
    classDef lib      fill:#16213e,stroke:#4a90d9,color:#e0e0e0,rx:6
    classDef proto    fill:#0f3460,stroke:#4a90d9,color:#e0e0e0,rx:6
    classDef external fill:#1a1a2e,stroke:#888,color:#aaa,rx:6,stroke-dasharray:4 4

    class App app
    class Explorer,Nx,ExArrow,ExZarr lib
    class IPC,FlightSvr,ADBCDrv proto
    class FlightSvcs,Databases external

ExArrow sits at the boundary between the BEAM and the Arrow ecosystem. It speaks the protocols that data infrastructure uses — IPC, Flight, ADBC — and surfaces them as idiomatic Elixir APIs. Explorer and Nx sit above it and consume the data it delivers.


What this enables


Requirements


Installation

Add the dependency:

def deps do
  [{:ex_arrow, "~> 0.1.0"}]
end

Using precompiled NIFs (default) After mix deps.get and mix compile, ExArrow downloads a prebuilt NIF for your platform from the project's GitHub releases. No Rust or C toolchain is required. Supported platforms: Linux x8664/aarch64, macOS x8664/arm64, Windows x8664.

Building from source If no precompiled NIF exists for your platform, or you are developing ExArrow itself, set EX_ARROW_BUILD=1 and have Rust installed:

EX_ARROW_BUILD=1 mix deps.get
EX_ARROW_BUILD=1 mix compile

The optional dependency {:rustler, "~> 0.32.0", optional: true} is required for source builds and is already listed in ExArrow's own mix.exs.

For path dependencies (e.g. Livebook or Mix.install), add rustler explicitly and have Rust available:

Mix.install([
  {:ex_arrow, path: "/path/to/ex_arrow"},
  {:rustler, "~> 0.37.3", optional: true}
])

Alternatively, use the published Hex package so the precompiled NIF is used and no Rust is needed: Mix.install([{:ex_arrow, "~> 0.1.0"}]).


Quick start

Read an Arrow IPC stream:

{:ok, stream} = ExArrow.IPC.Reader.from_file("/path/to/data.arrow")
{:ok, schema} = ExArrow.Stream.schema(stream)
fields = ExArrow.Schema.fields(schema)

case ExArrow.Stream.next(stream) do
  %ExArrow.RecordBatch{} = batch -> IO.inspect(ExArrow.RecordBatch.num_rows(batch))
  nil -> :done
  {:error, msg} -> IO.puts("Error: #{msg}")
end

Connect to an Arrow Flight server:

{:ok, client} = ExArrow.Flight.Client.connect("localhost", 9999, [])
{:ok, stream} = ExArrow.Flight.Client.do_get(client, "my_ticket")
{:ok, schema} = ExArrow.Stream.schema(stream)
batch = ExArrow.Stream.next(stream)

Query a database with ADBC:

{:ok, db} = ExArrow.ADBC.Database.open(driver_name: "adbc_driver_sqlite", uri: ":memory:")
{:ok, conn} = ExArrow.ADBC.Connection.open(db)
{:ok, stmt} = ExArrow.ADBC.Statement.new(conn, "SELECT 1 AS n")
{:ok, stream} = ExArrow.ADBC.Statement.execute(stmt)
{:ok, schema} = ExArrow.Stream.schema(stream)
batch = ExArrow.Stream.next(stream)

Livebook tutorials

Interactive notebooks (open in Livebook):

See livebook/README.md for run instructions.


IPC: stream and file

Stream (sequential) — from binary or file path:

{:ok, stream} = ExArrow.IPC.Reader.from_binary(ipc_bytes)
{:ok, stream} = ExArrow.IPC.Reader.from_file("/data/events.arrow")

{:ok, schema} = ExArrow.Stream.schema(stream)

Stream.repeatedly(fn -> ExArrow.Stream.next(stream) end)
|> Enum.take_while(&(&1 != nil and not match?({:error, _}, &1)))

Write to binary or file:

{:ok, binary} = ExArrow.IPC.Writer.to_binary(schema, batches)
:ok = ExArrow.IPC.Writer.to_file("/out/result.arrow", schema, batches)

File format (random access):

{:ok, file} = ExArrow.IPC.File.from_file("/data/large.arrow")
{:ok, schema} = ExArrow.IPC.File.schema(file)
n = ExArrow.IPC.File.batch_count(file)
{:ok, batch} = ExArrow.IPC.File.get_batch(file, 0)

Arrow Flight: client and server

Start the built-in echo server:

{:ok, server} = ExArrow.Flight.Server.start_link(9999)
{:ok, port} = ExArrow.Flight.Server.port(server)
:ok = ExArrow.Flight.Server.stop(server)

Transfer data:

{:ok, client} = ExArrow.Flight.Client.connect("localhost", 9999, [])

:ok = ExArrow.Flight.Client.do_put(client, schema, [batch1, batch2])

{:ok, stream} = ExArrow.Flight.Client.do_get(client, "echo")
batch = ExArrow.Stream.next(stream)

Metadata:

{:ok, flights} = ExArrow.Flight.Client.list_flights(client, <<>>)
{:ok, info}    = ExArrow.Flight.Client.get_flight_info(client, {:cmd, "echo"})
{:ok, schema}  = ExArrow.Flight.Client.get_schema(client, {:cmd, "echo"})
{:ok, actions} = ExArrow.Flight.Client.list_actions(client)
{:ok, ["pong"]} = ExArrow.Flight.Client.do_action(client, "ping", <<>>)

Flight is plaintext only in this release. Products that speak Arrow Flight include Dremio, InfluxDB IOx, and custom analytics servers.


ADBC: database to Arrow streams

SQLite in-memory:

{:ok, db}   = ExArrow.ADBC.Database.open(driver_name: "adbc_driver_sqlite", uri: ":memory:")
{:ok, conn} = ExArrow.ADBC.Connection.open(db)
{:ok, stmt} = ExArrow.ADBC.Statement.new(conn, "SELECT 1 AS n, &#39;hello&#39; AS s")
{:ok, stream} = ExArrow.ADBC.Statement.execute(stmt)
batch = ExArrow.Stream.next(stream)

PostgreSQL:

{:ok, db} = ExArrow.ADBC.Database.open(
  driver_name: "adbc_driver_postgresql",
  uri: "postgresql://user:pass@localhost:5432/mydb"
)
{:ok, conn}   = ExArrow.ADBC.Connection.open(db)
{:ok, stmt}   = ExArrow.ADBC.Statement.new(conn, "SELECT id, name FROM users")
{:ok, stream} = ExArrow.ADBC.Statement.execute(stmt)

Metadata:

{:ok, types_stream} = ExArrow.ADBC.Connection.get_table_types(conn)
{:ok, schema}       = ExArrow.ADBC.Connection.get_table_schema(conn, nil, nil, "users")
{:ok, objs_stream}  = ExArrow.ADBC.Connection.get_objects(conn, depth: "tables")

Optional driver download via the adbc package:

# Add {:adbc, "~> 0.7"} to deps, then:
Adbc.download_driver!(:sqlite)
{:ok, db} = ExArrow.ADBC.Database.open(driver_name: "adbc_driver_sqlite", uri: ":memory:")

Or use the convenience helper which calls Adbc.download_driver!/1 when the package is available: ExArrow.ADBC.DriverHelper.ensure_driver_and_open/2.


Using ExArrow with Explorer

Explorer handles in-memory analysis. ExArrow handles streaming and transport. They connect via Arrow IPC.

ExArrow to Explorer:

{:ok, stream} = ExArrow.IPC.Reader.from_file("/data/source.arrow")
{:ok, schema} = ExArrow.Stream.schema(stream)
batches =
  Stream.repeatedly(fn -> ExArrow.Stream.next(stream) end)
  |> Enum.take_while(fn nil -> false; {:error, _} -> false; _ -> true end)
{:ok, binary} = ExArrow.IPC.Writer.to_binary(schema, batches)
df = Explorer.DataFrame.load_ipc_stream!(binary)

Explorer to ExArrow:

df = Explorer.DataFrame.new(x: [1, 2, 3], y: ["a", "b", "c"])
binary = Explorer.DataFrame.dump_ipc_stream!(df)
{:ok, stream} = ExArrow.IPC.Reader.from_binary(binary)
batch = ExArrow.Stream.next(stream)

Use case examples

Ingest IPC from HTTP or Kafka and write to file

ipc_bytes = get_arrow_stream_from_http_or_kafka()
{:ok, stream} = ExArrow.IPC.Reader.from_binary(ipc_bytes)
{:ok, schema} = ExArrow.Stream.schema(stream)
batches =
  Stream.repeatedly(fn -> ExArrow.Stream.next(stream) end)
  |> Enum.take_while(fn nil -> false; {:error, _} -> false; _ -> true end)
:ok = ExArrow.IPC.Writer.to_file("/data/ingested.arrow", schema, batches)

Query a database and forward via Flight

{:ok, db}     = ExArrow.ADBC.Database.open(driver_name: "adbc_driver_sqlite", uri: "file:report.db")
{:ok, conn}   = ExArrow.ADBC.Connection.open(db)
{:ok, stmt}   = ExArrow.ADBC.Statement.new(conn, "SELECT * FROM sales WHERE year = 2024")
{:ok, stream} = ExArrow.ADBC.Statement.execute(stmt)
{:ok, schema} = ExArrow.Stream.schema(stream)
batches =
  Stream.repeatedly(fn -> ExArrow.Stream.next(stream) end)
  |> Enum.take_while(fn nil -> false; {:error, _} -> false; _ -> true end)

{:ok, client} = ExArrow.Flight.Client.connect("flight.example.com", 32010, [])
:ok = ExArrow.Flight.Client.do_put(client, schema, batches)

Connect to Dremio, InfluxDB IOx, or a custom Flight service

{:ok, client}  = ExArrow.Flight.Client.connect("dremio.example.com", 32010, connect_timeout_ms: 5_000)
{:ok, flights} = ExArrow.Flight.Client.list_flights(client, <<>>)
{:ok, stream}  = ExArrow.Flight.Client.do_get(client, ticket_from_service)
batch = ExArrow.Stream.next(stream)

Interchange with Python or R

# Read a file written by PyArrow or Pandas
{:ok, file}   = ExArrow.IPC.File.from_file("/data/from_python.arrow")
{:ok, schema} = ExArrow.IPC.File.schema(file)
n = ExArrow.IPC.File.batch_count(file)
for i <- 0..(n - 1) do
  {:ok, batch} = ExArrow.IPC.File.get_batch(file, i)
  # process batch
end

# Write for Python, R, or DuckDB
:ok = ExArrow.IPC.Writer.to_file("/data/for_python.arrow", schema, batches)

End-to-end: ADBC to Flight

{:ok, db}     = ExArrow.ADBC.Database.open(driver_name: "adbc_driver_postgresql",
                  uri: "postgresql://localhost/mydb")
{:ok, conn}   = ExArrow.ADBC.Connection.open(db)
{:ok, stmt}   = ExArrow.ADBC.Statement.new(conn, "SELECT * FROM sensor_readings")
{:ok, stream} = ExArrow.ADBC.Statement.execute(stmt)
{:ok, schema} = ExArrow.Stream.schema(stream)
batches =
  Stream.repeatedly(fn -> ExArrow.Stream.next(stream) end)
  |> Enum.take_while(fn nil -> false; {:error, _} -> false; _ -> true end)

{:ok, client} = ExArrow.Flight.Client.connect("flight.internal", 32010, [])
:ok = ExArrow.Flight.Client.do_put(client, schema, batches)

Benchmarks

ExArrow ships a Benchee-based benchmark suite in bench/ that quantifies the zero-copy streaming advantage over row-oriented alternatives.

Running locally

Benchee is a :dev-only dependency; MIX_ENV=dev is required.

MIX_ENV=dev mix run bench/ipc_read_bench.exs   # single suite
MIX_ENV=dev mix run bench/run_all.exs           # all suites
MIX_ENV=dev mix bench                           # convenience alias

HTML reports are written to bench/output/ (gitignored).

Suites

File What it measures
ipc_read_bench.exs Stream handle vs materialise — BEAM memory saved by keeping data native
ipc_write_bench.exs IPC serialisation vs :erlang.term_to_binary — columnar vs row-oriented write
flight_bench.exs Flight doput / doget / roundtrip latency with in-process server
adbc_bench.exs Stream handle vs schema peek vs full collect
pipeline_bench.exs End-to-end: IPC file on disk to Flight doput without materialising in BEAM

Published results

Results from every push to main are published at: https://thanos.github.io/ex_arrow/dev/bench/

The CI workflow posts a PR alert comment when any scenario regresses more than 20% relative to the previous baseline.


Documentation

API reference: mix docs or hexdocs.pm/ex_arrow.


Development

mix deps.get
EX_ARROW_BUILD=1 mix compile    # build NIF from source
mix test                         # exclude :adbc / :adbc_package tags if no drivers installed
mix docs                         # generate ExDoc
MIX_ENV=dev mix bench            # run benchmark suite

Local CI script (runs format, credo, dialyzer, tests, coverage, docs):

script/ci

Roadmap

The items below represent the planned direction for ExArrow. Contributions are welcome for any of them.

Near-term (v0.3)

Longer-term


FAQ

When should I use ExArrow? Use ExArrow when you need to read or write Arrow IPC (stream or file), connect to an Arrow Flight server (Dremio, InfluxDB IOx, custom), or run SQL via ADBC and receive Arrow result streams. Good fit for data pipelines, ETL, and interchange with systems that already speak Arrow.

When should I not use ExArrow? Do not use it as a dataframe or query engine. For in-memory analysis, filtering, grouping, and plotting, use Explorer. Do not use it as a replacement for Ecto when you only need normal SQL results. For Parquet-only workflows with no Flight/ADBC, consider Explorer's Parquet support first.

Can I use ExArrow and Explorer together? Yes. ExArrow handles transport and protocol layers. Use ExArrow.IPC.Writer.to_binary/2 to produce IPC, then Explorer.DataFrame.load_ipc_stream!/1 to load it. In the other direction, Explorer.DataFrame.dump_ipc_stream!/1 produces bytes that ExArrow.IPC.Reader.from_binary/1 can read.

Why do I get a 404 or "couldn't fetch NIF" on compile? Precompiled NIFs are hosted on GitHub releases. If you are on an unsupported platform or an unreleased version, the download fails. Set EX_ARROW_BUILD=1, install Rust, and run mix compile to build from source.

Is Arrow Flight over TLS supported? Yes, as of v0.2. Pass tls: [cert_pem: ..., key_pem: ...] to Server.start_link/2 for one-way TLS, or add ca_cert_pem: for mutual TLS. The client automatically selects TLS for non-loopback hosts; use tls: [ca_cert_pem: pem] for a custom CA.

Which ADBC drivers are supported? Any ADBC driver that provides a shared library — for example adbc_driver_sqlite, adbc_driver_postgresql, or the DuckDB ADBC driver. You must install the driver and pass its path, or ensure the driver manager can find it. Metadata and binding support depend on the individual driver.


License

MIT. See LICENSE for details. Copyright (c) 2025 Thanos Vassilakis.