EctoBigquery
Ecto adapter for Google BigQuery focused on schema management and data seeding: creating and evolving BigQuery datasets and tables through Ecto migrations, and loading initial data via the Seeds helper.
Why this library exists
I work as a data engineer and part of my day-to-day is maintaining BigQuery schemas across multiple environments. For a long time our team used Terraform for that, and while it works well for provisioning infrastructure, it falls short when it comes to schema evolution — ALTER TABLE operations like adding or dropping columns aren't natively supported, which forced us to write workarounds or apply changes manually.
Having spent years working with tools like Active Record and Ecto Migrations, I believe the migration pattern is simply the best way to manage a database schema: changes are versioned, ordered, reproducible, and reversible. That mental model transfers directly to BigQuery, so I built this library to bring it there.
Scope: This library is intentionally limited to DDL operations (
mix ecto.create/drop/migrate/rollback) and seeding (INSERT/DELETE viaEctoBigquery.Seeds). It does not implement a general-purpose query interface —Repo.all,Repo.insert,Repo.update, andRepo.deleteare not supported for application tables. This is a deliberate design decision: the goal of this library is schema and seed management, not runtime data access.
Features
mix ecto.create— creates the BigQuery datasetmix ecto.drop— drops the BigQuery datasetmix ecto.migrate— runs DDL migrations (create/drop tables, add/drop/rename columns)mix ecto.rollback— reverts migrations- Seeds helper — INSERT and DELETE via BigQuery DML
Installation
Add ecto_bigquery to your dependencies in mix.exs:
def deps do
[
{:ecto_bigquery, "~> 0.1"}
]
endSetup
1. Define your Repo
defmodule MyApp.Repo do
use EctoBigquery.Repo, otp_app: :my_app
end2. Configure your application
In config/config.exs:
config :my_app, MyApp.Repo,
project_id: "my-gcp-project",
dataset_id: "my_dataset",
goth: MyApp.Goth3. Add to your supervision tree
# lib/my_app/application.ex
children = [
{Goth, name: MyApp.Goth, source: {:default, []}},
MyApp.Repo
]4. Authentication
EctoBigquery uses Goth for Google Cloud authentication.
Application Default Credentials (recommended for local development):
gcloud auth application-default loginService Account (recommended for production):
{Goth,
name: MyApp.Goth,
source: {:service_account, Jason.decode!(File.read!("credentials.json"))}}Migrations
Migrations use standard Ecto syntax. Supported operations:
Create table
defmodule MyApp.Repo.Migrations.CreateEvents do
use Ecto.Migration
def change do
create table(:events) do
add :user_id, :string, null: false
add :event_type, :string
add :amount, :decimal
add :active, :boolean
add :metadata, :map
timestamps()
end
end
endAlter table
defmodule MyApp.Repo.Migrations.AlterEvents do
use Ecto.Migration
def change do
alter table(:events) do
add :contact_email, :string
remove :legacy_field
end
rename table(:events), :amount, to: :total_amount
end
endDrop table
defmodule MyApp.Repo.Migrations.DropEvents do
use Ecto.Migration
def change do
drop table(:events)
end
endType Mapping
Standard types
| Ecto type | BigQuery type |
|---|---|
:string | STRING |
:integer | INTEGER |
:float | FLOAT |
:boolean | BOOLEAN |
:decimal | NUMERIC |
:date | DATE |
:time | TIME |
:naive_datetime | DATETIME |
:utc_datetime | TIMESTAMP |
:binary | BYTES |
:map | JSON |
Column modes
BigQuery columns support three modes as an alternative to null: false:
| Option | BigQuery mode | SQL generated |
|---|---|---|
mode: :nullable | NULLABLE | (no constraint) |
mode: :required | REQUIRED | NOT NULL |
mode: :repeated | REPEATED | ARRAY<T> |
create table(:analytics) do
add :user_id, :string, mode: :required # STRING NOT NULL
add :tags, :string, mode: :repeated # ARRAY<STRING>
add :notes, :string, mode: :nullable # STRING (explicit)
endmode: :repeated is a shorthand for {:array, type}. Both are equivalent:
add :tags, :string, mode: :repeated # ARRAY<STRING>
add :tags, {:array, :string} # ARRAY<STRING>
The generator supports :repeated and :nullable as modifiers:
mix ecto_bigquery.gen.migration create_analytics user_id:string:required tags:string:repeated notes:string:nullableSpecial BigQuery types
| Ecto type | BigQuery type |
|---|---|
:geography | GEOGRAPHY |
:bignumeric | BIGNUMERIC |
{:array, :string} | ARRAY<STRING> |
{:array, :integer} | ARRAY<INTEGER> |
{:struct, [field: :type, ...]} | STRUCT<field TYPE, ...> |
{:array, {:struct, [...]}} | ARRAY<STRUCT<...>> |
create table(:locations) do
add :point, :geography
add :exact_amount, :bignumeric
add :tags, {:array, :string}
add :scores, {:array, :integer}
add :address, {:struct, [street: :string, city: :string, zip: :string]}
add :contacts, {:array, {:struct, [name: :string, email: :string]}}
endCREATE TABLE `locations` (
point GEOGRAPHY,
exact_amount BIGNUMERIC,
tags ARRAY<STRING>,
scores ARRAY<INTEGER>,
address STRUCT<street STRING, city STRING, zip STRING>,
contacts ARRAY<STRUCT<name STRING, email STRING>>
)Seeds
Use EctoBigquery.Seeds in priv/repo/seeds.exs:
# Option A: via Repo (Goth must be running in the supervision tree)
EctoBigquery.Seeds.insert(MyApp.Repo, "users", [
%{id: "1", name: "Alice", active: true},
%{id: "2", name: "Bob", active: false}
])
# Atomic multi-table insert — rolls back all if any insert fails
EctoBigquery.Seeds.insert_all(MyApp.Repo, [
{"users", [%{id: "1", name: "Alice"}]},
{"products", [%{sku: "P1", price: 9.99}]}
])
EctoBigquery.Seeds.delete(MyApp.Repo, "users", "active = FALSE")
EctoBigquery.Seeds.truncate(MyApp.Repo, "users")
# Option B: explicit connection (useful in standalone scripts)
config = MyApp.Repo.config()
project_id = Keyword.get(config, :project_id)
dataset_id = Keyword.get(config, :dataset_id)
{token, 0} = System.cmd("gcloud", ["auth", "print-access-token"])
conn = EctoBigquery.Connection.new(String.trim(token))
EctoBigquery.Seeds.insert_with_conn(conn, project_id, dataset_id, "users", [
%{id: "1", name: "Alice", active: true},
%{id: "2", name: "Bob", active: false}
])
EctoBigquery.Seeds.delete_with_conn(conn, project_id, dataset_id, "users", "active = FALSE")
EctoBigquery.Seeds.truncate_with_conn(conn, project_id, dataset_id, "users")Migration generator
mix ecto_bigquery.gen.migration generates migration files with pre-filled column definitions based on the migration name pattern.
# Create table
mix ecto_bigquery.gen.migration create_users name:string age:integer active:boolean
# NOT NULL columns
mix ecto_bigquery.gen.migration create_products sku:string:required price:decimal:required
# Add column
mix ecto_bigquery.gen.migration add_email_to_users email:string:required
# Remove column
mix ecto_bigquery.gen.migration remove_phone_from_users phone
# Drop table
mix ecto_bigquery.gen.migration drop_legacy_data
# Special types
mix ecto_bigquery.gen.migration create_places location:geography amount:bignumeric "tags:array[string]"Generator type aliases
| Alias | Ecto type |
|---|---|
string, text | :string |
integer, int | :integer |
float | :float |
decimal, numeric | :decimal |
boolean, bool | :boolean |
date | :date |
datetime, utc_datetime | :utc_datetime |
naive_datetime | :naive_datetime |
binary | :binary |
map, json | :map |
geography | :geography |
bignumeric | :bignumeric |
array[TYPE] | {:array, TYPE} |
Note:
STRUCTtypes must be written manually in the migration file since their nested fields cannot be expressed in a single CLI argument.
Mix tasks
# Create the BigQuery dataset
mix ecto.create
# Drop the dataset (deletes all tables)
mix ecto.drop
# Run pending migrations
mix ecto.migrate
# Revert the last migration
mix ecto.rollback
# Run seeds
mix run priv/repo/seeds.exsOut of scope
The following are intentionally not supported:
- Generic query interface —
Repo.all,Repo.insert,Repo.update,Repo.deletefor application tables. Usegoogle_api_big_querydirectly for runtime data access. - DDL transactions — BigQuery does not support transactional DDL. If a migration with multiple changes fails midway, previously applied changes remain permanent.