EctoBigquery

Ecto adapter for Google BigQuery focused on schema management and data seeding: creating and evolving BigQuery datasets and tables through Ecto migrations, and loading initial data via the Seeds helper.

Why this library exists

I work as a data engineer and part of my day-to-day is maintaining BigQuery schemas across multiple environments. For a long time our team used Terraform for that, and while it works well for provisioning infrastructure, it falls short when it comes to schema evolution — ALTER TABLE operations like adding or dropping columns aren't natively supported, which forced us to write workarounds or apply changes manually.

Having spent years working with tools like Active Record and Ecto Migrations, I believe the migration pattern is simply the best way to manage a database schema: changes are versioned, ordered, reproducible, and reversible. That mental model transfers directly to BigQuery, so I built this library to bring it there.

Scope: This library is intentionally limited to DDL operations (mix ecto.create/drop/migrate/rollback) and seeding (INSERT/DELETE via EctoBigquery.Seeds). It does not implement a general-purpose query interface — Repo.all, Repo.insert, Repo.update, and Repo.delete are not supported for application tables. This is a deliberate design decision: the goal of this library is schema and seed management, not runtime data access.

Features

mix ecto.create — creates the BigQuery dataset
mix ecto.drop — drops the BigQuery dataset
mix ecto.migrate — runs DDL migrations (create/drop tables, add/drop/rename columns)
mix ecto.rollback — reverts migrations
Seeds helper — INSERT and DELETE via BigQuery DML

Installation

Add ecto_bigquery to your dependencies in mix.exs:

def deps do
  [
    {:ecto_bigquery, "~> 0.1"}
  ]
end

Setup

1. Define your Repo

defmodule MyApp.Repo do
  use EctoBigquery.Repo, otp_app: :my_app
end

2. Configure your application

In config/config.exs:

config :my_app, MyApp.Repo,
  project_id: "my-gcp-project",
  dataset_id: "my_dataset",
  goth: MyApp.Goth

3. Add to your supervision tree

# lib/my_app/application.ex
children = [
  {Goth, name: MyApp.Goth, source: {:default, []}},
  MyApp.Repo
]

4. Authentication

EctoBigquery uses Goth for Google Cloud authentication.

Application Default Credentials (recommended for local development):

gcloud auth application-default login

Service Account (recommended for production):

{Goth,
  name: MyApp.Goth,
  source: {:service_account, Jason.decode!(File.read!("credentials.json"))}}

Migrations

Migrations use standard Ecto syntax. Supported operations:

Create table

defmodule MyApp.Repo.Migrations.CreateEvents do
  use Ecto.Migration

  def change do
    create table(:events) do
      add :user_id,    :string,       null: false
      add :event_type, :string
      add :amount,     :decimal
      add :active,     :boolean
      add :metadata,   :map
      timestamps()
    end
  end
end

Alter table

defmodule MyApp.Repo.Migrations.AlterEvents do
  use Ecto.Migration

  def change do
    alter table(:events) do
      add    :contact_email, :string
      remove :legacy_field
    end

    rename table(:events), :amount, to: :total_amount
  end
end

Drop table

defmodule MyApp.Repo.Migrations.DropEvents do
  use Ecto.Migration

  def change do
    drop table(:events)
  end
end

Type Mapping

Standard types

Ecto type	BigQuery type
`:string`	`STRING`
`:integer`	`INTEGER`
`:float`	`FLOAT`
`:boolean`	`BOOLEAN`
`:decimal`	`NUMERIC`
`:date`	`DATE`
`:time`	`TIME`
`:naive_datetime`	`DATETIME`
`:utc_datetime`	`TIMESTAMP`
`:binary`	`BYTES`
`:map`	`JSON`

Column modes

BigQuery columns support three modes as an alternative to null: false:

Option	BigQuery mode	SQL generated
`mode: :nullable`	`NULLABLE`	(no constraint)
`mode: :required`	`REQUIRED`	`NOT NULL`
`mode: :repeated`	`REPEATED`	`ARRAY<T>`

create table(:analytics) do
  add :user_id,    :string,  mode: :required   # STRING NOT NULL
  add :tags,       :string,  mode: :repeated   # ARRAY<STRING>
  add :notes,      :string,  mode: :nullable   # STRING (explicit)
end

mode: :repeated is a shorthand for {:array, type}. Both are equivalent:

add :tags, :string,          mode: :repeated   # ARRAY<STRING>
add :tags, {:array, :string}                   # ARRAY<STRING>

The generator supports :repeated and :nullable as modifiers:

mix ecto_bigquery.gen.migration create_analytics user_id:string:required tags:string:repeated notes:string:nullable

Special BigQuery types

Ecto type	BigQuery type
`:geography`	`GEOGRAPHY`
`:bignumeric`	`BIGNUMERIC`
`{:array, :string}`	`ARRAY<STRING>`
`{:array, :integer}`	`ARRAY<INTEGER>`
`{:struct, [field: :type, ...]}`	`STRUCT<field TYPE, ...>`
`{:array, {:struct, [...]}}`	`ARRAY<STRUCT<...>>`

create table(:locations) do
  add :point,        :geography
  add :exact_amount, :bignumeric
  add :tags,         {:array, :string}
  add :scores,       {:array, :integer}
  add :address,      {:struct, [street: :string, city: :string, zip: :string]}
  add :contacts,     {:array, {:struct, [name: :string, email: :string]}}
end

CREATE TABLE `locations` (
  point GEOGRAPHY,
  exact_amount BIGNUMERIC,
  tags ARRAY<STRING>,
  scores ARRAY<INTEGER>,
  address STRUCT<street STRING, city STRING, zip STRING>,
  contacts ARRAY<STRUCT<name STRING, email STRING>>
)

Seeds

Use EctoBigquery.Seeds in priv/repo/seeds.exs:

# Option A: via Repo (Goth must be running in the supervision tree)
EctoBigquery.Seeds.insert(MyApp.Repo, "users", [
  %{id: "1", name: "Alice", active: true},
  %{id: "2", name: "Bob",   active: false}
])

# Atomic multi-table insert — rolls back all if any insert fails
EctoBigquery.Seeds.insert_all(MyApp.Repo, [
  {"users",    [%{id: "1", name: "Alice"}]},
  {"products", [%{sku: "P1", price: 9.99}]}
])

EctoBigquery.Seeds.delete(MyApp.Repo, "users", "active = FALSE")
EctoBigquery.Seeds.truncate(MyApp.Repo, "users")

# Option B: explicit connection (useful in standalone scripts)
config     = MyApp.Repo.config()
project_id = Keyword.get(config, :project_id)
dataset_id = Keyword.get(config, :dataset_id)
{token, 0} = System.cmd("gcloud", ["auth", "print-access-token"])
conn = EctoBigquery.Connection.new(String.trim(token))

EctoBigquery.Seeds.insert_with_conn(conn, project_id, dataset_id, "users", [
  %{id: "1", name: "Alice", active: true},
  %{id: "2", name: "Bob",   active: false}
])

EctoBigquery.Seeds.delete_with_conn(conn, project_id, dataset_id, "users", "active = FALSE")
EctoBigquery.Seeds.truncate_with_conn(conn, project_id, dataset_id, "users")

Migration generator

mix ecto_bigquery.gen.migration generates migration files with pre-filled column definitions based on the migration name pattern.

# Create table
mix ecto_bigquery.gen.migration create_users name:string age:integer active:boolean

# NOT NULL columns
mix ecto_bigquery.gen.migration create_products sku:string:required price:decimal:required

# Add column
mix ecto_bigquery.gen.migration add_email_to_users email:string:required

# Remove column
mix ecto_bigquery.gen.migration remove_phone_from_users phone

# Drop table
mix ecto_bigquery.gen.migration drop_legacy_data

# Special types
mix ecto_bigquery.gen.migration create_places location:geography amount:bignumeric "tags:array[string]"

Generator type aliases

Alias	Ecto type
`string`, `text`	`:string`
`integer`, `int`	`:integer`
`float`	`:float`
`decimal`, `numeric`	`:decimal`
`boolean`, `bool`	`:boolean`
`date`	`:date`
`datetime`, `utc_datetime`	`:utc_datetime`
`naive_datetime`	`:naive_datetime`
`binary`	`:binary`
`map`, `json`	`:map`
`geography`	`:geography`
`bignumeric`	`:bignumeric`
`array[TYPE]`	`{:array, TYPE}`

Note: STRUCT types must be written manually in the migration file since their nested fields cannot be expressed in a single CLI argument.

Mix tasks

# Create the BigQuery dataset
mix ecto.create

# Drop the dataset (deletes all tables)
mix ecto.drop

# Run pending migrations
mix ecto.migrate

# Revert the last migration
mix ecto.rollback

# Run seeds
mix run priv/repo/seeds.exs

Out of scope

The following are intentionally not supported:

Generic query interface — Repo.all, Repo.insert, Repo.update, Repo.delete for application tables. Use google_api_big_query directly for runtime data access.
DDL transactions — BigQuery does not support transactional DDL. If a migration with multiple changes fails midway, previously applied changes remain permanent.