Continuum
English | 简体中文
Continuum is a durable execution engine for Elixir. Write a multi-step business process as straight-line Elixir code; failures, restarts, and node death cause the workflow to resume exactly where it left off with identical state, by replaying its event history through the same pure orchestration code.
It is OTP-native and Postgres-backed — no separate cluster service, no paid SaaS dependency, no polyglot SDK. Continuum lives in your application's supervision tree and uses the database you already run.
Why Continuum
Continuum is to durable execution what Phoenix is to web and Oban is to job queues: the obvious answer to "how do I run a multi-step business process that survives a crash?" for Elixir-first teams.
- Straight-line code. Express orchestration as ordinary Elixir control
flow —
case,with, comprehensions. Effects go throughactivity/2,await signal, andtimer; everything else is pure. - Deterministic replay. A run re-executes from the top on every wake.
Structured cursor identity means any divergence between replay and the
original execution surfaces as a loud
Continuum.ReplayDriftError, never silent corruption. - One dependency. Postgres is the only thing you need to operate — it is
the journal, the lease store, the timer wheel, and the signal bus
(
LISTEN/NOTIFY). - It's just OTP. Continuum is a supervision tree you add to your own app. Crash recovery, leasing, and back-pressure are built on processes, not an external coordinator.
Deliberately out of scope: polyglot SDKs, cross-language activities, a separate cluster service, and Kubernetes operators.
Quickstart
defmodule MyApp.OrderFlow do
use Continuum.Workflow, version: 1
def run(%{order_id: id, items: items}) do
{:ok, validated} = activity Validation.check(items)
{:ok, charge} =
activity Payments.charge(id, validated.total),
retry: [max_attempts: 5, backoff: :exponential],
compensate: {Payments, :refund, [id]}
case await signal(:fraud_review, timeout: hours(24)) do
:approved -> activity Fulfillment.ship(id)
:rejected ->
compensate(charge)
{:error, :fraud_rejected}
:timeout -> activity Fulfillment.ship(id)
end
end
end
{:ok, run_id} = Continuum.start(MyApp.OrderFlow, %{order_id: "o1", items: [...]})
# from anywhere — durable mailbox, survives restarts
:ok = Continuum.signal(run_id, :fraud_review, :approved)
# blocks via PubSub with poll fallback
{:ok, %{state: :completed, result: result}} = Continuum.await(run_id, 30_000)
Installation
Add Continuum and a Postgres driver to your dependencies:
def deps do
[
{:continuum, "~> 0.5"},
{:postgrex, "~> 0.19"}
]
end
Point Continuum at your repo:
# config/config.exs
config :continuum, repo: MyApp.Repo, journal: Continuum.Runtime.Journal.Postgres
Generate and run the migration:
mix continuum.gen.migration --repo MyApp.Repo
mix ecto.migrate
Add Continuum's runtime children to your supervision tree, after your repo:
def start(_type, _args) do
children =
[
MyApp.Repo,
{Phoenix.PubSub, name: MyApp.PubSub}
] ++
Continuum.children() ++
[MyAppWeb.Endpoint]
Supervisor.start_link(children, strategy: :one_for_one, name: MyApp.Supervisor)
end
Features
Determinism by construction
- Workflow code is pure-by-construction and re-executed top-to-bottom on every wake; only effects produce side-visible work.
- A compile-time AST scanner rejects non-deterministic calls
(
DateTime.utc_now,:rand.*,:ets.*,Process.send,Kernel.apply, …) with remediation hints. Helper modules opt in viause Continuum.Pureor aconfig :continuum, trusted_modules: [...]allowlist. - Deterministic primitives —
Continuum.now/0,today/0,uuid4/0,random/0, and theside_effect/1escape hatch — capture stable cursor identity at compile time.
Durable execution
- Postgres journal with lease + fencing-token CAS on every write. A stolen lease produces a write failure and terminates the stale engine — it never corrupts history.
- Built-in activity worker pool (no Oban dependency):
FOR UPDATE SKIP LOCKEDclaim, exponential backoff, per-task fencing, and an atomic result-and-task commit, with retry/timeout policy viause Continuum.Activity. - Durable timers and signals over
pg_notify/LISTEN.await signal(name, timeout: ms)resolves the signal/timeout race deterministically. - Crash survival. Kill the engine pid mid-flight; the dispatcher re-leases the run and replay completes from the journaled history. Boot-time recovery rescues orphaned runs, tasks, and timers without stealing live remote leases.
- Cross-run idempotency keyed on
(activity_module, idempotency_key), so activities are exactly-once-ish across runs.
Workflow composition
- Sagas / compensation — attach
compensate:to an activity, thencompensate/1orcompensate_all/0to roll back completed work in deterministic LIFO (or parallel) order. - Parent/child workflows —
await child Mod.run(input),start_child/3, andawait_child/1for durable fan-out/fan-in. continue_as_new/1— complete the current run and start a successor with fresh history for long-running loops.- Workflow versioning — journaled
Continuum.patched?/1markers for safe in-place edits, and content-addressed(workflow, version_hash)dispatch that marks missing code:stuck_unknown_versionrather than replaying through changed logic.
Operations & observability
Continuum.Observer— an optional Phoenix LiveView with a runs index, a decoded per-run event timeline, and operator actions for cancelling a run and injecting a signal.Continuum.OpenTelemetry— an opt-in bridge that turns Continuum telemetry intorun_attempt/activity_attemptspans, linked back through a persisted W3Ctraceparent.- 24+ documented telemetry events under the
[:continuum, …]prefix. - Operator tooling — monthly-partitioned events, opt-in history snapshots,
the read-only
mix continuum.audit, and dry-run-by-default cleanup tasks.
Multi-tenancy & clustering
- Named multi-instance runtimes via
Continuum.children(name:, repo:), each bound to its own Ecto repo. - Namespaces — a soft tenant boundary for list/query; single-run operations
stay keyed by global
run_id. - Search attributes and structured queries —
attributes:/Continuum.set_attributes/3plusContinuum.query/1,2. - Cluster-aware wake routing over
:pgfor cross-node wakeups. The Postgres lease and fencing token remain the sole authority for writes.
Testing
Continuum.Test provides an in-memory journal for fast unit tests, Postgres
helpers for integration tests, signal/timer injection, golden-history replay,
and an opt-in paranoid re-replay mode that catches divergence.
Parent/child example
defmodule MyApp.BatchFlow do
use Continuum.Workflow, version: 1
def run(%{order_ids: ids}) do
ids
|> Enum.map(fn id ->
start_child MyApp.OrderFlow, %{order_id: id}, id: id
end)
|> Enum.map(&await_child/1)
end
end
Observer
The optional Continuum.Observer LiveView lists runs, renders the journal
event timeline per run, and exposes operator actions for cancelling a run and
sending a signal. It is mounted from a host Phoenix router and ships no
authentication of its own — wrap it in your existing admin pipeline.
import Continuum.Observer.Router
scope "/admin" do
pipe_through [:browser, :authenticate_admin]
continuum_observer "/continuum", instance: :myapp_continuum
end
To see the UI locally, the repo bundles a self-contained demo:
docker compose up -d
MIX_ENV=test iex -S mix run dev/observer_demo.exs
# then open http://localhost:4000/continuum
The demo seeds three runs in different states and prints iex helpers for
spawning more, sending signals, and cancelling. See
guides/observer.md for production mount instructions.
Documentation
Full docs are published on HexDocs. The guides cover the entire surface:
- Your first workflow
- Activities, retries, and idempotency
- Determinism rules and replay drift
- Sagas and compensation · Child workflows · Long-running workflows
- Patching workflows · Workflow versioning
- Multi-instance Continuum · Clustering · Namespaces
- Search attributes and structured queries
- Operations · Auditing · Observer · Observability (OpenTelemetry) · Snapshots
See examples/continuum_example_orders
for a Phoenix app exercising activity → signal/timeout → compensation,
parent/child batches, continue_as_new, per-workflow snapshots, namespaces,
the Observer, and OpenTelemetry.
Upgrading? See the migration guides .
Status
Continuum is v0.5 (pre-1.0). The durable engine, determinism enforcement,
workflow composition, observability, and clustering surface are implemented and
covered by tests, including crash-resume, lease-fencing races, and
property-based replay. APIs may still change before 1.0 — pin to a specific
0.x in production. See CHANGELOG.md for release history.
Development
A docker-compose.yml brings up Postgres for local development and tests.
mix deps.get
docker compose up -d # Postgres on localhost:5432
mix compile --warnings-as-errors
mix test # unit + integration suite
mix test.cluster # real :peer cluster tests (run separately)
mix format
License
Copyright 2026 The Continuum Authors. (yyeger)
Licensed under the Apache License, Version 2.0.