Squidie

CICodecovHex.pmHexDocsLicense: Apache 2.0


Squidie is an embedded durable workflow runtime for Elixir applications.

Define workflow modules, persist runs in your application database, and execute visible work from host-owned workers with Squidie.execute_next/1.

{:ok, run} =
Squidie.start(MyApp.Workflows.Checkout, :manual, %{order_id: "order_123"})
{:ok, _snapshot} = Squidie.execute_next(owner_id: "checkout-worker-1")

Squidie stores workflow state, step attempts, retries, approvals, transitions, audit events, and recovery history in the host application's database. It does not run as a separate service, broker, or orchestration cluster.

The host application keeps its supervision tree, deployment model, repository, schedulers, queue backend, and operator surfaces. Squidie owns workflow progression, transition routing, retry semantics, pause and approval handling, replay and recovery policy, durable execution history, and graph inspection.

Queue delivery, worker supervision, and backend leasing remain host-owned concerns. Storage portability is defined by the journal storage adapter contract; the production relational implementation uses a Postgres-compatible Ecto adapter. See the storage strategy for adapter guarantees.

Adoption status Squidie provides a supported 0.1.x journal runtime for embedded host-app workflows.

Treat production rollout as an application-owned integration: run the host-app smoke and resilience checks, review the operational boundaries, and adopt the queue/leasing strategy that matches your deployment. See Production Readiness for the current baseline.

Start Here

The fastest way to start is the guided Livebook. It demonstrates creating a workflow, starting a durable run, executing work, and inspecting the result.

Run in Livebook

GoalResource
Find the right guideDocumentation guide
Run a guided interactive exampleGetting Started Livebook
Integrate Squidie into an existing applicationGetting Started guide
Review a complete working exampleMinimal host app
Add backend-owned delivery and leasesBedrock minimal host app
Review a small external OTP host appThe Beacon

The written guide covers installation, workflow creation, execution, run inspection, retries, manual gates, cron triggers, and Bedrock-backed leases. The Beacon is a compact OTP host application that uses Squidie for scheduled monitoring notifications with host-owned Bedrock delivery, cron scheduling, Discord webhooks, and file-backed seen-state.

Getting Started

Documentation and examples:

ReferenceDescription
Getting StartedSetup and first workflow run
Workflow AuthoringTriggers, steps, transitions, retries, and compensation
Host App IntegrationPhoenix and OTP integration
Reference WorkflowsApproval, recovery, saga, and cron examples
Minimal Host AppExecutable example application
Bedrock Minimal Host AppBackend-owned delivery with leases and retry requeue
ArchitectureRuntime flow and component boundaries

Installation

Add Squidie to your dependencies:

defp deps do
[
{:squidie, "~> 0.1.2"}
]
end

Configure the repo and default queue:

config :squidie,
repo: MiddleEarth.Repo,
queue: "default"

Install and run the migration:

mix deps.get
mix squidie.install
mix ecto.migrate

To keep workflow modules formatted consistently as DSL-style declarations, import Squidie formatter rules in .formatter.exs:

[
import_deps: [:squidie],
inputs: ["{mix,.formatter}.exs", "{config,lib,test}/**/*.{ex,exs}"]
]

Finally, start one host-owned executor loop. The loop is not a separate Squid Mesh service; it is just a supervised process in your application that asks Squidie for the next visible workflow attempt.

This example uses a GenServer because it is a small OTP shape for scheduling the next drain. A queue worker, cron process, or existing host scheduler can own the same Squidie.execute_next/1 call. Hosts can use Bedrock, Oban, a custom queue, or any other executor they already operate:

defmodule MyApp.SquidieWorker do
use GenServer
def start_link(opts \\ []) do
GenServer.start_link(__MODULE__, opts, name: __MODULE__)
end
def init(opts) do
owner_id = Keyword.get(opts, :owner_id, "my-app-squidie")
{:ok, %{owner_id: owner_id}, {:continue, :drain}}
end
def handle_continue(:drain, state), do: {:noreply, drain_once(state)}
def handle_info(:drain, state), do: {:noreply, drain_once(state)}
defp drain_once(state) do
interval =
case Squidie.execute_next(owner_id: state.owner_id) do
{:ok, :none} -> 100
{:ok, _snapshot} -> 0
{:error, _reason} -> 1_000
end
Process.send_after(self(), :drain, interval)
state
end
end

Add capacity limits, metrics, shutdown policy, and placement rules around the same Squidie.execute_next/1 boundary. See Host App Integration for the full host shape.

Optional: Bedrock Job Runner And Leases

Use Bedrock when the host application needs backend-owned delivery, delayed visibility, job leases, heartbeat/lease extension, retry requeue, and recovery. Keep workflow modules backend-neutral; Bedrock belongs behind host adapter modules.

If the supervised worker loop above can call Squidie.execute_next/1 often enough for your workload, start there. Add Bedrock only when the host needs a durable job backend for payload delivery, delayed visibility, worker leases, and redelivery after worker or node failure.

1. Configure Squidie

Point Squidie at the host repo. Use the same queue your host payload worker passes to Squidie.execute_next/1:

config :squidie,
repo: MyApp.Repo,
queue: "tenant_a"

2. Configure Payload Delivery

Keep the delivery adapter in the host app. It maps Squidie cron activations or drain requests into Bedrock jobs:

config :my_app, MyApp.SquidieDeliveryAdapter,
queue_id: "tenant_a",
topic: "squidie:payload"

3. Start The Host Runtime

Start the repo, Bedrock cluster, and Bedrock queue under the host supervision tree:

children = [
MyApp.Repo,
{MyApp.BedrockCluster, []},
{MyApp.JobQueue, concurrency: 5, batch_size: 10}
]
Supervisor.start_link(children, strategy: :one_for_one, name: MyApp.Supervisor)

4. Add A Delivery Adapter

The adapter owns Bedrock job enqueueing. Workflow modules should not know Bedrock exists:

defmodule MyApp.SquidieDeliveryAdapter do
alias Squidie.Executor.Payload
def enqueue_cron(_config, workflow, trigger, opts) do
payload =
Payload.cron(
workflow,
trigger,
Keyword.take(opts, [:signal_id, :intended_window])
)
MyApp.JobQueue.insert(%{
topic: "squidie:payload",
queue_id: "tenant_a",
payload: payload,
scheduled_in: opts[:schedule_in]
})
end
end

5. Add A Host Payload Worker

Bedrock leases a payload job and invokes the host callback. From there, the code is host-owned: perform/2 delivers the Squidie payload, then this example runs a bounded drain loop while the Bedrock job lease is held.

The loop comes from the host callback, not from Bedrock. A host can call Squidie.execute_next/1 once per job instead; bounded draining is just a capacity choice for this example:

flowchart LR
Bedrock["Bedrock leases payload job"] --> Callback["Host perform/2 callback"]
Callback --> Payload["Runner.perform(payload)"]
Callback --> Choice{"Host drain choice"}
Choice --> Once["execute_next/1 once"]
Choice --> Loop["bounded execute_next/1 loop"]
Once --> Journal["Squidie journal"]
Loop --> Journal
defmodule MyApp.Jobs.SquidiePayload do
use Bedrock.JobQueue.Job,
topic: "squidie:payload",
max_retries: 3,
priority: 100
alias Squidie.Runtime.Runner
def perform(payload, _meta) when is_map(payload) do
case Runner.perform(payload) do
:ok -> drain_journal_attempts("tenant_a", 0)
{:ok, _snapshot} -> drain_journal_attempts("tenant_a", 0)
{:error, reason} -> {:error, reason}
end
end
defp drain_journal_attempts(_queue, 50) do
{:error, :journal_drain_limit_exceeded}
end
defp drain_journal_attempts(queue, count) do
case Squidie.execute_next(
queue: queue,
owner_id: "my-app-bedrock-worker",
heartbeat_interval_ms: 10_000
) do
{:ok, :none} -> :ok
{:ok, _snapshot} -> drain_journal_attempts(queue, count + 1)
{:error, reason} -> {:error, reason}
end
end
end

6. Configure Both Lease Layers

The Bedrock lease protects job delivery. The Squidie heartbeat protects the workflow attempt claimed by execute_next/1:

config :my_app, MyApp.Jobs.SquidiePayload,
journal_heartbeat_interval_ms: 10_000,
max_journal_attempts: 50

Do not enqueue one Bedrock job per workflow step, and do not model workflow step retries as Bedrock job retries. A normal step failure, retry, or terminal run is durable Squidie state returned by Squidie.execute_next/1.

Treat {:ok, snapshot} from execute_next/1 as successful host-worker progress even when the snapshot describes a failed workflow run. Return {:error, reason} to Bedrock only when payload delivery or the host drain itself failed and should be redelivered.

For the concrete setup, see Bedrock Lease Backend Setup and the Bedrock Minimal Host App.

Workflows

Workflows are Elixir modules. A trigger declares the entrypoint and validates the payload before the run is persisted.

Steps declare their inputs, outputs, retry policy, and compensation behavior. Transitions wire them together.

This workflow demonstrates manual gates, approval flows, conditional routing, retries, saga compensation, and irreversible steps:

defmodule MiddleEarth.Workflows.RingErrand do
use Squidie.Workflow
workflow do
trigger :leave_shire do
manual()
payload do
field :bearer, :string, default: "Frodo"
field :ring_id, :string
field :route_preference, :string, default: "moria"
end
end
step :pack_provisions, Hobbiton.Steps.PackProvisions,
output: :provisions
step :hide_at_prancing_pony, :pause
approval_step :council_vote,
output: :council,
deadline: [within: 300_000, due_soon: 60_000, escalation: :operator_action]
step :choose_path, Rivendell.Steps.ChoosePath,
input: [bearer: [:bearer], decision: [:council, :decision]],
output: :route
step :cross_moria, Fellowship.Steps.CrossMoria,
input: [:bearer, :provisions, :route],
retry: [max_attempts: 3, backoff: [type: :exponential]],
deadline: [within: 30_000, due_soon: 5_000, escalation: :diagnostic]
step :reserve_eagle, Eagles.Steps.ReserveRide,
compensate: Eagles.Steps.CancelRide
step :toss_ring, Mordor.Steps.TossRing,
irreversible: true
transition :pack_provisions, on: :ok, to: :hide_at_prancing_pony
transition :hide_at_prancing_pony, on: :ok, to: :council_vote
transition :council_vote, on: :ok, to: :choose_path
transition :choose_path, on: :ok, to: :cross_moria
transition :cross_moria, on: :ok, to: :reserve_eagle
transition :cross_moria, on: :error, to: :complete, recovery: :undo
transition :reserve_eagle, on: :ok, to: :toss_ring
transition :toss_ring, on: :ok, to: :complete
end
end

Steps and approvals can declare diagnostic deadlines with deadline: [...]. Squidie persists the due timestamps in runnable and manual-control facts and surfaces evaluated states such as :on_time, :due_soon, :overdue, and :escalated through list_runs/2, inspect_run/2, inspect_run_graph/2, and explain_run/2. Alert delivery, paging, and operator escalation remain host-owned; the runtime only records durable deadline evidence and safe next actions.

Cron-triggered workflows use scheduling declarations:

defmodule Gondor.Workflows.BeaconWatch do
use Squidie.Workflow
workflow do
trigger :nightly_beacon_check do
cron "0 21 * * *", timezone: "Etc/UTC"
payload do
field :beacon_count, :integer, default: 7
end
end
step :inspect_hilltops, Gondor.Steps.InspectHilltops,
retry: [max_attempts: 3]
step :light_beacon, Gondor.Steps.LightBeacon,
compensate: Gondor.Steps.ExtinguishBeacon
transition :inspect_hilltops, on: :ok, to: :light_beacon
transition :light_beacon, on: :ok, to: :complete
end
end

Dependency-based workflows use after: [...] for parallel execution:

defmodule Gondor.Workflows.ParallelAttack do
use Squidie.Workflow
workflow do
trigger :start do
manual()
end
step :march_to_gate, Gondor.Steps.MarchToGate
step :rally_rohan, Rohan.Steps.RallyArmy
step :distract_sauron, Fellowship.Steps.DistractEnemy
step :declare_victory, Gondor.Steps.DeclareVictory,
after: [:march_to_gate, :rally_rohan, :distract_sauron]
end
end

Running Workflows

Start a workflow run:

{:ok, run} =
Squidie.start(
MiddleEarth.Workflows.RingErrand,
:leave_shire,
%{ring_id: "one-ring"}
)

Inspect a run with full history:

Squidie.inspect_run(run.run_id, include_history: true)

Get an operator-facing explanation:

{:ok, explanation} = Squidie.explain_run(run.run_id)
explanation.reason #=> :waiting_for_retry
explanation.evidence.command_counts #=> %{"start_run" => 1, "cancel_run" => 2}

The explain_run/2 function summarizes the current state, valid next actions, and supporting evidence for dashboards and operational tooling.

Approvals and Manual Gates

Pause steps and approval steps block progression until explicitly resolved:

# Resume a paused step
Squidie.resume(run.run_id, %{actor: "strider", reason: "ready to proceed"})
# Approve or reject an approval gate
Squidie.approve(run.run_id, %{actor: "elrond", note: "approved"})
Squidie.reject(run.run_id, %{actor: "elrond", note: "rejected"})

For idempotent command delivery, use explicit runtime signals:

alias Squidie.Runtime.Signal
{:ok, signal} =
Signal.approve_run(run.run_id, %{actor: "elrond", note: "approved"},
idempotency_key: "approval-#{run.run_id}"
)
{:ok, approved_run} = Squidie.apply_signal(signal)

Reusing an idempotency key returns the existing result without creating duplicate command receipts. Approval steps persist their resolved targets and output metadata, surviving deploys and restarts.

Compensation and Recovery

Workflow authors can mark completed side effects as compensatable so operators and host tools can see the rollback contract when later work fails:

step :borrow_rope, Lothlorien.Steps.BorrowRope,
compensate: Lothlorien.Steps.ReturnRope
step :reserve_eagle, Eagles.Steps.ReserveRide,
compensate: Eagles.Steps.CancelRide
step :cross_moria, Fellowship.Steps.CrossMoria,
retry: [max_attempts: 3]

A failed :cross_moria exposes the completed compensatable steps and their declared callbacks through inspect_run/2, inspect_run_graph/2, and explain_run/2. The callback metadata is persisted with each runnable so dashboards can show rollback availability even if the workflow module changes.

For side effects that cannot be reversed, mark steps as irreversible: true or compensatable: false. Squidie exposes these boundaries during inspection and blocks replay by default after irreversible execution.

Child Workflows

Steps can spawn child workflow runs for dynamic work expansion:

defmodule Hobbiton.Steps.SendInvites do
use Squidie.Step, name: :send_invites
@impl true
def run(%{party_id: party_id, guests: guests}, %Squidie.Step.Context{} = context) do
children =
for guest <- guests do
{:ok, child} =
Squidie.start_child_run(
context,
Hobbiton.Workflows.DeliverInvite,
%{party_id: party_id, guest_id: guest.id},
child_key: "invite_#{guest.id}"
)
child.run_id
end
{:ok, %{child_run_ids: children}}
end
end

Each child run has independent inspection, retry, replay, and cancellation. Repeating the same child_key returns the existing child instead of creating duplicates.

Inspectable Dynamic Work

Host code can preview, record, or schedule bounded dynamic work for an active run. Preview is read-only, record persists inspection metadata, and schedule persists the same dynamic-work fact while planning executable runnable intents:

registry = %{"digest.deliver" => MyApp.Steps.DeliverDigest}
{:ok, preview} =
Squidie.preview_dynamic_work(
run.run_id,
%{
dynamic_key: "subscription_digest_fanout",
origin: %{
runnable_key: "run_123:schedule_digest:1",
step: "schedule_digest",
attempt: 1
},
reason: :runtime_fanout,
nodes: [
%{id: "deliver_digest:chat_1", action: "digest.deliver"}
]
},
action_registry: registry
)
preview.origin_node_id
preview.added_node_ids
preview.added_edge_ids
preview.recordable?
preview.graph.nodes

After previewing, choose one durable write path. Use record_dynamic_work/3 when the dynamic structure should be inspectable only:

{:ok, snapshot} =
Squidie.record_dynamic_work(
run.run_id,
%{
dynamic_key: "subscription_digest_fanout",
origin: %{
runnable_key: "run_123:schedule_digest:1",
step: "schedule_digest",
attempt: 1
},
reason: :runtime_fanout,
nodes: [
%{id: "deliver_digest:chat_1", action: "digest.deliver"}
]
},
action_registry: registry
)

Use schedule_dynamic_work/3 instead when the dynamic nodes should execute:

{:ok, snapshot} =
Squidie.schedule_dynamic_work(
run.run_id,
%{
dynamic_key: "subscription_digest_fanout",
origin: %{
runnable_key: "run_123:schedule_digest:1",
step: "schedule_digest",
attempt: 1
},
reason: :runtime_fanout,
nodes: [
%{
id: "deliver_digest:chat_1",
action: "digest.deliver",
input: %{subscription_id: "sub_123"}
}
]
},
action_registry: registry
)

Think of dynamic work as a late graph patch attached to an already-applied runnable. The three public calls all validate the same proposal; they differ in how much of that proposal becomes durable.

CallJournal writeRunnable workBest fit
preview_dynamic_work/3NoneNoneShow the proposed graph change before committing it
record_dynamic_work/3Inspection factNoneMake generated structure visible to operators and dashboards
schedule_dynamic_work/3Inspection fact and runnable intentsYesAdd executable dynamic nodes to the run
flowchart LR
Origin[Applied origin runnable] --> Proposal[Dynamic work proposal]
Proposal --> Preview[preview_dynamic_work/3]
Proposal --> Record[record_dynamic_work/3]
Proposal --> Schedule[schedule_dynamic_work/3]
Preview --> Overlay[Graph overlay]
Record --> Fact[Durable inspection fact]
Schedule --> Fact
Schedule --> Intents[Runnable intents]
Intents --> Executor[execute_next/1]

Every proposal is checked against the current run snapshot:

RuleWhy it matters
Stable dynamic_key, node ids, and optional edge idsPrevents duplicate or drifting graph patches
Origin metadata with runnable key, step, and attemptTies the patch to the work that produced it
Applied origin runnable for schedulingPrevents executable work from appearing before its producer finished
:action_registry for schedulingKeeps executable action keys behind a host-owned allowlist
Terminal run rejectionKeeps completed runs closed to new work

Preview returns normalized dynamic work plus a graph overlay. Visual editors get stable metadata from that overlay: producer node id, added node ids, added edge ids, whether recording would append a durable fact, and warnings such as duplicate dynamic work.

Recording and scheduling are alternatives, not a promotion flow. Recording stores only the inspection fact. Scheduling stores that fact and the runnable intents in one run-thread write; the normal execute_next/1 path then claims, executes, retries, applies, and inspects the dynamic attempts.

Executable dynamic nodes must use approved action keys and may include an input map for the attempt. They can opt into persisted retry with retry: [max_attempts: n]. Dynamic edges are graph-inspection metadata for now; scheduled dynamic nodes are queued as independent runnable intents.

Dynamic steps are replay-unsafe by default and require manual review before irreversible replay. Scheduling an already-recorded node with the same id is rejected by duplicate-node validation.

inspect_run_graph/2 also exposes dynamic_work_overlays so dashboards and visual editors can show producer nodes, added node ids, and added edge ids without reconstructing them from raw dynamic-work records.

Long-Running Steps

Workers can ask the journal executor to renew the active claim while a step is running:

Squidie.execute_next(
owner_id: "billing-worker-1",
lease_for: 30,
heartbeat_interval_ms: 10_000
)

The executor keeps raw claim tokens internal. Durable heartbeat entries store only the claim-token hash and are fenced by the same claim id and token used for completion or failure. The minimum heartbeat interval is 50ms; production workers should choose a much larger interval relative to lease_for.

Runtime-Authored Specs

Host-owned editors or databases can activate validated workflow specs without runtime code generation. Use stable action keys, resolve them through an allowlist, then start the resolved spec through the public API:

registry = %{"digest.record_delivery" => MyApp.Steps.RecordDigestDelivery}
:ok = Squidie.Workflow.validate_spec(spec, action_registry: registry)
{:ok, run} =
Squidie.start_spec(spec, :manual_digest, payload,
action_registry: registry
)

Squidie persists the resolved definition with the run so workers and inspect_run_graph/2 can inspect and execute it later. Replay for runtime-authored spec runs is intentionally rejected until that lifecycle is supported.

Visual-editor JSON can use the same host-owned action allowlist before a draft graph with top-level action keys is accepted:

:ok = Squidie.Workflow.EditorSpec.validate_map(editor_map, action_registry: registry)
{:ok, graph} = Squidie.Workflow.EditorSpec.preview_graph(editor_map, action_registry: registry)
{:ok, diff} = Squidie.Workflow.EditorSpec.diff(source_spec, editor_map, action_registry: registry)

These editor APIs still validate, preview, and compare data only. Starting a runtime-authored run remains the separate start_spec/3 or start_spec/4 boundary.

Cancellation, Replay, and Listing

{:ok, running_runs} = Squidie.list_runs(status: :running)
{:ok, _} = Squidie.cancel(run.run_id)
{:ok, _} = Squidie.replay(run.run_id)
# Replay past irreversible steps requires an explicit override
{:ok, _} = Squidie.replay(run.run_id, allow_irreversible: true)

Graph Inspection

Inspect the workflow graph with execution state:

{:ok, graph} = Squidie.inspect_run_graph(run.run_id)
graph
|> Squidie.Runs.GraphInspection.to_map()
|> Map.take([:status, :current_node_ids, :nodes, :edges])

The graph includes nodes, edges, and the selected transition path for conditional routing. Nested workflow starts stay as separate runs; parent graph maps include child_links so dashboards and visual editors can render subflow links without treating child workflows as inline executable nodes.

Node Visibility and Redaction

Graph nodes can include host-domain inputs, outputs, errors, manual metadata, and dynamic-work metadata. By default, inspect_run_graph/2 omits detailed payload fields; request include_history: true only for trusted operator surfaces.

Before exposing graph payloads outside a trusted boundary, apply a host-owned visibility policy:

{:ok, graph} = Squidie.inspect_run_graph(run.run_id, include_history: true)
{:ok, visible_graph} =
Squidie.ReadModel.Visibility.redact(graph, current_actor, MyApp.VisibilityPolicy)

External/operator views preserve node ids, status, current state, recovery availability, dynamic-work shape, and safe edge topology while removing node payloads, errors, attempt details, command history, and sensitive metadata.

Actor Visibility

Squidie provides built-in support for actor-scoped visibility to safely expose workflow data to different users. The runtime tracks actor information in manual actions and provides flexible redaction policies:

# Define a visibility policy
defmodule MyApp.VisibilityPolicy do
@behaviour Squidie.ReadModel.Visibility.Policy
def visibility_scope(actor, _view) do
cond do
actor.role == "admin" -> :auditor # Full access
actor.role == "support" -> :operator # Operational details
true -> :external # Minimal information
end
end
end
# Apply redaction at API boundaries
{:ok, snapshot} = Squidie.inspect(run_id)
safe_view = Squidie.ReadModel.Visibility.redact(snapshot, current_user, MyApp.VisibilityPolicy)

The three standard scopes provide appropriate data access:

See the Actor Visibility Guide for comprehensive documentation on implementing multi-tenant access patterns, role-based visibility, and security best practices.

Optional Dashboard

SquidSonar is the optional read-only Phoenix LiveView dashboard for Squidie. Mount it inside a Phoenix host application to inspect recent runs, filter by status, search runtime metadata, and view run detail pages with diagnosis, history counts, last error information, and workflow graph visualization.

Contributing

Please review the existing runtime model and workflow semantics before proposing substantial changes. Contributions are most welcome in: runtime reliability, workflow ergonomics, inspection tooling, recovery semantics, documentation improvements, backend integrations, and executable examples.

License

Copyright 2024, released under the Apache 2.0 License.