Lockstep
Coyote-style controlled concurrency testing for the BEAM.
You write what looks like a regular ExUnit test. Lockstep runs it many times, each time picking a different message-passing schedule via a configurable strategy. When it finds a bug, it prints the schedule that caused it and saves it to disk so you can reproduce.
Contents
- Installation
- Using Lockstep with AI tools
- Need this on your system?
- Why bother
- What's shipping
- What's not shipping yet
- Quick start
- Showcase: bug classes Lockstep finds
- Real libraries verified under Lockstep
- Configuration
- Testing methodology
- API
- Reproducing a bug
- Shrinking a trace
- Long runs and progress
- Bug-finding regression suite
- Design
- License
Installation
Add to your mix.exs:
def deps do
[
{:lockstep, "~> 0.1.0", only: :test}
]
endStatus: 0.1.0 / initial release. API is stable for the surface documented here, but expect breakage as we tighten things and respond to real-world usage. SemVer applies starting at 1.0.
Using Lockstep with AI tools
Lockstep ships with first-class AI integration:
AGENTS.md— guide for AI coding agents (Claude Code, Cursor, Copilot) to invoke Lockstep via the CLI. Documents the input/output contract formix test/mix lockstep.replay/mix lockstep.shrink.skill/lockstep/SKILL.md— Anthropic Skill format. Loadable by the Anthropic Agent SDK so agents understand when and how to use Lockstep.mcp_server/— Model Context Protocol server. Wrapsmixcommands as MCP tools (lockstep_test,lockstep_replay,lockstep_shrink,lockstep_dump_trace). Plug into Claude Code viaclaude_desktop_config.json.
The killer agentic workflow:
- Agent reads code, hypothesizes a race
- Agent generates a Lockstep test file
-
Agent calls
lockstep_test(MCP) — Lockstep runs N iterations -
If a bug is found, agent reads the trace via
lockstep_replay - Agent generates a fix
-
Agent re-runs
lockstep_testwith the same seed to verify
Need this on your system?
If you maintain a high-stakes BEAM codebase — a payment processor, a distributed coordinator, a high-throughput pipeline, a custom GenServer that must be linearizable — and you want a thorough concurrency audit rather than a one-off bug report, I take on this work directly.
That includes: applying Lockstep to your code, writing custom test scaffolds and models, running multi-node race-hunting campaigns, hardening hot paths, and delivering a written report with reproducible counterexamples and fixes.
Why bother
Concrete numbers from
test/value_prop/lost_update_comparison_test.exs
running the same lost-update race two ways: ordinary GenServer.call
in a tight loop (200 runs) vs Lockstep.GenServer.call under PCT
(5 seeds × up to 200 iterations).
Vanilla GenServer.call × 200 | Lockstep PCT × 5 seeds | |
|---|---|---|
| Bug detected | 198/200 runs (99%) | 5/5 seeds (100%) |
| Time to first detection | ~3.6 ms wall-clock | iterations 1, 2, 2, 4, 15 |
| Failing schedule recorded | No |
Yes (.lockstep file) |
| Failing schedule prints in human form | No | Yes (alias-mapped trace) |
| Same input → same failing schedule | No | Yes (same seed → same iteration → identical trace) |
| Replay on demand | No |
Yes (Lockstep.Replay.run/2) |
Vanilla testing finds this bug just fine — running the body in a tight loop catches it 99% of the time. What it can't do is hand you a specific failing schedule that you can stare at, save, debug, and replay. Each failing run is an opaque "the counter ended at 1 instead of 2" with no record of which interleaving caused it.
Lockstep's value isn't finding races — multi-core BEAM finds them too, eventually. It's giving you a reproducible handle to a specific failing schedule:
- Same seed → same iteration → byte-identical trace. Three runs of
the lost-update test with
seed: 0xC0DE_C0FFEEall find the bug at iteration 2 with structurally identical traces. - Trace prints as a human-readable schedule with aliased pids
(
P0,P1, ...), step numbers, and a<-- FAILED HEREmarker. - Replay re-fires the bug through
Lockstep.Replay.run/2so you can attach a debugger / addIO.inspect/ step through.
Vanilla assert_raise in a loop tells you the test is flaky.
Lockstep tells you which interleaving makes it flaky.
Which strategy?
Lockstep ships five strategies (:random, :pct, :fair_pct,
:pos, :idpct) plus :replay for trace-driven reruns. They
behave noticeably differently as race depth grows. From
test/value_prop/strategy_comparison_test.exs
on a 3-subscribe + 1-publish ordering race, 10 seeds × up to 100
iterations each:
| strategy | bug found | min iter | median | mean | max |
|---|---|---|---|---|---|
:random | 10/10 | 1 | 3.0 | 3.5 | 7 |
:pct | 10/10 | 1 | 1.0 | 1.4 | 2 |
:fair_pct | 10/10 | 1 | 1.0 | 1.4 | 2 |
:pos | 10/10 | 1 | 1.5 | 1.8 | 3 |
Lower iteration counts mean the strategy steered toward the bug faster. PCT and Fair-PCT find this race in 1-2 iterations every time; random takes ~3.5 on average. POS sits between them. On shallower races (the depth-2 lost-update demo) all four strategies tie at iteration 1 — the differentiation only shows up once the race has real depth.
Default is :pct because it gives the most consistent low-iteration
finds across both shallow and deep races. Switch to :fair_pct if
your code has spin loops where pure PCT can starve a low-priority
process; switch to :pos if PCT plateaus on a particular bug shape.
What's shipping
Engine
Lockstep.Test— ExUnit case template with actest/3macro that runs the body N times.-
A user-level scheduler (
Lockstep.Controller) that intercepts every Lockstep sync point and decides who runs next. -
Five strategies:
:random,:pct,:fair_pct,:pos,:idpct.- PCT is Burckhardt et al., ASPLOS 2010 — bug-depth probabilistic guarantees.
- Fair-PCT is Coyote's hybrid: PCT then random, for liveness.
- POS is Yuan et al., OOPSLA 2025 (Fray) — re-randomizes priorities each step.
- IDPCT is iterative-deepening PCT — cycles depth in a configured range per iteration.
-
Trace-driven replay:
:replaystrategy +Lockstep.Replay.run/2.
Test surface
- Selective receive via
Lockstep.recv_first/1— predicate-based pattern matching over the controller-side mailbox. Lockstep.GenServer—start_link/2/call/2/cast/2/stop/2wrapper for ordinary GenServer-style modules. Returns{:ok, pid}so OTP-shaped patterns ({:ok, srv} = ...) work identically after rewriting.Lockstep.GenStatem— minimal:gen_statem-shaped wrapper (handle_event_functionmode +:replyactions).Lockstep.Task—async/1/await/1/await_many/1.- Virtual time —
Lockstep.now/0,Lockstep.send_after/3,Lockstep.cancel_timer/1. Time advances only when no managed process is ready and a timer is the only way forward. - Process monitors —
Lockstep.monitor/1,Lockstep.demonitor/2, with:DOWNmessages delivered through the controller's mailbox so monitor-based crash detection (NimblePool, Phoenix.PubSub, supervisor patterns) works under controlled scheduling. Monitored crashes don't trigger the child-crash bug verdict — the monitor is treated as "I expected this death." - Links +
:trap_exit—Lockstep.spawn_link/1,Lockstep.link/1,Lockstep.unlink/1,Lockstep.flag/2. The controller maintains the link graph and propagates abnormal exits with cycle detection: non-trapping linkers cascade-die, trapping linkers receive{:EXIT, dying_pid, reason}, normal exits don't propagate. Lets Lockstep run real OTP code that depends on link semantics (NimblePool's worker monitoring, Task supervision, GenServer trees). - Process.alive? —
Lockstep.alive?/1consults the controller's view; the call is itself a sync point so the classicif Process.alive?(pid), do: GenServer.call(pid, ...)TOCTOU race surfaces (seetest/examples/toctou_alive_test.exs— Lockstep finds it on iteration 3 under PCT). - Registry —
Lockstep.Registrywith:unique/:duplicatekeys, monitor-driven automatic cleanup on registering-process death,dispatch/3broadcast, and the full:viacallback contract soLockstep.GenServer.start_link(M, A, name: {:via, Lockstep.Registry, {reg, :foo}})works. Most-common-subset implementation; partitioning, listeners, andmatch/4patterns aren't modeled. - Supervisor —
Lockstep.Supervisor,:one_for_onestrategy, built onspawn_link+trap_exit. Restart options:permanent,:transient,:temporary. Enforcesmax_restarts/max_secondsintensity using Lockstep's virtual clock so restart-loop limits are reproducible across replays. - Compile-time linter (
Lockstep.Linter) — warns on baresend/receive/spawn, plainGenServer.call/Task.async/Process.send_after, and ExUnitassert_receive/refute_receiveinsidectestbodies. - Compile-time rewriter (
Lockstep.Rewriter) — automatically maps vanillaGenServer.call,Task.async,send,spawn,Process.send_after, andreceiveblocks to theirLockstep.*equivalents inside a test body. Lets you write vanilla-OTP-shaped tests and still get controlled scheduling. Enable per-module viause Lockstep.Test, rewrite: true, or per-test viaLockstep.Test.vanilla_run/2. - Mix compiler (
Mix.Tasks.Compile.LockstepRewrite) — applies the same rewriter at the project level: everylib/**/*.exis rewritten to a build directory before the standard Elixir compiler runs. Lets you point Lockstep at unmodified production modules in your own application. Opt in via env-specificcompilers:/elixirc_paths:settings (see "Project-level rewriting" below). - Erlang AST rewriter (
Lockstep.ErlangRewriter) — same vanilla-OTP-routing trick, but on.erlsource. Handlesgen_server:call,erlang:spawn,Pid ! Msg, barereceive,process_flag(:trap_exit, ...), the bare BIF auto-imports (spawn(F)), and 4-arggen_server:start_link({local, Name}, ...)atom registration. Lets us pull in foundational Erlang libraries whosegen_servermachinery is the actual concurrency surface (:pg, supervisors,gen_statem,:ssl's session manager, etc.). Demonstrated on OTP'spg.erl(~720 LOC of pure Erlang, the foundation of Phoenix.PubSub) — compiles through the rewriter and runs under Lockstep's controlled scheduling. Lockstep.Agent— drop-in for OTPAgent. Without it,Agent.get/3-4bypasses Lockstep's controller entirely (it's a bare:gen_server.call); with it, every agent operation is a sync point. Necessary for finding races inAgent-backed code.Lockstep.Task/Lockstep.Task.Supervisor— extended to coverTask.async_stream/3,4(honouring:max_concurrency),Task.async/3(MFA form), andTask.Supervisor.{start_link, async, start_child, async_stream}.- NIF-backed shared state (
Lockstep.ETS,Lockstep.Atomics,Lockstep.PersistentTerm) — ETS /:atomics/:persistent_termops are individually atomic but compose racefully (the classiclookup; insert(v+1)lost-update). Each wrapper inserts a sync point before delegating to the underlying NIF, so the strategy can interleave between calls. The rewriter routes:ets.*,:atomics.*,:persistent_term.*calls through these wrappers. Outside a Lockstep test (no controller in scope) the wrappers fall through to the underlying NIF — same code runs identically in production.
Tooling
-
Trace capture as
:erlang.term_to_binaryto.lockstepfiles. mix lockstep.replay --trace path— print the saved schedule + repro recipe.mix lockstep.replay --trace path --run "Mod.fn" --file path.exs— re-run a specific test body under the recorded schedule.mix lockstep.dump_trace --trace path— raw dump for tooling.- Deterministic seeds: same top-level seed reproduces the same iterations.
What's not shipping yet
- No DPOR / exhaustive search. Lockstep is a randomized tester, not a verifier. Concuerror integration is on the roadmap.
- No distributed Erlang. Single-node only.
- No
async: trueExUnit modules.use Lockstep.Testforcesasync: false. - No live iteration progress. A long-running
iterations: 10_000run is silent until it either finishes or finds a bug. Lockstep.GenStatemishandle_event_function-mode only;:state_functionsmode and built-in timeout actions aren't wrapped.
Quick start
# mix.exs
{:lockstep, "~> 0.1.0", only: :test}Write a controlled test:
defmodule MyApp.RaceTest do
use Lockstep.Test, iterations: 200, strategy: :pct
ctest "lost update on shared counter" do
parent = self()
state = Lockstep.spawn(fn -> stateful_loop(0) end)
for _ <- 1..2 do
Lockstep.spawn(fn ->
Lockstep.send(state, {:get, self()})
v = Lockstep.recv()
Lockstep.send(state, {:put, v + 1})
Lockstep.send(parent, :done)
end)
end
for _ <- 1..2, do: Lockstep.recv()
Lockstep.send(state, {:get, self()})
assert Lockstep.recv() == 2
end
defp stateful_loop(n) do
case Lockstep.recv() do
{:get, from} -> Lockstep.send(from, n); stateful_loop(n)
{:put, x} -> stateful_loop(x)
end
end
endRun it:
mix test test/my_app/race_test.exsExpected output on a buggy schedule:
Lockstep found a concurrency bug on iteration 2.
seed: 828370911214
iter seed: 1417509209
strategy: :pct
trace path: traces/counter_race-iter2-seed828370911214.lockstep
Reason:
assertion failed: lost update detected; counter is 1
Processes:
P0(root) = #PID<0.186.0>
P1 = #PID<0.187.0>
...
Schedule:
step 6 send P3 -> P1 {:get, ...}
step 8 send P2 -> P1 {:get, ...}
step 11 recv P3 0
step 18 recv P2 0 # both workers read the same value!
...
Replay with:
mix lockstep.replay --trace traces/counter_race-iter2-seed828370911214.lockstepShowcase: bug classes Lockstep finds
The test/examples/ directory contains worked demos. Each is a
side-by-side of a buggy implementation that Lockstep flags within tens
of iterations and a fixed implementation that it explores at length
without finding a problem.
Atomicity violation: lost-uniqueness in a job queue
test/examples/unique_jobs_test.exs.
A textbook bug from the Elixir job-queue ecosystem: naive submit_unique
does a lookup then an insert, which under concurrency lets two
callers both observe the key as missing and both insert. DB-backed
queues have hit this exact race historically; the fix is always
atomic upsert / INSERT ... ON CONFLICT.
# Buggy: split read+write — Lockstep finds duplicate submissions on iter ~15.
case Lockstep.GenServer.call(db, {:lookup, key}) do
nil -> Lockstep.GenServer.call(db, {:insert, key, val}); :submitted
_ -> :already_exists
end
# Fixed: single atomic call serialized by the GenServer — 100/100 iters pass.
Lockstep.GenServer.call(db, {:put_new, key, val})
The trace pinpoints the failure: both workers call :lookup (steps 6
and 8), both receive nil (steps 11 and 15), both call :insert.
Atomicity violation: read-modify-write on shared state
test/examples/genserver_race_test.exs.
Two workers each do Counter.get then Counter.put(val + 1). With
N=2 workers expecting the counter to reach 2, some interleavings let
both reads happen first, both writes set the counter to 1, and one
update is lost. PCT finds it on iteration 4.
The control test in the same file uses cast({:add, 1}) instead — the
GenServer serializes the read-modify-write — and Lockstep proves it
race-free across all explored schedules.
Deadlock: circular GenServer calls
test/examples/circular_call_deadlock_test.exs.
Two Echo GenServers wired up as peers. Each one's handle_call(:echo, ...) synchronously delegates to its peer. Two concurrent callers — one
to A, one to B — and both servers end up mid-handle_call waiting for
the other to reply. Classic A-waits-for-B-waits-for-A.
Lockstep's check_progress step notices that every alive process is
blocked on recv_match with no pending timer that could unblock anyone,
and aborts the iteration with :lockstep_deadlock. The trace shows the
exact sequence of gen_calls that closed the cycle.
This bug class is hard to catch with property-based testing because it needs a specific interleaving (both callers in flight before either peer-call returns) — Lockstep's PCT strategy finds it on iteration 1.
Auto-rewrite: vanilla OTP code, controlled scheduling
test/examples/auto_rewrite_demo_test.exs.
Same lost-update race, written entirely with vanilla GenServer.call,
Task.async, Task.await_many — no Lockstep.* calls in the test
body. The Lockstep.Test.vanilla_run/2 macro walks the body's AST
and rewrites every recognized OTP call to its Lockstep equivalent
before handing it to the runner.
test "auto-rewrite finds the race in vanilla OTP code" do
assert_raise Lockstep.BugFound, fn ->
Lockstep.Test.vanilla_run iterations: 100, strategy: :pct, seed: 1 do
{:ok, c} = GenServer.start_link(Counter, 0)
tasks =
for _ <- 1..2 do
Task.async(fn ->
v = GenServer.call(c, :get)
:ok = GenServer.call(c, {:set, v + 1})
end)
end
Task.await_many(tasks)
final = GenServer.call(c, :get)
if final != 2, do: raise "lost update; counter is #{final}"
end
end
end
Lockstep finds the race on iteration 2 with the same trace it would
have produced for the manually-written equivalent. Useful when you
want test bodies to read like ordinary Elixir, when you're porting
an existing flaky test to Lockstep, or when you want to demonstrate a
race in a vanilla snippet without forcing readers to learn the
Lockstep.send/Lockstep.recv API up-front.
The rewriter handles GenServer.{call, cast, start_link, stop},
Task.{async, await, await_many}, send, Kernel.send,
:erlang.send, spawn, spawn_link, Process.send_after,
Process.cancel_timer, and receive do clauses end (translates to
Lockstep.recv_first + case). Helper functions defined outside the
test body are not rewritten by vanilla_run/ctest rewrite: true
— inline race code into the body, or use the project-level Mix
compiler below to rewrite an entire lib/.
Project-level rewriting (Mix compiler)
To test unmodified modules in your lib/ (or in a dep) without
changing source, plug Lockstep into your test compile pipeline:
# mix.exs
def project, do: [
...
compilers: compilers(Mix.env()),
elixirc_paths: elixirc_paths(Mix.env()),
lockstep_rewrite: lockstep_rewrite(Mix.env()),
]
defp compilers(:test), do: [:lockstep_rewrite] ++ Mix.compilers()
defp compilers(_), do: Mix.compilers()
defp elixirc_paths(:test), do: ["_build/test/lockstep_rewritten/lib", "test/support"]
defp elixirc_paths(_), do: ["lib"]
defp lockstep_rewrite(:test), do: %{paths: ["lib/**/*.ex"]}
defp lockstep_rewrite(_), do: nil
In :test, every matching .ex is rewritten under
_build/test/lockstep_rewritten/lib/ and the standard Elixir
compiler picks it up from there. Production / dev compilation is
untouched — no Lockstep.* runtime cost.
The same machinery works for dep code: include "deps/<dep>/lib/**/*.ex"
in the paths: list. The rewriter handles GenServer.{call, cast, start_link, stop, reply}, Task.{async, await, await_many},
send, spawn, spawn_link, Process.{send_after, cancel_timer, monitor, demonitor}, and receive blocks. Process.flag(:trap_exit, …)
and :erlang.spawn/3 (MFA form) are left alone.
End-to-end NimblePool dep-rewrite demo:
test/phase2_dep_rewrite_test.exs.
CubDB historical race (commit 60fb38f)
test/examples/cubdb_compaction_race_test.exs.
Models the race upstream CubDB fixed in
commit 60fb38f
("Fix race condition during compaction"). Quoting the maintainer:
The compaction process actually happens in two phases: actual compaction, and catch-up. The previous implementation was starting the catch-up phase asynchronously, introducing a race condition: right after actual compaction completed, and before catch-up started, there was the possibility to erroneously start another concurrent compaction.
The shape: a multi-phase operation guarded by a busy flag. Phase 1
completion clears busy AND queues phase 2 as a separate message.
Between those two events a new operation passes the busy? check and
starts a concurrent compaction. Phase 2 then sees state overwritten
by the second compaction.
Detection in our model: each compaction has a unique ID, written to
state.current_id on start_compaction. Phase 2 verifies the ID it
holds matches state.current_id; if a second compaction has run in
between and overwritten the field, the assertion fires.
Sweep result over 4 strategies × 5 seeds × 200 iterations each:
| strategy | finds the race | typical iter |
|---|---|---|
:pos | 5/5 seeds | 1, 1, 3, 5, 6 |
:random | 5/5 seeds | 1, 3, 3, 4, 9 |
:fair_pct | 1/5 seeds | 53 |
:pct | 0/5 seeds | n/a |
PCT's bug-depth heuristic happens to bias away from this particular race shape — the change-point insertion at d-1 random steps doesn't align with the phase-1/phase-2 mailbox window. POS's per-step priority resampling finds it consistently. A real consequence of having multiple strategies bundled.
The fix in the model (and upstream): keep busy=true until BOTH
phases complete. The intermediate mailbox state — phase 2 queued but
not run — no longer admits a new :start_compaction. Verified
race-free across the same 200 iterations × :pct.
Honeydew historical race (commit 41831c8)
test/examples/honeydew_status_race_test.exs.
Models the race upstream Honeydew fixed in
commit 41831c8:
Fixing race condition with Honeydew.status/1 where monitors could die before being queried for their current job.
The original buggy pattern:
busy_workers =
queue
|> get_all_members(:monitors)
|> Enum.map(&GenServer.call(&1, :current_job))
Race: between get_all_members/2 returning a list of monitor pids
and GenServer.call/2 reaching each one, a monitor can exit. The
call raises :exit, crashing the entire status/1 invocation.
We can't compile and run upstream Honeydew under Lockstep — it's a
full OTP application with :pg, supervisors, and Mnesia plumbing
beyond our rewriter's reach — but the race shape is universal.
The minimal reproduction (3 monitors, one killed concurrently with
the iteration) fails on iteration 2 under PCT. The fixed pattern
(try/catch :exit, _ around each call) passes 200 iterations.
The point isn't that we found this bug — the maintainer fixed it in
-
The point is that the same pattern in any monitor-based
library would surface under Lockstep, and the failing schedule is
a one-line change in user code (the missing
try/catch) away from a clean test.
Real bug found: Phoenix.Tracker's pre-fc5686f ETS race
test/phoenix_tracker_prefix_race_test.exs.
Lockstep finds the race fixed upstream in
fc5686f
("Update ETS atomically on tracker update").
Pre-fix Tracker.update did State.leave then put_presence —
two ETS writes back-to-back. A concurrent Tracker.list could
observe the table in between, with the entry temporarily missing.
This is exactly the kind of race that's invisible at the
message-passing level — it lives at the ETS NIF level.
With Lockstep.ETS providing sync points at every ETS call,
the strategy can interleave Tracker.list between the leave and
the put. POS finds it on iteration 1. The whole-of-Tracker
plus Phoenix.PubSub plus rewritten OTP :pg (~3,700 LOC) runs
unmodified, with sync points injected at NIF boundaries by our
rewriter.
Real bug found in NimblePool's history
Lockstep finds a real race in NimblePool's
checkout! timeout handling, fixed upstream in commit
e18f45f
("Cancel requests on client timeout").
test/nimble_pool_timeout_race_test.exs
takes the pre-fix NimblePool source (just one commit before the
fix), compiles it through Lockstep.MixCompiler, and runs a
realistic scenario:
- Capacity-1 pool
- Holder client checks out and refuses to release
-
Waiter client calls
checkout!(pool, ..., 50ms)and usestry/catch :exit, _to handle the deadline (real production pattern) - Waiter's timeout fires
- Holder is then told to release
Under wall-clock testing this race rarely surfaces — it depends on the scheduler interleaving the timeout and the holder's release in a particular way. Lockstep's PCT strategy finds it on iteration 1, producing the exit:
child crashed: exit(:unexpected_remove)
The pool process itself dies. In production, anyone calling
checkout! with a try/catch :exit, _ on this version would
intermittently bring down their entire pool.
The upstream fix has two parts: send a cancel message from the
client before exiting, and turn the :unexpected_remove crash into
a no-op for late-arriving removals. With the fix applied (current
NimblePool releases) Lockstep's same scenario runs clean.
Real-world race patterns
Beyond the libraries the rewriter handles directly, the real-world-patterns suite reproduces bug shapes seen in production BEAM systems in standalone form, exercising Lockstep's Registry / Supervisor / link / monitor support.
test/examples/real_world_patterns/cache_thundering_herd_test.exs.
The classic "lookup → miss → populate → put" anti-pattern. N
concurrent clients each see a miss, each populate, end up
overwriting each other. Lockstep finds it on iteration 1 under
PCT. The locked-populate fix runs 100 iterations clean.
test/examples/real_world_patterns/pubsub_fanout_ack_test.exs.
PubSub broadcast that waits for acks from every subscriber. If one
subscriber dies between Registry.lookup and acking, the broadcaster
blocks forever. Found on iteration 1 under PCT. The fix is
monitor-based ack accounting — treat :DOWN as a counted ack — and
runs clean across 50 iterations.
These patterns are what Lockstep.Registry, Lockstep.Supervisor,
Lockstep.spawn_link, Lockstep.flag(:trap_exit, true), and
Lockstep.alive? were built for. Each lives in the bug zoo
regression suite so future engine changes can't quietly break their
findability.
Real libraries verified under Lockstep
Beyond the bug-class showcase above, Lockstep has been pointed at production BEAM libraries — both pre-fix versions to confirm it catches known races, and current versions to gather verification evidence ("no race in N controlled schedules").
| Library | Mode | Result | Test |
|---|---|---|---|
| Hammer | rewrite + run | Real bug — Atomic.FixWindow.inc/4 lost-update vs hit/5, found under PCT | test/hammer_test.exs |
| Phoenix.Tracker (pre-fc5686f) | rewrite + run | Real bug — ETS race fixed in fc5686f; POS finds it iter 1 | test/phoenix_tracker_prefix_race_test.exs |
| NimblePool (pre-e18f45f) | rewrite + run | Real bug — checkout timeout race; PCT finds it iter 1 | test/nimble_pool_timeout_race_test.exs |
| Phoenix.PubSub (current) | rewrite + run | Verified clean — 70 PCT iterations across 3 race-hunt scenarios | test/phoenix_pubsub_race_hunt_test.exs |
| GenStage (current) | rewrite + run | Verified clean — single + fan-out pipeline scenarios | test/gen_stage_test.exs |
OTP :pg (Erlang) | Erlang rewriter + run | Verified clean — joins/broadcasts under interleaved scheduling | test/erlang_pg_test.exs |
| Mutex (lud/mutex) | rewrite + run | Verified clean — 10,000 schedules across 5 scenarios | test/real_libraries_test.exs |
| ConCache.Lock | rewrite + run | Verified clean — 4,000 schedules across 5 scenarios | test/concache_lock_hunt_test.exs |
| NimblePool port | reimplemented | Verified clean — 1,200 schedules across 6 scenarios | test/examples/nimble_pool_demo_test.exs |
| Cachex | test-only Hex dep | Linearizable — 200 iter mixed ops + 500 iter R-M-W | test/cachex_test.exs |
The two work modes:
rewrite + run—Lockstep.MixCompilerrewrites the upstream source, recompiles it, and the test drives the actual upstream API under controlled scheduling. Sync points sit at everyGenServer.call,:ets.*,Process.monitor,send,receive, etc.Erlang rewriter + run— same pattern but for.erlsource:Lockstep.ErlangRewriterwalks the Erlang abstract format and routesgen_server:call,erlang:spawn,Pid ! Msg, barereceive, etc. through the controller. Used for OTP modules like:pg,:gen_statem, supervisor internals.test-only Hex dep— the library compiles unmodified; the test wraps its API inLockstep.GenServer.call-shape sync points at the boundary. Useful when the lib is too large or macro-heavy to rewrite cleanly.
For the methodology behind picking a mode, writing the model, and sizing iterations, see METHODOLOGY.md.
Configuration
use Lockstep.Test accepts:
| Option | Default | Meaning |
|---|---|---|
:iterations | 100 | How many randomized schedules to try. |
| :strategy | :pct | :random | :pct | :fair_pct | :pos | :idpct. |
| :seed | random | Top-level RNG seed; per-iteration seeds derive from it. |
| :max_steps | 1000 | Hard limit on scheduling steps per iteration. |
| :iter_timeout | 30_000 | Hard wall-clock timeout per iteration (ms). |
Testing methodology
For applying Lockstep to a real BEAM library — the model-based
linearizability template, the compile-rewrite vs test-only-dep paths,
strategy selection, fault injection, and case studies (Cachex,
ConCache, lud/mutex, nimble_pool, Hammer) — see
METHODOLOGY.md. Includes a real bug we found in
Hammer's Hammer.Atomic.FixWindow.inc/4 (lost-update race against
hit/5).
API
The runtime API for use inside ctest bodies:
| Call | Description |
|---|---|
Lockstep.spawn/1 | Spawn a managed child process. |
Lockstep.spawn_link/1 | Spawn a managed child and bidirectionally link to it (atomic). |
Lockstep.send/2 | Send to another managed process (recorded by controller). |
Lockstep.recv/0 | Receive the next message in delivery order. |
Lockstep.recv_first/1 | Selective receive: take first message matching a predicate. |
Lockstep.now/0 | Read the controller's virtual clock. |
Lockstep.send_after/3 |
Schedule a message after delay_ms of virtual time. Returns a timer ref. |
Lockstep.cancel_timer/1 |
Cancel a pending timer; returns ms remaining or false. |
Lockstep.sleep/1 | Virtual-time sleep (sentinel timer + selective receive under the hood). |
Lockstep.monitor/1 |
Monitor a managed process; :DOWN is delivered through Lockstep's mailbox. |
Lockstep.demonitor/2 |
Stop monitoring; :flush removes any already-delivered :DOWN. |
Lockstep.alive?/1 | Consult the controller's view; falls back to vanilla for unmanaged pids. |
Lockstep.link/1 | Bidirectionally link to another managed process. |
Lockstep.unlink/1 | Remove the link. |
Lockstep.flag/2 | :trap_exit is modeled at the controller level; other flags are placeholders. |
Higher-level wrappers built on these primitives:
Lockstep.GenServer.{start_link, call, cast, stop}— test modules that follow the GenServer callback contract.start_link/3acceptsname: {:via, Module, term}.Lockstep.Task.{async, await, await_many}—Task.async-style fan-out / fan-in with selective-receive on a unique ref.Lockstep.Registry.{start_link, register, unregister, lookup, dispatch, count, keys}— controller-aware key/pid registry with:viaintegration.Lockstep.Supervisor.{start_link, which_children, count_children, start_child, terminate_child, restart_child}—:one_for_onesupervisor with permanent/transient/temporary restart semantics and intensity limits.
Every primitive is a sync point — every call hands control back to the
controller, which picks the next process to run per the configured
strategy. The send / spawn variants don't block the user-side caller
for delivery semantics; the controller records / queues the action and
resumes the caller at its next scheduling turn. recv/0 and
recv_first/1 block until the controller delivers a message.
Lockstep.alive?/1 being a sync point is the point: TOCTOU bugs
(if Process.alive?(pid), do: GenServer.call(pid, ...)) surface
naturally because the strategy may interleave another process between
the check and the subsequent call.
Lockstep.link/1 and friends model the BEAM link graph at the
controller level. When a linked process dies abnormally, the cascade
is propagated through Lockstep's link graph (with cycle detection):
non-trapping linkers die transitively, and the chain terminates either
at a flag(:trap_exit, true) process — which receives a synthetic
{:EXIT, dying_pid, reason} message — or at the test root, which
finalizes the iteration as a bug.
Selective receive
ref = make_ref()
Lockstep.send(server, {:request, self(), ref})
reply =
Lockstep.recv_first(fn
{^ref, payload} -> {:ok, payload}
_ -> false # any non-truthy means "skip"
end)Other messages already in the mailbox are not disturbed.
GenServer wrapper
defmodule Counter do
def init(initial), do: {:ok, initial}
def handle_call(:get, _from, n), do: {:reply, n, n}
def handle_call({:put, x}, _from, _n), do: {:reply, :ok, x}
def handle_cast({:add, x}, n), do: {:noreply, n + x}
end
ctest "lost update on counter" do
srv = Lockstep.GenServer.start_link(Counter, 0)
for _ <- 1..2 do
Lockstep.spawn(fn ->
v = Lockstep.GenServer.call(srv, :get)
:ok = Lockstep.GenServer.call(srv, {:put, v + 1})
end)
end
# ... barrier ...
assert Lockstep.GenServer.call(srv, :get) == 2
end
The wrapper itself is a thin loop on top of Lockstep.recv/Lockstep.send/
Lockstep.recv_first. See lib/lockstep/gen_server.ex for the
~80-line implementation.
Virtual time
ctest "GenServer with periodic tick" do
pid = Lockstep.GenServer.start_link(MyTimerServer, [])
Lockstep.send_after(pid, :start, 0)
# Drain the test forward until 350ms of virtual time has passed.
Lockstep.send_after(self(), :wakeup, 351)
_ = Lockstep.recv()
assert Lockstep.GenServer.call(pid, :tick_count) >= 3
end
Virtual time only advances when the controller would otherwise
deadlock — when every managed process is blocked on receive and the
only way to unblock someone is to fire the next pending timer. So
Lockstep.send_after(pid, :ding, 86_400_000) followed by recv returns
in milliseconds of wall-clock time, not a day. Same trick :meck/Mox
let you do for :erlang.system_time, but built into the scheduler.
Reproducing a bug
Three ways, in order of laziness:
1. Re-run the suite with the original seed. The trace records both
the top-level seed and the iteration. Re-run with seed: <top_seed> and
iterations: >= <iter>. Per-iteration seed derivation is
phash2({top_seed, iteration}), so iteration K of the same top seed
always behaves identically.
2. Replay from the CLI. Extract the test body into a named function, then:
mix lockstep.replay --trace traces/foo.lockstep \
--run "MyApp.RaceTest.lost_update_body" \
--file test/my_app/race_test.exs
The mix task loads the test file (so the module becomes available),
then runs Lockstep.Replay.run/2 with the schedule from the trace.
3. Replay programmatically. Same idea, written as an ordinary ExUnit test:
defmodule MyApp.RaceTest do
use Lockstep.Test, ...
# The body is extracted into a function so it can be called from
# both the ctest and a one-off replay test.
def lost_update_body do
# ... the actual test scenario ...
end
ctest "lost update" do
lost_update_body()
end
end
defmodule MyApp.ReplayTest do
use ExUnit.Case
test "replay the lost-update bug" do
Lockstep.Replay.run(
&MyApp.RaceTest.lost_update_body/0,
"traces/MyApp-iter2-seed828370911214.lockstep"
)
end
end
If the user code is fully deterministic (no raw :rand, no real I/O,
no NIFs that yield asymmetrically), the replay produces a structurally
identical trace and re-fires the bug. If something diverges, you get a
Lockstep.ReplayDivergence with a step number and a hint.
You can also compare two traces structurally:
Lockstep.Replay.compare_traces!(original_trace, replayed_trace)Pids and refs are sanitized before comparison since they're recreated on each run.
Shrinking a trace
A failing trace can be hundreds of events. Most of those are necessary
setup but obscure the actual race. Lockstep.Shrink.shrink/3 searches
for a shorter replay_pid_order that still triggers the same bug
signature under strict replay, then saves a .shrunk.lockstep next
to the original.
mix lockstep.shrink --trace traces/counter_race-iter5-seed42.lockstep \
--run "MyApp.CounterRaceTest.lost_update_body" \
--file test/my_app/counter_race_test.exsOutput:
Lockstep shrink succeeded:
original schedule decisions: 87
shrunk schedule decisions: 12
attempts: 156
bug signature: {:exception, RuntimeError}
shrunk trace: traces/counter_race-iter5-seed42.shrunk.lockstep
The shrunk trace replays exactly like the original — point
mix lockstep.replay at it for the same bug at fewer scheduling
decisions. Programmatic version:
{:ok, info} =
Lockstep.Shrink.shrink(
fn -> MyApp.CounterRaceTest.lost_update_body() end,
"traces/counter_race-iter5-seed42.lockstep",
verbose: true
)
info.original_length # 87
info.new_length # 12
info.new_trace_path # "traces/...shrunk.lockstep"The algorithm is two-pass — truncation (shorter prefixes) then decimation (drop individual decisions) — repeated to a fixed point. Bug-signature comparison is coarse (same exception type, same deadlock category, etc.), so the shrunk trace is a cleaner repro not a byte-identical reproduction.
Long runs and progress
iterations: 10_000 is silent by default until it either finishes
or finds a bug. Pass progress: to opt in:
Lockstep.Runner.run(
fn -> MyTest.race_body() end,
iterations: 10_000,
strategy: :pct,
progress: true # prints every ~5%
# progress: 100 # or: every 100 iterations
# progress: fn iter -> ... end # or: a custom hook (telemetry, ...)
)Bug-finding regression suite
test/bug_zoo_test.exs is a curated set of known concurrency races,
each declared with a (strategy, seed, max_iterations, expected_at_most)
configuration. The test fails if any race takes more iterations than
its baseline — catching engine regressions in bug-finding power as
the controller and strategies evolve.
When making engine changes, run mix test test/bug_zoo_test.exs to
confirm bug-finding hasn't slipped. Add new entries as you wire up new
classes of bug.
Design
The shape mirrors Microsoft Coyote, mapped to BEAM:
| Coyote (.NET) | Lockstep (BEAM) |
|---|---|
coyote rewrite (IL rewriter) |
Runtime Lockstep.send/recv/spawn API |
TestingEngine | Lockstep.Runner.run/2 |
IActorRuntime | Lockstep.Controller GenServer |
| Strategies (random, PCT, ...) | Lockstep.Strategy behaviour |
coyote replay <trace> | mix lockstep.replay |
License
Apache-2.0.