LetItCrash

CIHex.pmHex.pm DownloadsLicense: MITElixir Version

A testing library for crash recovery and OTP supervision behavior in Elixir.

Embrace the "let it crash" philosophy in your tests by easily simulating crashes and verifying that your GenServers and supervised processes recover correctly.

Why Use LetItCrash?

We know Elixir/OTP supervision works. LetItCrash doesn't test if processes restartβ€”it tests if your application handles restarts correctly.

Real bugs this library helps catch:

Think of it as integration testing for your crash recovery logic, not unit testing the BEAM.

Installation

Add let_it_crash to your list of dependencies in mix.exs:

def deps do
[
{:let_it_crash, "~> 0.5.0", only: :test}
]
end

Testing async work in Phoenix + Oban

The most common production bug LetItCrash helps catch is the silent swallow β€” a Task raises but no one is awaiting it, the supervisor moves on, the test passes, and a real user gets stuck. LetItCrash.Async wraps your test code in an observer block that subscribes to telemetry exception events and lets you name the failure modes explicitly:

defmodule MyAppWeb.WidgetControllerTest do
use MyAppWeb.ConnCase
use LetItCrash
test "POST /api/widgets fires a Task that doesn't silently swallow", %{conn: conn} do
report =
observe_async(fn ->
conn = post(conn, "/api/widgets", %{name: "alpha"})
assert json_response(conn, 202)
end)
assert :ok = assert_no_silent_swallow(report)
assert :ok = assert_all_completed(report, within: 2_000)
end
test "the create-widget Oban worker is idempotent" do
assert :ok =
assert_idempotent(
fn -> MyApp.Widgets.do_create(%{name: "alpha"}) end,
state: fn -> MyApp.Repo.aggregate(MyApp.Widget, :count) end
)
end
end

See LetItCrash.Async for the full surface β€” failure-mode definitions, options, limitations, and the Ecto.Sandbox interaction note.

Usage

defmodule MyAppTest do
use ExUnit.Case
use LetItCrash
test "supervised genserver recovers after crash" do
# Start a supervisor with your GenServer
{:ok, supervisor} = MySupervisor.start_link()
{:ok, _pid} = MySupervisor.start_worker(supervisor, :my_worker)
# Crash by name (automatic PID tracking)
LetItCrash.crash(:my_worker)
# Verify recovery - waits for new PID
assert LetItCrash.recovered?(:my_worker)
# Clean up
Process.exit(supervisor, :shutdown)
end
test "process state resets after restart" do
{:ok, supervisor} = MySupervisor.start_link()
{:ok, _pid} = MySupervisor.start_worker(supervisor, :stateful_server)
LetItCrash.test_restart(:stateful_server, fn ->
# This function runs before AND after the crash
# State will be reset to initial after restart
MyStatefulServer.increment()
count = MyStatefulServer.get_count()
IO.puts("Count: #{count}") # Will be 1 before crash, 1 after (reset + increment)
end)
Process.exit(supervisor, :shutdown)
end
test "manual PID tracking" do
{:ok, supervisor} = MySupervisor.start_link()
{:ok, _pid} = MySupervisor.start_worker(supervisor, :manual_worker)
# Store original PID manually
original_pid = Process.whereis(:manual_worker)
LetItCrash.crash(:manual_worker)
# Check recovery with original PID and custom timeout
assert LetItCrash.recovered?(:manual_worker, original_pid, timeout: 2000)
Process.exit(supervisor, :shutdown)
end
end

API

crash/1 and crash/2

Crashes a process by PID or registered name. Follows the same convention as Process.exit/2 with the process as the first argument to enable easy piping.

# crash/1 - Sends :shutdown signal (can be trapped)
LetItCrash.crash(pid) # Crash by PID
LetItCrash.crash(:process_name) # Crash by name + auto tracking
# crash/2 - Specify the signal type
LetItCrash.crash(pid, :shutdown) # Equivalent to crash/1
LetItCrash.crash(pid, :kill) # :kill signal (cannot be trapped)
LetItCrash.crash(:process_name, :kill) # With registered name
# Piping support:
Process.whereis(:my_process)
|> LetItCrash.crash(:kill)

When to use :kill?

Use crash(process, :kill) when testing processes that use Process.flag(:trap_exit, true), which is common in GenServers that need to perform cleanup logic on normal exits:

defmodule ScoreCoordinator do
use GenServer
def init(_) do
Process.flag(:trap_exit, true) # Traps normal exits
{:ok, %{}}
end
def handle_info({:EXIT, _pid, _reason}, state) do
# Cleanup logic here
{:noreply, state}
end
end
# In tests:
test "coordinator recovers from forced crash" do
{:ok, supervisor} = MySupervisor.start_link()
{:ok, _pid} = MySupervisor.start_coordinator(supervisor, :coordinator)
# Use :kill to guarantee termination even with trap_exit
LetItCrash.crash(:coordinator, :kill)
assert LetItCrash.recovered?(:coordinator)
end

wait_for_process/1,2

Waits for a registered process to exist and be alive. Useful in test setup when you need to ensure a process is available before interacting with it.

# Basic usage - waits up to 1000ms (default)
:ok = LetItCrash.wait_for_process(:my_worker)
# With custom timeout for slow-starting processes
:ok = LetItCrash.wait_for_process(:heavy_worker, timeout: 5000)
# With custom polling interval
:ok = LetItCrash.wait_for_process(:worker, timeout: 2000, interval: 100)

Options:

Returns:

recovered?/1,2,3

Checks if a registered process has recovered after a crash. Multiple signatures available:

# Uses stored PID from crash/1 (recommended)
LetItCrash.recovered?(:process_name)
# With custom timeout/options
LetItCrash.recovered?(:process_name, timeout: 2000, interval: 100)
# Manual PID comparison
LetItCrash.recovered?(:process_name, original_pid)
# Manual PID + options
LetItCrash.recovered?(:process_name, original_pid, timeout: 3000)

Options:

test_restart/2,3

Tests that a process recovers by running the same function before and after crash.

# Basic usage
LetItCrash.test_restart(:process_name, fn ->
# Test logic executed before AND after crash
end)
# With options
LetItCrash.test_restart(:process_name, fn ->
# Test logic
end, timeout: 2000)

assert_clean_registry/2,3

Verifies that Registry entries are properly cleaned up when a process crashes and recreated when it recovers.

# Basic usage - verifies cleanup and re-registration
LetItCrash.assert_clean_registry(MyApp.Registry, :process_name)
# With custom timeout
LetItCrash.assert_clean_registry(MyApp.Registry, :process_name, timeout: 3000)

This function ensures your processes properly:

verify_ets_cleanup/2,3

Monitors ETS table entries to verify proper cleanup during process crashes.

# Verify entry is cleaned up (default behavior)
LetItCrash.verify_ets_cleanup(:my_cache, :process_data)
# Custom cleanup expectations
LetItCrash.verify_ets_cleanup(:shared_table, :key,
expect_cleanup: true,
expect_recreate: false,
timeout: 1500
)
# Verify recreation after cleanup
LetItCrash.verify_ets_cleanup(:cache_table, :data_key,
expect_cleanup: true,
expect_recreate: true
)

Options:

assert_supervision_impact/3

Crashes a child process and verifies the expected impact on its siblings within a supervision tree. Validates that your chosen supervision strategy (:one_for_one, :one_for_all, :rest_for_one) behaves as expected for your specific tree.

# Verify one_for_one: only the crashed child restarts
LetItCrash.assert_supervision_impact(:my_supervisor, :worker_a,
expect: [
worker_a: :restarted,
worker_b: :alive,
worker_c: :alive
]
)
# Verify one_for_all: all children restart
LetItCrash.assert_supervision_impact(:my_supervisor, :worker_a,
expect: [
worker_a: :restarted,
worker_b: :restarted,
worker_c: :restarted
]
)
# Verify rest_for_one: crashed child and later siblings restart
LetItCrash.assert_supervision_impact(:my_supervisor, :worker_b,
expect: [
worker_a: :alive,
worker_b: :restarted,
worker_c: :restarted
]
)

Each status can be paired with an assertion function to verify application-level behavior after the supervision event β€” not just that processes restarted, but that your code actually handles the restart correctly:

LetItCrash.assert_supervision_impact(:my_supervisor, :coordinator,
expect: [
coordinator: {:restarted, fn ->
# Verify the coordinator came back in a valid state
assert MyCoordinator.get_status() == :idle
end},
worker_a: {:alive, fn ->
# Verify the sibling is still functional
assert MyWorker.ready?(:worker_a)
end},
worker_b: :restarted
]
)

Expected Statuses:

Options:

Advanced Usage Examples

Testing Registry and ETS Cleanup

defmodule MyAppTest do
use ExUnit.Case
use LetItCrash
test "server cleans up resources properly on crash" do
# Setup: Start Registry and ETS table
{:ok, _} = Registry.start_link(keys: :unique, name: MyApp.Registry)
:ets.new(:app_cache, [:set, :public, :named_table])
{:ok, supervisor} = MySupervisor.start_link()
{:ok, _pid} = MySupervisor.start_worker(supervisor, :resource_server)
# Server registers itself and creates ETS entries
assert [{_pid, _}] = Registry.lookup(MyApp.Registry, :resource_server)
:ets.insert(:app_cache, {:server_data, "important_data"})
# Crash and verify proper cleanup + recovery
LetItCrash.crash(:resource_server)
# Verify Registry cleanup and re-registration
assert :ok = LetItCrash.assert_clean_registry(MyApp.Registry, :resource_server)
# Verify ETS cleanup
assert :ok = LetItCrash.verify_ets_cleanup(:app_cache, :server_data)
Process.exit(supervisor, :shutdown)
end
end

Testing Supervision Strategy Impact

A common real-world scenario: you have a scoring system with a coordinator and multiple workers under a :one_for_all strategy. When the coordinator crashes mid-calculation, you need to verify that workers don't retain stale partial results and that the coordinator comes back ready to accept new work.

defmodule ScoringSystemTest do
use ExUnit.Case
use LetItCrash
test "coordinator crash resets workers to clean state" do
{:ok, supervisor} = ScoringSupervisor.start_link()
# Workers are processing scores
ScoreWorker.submit(:worker_a, %{team: "A", score: 42})
ScoreWorker.submit(:worker_b, %{team: "B", score: 38})
ScoreCoordinator.begin_normalization(:coordinator)
# Coordinator crashes mid-normalization
LetItCrash.assert_supervision_impact(supervisor, :coordinator,
signal: :kill,
expect: [
coordinator: {:restarted, fn ->
# Coordinator must come back idle, not stuck in :normalizing
assert ScoreCoordinator.get_status(:coordinator) == :idle
# Must be able to accept new work immediately
assert :ok = ScoreCoordinator.begin_normalization(:coordinator)
end},
worker_a: {:restarted, fn ->
# Workers must not retain partial/stale scores
assert ScoreWorker.get_pending(:worker_a) == []
end},
worker_b: {:restarted, fn ->
assert ScoreWorker.get_pending(:worker_b) == []
end}
]
)
Process.exit(supervisor, :shutdown)
end
end

Combined Testing Workflow

test "complete crash recovery validation" do
{:ok, supervisor} = MySupervisor.start_link()
{:ok, _pid} = MySupervisor.start_worker(supervisor, :full_test_server)
# Test complete recovery workflow
LetItCrash.test_restart(:full_test_server, fn ->
# This runs before AND after crash
MyServer.increment_counter()
assert MyServer.get_counter() == 1 # Will be reset to 0, then incremented to 1
end)
# Verify additional cleanup
LetItCrash.assert_clean_registry(MyApp.Registry, :full_test_server)
LetItCrash.verify_ets_cleanup(:server_cache, :counter_data)
Process.exit(supervisor, :shutdown)
end

Important Notes

⚠️ Requires Supervision: The recovered?/1 and test_restart/2 functions only work with supervised processes. Unsupervised processes won't restart after crashes.

πŸ”„ State Reset: Process state is reset to initial values after restart (this is normal OTP behavior).

🏷️ Named Processes: Recovery detection only works with registered (named) processes.

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for details on:

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support


Embrace the crash, test the recovery! πŸ’₯➑️βœ