Reach

Program dependence graph for Elixir and Erlang.

Reach builds a graph that captures what depends on what in your code. Given any expression, you can ask: what affects it? What does it affect? Can these two statements be safely reordered? Does user input reach this database call without sanitization?

Use cases

Security: taint analysis

Does user input reach a dangerous sink without sanitization?

graph = Reach.file_to_graph!("lib/my_app_web/controllers/user_controller.ex")

Reach.taint_analysis(graph,
  sources: [type: :call, function: :params],
  sinks: [type: :call, module: System, function: :cmd, arity: 2],
  sanitizers: [type: :call, function: :sanitize]
)
#=> [%{source: ..., sink: ..., path: [...], sanitized: false}]

Code quality: dead code detection

graph = Reach.file_to_graph!("lib/accounts.ex")

for node <- Reach.dead_code(graph) do
  IO.warn("Dead code at #{inspect(node.source_span)}: #{node.type}")
end

Refactoring: safe reordering

graph = Reach.file_to_graph!("lib/pipeline.ex")

for block <- Reach.nodes(graph, type: :block),
    {a, b} <- pairs(block.children) do
  if Reach.independent?(graph, a.id, b.id) do
    IO.puts("Statements at lines #{a.source_span.start_line} and " <>
      "#{b.source_span.start_line} can be safely reordered")
  end
end

Clone detection: reordering-equivalent code

Two blocks with the same statements in different order are clones if the statements are independent. canonical_order sorts independent siblings by structural hash so both orderings produce the same sequence:

# For ExDNA integration
order = Reach.canonical_order(graph, block_node_id)
hash = :erlang.phash2(Enum.map(order, fn {_, node} -> node end))

OTP: GenServer state flow analysis

graph = Reach.file_to_graph!("lib/my_server.ex")
edges = Reach.edges(graph)

# Which callbacks read state?
state_reads = Enum.filter(edges, &(&1.label == :state_read))

# Does state flow between callbacks?
state_passes = Enum.filter(edges, &(&1.label == :state_pass))

# ETS write-then-read dependencies
ets_deps = Enum.filter(edges, &match?({:ets_dep, _table}, &1.label))

Concurrency: crash propagation

graph = Reach.file_to_graph!("lib/my_supervisor.ex")
edges = Reach.edges(graph)

# Which monitors connect to which :DOWN handlers?
Enum.filter(edges, &(&1.label == :monitor_down))

# Does this process trap exits? Which handler receives them?
Enum.filter(edges, &(&1.label == :trap_exit))

# Task.async → Task.await data flow
Enum.filter(edges, &(&1.label == :task_result))

# Supervisor child startup ordering
Enum.filter(edges, &(&1.label == :startup_order))

Installation

def deps do
  [{:reach, "~> 1.0"}]
end

Building a graph

# From Elixir source
{:ok, graph} = Reach.string_to_graph("def foo(x), do: x + 1")
{:ok, graph} = Reach.file_to_graph("lib/my_module.ex")

# From Erlang source
{:ok, graph} = Reach.string_to_graph(source, language: :erlang)
{:ok, graph} = Reach.file_to_graph("src/my_module.erl")  # auto-detected

# From pre-parsed AST (for Credo/ExDNA integration)
{:ok, ast} = Code.string_to_quoted(source)
{:ok, graph} = Reach.ast_to_graph(ast)

# From compiled BEAM bytecode (sees macro-expanded code)
{:ok, graph} = Reach.module_to_graph(MyApp.Accounts)
{:ok, graph} = Reach.compiled_to_graph(source)

# Bang variants
graph = Reach.file_to_graph!("lib/my_module.ex")

The BEAM frontend captures code invisible to source-level analysis — use GenServer callbacks, macro-expanded try/rescue, generated functions:

# Source: only sees init/1 and handle_call/3
Reach.string_to_graph(genserver_source)

# BEAM: also sees child_spec/1, terminate/2, handle_info/2
Reach.compiled_to_graph(genserver_source)

Multi-file project analysis

# Analyze a whole Mix project
project = Reach.Project.from_mix_project()

# Or specific paths
project = Reach.Project.from_glob("lib/**/*.ex")

# Cross-module taint analysis
Reach.Project.taint_analysis(project,
  sources: [type: :call, function: :params],
  sinks: [type: :call, module: System, function: :cmd]
)

# Dependency summaries for external modules
summaries = Reach.Project.summarize_dependency(Ecto.Adapters.SQL)
project = Reach.Project.from_mix_project(summaries: summaries)

API reference

Nodes

Reach.nodes(graph)                                    # all nodes
Reach.nodes(graph, type: :call)                       # by type
Reach.nodes(graph, type: :call, module: Enum)         # by module
Reach.nodes(graph, type: :call, module: Repo, function: :insert, arity: 1)

node = Reach.node(graph, node_id)
node.type        #=> :call
node.meta        #=> %{module: Repo, function: :insert, arity: 1}
node.source_span #=> %{file: "lib/accounts.ex", start_line: 12, ...}

Slicing

Reach.backward_slice(graph, node_id)   # what affects this?
Reach.forward_slice(graph, node_id)    # what does this affect?
Reach.chop(graph, source_id, sink_id)  # how does A influence B?

Data flow and independence

Reach.data_flows?(graph, source_id, sink_id)
Reach.depends?(graph, id_a, id_b)
Reach.independent?(graph, id_a, id_b)
Reach.has_dependents?(graph, node_id)
Reach.passes_through?(graph, source_id, sink_id, &predicate/1)

Effects

Reach.pure?(node)              #=> true
Reach.classify_effect(node)    #=> :pure | :io | :read | :write | :send | :receive | :exception | :unknown

Covers Enum, Map, List, String, Keyword, Tuple, Integer, Float, Atom, MapSet, Range, Regex, URI, Path, Base, and Erlang equivalents. Enum.each correctly classified as impure.

Dependencies

Reach.control_deps(graph, node_id)   #=> [{controller_id, label}, ...]
Reach.data_deps(graph, node_id)      #=> [{source_id, :variable_name}, ...]
Reach.edges(graph)                   # all dependence edges

Interprocedural

Reach.function_graph(graph, {MyModule, :my_function, 2})
Reach.context_sensitive_slice(graph, node_id)   # Horwitz-Reps-Binkley
Reach.call_graph(graph)                         # {module, function, arity} vertices

Taint analysis

# Keyword filters (same format as nodes/2)
Reach.taint_analysis(graph,
  sources: [type: :call, function: :params],
  sinks: [type: :call, module: Repo, function: :query],
  sanitizers: [type: :call, function: :sanitize]
)

# Predicates also work
Reach.taint_analysis(graph,
  sources: &(&1.meta[:function] in [:params, :body_params]),
  sinks: [type: :call, module: System, function: :cmd]
)
#=> [%{source: node, sink: node, path: [node_id, ...], sanitized: boolean}]

Dead code

Reach.dead_code(graph)
#=> [%Reach.IR.Node{type: :call, meta: %{function: :upcase}}, ...]

Canonical ordering

Reach.canonical_order(graph, block_id)
#=> [{node_id, %Reach.IR.Node{}}, ...] sorted so independent
#   siblings have deterministic order regardless of source order

Neighbors

Reach.neighbors(graph, node_id)                # all direct neighbors
Reach.neighbors(graph, node_id, :state_read)   # only state_read edges
Reach.neighbors(graph, node_id, :data)         # matches {:data, _} labels

Raw graph access

raw = Reach.to_graph(graph)     # returns the underlying libgraph Graph.t()
Graph.vertices(raw)             # use any libgraph function
Graph.get_shortest_path(raw, id_a, id_b)

Export

{:ok, dot} = Reach.to_dot(graph)
File.write!("graph.dot", dot)
# dot -Tpng graph.dot -o graph.png

Edge types

Label	Source	Meaning
`{:data, var}`	DDG	Data flows through variable `var`
`:containment`	DDG	Parent expression depends on child sub-expression
`{:control, label}`	CDG	Execution controlled by branch condition
`:call`	SDG	Call site to callee function
`:parameter_in`	SDG	Argument flows to formal parameter
`:parameter_out`	SDG	Return value flows back to caller
`:summary`	SDG	Shortcut: parameter flows to return value
`:state_read`	OTP	GenServer callback reads state parameter
`:state_pass`	OTP	State flows between consecutive callbacks
`{:ets_dep, table}`	OTP	ETS write → read on same table
`{:pdict_dep, key}`	OTP	Process dictionary put → get on same key
`:message_order`	OTP	Sequential sends to same pid
`:monitor_down`	Concurrency	`Process.monitor` → `handle_info({:DOWN, ...})`
`:trap_exit`	Concurrency	`Process.flag(:trap_exit)` → `handle_info({:EXIT, ...})`
`:link_exit`	Concurrency	`spawn_link` / `Process.link` → `:EXIT` handler
`:task_result`	Concurrency	`Task.async` → `Task.await` data flow
`:startup_order`	Concurrency	Supervisor child A starts before child B
`{:message_content, tag}`	OTP	`send(pid, {tag, data})` payload → handler pattern vars
`:call_reply`	OTP	`{:reply, value, state}` → `GenServer.call` return
`:match_binding`	DDG	Match RHS flows to LHS definition var
`:higher_order`	SDG	Argument flows through higher-order function to result

Architecture

graph TD
    Source["Source (.ex / .erl / .beam)"] --> Frontend
    Frontend["Frontend → IR Nodes<br/><i>Elixir AST · Erlang abstract forms · BEAM bytecode</i>"] --> CFG
    CFG["Control Flow Graph<br/><i>entry/exit · branching · exceptions</i>"] --> Dom
    Dom["Dominators<br/><i>post-dominator tree · dominance frontier</i>"] --> CD
    CD["Control Dependence<br/><i>Ferrante et al. 1987</i>"] --> PDG
    CFG --> DD
    DD["Data Dependence<br/><i>def-use chains · containment edges</i>"] --> PDG
    PDG["Program Dependence Graph<br/><i>control ∪ data</i>"] --> SDG
    SDG["System Dependence Graph + OTP + Concurrency<br/><i>call/summary edges · GenServer · ETS · monitors · tasks</i>"] --> Query
    Query["Query / Effects<br/><i>slicing · independence · taint · purity</i>"]

Performance

Benchmarked on real projects (single-threaded, Apple M1 Pro):

Project	Files	Functions	IR Nodes	Time	ms/file
ex_slop	26	109	5,186	36ms	1.4
ex_dna	32	208	10,649	87ms	2.7
Livebook	72	589	22,156	160ms	2.2
Oban	64	606	31,413	195ms	3.0
Keila	190	1,074	49,354	282ms	1.5
Phoenix	74	1,090	48,292	333ms	4.5
Absinthe	282	1,411	68,416	375ms	1.3

740 files, zero crashes.

References

Ferrante, Ottenstein, Warren — The Program Dependence Graph and Its Use in Optimization (1987)
Horwitz, Reps, Binkley — Interprocedural Slicing Using Dependence Graphs (1990)
Silva, Tamarit, Tomás — System Dependence Graphs for Erlang Programs (2012)
Cooper, Harvey, Kennedy — A Simple, Fast Dominance Algorithm (2001)

License

MIT