Reach

Program dependence graph for Elixir and Erlang.

Reach builds a graph that captures what depends on what in your code. Given any expression, you can ask: what affects it? What does it affect? Can these two statements be safely reordered? Does user input reach this database call without sanitization?

Use cases

Security: taint analysis

Does user input reach a dangerous sink without sanitization?

graph = Reach.file_to_graph!("lib/my_app_web/controllers/user_controller.ex")

Reach.taint_analysis(graph,
  sources: [type: :call, function: :params],
  sinks: [type: :call, module: System, function: :cmd, arity: 2],
  sanitizers: [type: :call, function: :sanitize]
)
#=> [%{source: ..., sink: ..., path: [...], sanitized: false}]

Code quality: dead code detection

graph = Reach.file_to_graph!("lib/accounts.ex")

for node <- Reach.dead_code(graph) do
  IO.warn("Dead code at #{inspect(node.source_span)}: #{node.type}")
end

Refactoring: safe reordering

graph = Reach.file_to_graph!("lib/pipeline.ex")

for block <- Reach.nodes(graph, type: :block),
    {a, b} <- pairs(block.children) do
  if Reach.independent?(graph, a.id, b.id) do
    IO.puts("Statements at lines #{a.source_span.start_line} and " <>
      "#{b.source_span.start_line} can be safely reordered")
  end
end

Clone detection: reordering-equivalent code

Two blocks with the same statements in different order are clones if the statements are independent. canonical_order sorts independent siblings by structural hash so both orderings produce the same sequence:

# For ExDNA integration
order = Reach.canonical_order(graph, block_node_id)
hash = :erlang.phash2(Enum.map(order, fn {_, node} -> node end))

OTP: GenServer state flow analysis

graph = Reach.file_to_graph!("lib/my_server.ex")
edges = Reach.edges(graph)

# Which callbacks read state?
state_reads = Enum.filter(edges, &(&1.label == :state_read))

# Does state flow between callbacks?
state_passes = Enum.filter(edges, &(&1.label == :state_pass))

# ETS write-then-read dependencies
ets_deps = Enum.filter(edges, &match?({:ets_dep, _table}, &1.label))

Concurrency: crash propagation

graph = Reach.file_to_graph!("lib/my_supervisor.ex")
edges = Reach.edges(graph)

# Which monitors connect to which :DOWN handlers?
Enum.filter(edges, &(&1.label == :monitor_down))

# Does this process trap exits? Which handler receives them?
Enum.filter(edges, &(&1.label == :trap_exit))

# Task.async → Task.await data flow
Enum.filter(edges, &(&1.label == :task_result))

# Supervisor child startup ordering
Enum.filter(edges, &(&1.label == :startup_order))

Installation

def deps do
  [{:reach, "~> 1.0"}]
end

Building a graph

# From Elixir source
{:ok, graph} = Reach.string_to_graph("def foo(x), do: x + 1")
{:ok, graph} = Reach.file_to_graph("lib/my_module.ex")

# From Erlang source
{:ok, graph} = Reach.string_to_graph(source, language: :erlang)
{:ok, graph} = Reach.file_to_graph("src/my_module.erl")  # auto-detected

# From pre-parsed AST (for Credo/ExDNA integration)
{:ok, ast} = Code.string_to_quoted(source)
{:ok, graph} = Reach.ast_to_graph(ast)

# From compiled BEAM bytecode (sees macro-expanded code)
{:ok, graph} = Reach.module_to_graph(MyApp.Accounts)
{:ok, graph} = Reach.compiled_to_graph(source)

# Bang variants
graph = Reach.file_to_graph!("lib/my_module.ex")

The BEAM frontend captures code invisible to source-level analysis — use GenServer callbacks, macro-expanded try/rescue, generated functions:

# Source: only sees init/1 and handle_call/3
Reach.string_to_graph(genserver_source)

# BEAM: also sees child_spec/1, terminate/2, handle_info/2
Reach.compiled_to_graph(genserver_source)

Multi-file project analysis

# Analyze a whole Mix project
project = Reach.Project.from_mix_project()

# Or specific paths
project = Reach.Project.from_glob("lib/**/*.ex")

# Cross-module taint analysis
Reach.Project.taint_analysis(project,
  sources: [type: :call, function: :params],
  sinks: [type: :call, module: System, function: :cmd]
)

# Dependency summaries for external modules
summaries = Reach.Project.summarize_dependency(Ecto.Adapters.SQL)
project = Reach.Project.from_mix_project(summaries: summaries)

API reference

Nodes

Reach.nodes(graph)                                    # all nodes
Reach.nodes(graph, type: :call)                       # by type
Reach.nodes(graph, type: :call, module: Enum)         # by module
Reach.nodes(graph, type: :call, module: Repo, function: :insert, arity: 1)

node = Reach.node(graph, node_id)
node.type        #=> :call
node.meta        #=> %{module: Repo, function: :insert, arity: 1}
node.source_span #=> %{file: "lib/accounts.ex", start_line: 12, ...}

Slicing

Reach.backward_slice(graph, node_id)   # what affects this?
Reach.forward_slice(graph, node_id)    # what does this affect?
Reach.chop(graph, source_id, sink_id)  # how does A influence B?

Data flow and independence

Reach.data_flows?(graph, source_id, sink_id)
Reach.depends?(graph, id_a, id_b)
Reach.independent?(graph, id_a, id_b)
Reach.has_dependents?(graph, node_id)
Reach.passes_through?(graph, source_id, sink_id, &predicate/1)

Effects

Reach.pure?(node)              #=> true
Reach.classify_effect(node)    #=> :pure | :io | :read | :write | :send | :receive | :exception | :unknown

Covers Enum, Map, List, String, Keyword, Tuple, Integer, Float, Atom, MapSet, Range, Regex, URI, Path, Base, and Erlang equivalents. Enum.each correctly classified as impure.

Dependencies

Reach.control_deps(graph, node_id)   #=> [{controller_id, label}, ...]
Reach.data_deps(graph, node_id)      #=> [{source_id, :variable_name}, ...]
Reach.edges(graph)                   # all dependence edges

Interprocedural

Reach.function_graph(graph, {MyModule, :my_function, 2})
Reach.context_sensitive_slice(graph, node_id)   # Horwitz-Reps-Binkley
Reach.call_graph(graph)                         # {module, function, arity} vertices

Taint analysis

# Keyword filters (same format as nodes/2)
Reach.taint_analysis(graph,
  sources: [type: :call, function: :params],
  sinks: [type: :call, module: Repo, function: :query],
  sanitizers: [type: :call, function: :sanitize]
)

# Predicates also work
Reach.taint_analysis(graph,
  sources: &(&1.meta[:function] in [:params, :body_params]),
  sinks: [type: :call, module: System, function: :cmd]
)
#=> [%{source: node, sink: node, path: [node_id, ...], sanitized: boolean}]

Dead code

Reach.dead_code(graph)
#=> [%Reach.IR.Node{type: :call, meta: %{function: :upcase}}, ...]

Canonical ordering

Reach.canonical_order(graph, block_id)
#=> [{node_id, %Reach.IR.Node{}}, ...] sorted so independent
#   siblings have deterministic order regardless of source order

Neighbors

Reach.neighbors(graph, node_id)                # all direct neighbors
Reach.neighbors(graph, node_id, :state_read)   # only state_read edges
Reach.neighbors(graph, node_id, :data)         # matches {:data, _} labels

Raw graph access

raw = Reach.to_graph(graph)     # returns the underlying libgraph Graph.t()
Graph.vertices(raw)             # use any libgraph function
Graph.get_shortest_path(raw, id_a, id_b)

Export

{:ok, dot} = Reach.to_dot(graph)
File.write!("graph.dot", dot)
# dot -Tpng graph.dot -o graph.png

Edge types

Label Source Meaning
{:data, var} DDG Data flows through variable var
:containment DDG Parent expression depends on child sub-expression
{:control, label} CDG Execution controlled by branch condition
:call SDG Call site to callee function
:parameter_in SDG Argument flows to formal parameter
:parameter_out SDG Return value flows back to caller
:summary SDG Shortcut: parameter flows to return value
:state_read OTP GenServer callback reads state parameter
:state_pass OTP State flows between consecutive callbacks
{:ets_dep, table} OTP ETS write → read on same table
{:pdict_dep, key} OTP Process dictionary put → get on same key
:message_order OTP Sequential sends to same pid
:monitor_down Concurrency Process.monitorhandle_info({:DOWN, ...})
:trap_exit Concurrency Process.flag(:trap_exit)handle_info({:EXIT, ...})
:link_exit Concurrency spawn_link / Process.link:EXIT handler
:task_result Concurrency Task.asyncTask.await data flow
:startup_order Concurrency Supervisor child A starts before child B
{:message_content, tag} OTP send(pid, {tag, data}) payload → handler pattern vars
:call_reply OTP {:reply, value, state}GenServer.call return
:match_binding DDG Match RHS flows to LHS definition var
:higher_order SDG Argument flows through higher-order function to result

Architecture

graph TD
    Source["Source (.ex / .erl / .beam)"] --> Frontend
    Frontend["Frontend → IR Nodes<br/><i>Elixir AST · Erlang abstract forms · BEAM bytecode</i>"] --> CFG
    CFG["Control Flow Graph<br/><i>entry/exit · branching · exceptions</i>"] --> Dom
    Dom["Dominators<br/><i>post-dominator tree · dominance frontier</i>"] --> CD
    CD["Control Dependence<br/><i>Ferrante et al. 1987</i>"] --> PDG
    CFG --> DD
    DD["Data Dependence<br/><i>def-use chains · containment edges</i>"] --> PDG
    PDG["Program Dependence Graph<br/><i>control ∪ data</i>"] --> SDG
    SDG["System Dependence Graph + OTP + Concurrency<br/><i>call/summary edges · GenServer · ETS · monitors · tasks</i>"] --> Query
    Query["Query / Effects<br/><i>slicing · independence · taint · purity</i>"]

Performance

Benchmarked on real projects (single-threaded, Apple M1 Pro):

Project Files Functions IR Nodes Time ms/file
ex_slop 26 109 5,186 36ms 1.4
ex_dna 32 208 10,649 87ms 2.7
Livebook 72 589 22,156 160ms 2.2
Oban 64 606 31,413 195ms 3.0
Keila 190 1,074 49,354 282ms 1.5
Phoenix 74 1,090 48,292 333ms 4.5
Absinthe 282 1,411 68,416 375ms 1.3

740 files, zero crashes.

References

License

MIT