Metastatic

Metastatic

Cross-language code analysis through unified MetaAST representation

Metastatic is a library that provides a unified MetaAST (Meta-level Abstract Syntax Tree) intermediate representation for parsing, transforming, and analyzing code across multiple programming languages using a three-layer meta-model architecture.

Vision

Build tools once, apply them everywhere. Create a universal meta-model for program syntax that enables cross-language code analysis, transformation, and tooling.

Metastatic provides the foundation - the MetaAST meta-model and language adapters. Tools that leverage this foundation (mutation testing, purity analysis, complexity metrics) are built separately.

Key Features

Scope

What Metastatic Provides:

What Metastatic Does NOT Provide:

Metastatic is a foundation library that other tools build upon.

Quick Start

CLI Tools

MetASTatic provides command-line tools for cross-language translation, AST inspection, and semantic analysis:

# Cross-language translation
mix metastatic.translate --from python --to elixir hello.py
mix metastatic.translate --from elixir --to python lib/module.ex --output py_output/

# AST inspection (tree format)
mix metastatic.inspect hello.py

# AST inspection (JSON format)
mix metastatic.inspect --format json hello.py

# Filter by layer
mix metastatic.inspect --layer core hello.py

# Extract variables only
mix metastatic.inspect --variables hello.py

# Analyze MetaAST metrics
mix metastatic.analyze hello.py

# Validate with strict mode
mix metastatic.analyze --validate strict hello.py

# Check semantic equivalence
mix metastatic.validate_equivalence hello.py hello.ex

# Show detailed differences
mix metastatic.validate_equivalence --verbose file1.py file2.ex

# Cross-language code duplication detection
mix metastatic.detect_duplicates file1.py file2.ex
mix metastatic.detect_duplicates --dir lib/ --format json

Using Language Adapters

Metastatic currently supports 5 language adapters: Python, Elixir, Erlang, Ruby, and Haskell.

Elixir & Erlang

alias Metastatic.Adapters.{Elixir, Erlang}
alias Metastatic.{Adapter, Document}

# Parse Elixir source code
{:ok, doc} = Adapter.abstract(Elixir, "x + 5", :elixir)
doc.ast  # => {:binary_op, [category: :arithmetic, operator: :+], 
         #      [{:variable, [], "x"}, {:literal, [subtype: :integer], 5}]}

# Parse Erlang source code
{:ok, doc} = Adapter.abstract(Erlang, "X + 5.", :erlang)
doc.ast  # => {:binary_op, [category: :arithmetic, operator: :+], 
         #      [{:variable, [], "X"}, {:literal, [subtype: :integer], 5}]}

# Round-trip transformation
source = "x + y * 2"
{:ok, result} = Adapter.round_trip(Elixir, source)
result == source  # => true

# Convert back to source
{:ok, source} = Adapter.reify(Elixir, doc)

# Cross-language equivalence
elixir_source = "x + 5"
erlang_source = "X + 5."

{:ok, elixir_doc} = Adapter.abstract(Elixir, elixir_source, :elixir)
{:ok, erlang_doc} = Adapter.abstract(Erlang, erlang_source, :erlang)

# Both produce semantically equivalent MetaAST!
# (only variable naming differs: "x" vs "X")

Python

alias Metastatic.Adapters.Python

# Parse Python arithmetic
{:ok, doc} = Adapter.abstract(Python, "x + 5", :python)
doc.ast  # => {:binary_op, [category: :arithmetic, operator: :+], 
         #      [{:variable, [], "x"}, {:literal, [subtype: :integer], 5}]}

# Parse Python class
source = """
class Calculator:
    def __init__(self, value=0):
        self.value = value
    
    def add(self, x):
        self.value += x
        return self
"""
{:ok, doc} = Adapter.abstract(Python, source, :python)
# doc.ast contains {:language_specific, :python, ...} for class definition

Ruby

alias Metastatic.Adapters.Ruby

# Parse Ruby code
{:ok, doc} = Adapter.abstract(Ruby, "x + 5", :ruby)
doc.ast  # => {:binary_op, [category: :arithmetic, operator: :+], 
         #      [{:variable, [], "x"}, {:literal, [subtype: :integer], 5}]}

# Parse Ruby class with method chaining
source = """
class Calculator
  attr_reader :value
  
  def initialize(initial = 0)
    @value = initial
  end
  
  def add(x)
    @value += x
    self
  end
end
"""
{:ok, doc} = Adapter.abstract(Ruby, source, :ruby)
# doc.ast contains {:language_specific, :ruby, ...} for class definition

Haskell

alias Metastatic.Adapters.Haskell

# Parse Haskell arithmetic
{:ok, doc} = Adapter.abstract(Haskell, "x + 5", :haskell)
doc.ast  # => {:binary_op, [category: :arithmetic, operator: :+], 
         #      [{:variable, [], "x"}, {:literal, [subtype: :integer], 5}]}

# Parse Haskell function with type signature
source = """
factorial :: Int -> Int
factorial 0 = 1
factorial n = n * factorial (n - 1)
"""
{:ok, doc} = Adapter.abstract(Haskell, source, :haskell)
# doc.ast contains {:language_specific, :haskell, ...} for type signature and function

# Parse data type definition
source = "data Maybe a = Nothing | Just a"
{:ok, doc} = Adapter.abstract(Haskell, source, :haskell)
# doc.ast contains {:language_specific, :haskell, ...} for algebraic data type

Working with MetaAST Directly

alias Metastatic.{AST, Document, Validator}

# Create a MetaAST document (uniform 3-tuple format)
ast = {:binary_op, [category: :arithmetic, operator: :+], 
       [{:variable, [], "x"}, {:literal, [subtype: :integer], 5}]}
doc = Document.new(ast, :elixir)

# Validate conformance
{:ok, meta} = Validator.validate(doc)
meta.level  # => :core
meta.variables  # => MapSet.new(["x"])

# Extract variables
AST.variables(ast)  # => MapSet.new(["x"])

# Check conformance
AST.conforms?(ast)  # => true

AST Traversal & Manipulation

MetaAST trees need to be walked, searched, and transformed -- for code analysis, refactoring, linting, or building new cross-language tools. Metastatic provides a full set of traversal and manipulation functions that mirror Elixir's Macro module, adapted for the MetaAST 3-tuple format. All are available both on Metastatic.AST (canonical) and as convenience wrappers on the Metastatic module itself.

Why traversal matters

Unlike Elixir's native AST, MetaAST nodes come from many languages. A single traversal API means you write a variable renamer, a dead-code finder, or a complexity counter once and it works on Python, Ruby, Erlang, Haskell, and Elixir code.

Walking the tree

alias Metastatic.AST

{:ok, ast} = Metastatic.quote("x + y * 2", :python)

# Transform-only walk (no accumulator) -- like Macro.postwalk/2
new_ast = Metastatic.postwalk(ast, fn
  {:variable, meta, name} -> {:variable, meta, String.upcase(name)}
  node -> node
end)

# Walk with accumulator -- like Macro.prewalk/3
{_ast, var_names} = Metastatic.prewalk(ast, [], fn
  {:variable, _, name} = node, acc -> {node, [name | acc]}
  node, acc -> {node, acc}
end)
# var_names => ["y", "x"]

# Full pre+post traverse -- like Macro.traverse/4
{_ast, count} = Metastatic.traverse(ast, 0,
  fn node, acc -> {node, acc + 1} end,   # pre
  fn node, acc -> {node, acc} end         # post
)

Lazy enumeration

# Stream all nodes depth-first -- like Macro.prewalker/1
ast |> Metastatic.prewalker() |> Enum.filter(&AST.operator?/1)

# Post-order stream -- like Macro.postwalker/1
ast |> Metastatic.postwalker() |> Enum.count()

Finding nodes

# Path from a matching node up to the root -- like Macro.path/2
path = Metastatic.path(ast, fn
  {:literal, _, 42} -> true
  _ -> false
end)
# => [{:literal, ...42}, {:binary_op, ...}, ...root]

Pipe utilities

# Decompose pipe chains -- like Macro.unpipe/1
steps = Metastatic.unpipe(pipe_ast)
# => [{initial_expr, 0}, {call1, 0}, {call2, 0}]

# Inject an expression into a function call -- like Macro.pipe/3
Metastatic.pipe_into(expr, call_node, 0)

Predicates and inspection

# Is the whole subtree purely literal? -- like Macro.quoted_literal?/1
Metastatic.literal?({:list, [], [{:literal, [subtype: :integer], 1}]})  # => true

# Is it an operator node?
Metastatic.operator?(ast)  # => true for :binary_op / :unary_op

# Human-readable representation -- like Macro.to_string/1
Metastatic.to_string(ast)  # => "x + y * 2"

# Decompose a function call -- like Macro.decompose_call/1
Metastatic.decompose_call(call_node)  # => {"add", [arg1, arg2]}

# Validate structure with diagnostics -- like Macro.validate/1
Metastatic.validate(ast)  # => :ok | {:error, {:invalid_node, ...}}

# Generate a fresh variable for transformations -- like Macro.unique_var/2
Metastatic.unique_var("tmp")  # => {:variable, [], "tmp_42"}

Supplemental Modules

Supplemental modules extend MetaAST with library-specific integrations, enabling cross-language transformations:

alias Metastatic.Supplemental.Transformer

# Transform actor patterns to Python Pykka library calls
ast = {:actor_call, {:variable, "worker"}, "process", [data]}
{:ok, python_ast} = Transformer.transform(ast, :python)
# Result: {:function_call, "worker.ask", [{:literal, :string, "process"}, data]}

# Check what supplementals are available for a language
Transformer.supported_constructs(:python)
# => [:actor_call, :actor_cast, :spawn_actor, :async_await, :async_context, :gather]

# Validate what supplementals a document needs
alias Metastatic.Supplemental.Validator
{:ok, analysis} = Validator.validate(doc)
analysis.required_supplementals  # => [:pykka, :asyncio]

Available supplementals:

See Supplemental Modules for comprehensive guide on using and creating supplementals.

Code Duplication Detection

Detect code clones across same or different programming languages using unified MetaAST representation:

# Detect duplicates (note: requires language adapters, Phase 2+)
mix metastatic.detect_duplicates file1.py file2.ex

# Scan entire directory
mix metastatic.detect_duplicates --dir lib/

# JSON output with custom threshold
mix metastatic.detect_duplicates file1.py file2.ex --format json --threshold 0.85

# Save detailed report
mix metastatic.detect_duplicates --dir lib/ --format detailed --output report.txt
alias Metastatic.{Document, Analysis.Duplication}
alias Metastatic.Analysis.Duplication.Reporter

# Detect duplication between two documents (uniform 3-tuple format)
ast1 = {:binary_op, [category: :arithmetic, operator: :+], 
        [{:variable, [], "x"}, {:literal, [subtype: :integer], 5}]}
ast2 = {:binary_op, [category: :arithmetic, operator: :+], 
        [{:variable, [], "y"}, {:literal, [subtype: :integer], 5}]}
doc1 = Document.new(ast1, :python)
doc2 = Document.new(ast2, :elixir)

{:ok, result} = Duplication.detect(doc1, doc2)

result.duplicate?         # => true
result.clone_type         # => :type_ii (renamed clone)
result.similarity_score   # => 1.0

# Format results
Reporter.format(result, :text)
# "Duplicate detected: Type II (Renamed Clone)
#  Similarity score: 1.0
#  ..."

# Detect across multiple documents
ast3 = {:binary_op, [category: :arithmetic, operator: :+], 
        [{:variable, [], "x"}, {:literal, [subtype: :integer], 5}]}
doc3 = Document.new(ast3, :elixir)

{:ok, groups} = Duplication.detect_in_list([doc1, doc2, doc3])
length(groups)  # => 1 (all three form a clone group)

# Format clone groups
Reporter.format_groups(groups, :detailed)

Clone Types Detected:

Features:

Based on:

Purity Analysis

Analyze code for side effects and functional purity across all supported languages:

# Check if code is pure
mix metastatic.purity_check my_file.py
# Output: PURE or IMPURE: [effects]

# Detailed analysis
mix metastatic.purity_check my_file.ex --format detailed

# JSON output for CI/CD
mix metastatic.purity_check my_file.erl --format json
alias Metastatic.{Document, Analysis.Purity}

# Pure arithmetic (uniform 3-tuple format)
ast = {:binary_op, [category: :arithmetic, operator: :+], 
       [{:variable, [], "x"}, {:literal, [subtype: :integer], 5}]}
doc = Document.new(ast, :python)
{:ok, result} = Purity.analyze(doc)

result.pure?              # => true
result.effects            # => []
result.confidence         # => :high

# Impure with I/O
ast = {:function_call, [name: "print"], [{:literal, [subtype: :string], "hello"}]}
doc = Document.new(ast, :python)
{:ok, result} = Purity.analyze(doc)

result.pure?              # => false
result.effects            # => [:io]
result.summary            # => "Function is impure due to I/O operations"

Detected Effects:

Direct Native AST Input

All analyzers accept native language AST directly as {language, native_ast} tuples for integration with existing tooling:

alias Metastatic.Analysis.Purity

# Python native AST (from Python's ast module)
python_ast = %{"_type" => "Constant", "value" => 42}
{:ok, result} = Purity.analyze({:python, python_ast})
result.pure?  # => true

# Elixir native AST  
elixir_ast = {:+, [], [{:x, [], nil}, 5]}
{:ok, result} = Purity.analyze({:elixir, elixir_ast})
result.pure?  # => true

# Supports all analyzers
alias Metastatic.Analysis.Complexity
{:ok, result} = Complexity.analyze({:python, python_ast})

# Error handling for unsupported languages
{:error, {:unsupported_language, _}} = Purity.analyze({:unsupported, :some_ast})

This enables seamless integration with language-specific parsers and build tools without requiring Document struct creation.

Complexity Analysis

Analyze code complexity with six comprehensive metrics that work uniformly across all supported languages:

# Analyze complexity
mix metastatic.complexity my_file.py

# JSON output
mix metastatic.complexity my_file.ex --format json

# Detailed report with recommendations
mix metastatic.complexity my_file.erl --format detailed

# Custom thresholds
mix metastatic.complexity my_file.py --max-cyclomatic 15 --max-cognitive 20
alias Metastatic.{Document, Analysis.Complexity}

# Analyze all metrics (uniform 3-tuple format)
ast = {:conditional, [], [
  {:variable, [], "x"}, 
  {:conditional, [], [
    {:variable, [], "y"}, 
    {:literal, [subtype: :integer], 1}, 
    {:literal, [subtype: :integer], 2}]},
  {:literal, [subtype: :integer], 3}]}
doc = Document.new(ast, :python)
{:ok, result} = Complexity.analyze(doc)

result.cyclomatic      # => 3 (McCabe complexity)
result.cognitive       # => 3 (with nesting penalties)
result.max_nesting     # => 2
result.halstead.volume # => 45.6 (program volume)
result.loc.logical     # => 2
result.warnings        # => []
result.summary         # => "Code has low complexity"

Available Metrics:

Default Thresholds:

Advanced Analysis Features

Metastatic provides six additional static analysis capabilities that work uniformly across all supported languages:

Dead Code Detection

Identify unreachable code paths and constant conditional branches:

# Detect dead code
mix metastatic.dead_code my_file.py

# JSON output
mix metastatic.dead_code my_file.ex --format json

# Filter by confidence level
mix metastatic.dead_code my_file.rb --confidence high
alias Metastatic.{Document, Analysis.DeadCode}

# Code after return statement (uniform 3-tuple format)
ast = {:block, [], [
  {:early_return, [], [{:literal, [subtype: :integer], 42}]},
  {:function_call, [name: "print"], [{:literal, [subtype: :string], "unreachable"}]}
]}
doc = Document.new(ast, :python)
{:ok, result} = DeadCode.analyze(doc)

result.has_dead_code?  # => true
result.issues          # => [{:code_after_return, :high, ...}]
result.summary         # => "1 dead code issue detected"

Detects:

Unused Variables

Track variable definitions and usage with scope-aware analysis:

# Find unused variables
mix metastatic.unused_vars my_file.py

# Ignore underscore-prefixed variables
mix metastatic.unused_vars my_file.ex --ignore-underscore

# JSON output
mix metastatic.unused_vars my_file.erl --format json
alias Metastatic.Analysis.UnusedVariables

ast = {:block, [], [
  {:assignment, [], [{:variable, [], "x"}, {:literal, [subtype: :integer], 5}]},
  {:assignment, [], [{:variable, [], "y"}, {:literal, [subtype: :integer], 10}]},
  {:variable, [], "y"}
]}
doc = Document.new(ast, :python)
{:ok, result} = UnusedVariables.analyze(doc)

result.has_unused?     # => true
result.unused          # => MapSet.new(["x"])
result.summary         # => "1 unused variable: x"

Features:

Control Flow Graph

Generate control flow graphs with multiple export formats:

# Generate CFG in DOT format (for Graphviz)
mix metastatic.control_flow my_file.py --format dot

# Generate D3.js JSON for interactive visualization
mix metastatic.control_flow my_file.ex --format d3 --output cfg.json

# Text representation
mix metastatic.control_flow my_file.rb --format text
alias Metastatic.Analysis.ControlFlow

ast = {:conditional, [], [
  {:variable, [], "x"}, 
  {:early_return, [], [{:literal, [subtype: :integer], 1}]},
  {:literal, [subtype: :integer], 2}
]}
doc = Document.new(ast, :python)
{:ok, result} = ControlFlow.analyze(doc)

result.node_count      # => 5
result.edge_count      # => 4
result.has_cycles?     # => false
result.to_dot()        # => "digraph CFG { ... }"
result.to_d3_json()    # => %{nodes: [...], links: [...]}

Export Formats:

Features:

Taint Analysis

Track data flow from untrusted sources to sensitive operations:

# Check for taint vulnerabilities
mix metastatic.taint_check my_file.py

# JSON output
mix metastatic.taint_check my_file.ex --format json
alias Metastatic.Analysis.Taint

# Dangerous pattern: eval(input())
ast = {:function_call, [name: "eval"], [
  {:function_call, [name: "input"], []}
]}
doc = Document.new(ast, :python)
{:ok, result} = Taint.analyze(doc)

result.has_vulnerabilities?  # => true
result.vulnerabilities       # => [{:code_injection, ...}]
result.summary               # => "1 taint vulnerability detected"

Detects:

Note: Current implementation detects direct flows. Variable tracking and interprocedural analysis planned for future releases.

Security Vulnerability Detection

Pattern-based security scanning with CWE identifiers:

# Scan for security issues
mix metastatic.security_scan my_file.py

# JSON output with CWE details
mix metastatic.security_scan my_file.ex --format json
alias Metastatic.Analysis.Security

# Hardcoded password
ast = {:assignment, [], [{:variable, [], "password"}, {:literal, [subtype: :string], "admin123"}]}
doc = Document.new(ast, :python)
{:ok, result} = Security.analyze(doc)

result.has_vulnerabilities?  # => true
result.vulnerabilities[0].type       # => :hardcoded_secret
result.vulnerabilities[0].severity   # => :high
result.vulnerabilities[0].cwe        # => "CWE-798"

Vulnerability Categories:

Severity Levels: Critical, High, Medium, Low

Code Smell Detection

Identify maintainability issues and anti-patterns:

# Detect code smells
mix metastatic.code_smells my_file.py

# Detailed report
mix metastatic.code_smells my_file.ex --format detailed

# JSON output
mix metastatic.code_smells my_file.rb --format json
alias Metastatic.Analysis.Smells

# Long function with deep nesting
ast = {:block, [
  # ... 25+ statements with nesting depth 6
]}
doc = Document.new(ast, :python)
{:ok, result} = Smells.analyze(doc)

result.has_smells?     # => true
result.smells          # => [:long_function, :deep_nesting]
result.severity        # => :high

Detected Smells:

Integration: Leverages existing complexity metrics for detection.

Business Logic Analyzers

Detect universal anti-patterns and code quality issues across all supported languages using 20 language-agnostic analyzers:

# Run specific analyzer
mix metastatic.analyze --analyzer callback_hell my_file.py

# Run all business logic analyzers
mix metastatic.analyze --business-logic my_file.ex

# JSON output for CI/CD
mix metastatic.analyze --format json my_file.rb
alias Metastatic.Analysis.Runner
alias Metastatic.Document

# Run all analyzers (uniform 3-tuple format)
ast = {:conditional, [], [
  {:variable, [], "x"},
  {:conditional, [], [
    {:variable, [], "y"}, 
    {:conditional, [], [
      {:variable, [], "z"}, 
      {:literal, [subtype: :integer], 1}, 
      {:literal, [subtype: :integer], 2}]},
    {:literal, [subtype: :integer], 3}]},
  {:literal, [subtype: :integer], 4}]}
doc = Document.new(ast, :python)

{:ok, issues} = Runner.run(doc)
# Returns issues from all enabled analyzers

# Run specific analyzers
config = %{analyzers: [:callback_hell, :missing_error_handling]}
{:ok, issues} = Runner.run(doc, config)

Available Analyzers (20 total):

Tier 1 - Pure MetaAST (9 analyzers):

Tier 2 - Function Name Heuristics (4 analyzers):

Tier 3 - Naming Conventions (4 analyzers):

Tier 4 - Content Analysis (3 analyzers):

Key Features:

Cross-Language Detection: These analyzers demonstrate that many "language-specific" patterns are actually universal anti-patterns:

Documentation

Architecture

Three-Layer MetaAST

Layer 1: Core (M2.1) - Universal concepts (ALL languages)
Common constructs: literals, variables, operators, conditionals, function calls, assignments

Layer 2: Extended (M2.2) - Common patterns (MOST languages)
Control flow: loops, lambdas, collection operations, pattern matching, exception handling

Layer 2s: Structural/Organizational (M2.2s) - Top-level constructs (MOST languages)
Organizational: containers (modules/classes/namespaces), function definitions, properties, attribute access, augmented assignments

Layer 3: Native (M2.3) - Language-specific escape hatches
Language-specific: lifetimes, async models, advanced type systems, metaprogramming

Examples

Shopping Cart Example

A comprehensive real-world example demonstrating metastatic's capabilities using an e-commerce shopping cart:

# From project root
mix compile

# Run interactive demo
elixir examples/shopping_cart/demo.exs

# Visualize MetaAST tree structures
elixir examples/shopping_cart/visualize_ast.exs

What you'll learn:

Files:

See examples/README.md for more details.

Use Cases

Foundation for Cross-Language Tools

Metastatic provides the MetaAST foundation that other tools build upon:

# Mutation testing (in muex library, NYI)
Muex.mutate_file("src/calculator.py", :python)
Muex.mutate_file("src/calculator.js", :javascript)
# Both use Metastatic's MetaAST under the hood!

Cross-Language Code Transformation

Transform code between languages (for supported constructs):

# Parse Python
{:ok, doc} = Metastatic.Builder.from_source(python_source, :python)

# Transform to Elixir (with supplemental modules for unsupported constructs)
{:ok, elixir_source} = Metastatic.Builder.to_source(doc, :elixir)

Semantic Equivalence Validation

Verify that code across languages has identical semantics:

{:ok, py_doc} = Metastatic.Builder.from_source("x + 5", :python)
{:ok, ex_doc} = Metastatic.Builder.from_source("x + 5", :elixir)

py_doc.ast == ex_doc.ast  # => true (same MetaAST)

AST Analysis Infrastructure

Build language-agnostic analysis tools:

# Extract all variables from any supported language
{:ok, doc} = Metastatic.Builder.from_source(source, language)
variables = Metastatic.AST.variables(doc.ast)

Contributing

This project is currently in the research/foundation phase. Contributions welcome!

Research Background

Metastatic is inspired by research from:

Credits

Created as part of the Oeditus code quality tooling ecosystem.

Research synthesis from muex and propwise multi-language analysis projects.

Installation

def deps do
  [
    {:metastatic, "~> 0.12"}
  ]
end

Documentation.