Edifice

A comprehensive ML architecture library for Elixir, built on Nx and Axon.

92 neural network architectures across 16 families — from MLPs to Mamba, transformers to graph networks, VAEs to spiking neurons.

Why Edifice?

The Elixir ML ecosystem has excellent numerical computing (Nx) and model building (Axon) foundations, but no comprehensive collection of ready-to-use architectures. Edifice fills that gap:

Installation

Add edifice to your dependencies in mix.exs:

def deps do
  [
    {:edifice, "~> 0.1.0"}
  ]
end

Edifice requires Nx ~> 0.10 and Axon ~> 0.8. For GPU acceleration, add EXLA:

{:exla, "~> 0.10"}

Quick Start

# Build any architecture by name
model = Edifice.build(:mamba, embed_size: 256, hidden_size: 512, num_layers: 4)

# Or use the module directly for more control
model = Edifice.SSM.Mamba.build(
  embed_size: 256,
  hidden_size: 512,
  state_size: 16,
  num_layers: 4,
  window_size: 60
)

# Build and run
{init_fn, predict_fn} = Axon.build(model)
params = init_fn.(Nx.template({1, 60, 256}, :f32), Axon.ModelState.empty())
output = predict_fn.(params, input)

# Explore what's available
Edifice.list_architectures()
# => [:attention, :bayesian, :capsule, :deep_sets, :densenet, :diffusion, ...]

Edifice.list_families()
# => %{ssm: [:mamba, :mamba_ssd, :s5, ...], attention: [:attention, :retnet, ...], ...}

Architecture Families

Feedforward

Architecture Module Key Feature
MLPEdifice.Feedforward.MLP Multi-layer perceptron with configurable hidden sizes
KANEdifice.Feedforward.KAN Kolmogorov-Arnold Networks, learnable activation functions
TabNetEdifice.Feedforward.TabNet Attentive feature selection for tabular data

State Space Models

Architecture Module Key Feature
S4Edifice.SSM.S4 HiPPO DPLR initialization, long-range memory
S4DEdifice.SSM.S4D Diagonal state space, simplified S4
S5Edifice.SSM.S5 MIMO diagonal SSM with D skip connection
H3Edifice.SSM.H3 Two SSMs with multiplicative gating + short convolution
HyenaEdifice.SSM.Hyena Long convolution hierarchy, implicit filters
MambaEdifice.SSM.Mamba Selective SSM, parallel associative scan
Mamba-2 (SSD)Edifice.SSM.MambaSSD Structured state space duality, chunk-wise matmul
Mamba (Cumsum)Edifice.SSM.MambaCumsum Mamba with configurable scan algorithm
Mamba (Hillis-Steele)Edifice.SSM.MambaHillisSteele Mamba with max-parallelism scan
BiMambaEdifice.SSM.BiMamba Bidirectional Mamba for non-causal tasks
GatedSSMEdifice.SSM.GatedSSM Gated temporal with gradient checkpointing
JambaEdifice.SSM.Hybrid Mamba + Attention hybrid (configurable ratio)
ZambaEdifice.SSM.Zamba Mamba + single shared attention layer

Attention & Linear Attention

Architecture Module Key Feature
Multi-Head AttentionEdifice.Attention.MultiHead Sliding window, QK LayerNorm
GQAEdifice.Attention.GQA Grouped Query Attention, fewer KV heads
PerceiverEdifice.Attention.Perceiver Cross-attention to learned latents, input-agnostic
FNetEdifice.Attention.FNet Fourier Transform replacing attention
Linear TransformerEdifice.Attention.LinearTransformer Kernel-based O(N) attention
NystromformerEdifice.Attention.Nystromformer Nystrom approximation of attention matrix
PerformerEdifice.Attention.Performer FAVOR+ random feature attention
RetNetEdifice.Attention.RetNet Multi-scale retention, O(1) recurrent inference
RWKV-7Edifice.Attention.RWKV Linear attention, O(1) space, "Goose" architecture
GLAEdifice.Attention.GLA Gated Linear Attention with data-dependent decay
HGRN-2Edifice.Attention.HGRN Hierarchically gated linear RNN, state expansion
Griffin/HawkEdifice.Attention.Griffin RG-LRU + local attention (Griffin) or pure RG-LRU (Hawk)

Recurrent Networks

Architecture Module Key Feature
LSTM/GRUEdifice.Recurrent Classic recurrent with multi-layer stacking
xLSTMEdifice.Recurrent.XLSTM Exponential gating, matrix memory (sLSTM/mLSTM)
MinGRUEdifice.Recurrent.MinGRU Minimal GRU, parallel-scannable
MinLSTMEdifice.Recurrent.MinLSTM Minimal LSTM, parallel-scannable
DeltaNetEdifice.Recurrent.DeltaNet Delta rule-based linear RNN
TTTEdifice.Recurrent.TTT Test-Time Training, self-supervised at inference
TitansEdifice.Recurrent.Titans Neural long-term memory, surprise-gated
ReservoirEdifice.Recurrent.Reservoir Echo State Networks with fixed random reservoir

Vision

Architecture Module Key Feature
ViTEdifice.Vision.ViT Vision Transformer, patch embedding
DeiTEdifice.Vision.DeiT Data-efficient ViT with distillation token
SwinEdifice.Vision.SwinTransformer Shifted window attention, hierarchical features
U-NetEdifice.Vision.UNet Encoder-decoder with skip connections
ConvNeXtEdifice.Vision.ConvNeXt Modernized ConvNet with transformer-inspired design
MLP-MixerEdifice.Vision.MLPMixer Pure MLP with token/channel mixing

Convolutional

Architecture Module Key Feature
Conv1D/2DEdifice.Convolutional.Conv Configurable convolution blocks with BN, activation, dropout
ResNetEdifice.Convolutional.ResNet Residual/bottleneck blocks, configurable depth
DenseNetEdifice.Convolutional.DenseNet Dense connections, feature reuse
TCNEdifice.Convolutional.TCN Dilated causal convolutions for sequences
MobileNetEdifice.Convolutional.MobileNet Depthwise separable convolutions
EfficientNetEdifice.Convolutional.EfficientNet Compound scaling (depth, width, resolution)

Generative Models

Architecture Module Key Feature
VAEEdifice.Generative.VAE Reparameterization trick, KL divergence, beta-VAE
VQ-VAEEdifice.Generative.VQVAE Discrete codebook, straight-through estimator
GANEdifice.Generative.GAN Generator/discriminator, WGAN-GP support
Diffusion (DDPM)Edifice.Generative.Diffusion Denoising diffusion, sinusoidal time embedding
DDIMEdifice.Generative.DDIM Deterministic diffusion sampling, fast inference
DiTEdifice.Generative.DiT Diffusion Transformer, AdaLN-Zero conditioning
Latent DiffusionEdifice.Generative.LatentDiffusion Diffusion in compressed latent space
Consistency ModelEdifice.Generative.ConsistencyModel Single-step generation via consistency training
Score SDEEdifice.Generative.ScoreSDE Continuous SDE framework (VP-SDE, VE-SDE)
Flow MatchingEdifice.Generative.FlowMatching ODE-based generation, multiple loss variants
Normalizing FlowEdifice.Generative.NormalizingFlow Affine coupling layers (RealNVP-style)

Contrastive & Self-Supervised

Architecture Module Key Feature
SimCLREdifice.Contrastive.SimCLR NT-Xent contrastive loss, projection head
BYOLEdifice.Contrastive.BYOL No negatives, momentum encoder
Barlow TwinsEdifice.Contrastive.BarlowTwins Cross-correlation redundancy reduction
MAEEdifice.Contrastive.MAE Masked Autoencoder, 75% patch masking
VICRegEdifice.Contrastive.VICReg Variance-Invariance-Covariance regularization

Graph & Set Networks

Architecture Module Key Feature
GCNEdifice.Graph.GCN Spectral graph convolutions (Kipf & Welling)
GATEdifice.Graph.GAT Graph attention with multi-head support
GINEdifice.Graph.GIN Graph Isomorphism Network, maximally expressive
GraphSAGEEdifice.Graph.GraphSAGE Inductive learning, neighborhood sampling
Graph TransformerEdifice.Graph.GraphTransformer Full attention over nodes with edge features
PNAEdifice.Graph.PNA Principal Neighbourhood Aggregation
SchNetEdifice.Graph.SchNet Continuous-filter convolutions for molecules
DeepSetsEdifice.Sets.DeepSets Permutation-invariant set functions
PointNetEdifice.Sets.PointNet Point cloud processing with T-Net alignment

Energy, Probabilistic & Memory

Architecture Module Key Feature
EBMEdifice.Energy.EBM Energy-based models, contrastive divergence
HopfieldEdifice.Energy.Hopfield Modern continuous Hopfield networks
Neural ODEEdifice.Energy.NeuralODE Continuous-depth networks via ODE solvers
Bayesian NNEdifice.Probabilistic.Bayesian Weight uncertainty, variational inference
MC DropoutEdifice.Probabilistic.MCDropout Uncertainty estimation via dropout at inference
Evidential NNEdifice.Probabilistic.EvidentialNN Dirichlet priors for uncertainty
NTMEdifice.Memory.NTM Neural Turing Machine, differentiable memory
Memory NetworkEdifice.Memory.MemoryNetwork End-to-end memory with multi-hop attention

Meta-Learning & Specialized

Architecture Module Key Feature
MoEEdifice.Meta.MoE Mixture of Experts with top-k/hash routing
Switch MoEEdifice.Meta.SwitchMoE Top-1 routing with load balancing
Soft MoEEdifice.Meta.SoftMoE Fully differentiable soft token routing
LoRAEdifice.Meta.LoRA Low-Rank Adaptation for parameter-efficient fine-tuning
AdapterEdifice.Meta.Adapter Bottleneck adapter modules for transfer learning
HypernetworkEdifice.Meta.Hypernetwork Networks that generate other networks' weights
CapsuleEdifice.Meta.Capsule Dynamic routing between capsules
Liquid NNEdifice.Liquid Continuous-time ODE dynamics (LTC cells)
SNNEdifice.Neuromorphic.SNN Leaky integrate-and-fire, surrogate gradients
ANN2SNNEdifice.Neuromorphic.ANN2SNN Convert trained ANNs to spiking networks

Building Blocks

Block Module Key Feature
RMSNormEdifice.Blocks.RMSNorm Root Mean Square normalization
SwiGLUEdifice.Blocks.SwiGLU Gated FFN with SiLU activation
RoPEEdifice.Blocks.RoPE Rotary position embedding
ALiBiEdifice.Blocks.ALiBi Attention with linear biases
Patch EmbedEdifice.Blocks.PatchEmbed Image-to-patch tokenization
Sinusoidal PEEdifice.Blocks.SinusoidalPE Fixed sinusoidal position encoding
Adaptive NormEdifice.Blocks.AdaptiveNorm Condition-dependent normalization (AdaLN)
Cross AttentionEdifice.Blocks.CrossAttention Cross-attention between two sequences
Conv1D/2DEdifice.Convolutional.Conv Configurable convolution blocks
FFNEdifice.Blocks.FFN Standard and gated feed-forward networks
Message PassingEdifice.Graph.MessagePassing Generic MPNN framework, global pooling

Guides

New to ML?

Start here if you're new to machine learning. These guides build from zero to fluency with Edifice's API and architecture families.

  1. ML Foundations — What neural networks are, how they learn, tensors and shapes
  2. Core Vocabulary — Essential terminology used across all guides
  3. The Problem Landscape — Classification, generation, sequence modeling — which architectures solve which problems
  4. Reading Edifice — The build/init/predict pattern, Axon graphs, shapes, and runnable examples
  5. Learning Path — A guided tour through the 19 architecture families

Architecture Guides

Conceptual guides covering theory, architecture evolution, and decision tables for each family.

Sequence Processing

Representation Learning

Generative & Dynamic

Composition & Enhancement

Examples

See examples/ for runnable scripts including mlp_basics.exs, sequence_comparison.exs, graph_classification.exs, vae_generation.exs, and architecture_tour.exs.

Mamba for Sequence Modeling

model = Edifice.SSM.Mamba.build(
  embed_size: 128,
  hidden_size: 256,
  state_size: 16,
  num_layers: 4,
  window_size: 100
)

{init_fn, predict_fn} = Axon.build(model)
params = init_fn.(Nx.template({1, 100, 128}, :f32), Axon.ModelState.empty())
output = predict_fn.(params, Nx.broadcast(0.5, {1, 100, 128}))
# => {1, 256}

Graph Classification with GCN

model = Edifice.Graph.GCN.build_classifier(
  input_dim: 16,
  hidden_dims: [64, 64],
  num_classes: 2,
  pool: :mean
)

{init_fn, predict_fn} = Axon.build(model)

params = init_fn.(
  %{
    "nodes" => Nx.template({4, 10, 16}, :f32),
    "adjacency" => Nx.template({4, 10, 10}, :f32)
  },
  Axon.ModelState.empty()
)

output = predict_fn.(params, %{
  "nodes" => Nx.broadcast(0.5, {4, 10, 16}),
  "adjacency" => Nx.eye(10) |> Nx.broadcast({4, 10, 10})
})
# => {4, 2}

VAE with Reparameterization

{encoder, decoder} = Edifice.Generative.VAE.build(
  input_size: 784,
  latent_size: 32,
  encoder_sizes: [512, 256],
  decoder_sizes: [256, 512]
)

# Encoder outputs mu and log_var
{init_fn, predict_fn} = Axon.build(encoder)
params = init_fn.(Nx.template({1, 784}, :f32), Axon.ModelState.empty())
%{mu: mu, log_var: log_var} = predict_fn.(params, Nx.broadcast(0.5, {1, 784}))

# Sample latent vector (requires PRNG key for stochastic sampling)
key = Nx.Random.key(42)
{z, _new_key} = Edifice.Generative.VAE.reparameterize(mu, log_var, key)

# KL divergence for training
kl_loss = Edifice.Generative.VAE.kl_divergence(mu, log_var)

Permutation-Invariant Set Processing

model = Edifice.Sets.DeepSets.build(
  input_dim: 3,
  hidden_dim: 64,
  output_dim: 10,
  pool: :mean
)

{init_fn, predict_fn} = Axon.build(model)
params = init_fn.(Nx.template({4, 20, 3}, :f32), Axon.ModelState.empty())
# Process sets of 20 3D points
output = predict_fn.(params, Nx.broadcast(0.5, {4, 20, 3}))
# => {4, 10}

API Design

Every architecture module follows the same pattern:

# Module.build(opts) returns an Axon model
model = Edifice.SSM.Mamba.build(embed_size: 256, hidden_size: 512)

# Some modules expose layer-level builders for composition
layer = Edifice.Graph.GCN.gcn_layer(nodes, adjacency, output_dim)

# Generative models may return tuples
{encoder, decoder} = Edifice.Generative.VAE.build(input_size: 784)

# Utility functions for training
loss = Edifice.Generative.VAE.loss(reconstruction, target, mu, log_var)
energy = Edifice.Energy.Hopfield.energy(query, patterns, beta)

The unified registry lets you build any architecture by name:

# Useful for hyperparameter search, config-driven experiments
for arch <- [:mamba, :retnet, :griffin, :gla] do
  model = Edifice.build(arch, embed_size: 256, hidden_size: 512, num_layers: 4)
  # ... train and evaluate
end

Requirements

License

MIT License. See LICENSE for details.