Edifice
A comprehensive ML architecture library for Elixir, built on Nx and Axon.
92 neural network architectures across 16 families — from MLPs to Mamba, transformers to graph networks, VAEs to spiking neurons.
Why Edifice?
The Elixir ML ecosystem has excellent numerical computing (Nx) and model building (Axon) foundations, but no comprehensive collection of ready-to-use architectures. Edifice fills that gap:
- One dependency for all major architecture families
- Consistent API — every architecture follows
Module.build(opts) returning an Axon model - Unified registry —
Edifice.build(:mamba, opts) discovers and builds any architecture by name - Pure Elixir — no Python, no ONNX imports, just Nx/Axon all the way down
- GPU-ready — works with EXLA/CUDA out of the box
Installation
Add edifice to your dependencies in mix.exs:
def deps do
[
{:edifice, "~> 0.1.0"}
]
end
Edifice requires Nx ~> 0.10 and Axon ~> 0.8. For GPU acceleration, add EXLA:
{:exla, "~> 0.10"}
Quick Start
# Build any architecture by name
model = Edifice.build(:mamba, embed_size: 256, hidden_size: 512, num_layers: 4)
# Or use the module directly for more control
model = Edifice.SSM.Mamba.build(
embed_size: 256,
hidden_size: 512,
state_size: 16,
num_layers: 4,
window_size: 60
)
# Build and run
{init_fn, predict_fn} = Axon.build(model)
params = init_fn.(Nx.template({1, 60, 256}, :f32), Axon.ModelState.empty())
output = predict_fn.(params, input)
# Explore what's available
Edifice.list_architectures()
# => [:attention, :bayesian, :capsule, :deep_sets, :densenet, :diffusion, ...]
Edifice.list_families()
# => %{ssm: [:mamba, :mamba_ssd, :s5, ...], attention: [:attention, :retnet, ...], ...}
Architecture Families
Feedforward
|
Architecture |
Module |
Key Feature |
|---|
| MLP | Edifice.Feedforward.MLP |
Multi-layer perceptron with configurable hidden sizes |
| KAN | Edifice.Feedforward.KAN |
Kolmogorov-Arnold Networks, learnable activation functions |
| TabNet | Edifice.Feedforward.TabNet |
Attentive feature selection for tabular data |
State Space Models
|
Architecture |
Module |
Key Feature |
|---|
| S4 | Edifice.SSM.S4 |
HiPPO DPLR initialization, long-range memory |
| S4D | Edifice.SSM.S4D |
Diagonal state space, simplified S4 |
| S5 | Edifice.SSM.S5 |
MIMO diagonal SSM with D skip connection |
| H3 | Edifice.SSM.H3 |
Two SSMs with multiplicative gating + short convolution |
| Hyena | Edifice.SSM.Hyena |
Long convolution hierarchy, implicit filters |
| Mamba | Edifice.SSM.Mamba |
Selective SSM, parallel associative scan |
| Mamba-2 (SSD) | Edifice.SSM.MambaSSD |
Structured state space duality, chunk-wise matmul |
| Mamba (Cumsum) | Edifice.SSM.MambaCumsum |
Mamba with configurable scan algorithm |
| Mamba (Hillis-Steele) | Edifice.SSM.MambaHillisSteele |
Mamba with max-parallelism scan |
| BiMamba | Edifice.SSM.BiMamba |
Bidirectional Mamba for non-causal tasks |
| GatedSSM | Edifice.SSM.GatedSSM |
Gated temporal with gradient checkpointing |
| Jamba | Edifice.SSM.Hybrid |
Mamba + Attention hybrid (configurable ratio) |
| Zamba | Edifice.SSM.Zamba |
Mamba + single shared attention layer |
Attention & Linear Attention
|
Architecture |
Module |
Key Feature |
|---|
| Multi-Head Attention | Edifice.Attention.MultiHead |
Sliding window, QK LayerNorm |
| GQA | Edifice.Attention.GQA |
Grouped Query Attention, fewer KV heads |
| Perceiver | Edifice.Attention.Perceiver |
Cross-attention to learned latents, input-agnostic |
| FNet | Edifice.Attention.FNet |
Fourier Transform replacing attention |
| Linear Transformer | Edifice.Attention.LinearTransformer |
Kernel-based O(N) attention |
| Nystromformer | Edifice.Attention.Nystromformer |
Nystrom approximation of attention matrix |
| Performer | Edifice.Attention.Performer |
FAVOR+ random feature attention |
| RetNet | Edifice.Attention.RetNet |
Multi-scale retention, O(1) recurrent inference |
| RWKV-7 | Edifice.Attention.RWKV |
Linear attention, O(1) space, "Goose" architecture |
| GLA | Edifice.Attention.GLA |
Gated Linear Attention with data-dependent decay |
| HGRN-2 | Edifice.Attention.HGRN |
Hierarchically gated linear RNN, state expansion |
| Griffin/Hawk | Edifice.Attention.Griffin |
RG-LRU + local attention (Griffin) or pure RG-LRU (Hawk) |
Recurrent Networks
|
Architecture |
Module |
Key Feature |
|---|
| LSTM/GRU | Edifice.Recurrent |
Classic recurrent with multi-layer stacking |
| xLSTM | Edifice.Recurrent.XLSTM |
Exponential gating, matrix memory (sLSTM/mLSTM) |
| MinGRU | Edifice.Recurrent.MinGRU |
Minimal GRU, parallel-scannable |
| MinLSTM | Edifice.Recurrent.MinLSTM |
Minimal LSTM, parallel-scannable |
| DeltaNet | Edifice.Recurrent.DeltaNet |
Delta rule-based linear RNN |
| TTT | Edifice.Recurrent.TTT |
Test-Time Training, self-supervised at inference |
| Titans | Edifice.Recurrent.Titans |
Neural long-term memory, surprise-gated |
| Reservoir | Edifice.Recurrent.Reservoir |
Echo State Networks with fixed random reservoir |
Vision
|
Architecture |
Module |
Key Feature |
|---|
| ViT | Edifice.Vision.ViT |
Vision Transformer, patch embedding |
| DeiT | Edifice.Vision.DeiT |
Data-efficient ViT with distillation token |
| Swin | Edifice.Vision.SwinTransformer |
Shifted window attention, hierarchical features |
| U-Net | Edifice.Vision.UNet |
Encoder-decoder with skip connections |
| ConvNeXt | Edifice.Vision.ConvNeXt |
Modernized ConvNet with transformer-inspired design |
| MLP-Mixer | Edifice.Vision.MLPMixer |
Pure MLP with token/channel mixing |
Convolutional
|
Architecture |
Module |
Key Feature |
|---|
| Conv1D/2D | Edifice.Convolutional.Conv |
Configurable convolution blocks with BN, activation, dropout |
| ResNet | Edifice.Convolutional.ResNet |
Residual/bottleneck blocks, configurable depth |
| DenseNet | Edifice.Convolutional.DenseNet |
Dense connections, feature reuse |
| TCN | Edifice.Convolutional.TCN |
Dilated causal convolutions for sequences |
| MobileNet | Edifice.Convolutional.MobileNet |
Depthwise separable convolutions |
| EfficientNet | Edifice.Convolutional.EfficientNet |
Compound scaling (depth, width, resolution) |
Generative Models
|
Architecture |
Module |
Key Feature |
|---|
| VAE | Edifice.Generative.VAE |
Reparameterization trick, KL divergence, beta-VAE |
| VQ-VAE | Edifice.Generative.VQVAE |
Discrete codebook, straight-through estimator |
| GAN | Edifice.Generative.GAN |
Generator/discriminator, WGAN-GP support |
| Diffusion (DDPM) | Edifice.Generative.Diffusion |
Denoising diffusion, sinusoidal time embedding |
| DDIM | Edifice.Generative.DDIM |
Deterministic diffusion sampling, fast inference |
| DiT | Edifice.Generative.DiT |
Diffusion Transformer, AdaLN-Zero conditioning |
| Latent Diffusion | Edifice.Generative.LatentDiffusion |
Diffusion in compressed latent space |
| Consistency Model | Edifice.Generative.ConsistencyModel |
Single-step generation via consistency training |
| Score SDE | Edifice.Generative.ScoreSDE |
Continuous SDE framework (VP-SDE, VE-SDE) |
| Flow Matching | Edifice.Generative.FlowMatching |
ODE-based generation, multiple loss variants |
| Normalizing Flow | Edifice.Generative.NormalizingFlow |
Affine coupling layers (RealNVP-style) |
Contrastive & Self-Supervised
|
Architecture |
Module |
Key Feature |
|---|
| SimCLR | Edifice.Contrastive.SimCLR |
NT-Xent contrastive loss, projection head |
| BYOL | Edifice.Contrastive.BYOL |
No negatives, momentum encoder |
| Barlow Twins | Edifice.Contrastive.BarlowTwins |
Cross-correlation redundancy reduction |
| MAE | Edifice.Contrastive.MAE |
Masked Autoencoder, 75% patch masking |
| VICReg | Edifice.Contrastive.VICReg |
Variance-Invariance-Covariance regularization |
Graph & Set Networks
|
Architecture |
Module |
Key Feature |
|---|
| GCN | Edifice.Graph.GCN |
Spectral graph convolutions (Kipf & Welling) |
| GAT | Edifice.Graph.GAT |
Graph attention with multi-head support |
| GIN | Edifice.Graph.GIN |
Graph Isomorphism Network, maximally expressive |
| GraphSAGE | Edifice.Graph.GraphSAGE |
Inductive learning, neighborhood sampling |
| Graph Transformer | Edifice.Graph.GraphTransformer |
Full attention over nodes with edge features |
| PNA | Edifice.Graph.PNA |
Principal Neighbourhood Aggregation |
| SchNet | Edifice.Graph.SchNet |
Continuous-filter convolutions for molecules |
| DeepSets | Edifice.Sets.DeepSets |
Permutation-invariant set functions |
| PointNet | Edifice.Sets.PointNet |
Point cloud processing with T-Net alignment |
Energy, Probabilistic & Memory
|
Architecture |
Module |
Key Feature |
|---|
| EBM | Edifice.Energy.EBM |
Energy-based models, contrastive divergence |
| Hopfield | Edifice.Energy.Hopfield |
Modern continuous Hopfield networks |
| Neural ODE | Edifice.Energy.NeuralODE |
Continuous-depth networks via ODE solvers |
| Bayesian NN | Edifice.Probabilistic.Bayesian |
Weight uncertainty, variational inference |
| MC Dropout | Edifice.Probabilistic.MCDropout |
Uncertainty estimation via dropout at inference |
| Evidential NN | Edifice.Probabilistic.EvidentialNN |
Dirichlet priors for uncertainty |
| NTM | Edifice.Memory.NTM |
Neural Turing Machine, differentiable memory |
| Memory Network | Edifice.Memory.MemoryNetwork |
End-to-end memory with multi-hop attention |
Meta-Learning & Specialized
|
Architecture |
Module |
Key Feature |
|---|
| MoE | Edifice.Meta.MoE |
Mixture of Experts with top-k/hash routing |
| Switch MoE | Edifice.Meta.SwitchMoE |
Top-1 routing with load balancing |
| Soft MoE | Edifice.Meta.SoftMoE |
Fully differentiable soft token routing |
| LoRA | Edifice.Meta.LoRA |
Low-Rank Adaptation for parameter-efficient fine-tuning |
| Adapter | Edifice.Meta.Adapter |
Bottleneck adapter modules for transfer learning |
| Hypernetwork | Edifice.Meta.Hypernetwork |
Networks that generate other networks' weights |
| Capsule | Edifice.Meta.Capsule |
Dynamic routing between capsules |
| Liquid NN | Edifice.Liquid |
Continuous-time ODE dynamics (LTC cells) |
| SNN | Edifice.Neuromorphic.SNN |
Leaky integrate-and-fire, surrogate gradients |
| ANN2SNN | Edifice.Neuromorphic.ANN2SNN |
Convert trained ANNs to spiking networks |
Building Blocks
|
Block |
Module |
Key Feature |
|---|
| RMSNorm | Edifice.Blocks.RMSNorm |
Root Mean Square normalization |
| SwiGLU | Edifice.Blocks.SwiGLU |
Gated FFN with SiLU activation |
| RoPE | Edifice.Blocks.RoPE |
Rotary position embedding |
| ALiBi | Edifice.Blocks.ALiBi |
Attention with linear biases |
| Patch Embed | Edifice.Blocks.PatchEmbed |
Image-to-patch tokenization |
| Sinusoidal PE | Edifice.Blocks.SinusoidalPE |
Fixed sinusoidal position encoding |
| Adaptive Norm | Edifice.Blocks.AdaptiveNorm |
Condition-dependent normalization (AdaLN) |
| Cross Attention | Edifice.Blocks.CrossAttention |
Cross-attention between two sequences |
| Conv1D/2D | Edifice.Convolutional.Conv |
Configurable convolution blocks |
| FFN | Edifice.Blocks.FFN |
Standard and gated feed-forward networks |
| Message Passing | Edifice.Graph.MessagePassing |
Generic MPNN framework, global pooling |
Guides
New to ML?
Start here if you're new to machine learning. These guides build from zero to fluency with Edifice's API and architecture families.
- ML Foundations — What neural networks are, how they learn, tensors and shapes
- Core Vocabulary — Essential terminology used across all guides
- The Problem Landscape — Classification, generation, sequence modeling — which architectures solve which problems
- Reading Edifice — The build/init/predict pattern, Axon graphs, shapes, and runnable examples
- Learning Path — A guided tour through the 19 architecture families
Architecture Guides
Conceptual guides covering theory, architecture evolution, and decision tables for each family.
Sequence Processing
Representation Learning
- Vision Architectures — ViT, Swin, UNet, ConvNeXt, MLP-Mixer
- Convolutional Networks — ResNet, DenseNet, MobileNet, TCN
- Contrastive Learning — SimCLR, BYOL, BarlowTwins, MAE, VICReg
- Graph & Set Networks — Message passing, spectral, invariance
Generative & Dynamic
Composition & Enhancement
Examples
See examples/ for runnable scripts including mlp_basics.exs, sequence_comparison.exs, graph_classification.exs, vae_generation.exs, and architecture_tour.exs.
Mamba for Sequence Modeling
model = Edifice.SSM.Mamba.build(
embed_size: 128,
hidden_size: 256,
state_size: 16,
num_layers: 4,
window_size: 100
)
{init_fn, predict_fn} = Axon.build(model)
params = init_fn.(Nx.template({1, 100, 128}, :f32), Axon.ModelState.empty())
output = predict_fn.(params, Nx.broadcast(0.5, {1, 100, 128}))
# => {1, 256}
Graph Classification with GCN
model = Edifice.Graph.GCN.build_classifier(
input_dim: 16,
hidden_dims: [64, 64],
num_classes: 2,
pool: :mean
)
{init_fn, predict_fn} = Axon.build(model)
params = init_fn.(
%{
"nodes" => Nx.template({4, 10, 16}, :f32),
"adjacency" => Nx.template({4, 10, 10}, :f32)
},
Axon.ModelState.empty()
)
output = predict_fn.(params, %{
"nodes" => Nx.broadcast(0.5, {4, 10, 16}),
"adjacency" => Nx.eye(10) |> Nx.broadcast({4, 10, 10})
})
# => {4, 2}
VAE with Reparameterization
{encoder, decoder} = Edifice.Generative.VAE.build(
input_size: 784,
latent_size: 32,
encoder_sizes: [512, 256],
decoder_sizes: [256, 512]
)
# Encoder outputs mu and log_var
{init_fn, predict_fn} = Axon.build(encoder)
params = init_fn.(Nx.template({1, 784}, :f32), Axon.ModelState.empty())
%{mu: mu, log_var: log_var} = predict_fn.(params, Nx.broadcast(0.5, {1, 784}))
# Sample latent vector (requires PRNG key for stochastic sampling)
key = Nx.Random.key(42)
{z, _new_key} = Edifice.Generative.VAE.reparameterize(mu, log_var, key)
# KL divergence for training
kl_loss = Edifice.Generative.VAE.kl_divergence(mu, log_var)
Permutation-Invariant Set Processing
model = Edifice.Sets.DeepSets.build(
input_dim: 3,
hidden_dim: 64,
output_dim: 10,
pool: :mean
)
{init_fn, predict_fn} = Axon.build(model)
params = init_fn.(Nx.template({4, 20, 3}, :f32), Axon.ModelState.empty())
# Process sets of 20 3D points
output = predict_fn.(params, Nx.broadcast(0.5, {4, 20, 3}))
# => {4, 10}
API Design
Every architecture module follows the same pattern:
# Module.build(opts) returns an Axon model
model = Edifice.SSM.Mamba.build(embed_size: 256, hidden_size: 512)
# Some modules expose layer-level builders for composition
layer = Edifice.Graph.GCN.gcn_layer(nodes, adjacency, output_dim)
# Generative models may return tuples
{encoder, decoder} = Edifice.Generative.VAE.build(input_size: 784)
# Utility functions for training
loss = Edifice.Generative.VAE.loss(reconstruction, target, mu, log_var)
energy = Edifice.Energy.Hopfield.energy(query, patterns, beta)
The unified registry lets you build any architecture by name:
# Useful for hyperparameter search, config-driven experiments
for arch <- [:mamba, :retnet, :griffin, :gla] do
model = Edifice.build(arch, embed_size: 256, hidden_size: 512, num_layers: 4)
# ... train and evaluate
end
Requirements
-
Elixir >= 1.18
-
Nx ~> 0.10
-
Axon ~> 0.8
-
Polaris ~> 0.1
-
EXLA ~> 0.10 (optional, for GPU acceleration)
License
MIT License. See LICENSE for details.