Edifice



A comprehensive ML architecture library for Elixir, built on Nx and Axon.
186 neural network architectures across 25 families — from MLPs to Mamba, transformers to graph networks, VAEs to spiking neurons, audio codecs to robotics, scientific ML to 3D generation.
Why Edifice?
The Elixir ML ecosystem has excellent numerical computing (Nx) and model building (Axon) foundations, but no comprehensive collection of ready-to-use architectures. Edifice fills that gap:
- One dependency for all major architecture families
- Consistent API — every architecture follows
Module.build(opts) returning an Axon model - Unified registry —
Edifice.build(:mamba, opts) discovers and builds any architecture by name - Pure Elixir — no Python, no ONNX imports, just Nx/Axon all the way down
- GPU-ready — works with EXLA/CUDA out of the box
Installation
Add edifice to your dependencies in mix.exs:
def deps do
[
{:edifice, "~> 0.2.0"}
]
end
Edifice requires Nx ~> 0.10 and Axon ~> 0.8. For GPU acceleration, add EXLA:
{:exla, "~> 0.10"}
Tip: On Elixir 1.19+, set MIX_OS_DEPS_COMPILE_PARTITION_COUNT=4 to compile dependencies in parallel (up to 4x faster first build).
Quick Start
# Build any architecture by name
model = Edifice.build(:mamba, embed_size: 256, hidden_size: 512, num_layers: 4)
# Or use the module directly for more control
model = Edifice.SSM.Mamba.build(
embed_size: 256,
hidden_size: 512,
state_size: 16,
num_layers: 4,
window_size: 60
)
# Build and run
{init_fn, predict_fn} = Axon.build(model)
params = init_fn.(Nx.template({1, 60, 256}, :f32), Axon.ModelState.empty())
output = predict_fn.(params, input)
# Explore what's available
Edifice.list_architectures()
# => [:attention, :bayesian, :capsule, :deep_sets, :densenet, :diffusion, ...]
Edifice.list_families()
# => %{ssm: [:mamba, :mamba_ssd, :s5, ...], attention: [:attention, :retnet, ...], ...}
Architecture Families
Feedforward
|
Architecture |
Module |
Key Feature |
|---|
| MLP | Edifice.Feedforward.MLP |
Multi-layer perceptron with configurable hidden sizes |
| KAN | Edifice.Feedforward.KAN |
Kolmogorov-Arnold Networks, learnable activation functions |
| KAT | Edifice.Feedforward.KAT |
Kolmogorov-Arnold Transformer (KAN + attention) (learnable activations) |
| TabNet | Edifice.Feedforward.TabNet |
Attentive feature selection for tabular data |
| BitNet | Edifice.Feedforward.BitNet |
Ternary/binary weight quantization (1.58-bit) |
Transformer
|
Architecture |
Module |
Key Feature |
|---|
| Decoder-Only | Edifice.Transformer.DecoderOnly |
GPT-style with GQA, RoPE/iRoPE, SwiGLU, RMSNorm |
| Multi-Token Prediction | Edifice.Transformer.MultiTokenPrediction |
Predict next N tokens simultaneously |
| Byte Latent Transformer | Edifice.Transformer.ByteLatentTransformer |
Byte-level processing via encoder-latent-decoder |
| Nemotron-H | Edifice.Transformer.NemotronH |
NVIDIA’s hybrid Mamba-Transformer |
State Space Models
|
Architecture |
Module |
Key Feature |
|---|
| S4 | Edifice.SSM.S4 |
HiPPO DPLR initialization, long-range memory |
| S4D | Edifice.SSM.S4D |
Diagonal state space, simplified S4 |
| S5 | Edifice.SSM.S5 |
MIMO diagonal SSM with D skip connection |
| H3 | Edifice.SSM.H3 |
Two SSMs with multiplicative gating + short convolution |
| Hyena | Edifice.SSM.Hyena |
Long convolution hierarchy, implicit filters |
| Mamba | Edifice.SSM.Mamba |
Selective SSM, parallel associative scan |
| Mamba-2 (SSD) | Edifice.SSM.MambaSSD |
Structured state space duality, chunk-wise matmul |
| Mamba (Cumsum) | Edifice.SSM.MambaCumsum |
Mamba with configurable scan algorithm |
| Mamba (Hillis-Steele) | Edifice.SSM.MambaHillisSteele |
Mamba with max-parallelism scan |
| BiMamba | Edifice.SSM.BiMamba |
Bidirectional Mamba for non-causal tasks |
| GatedSSM | Edifice.SSM.GatedSSM |
Gated temporal with gradient checkpointing |
| Jamba | Edifice.SSM.Hybrid |
Mamba + Attention hybrid (configurable ratio) |
| Zamba | Edifice.SSM.Zamba |
Mamba + single shared attention layer |
| StripedHyena | Edifice.SSM.StripedHyena |
Interleaved Hyena long conv + gated conv |
| Mamba-3 | Edifice.SSM.Mamba3 |
Complex states, trapezoidal discretization, MIMO |
| GSS | Edifice.SSM.GSS |
Gated State Space (simplified S4 with gating) |
| Hymba | Edifice.SSM.Hymba |
Hybrid Mamba + attention with learnable meta tokens |
| SS Transformer | Edifice.SSM.SSTransformer |
State Space Transformer |
Attention & Linear Attention
|
Architecture |
Module |
Key Feature |
|---|
| Multi-Head Attention | Edifice.Attention.MultiHead |
Sliding window, QK LayerNorm |
| GQA | Edifice.Attention.GQA |
Grouped Query Attention, fewer KV heads |
| Perceiver | Edifice.Attention.Perceiver |
Cross-attention to learned latents, input-agnostic |
| FNet | Edifice.Attention.FNet |
Fourier Transform replacing attention |
| Linear Transformer | Edifice.Attention.LinearTransformer |
Kernel-based O(N) attention |
| Nystromformer | Edifice.Attention.Nystromformer |
Nystrom approximation of attention matrix |
| Performer | Edifice.Attention.Performer |
FAVOR+ random feature attention |
| RetNet | Edifice.Attention.RetNet |
Multi-scale retention, O(1) recurrent inference |
| RWKV-7 | Edifice.Attention.RWKV |
Linear attention, O(1) space, “Goose” architecture |
| GLA | Edifice.Attention.GLA |
Gated Linear Attention with data-dependent decay |
| HGRN-2 | Edifice.Attention.HGRN |
Hierarchically gated linear RNN, state expansion |
| Griffin/Hawk | Edifice.Attention.Griffin |
RG-LRU + local attention (Griffin) or pure RG-LRU (Hawk) |
| Diff Transformer | Edifice.Attention.DiffTransformer |
Noise-cancelling dual softmax subtraction |
| MLA | Edifice.Attention.MLA |
Multi-Head Latent Attention (DeepSeek KV compression) |
| Based | Edifice.Attention.Based |
Taylor expansion linear attention |
| Mega | Edifice.Attention.Mega |
Moving average + gated attention |
| InfiniAttention | Edifice.Attention.InfiniAttention |
Compressive memory for unbounded context |
| Conformer | Edifice.Attention.Conformer |
Conv-augmented transformer for audio/speech |
| Ring Attention | Edifice.Attention.RingAttention |
Distributed chunked attention for long sequences |
| Lightning Attention | Edifice.Attention.LightningAttention |
Hybrid linear/softmax with I/O-aware tiling |
| Gated Attention | Edifice.Attention.GatedAttention |
Sigmoid post-attention gate (NeurIPS 2025) |
| NSA | Edifice.Attention.NSA |
Native Sparse Attention (DeepSeek three-path) |
| KDA | Edifice.Attention.KDA |
Kimi Delta Attention, channel-wise decay |
| Flash Linear Attention | Edifice.Attention.FlashLinearAttention |
Optimized linear attention |
| YaRN | Edifice.Attention.YARN |
RoPE context extension via frequency scaling |
| Dual Chunk | Edifice.Attention.DualChunk |
Dual Chunk Attention for long-context |
| TMRoPE | Edifice.Attention.TMRoPE |
Time-aligned Multimodal RoPE |
| RNoPE-SWA | Edifice.Attention.RNoPESWA |
No positional encoding + sliding window |
Recurrent Networks
|
Architecture |
Module |
Key Feature |
|---|
| LSTM/GRU | Edifice.Recurrent |
Classic recurrent with multi-layer stacking |
| xLSTM | Edifice.Recurrent.XLSTM |
Exponential gating, matrix memory (sLSTM/mLSTM) |
| MinGRU | Edifice.Recurrent.MinGRU |
Minimal GRU, parallel-scannable |
| MinLSTM | Edifice.Recurrent.MinLSTM |
Minimal LSTM, parallel-scannable |
| DeltaNet | Edifice.Recurrent.DeltaNet |
Delta rule-based linear RNN |
| TTT | Edifice.Recurrent.TTT |
Test-Time Training, self-supervised at inference |
| Titans | Edifice.Recurrent.Titans |
Neural long-term memory, surprise-gated |
| Reservoir | Edifice.Recurrent.Reservoir |
Echo State Networks with fixed random reservoir |
| sLSTM | Edifice.Recurrent.SLSTM |
Scalar LSTM with exponential gating |
| xLSTM v2 | Edifice.Recurrent.XLSTMv2 |
Updated mLSTM with matrix memory |
| Gated DeltaNet | Edifice.Recurrent.GatedDeltaNet |
Linear attention with data-dependent gating |
| TTT-E2E | Edifice.Recurrent.TTTE2E |
End-to-end test-time training |
| Native Recurrence | Edifice.Recurrent.NativeRecurrence |
Native recurrence block |
Vision
|
Architecture |
Module |
Key Feature |
|---|
| ViT | Edifice.Vision.ViT |
Vision Transformer, patch embedding |
| DeiT | Edifice.Vision.DeiT |
Data-efficient ViT with distillation token |
| Swin | Edifice.Vision.SwinTransformer |
Shifted window attention, hierarchical features |
| U-Net | Edifice.Vision.UNet |
Encoder-decoder with skip connections |
| ConvNeXt | Edifice.Vision.ConvNeXt |
Modernized ConvNet with transformer-inspired design |
| MLP-Mixer | Edifice.Vision.MLPMixer |
Pure MLP with token/channel mixing |
| FocalNet | Edifice.Vision.FocalNet |
Focal modulation, hierarchical context |
| PoolFormer | Edifice.Vision.PoolFormer |
Average pooling token mixer (MetaFormer) |
| NeRF | Edifice.Vision.NeRF |
Neural radiance field, coordinate-to-color mapping |
| Gaussian Splat | Edifice.Vision.GaussianSplat |
3D Gaussian Splatting (NeRF successor) |
| MambaVision | Edifice.Vision.MambaVision |
4-stage hierarchical CNN+Mamba+Attention |
| DINOv2 | Edifice.Vision.DINOv2 |
Self-distillation vision backbone |
| MetaFormer | Edifice.Vision.MetaFormer |
Architecture-first framework (+ CAFormer variant) |
| EfficientViT | Edifice.Vision.EfficientViT |
Linear attention ViT |
Convolutional
|
Architecture |
Module |
Key Feature |
|---|
| Conv1D/2D | Edifice.Convolutional.Conv |
Configurable convolution blocks with BN, activation, dropout |
| ResNet | Edifice.Convolutional.ResNet |
Residual/bottleneck blocks, configurable depth |
| DenseNet | Edifice.Convolutional.DenseNet |
Dense connections, feature reuse |
| TCN | Edifice.Convolutional.TCN |
Dilated causal convolutions for sequences |
| MobileNet | Edifice.Convolutional.MobileNet |
Depthwise separable convolutions |
| EfficientNet | Edifice.Convolutional.EfficientNet |
Compound scaling (depth, width, resolution) |
Generative Models
|
Architecture |
Module |
Key Feature |
|---|
| VAE | Edifice.Generative.VAE |
Reparameterization trick, KL divergence, beta-VAE |
| VQ-VAE | Edifice.Generative.VQVAE |
Discrete codebook, straight-through estimator |
| GAN | Edifice.Generative.GAN |
Generator/discriminator, WGAN-GP support |
| Diffusion (DDPM) | Edifice.Generative.Diffusion |
Denoising diffusion, sinusoidal time embedding |
| DDIM | Edifice.Generative.DDIM |
Deterministic diffusion sampling, fast inference |
| DiT | Edifice.Generative.DiT |
Diffusion Transformer, AdaLN-Zero conditioning |
| Latent Diffusion | Edifice.Generative.LatentDiffusion |
Diffusion in compressed latent space |
| Consistency Model | Edifice.Generative.ConsistencyModel |
Single-step generation via consistency training |
| Score SDE | Edifice.Generative.ScoreSDE |
Continuous SDE framework (VP-SDE, VE-SDE) |
| Flow Matching | Edifice.Generative.FlowMatching |
ODE-based generation, multiple loss variants |
| Normalizing Flow | Edifice.Generative.NormalizingFlow |
Affine coupling layers (RealNVP-style) |
| MMDiT | Edifice.Generative.MMDiT |
Multimodal Diffusion Transformer (FLUX.1, SD3) |
| SoFlow | Edifice.Generative.SoFlow |
Flow matching + consistency loss |
| VAR | Edifice.Generative.VAR |
Visual Autoregressive (next-scale prediction) |
| Linear DiT (SANA) | Edifice.Generative.LinearDiT |
Linear attention for diffusion, 100x speedup |
| SiT | Edifice.Generative.SiT |
Scalable Interpolant Transformer |
| Transfusion | Edifice.Generative.Transfusion |
Unified AR text + diffusion images |
| MAR | Edifice.Generative.MAR |
Masked Autoregressive generation |
| CogVideoX | Edifice.Generative.CogVideoX |
3D causal VAE + expert transformer for video |
| TRELLIS | Edifice.Generative.TRELLIS |
Sparse 3D lattice + rectified flow |
Contrastive & Self-Supervised
|
Architecture |
Module |
Key Feature |
|---|
| SimCLR | Edifice.Contrastive.SimCLR |
NT-Xent contrastive loss, projection head |
| BYOL | Edifice.Contrastive.BYOL |
No negatives, momentum encoder |
| Barlow Twins | Edifice.Contrastive.BarlowTwins |
Cross-correlation redundancy reduction |
| MAE | Edifice.Contrastive.MAE |
Masked Autoencoder, 75% patch masking |
| VICReg | Edifice.Contrastive.VICReg |
Variance-Invariance-Covariance regularization |
| JEPA | Edifice.Contrastive.JEPA |
Joint Embedding Predictive Architecture |
| Temporal JEPA | Edifice.Contrastive.TemporalJEPA |
V-JEPA for video/temporal sequences |
| SigLIP | Edifice.Contrastive.SigLIP |
Sigmoid contrastive learning (CLIP improvement) |
Graph & Set Networks
|
Architecture |
Module |
Key Feature |
|---|
| GCN | Edifice.Graph.GCN |
Spectral graph convolutions (Kipf & Welling) |
| GAT | Edifice.Graph.GAT |
Graph attention with multi-head support |
| GIN | Edifice.Graph.GIN |
Graph Isomorphism Network, maximally expressive |
| GraphSAGE | Edifice.Graph.GraphSAGE |
Inductive learning, neighborhood sampling |
| Graph Transformer | Edifice.Graph.GraphTransformer |
Full attention over nodes with edge features |
| PNA | Edifice.Graph.PNA |
Principal Neighbourhood Aggregation |
| GINv2 | Edifice.Graph.GINv2 |
GIN with edge features |
| SchNet | Edifice.Graph.SchNet |
Continuous-filter convolutions for molecules |
| EGNN | Edifice.Graph.EGNN |
E(n)-equivariant GNN for molecular simulation |
| DeepSets | Edifice.Sets.DeepSets |
Permutation-invariant set functions |
| PointNet | Edifice.Sets.PointNet |
Point cloud processing with T-Net alignment |
Energy, Probabilistic & Memory
|
Architecture |
Module |
Key Feature |
|---|
| EBM | Edifice.Energy.EBM |
Energy-based models, contrastive divergence |
| Hopfield | Edifice.Energy.Hopfield |
Modern continuous Hopfield networks |
| Neural ODE | Edifice.Energy.NeuralODE |
Continuous-depth networks via ODE solvers |
| Bayesian NN | Edifice.Probabilistic.Bayesian |
Weight uncertainty, variational inference |
| MC Dropout | Edifice.Probabilistic.MCDropout |
Uncertainty estimation via dropout at inference |
| Evidential NN | Edifice.Probabilistic.EvidentialNN |
Dirichlet priors for uncertainty |
| NTM | Edifice.Memory.NTM |
Neural Turing Machine, differentiable memory |
| Memory Network | Edifice.Memory.MemoryNetwork |
End-to-end memory with multi-hop attention |
| Engram | Edifice.Memory.Engram |
O(1) hash-based associative memory |
Meta-Learning & Specialized
|
Architecture |
Module |
Key Feature |
|---|
| MoE | Edifice.Meta.MoE |
Mixture of Experts with top-k/hash routing |
| Switch MoE | Edifice.Meta.SwitchMoE |
Top-1 routing with load balancing |
| Soft MoE | Edifice.Meta.SoftMoE |
Fully differentiable soft token routing |
| LoRA | Edifice.Meta.LoRA |
Low-Rank Adaptation for parameter-efficient fine-tuning |
| Adapter | Edifice.Meta.Adapter |
Bottleneck adapter modules for transfer learning |
| Hypernetwork | Edifice.Meta.Hypernetwork |
Networks that generate other networks’ weights |
| Capsule | Edifice.Meta.Capsule |
Dynamic routing between capsules |
| MixtureOfDepths | Edifice.Meta.MixtureOfDepths |
Dynamic per-token compute allocation |
| MixtureOfAgents | Edifice.Meta.MixtureOfAgents |
Multi-model proposer + aggregator |
| RLHF Head | Edifice.Meta.RLHFHead |
Reward model and preference heads |
| DPO | Edifice.Meta.DPO |
Direct Preference Optimization |
| GRPO | Edifice.Meta.GRPO |
Group Relative Policy Optimization (DeepSeek-R1) |
| KTO | Edifice.Meta.KTO |
Kahneman-Tversky Optimization (binary feedback) |
| MoE v2 | Edifice.Meta.MoEv2 |
Expert-choice routing + shared experts + bias balancing |
| DoRA | Edifice.Meta.DoRA |
Weight-decomposed LoRA |
| Speculative Decoding | Edifice.Meta.SpeculativeDecoding |
Draft + verify inference acceleration |
| Test-Time Compute | Edifice.Meta.TestTimeCompute |
Adaptive test-time compute |
| Mixture of Tokenizers | Edifice.Meta.MixtureOfTokenizers |
Multi-tokenization expert routing |
| QAT | Edifice.Meta.QAT |
Quantization-Aware Training |
| Hybrid Builder | Edifice.Meta.HybridBuilder |
Configurable SSM/Attention ratio |
| Liquid NN | Edifice.Liquid |
Continuous-time ODE dynamics (LTC cells) |
| SNN | Edifice.Neuromorphic.SNN |
Leaky integrate-and-fire, surrogate gradients |
| ANN2SNN | Edifice.Neuromorphic.ANN2SNN |
Convert trained ANNs to spiking networks |
Interpretability
|
Architecture |
Module |
Key Feature |
|---|
| Sparse Autoencoder | Edifice.Interpretability.SparseAutoencoder |
Feature extraction from model activations |
| Transcoder | Edifice.Interpretability.Transcoder |
Cross-layer mechanistic interpretability |
Scientific ML
|
Architecture |
Module |
Key Feature |
|---|
| FNO | Edifice.Scientific.FNO |
Fourier Neural Operator for solving PDEs |
Audio
|
Architecture |
Module |
Key Feature |
|---|
| EnCodec | Edifice.Audio.EnCodec |
Neural audio codec (encoder → RVQ → decoder) |
| VALL-E | Edifice.Audio.VALLE |
Codec language model for zero-shot TTS |
| SoundStorm | Edifice.Audio.SoundStorm |
Parallel audio token generation |
Robotics
|
Architecture |
Module |
Key Feature |
|---|
| ACT | Edifice.Robotics.ACT |
Action Chunking Transformer for imitation learning |
| OpenVLA | Edifice.Robotics.OpenVLA |
Vision-Language-Action model for robot control |
RL & World Models
|
Architecture |
Module |
Key Feature |
|---|
| PolicyValue | Edifice.RL.PolicyValue |
Actor-critic policy-value network |
| World Model | Edifice.WorldModel.WorldModel |
Encoder + dynamics + reward head |
| Medusa | Edifice.Inference.Medusa |
Multi-head speculative decoding |
Multimodal
|
Architecture |
Module |
Key Feature |
|---|
| Multimodal Fusion | Edifice.Multimodal.Fusion |
MLP projection, cross-attention, Perceiver resampler |
Building Blocks
|
Block |
Module |
Key Feature |
|---|
| RMSNorm | Edifice.Blocks.RMSNorm |
Root Mean Square normalization |
| SwiGLU | Edifice.Blocks.SwiGLU |
Gated FFN with SiLU activation |
| RoPE | Edifice.Blocks.RoPE |
Rotary position embedding |
| ALiBi | Edifice.Blocks.ALiBi |
Attention with linear biases |
| Patch Embed | Edifice.Blocks.PatchEmbed |
Image-to-patch tokenization |
| Sinusoidal PE | Edifice.Blocks.SinusoidalPE |
Fixed sinusoidal position encoding |
| Adaptive Norm | Edifice.Blocks.AdaptiveNorm |
Condition-dependent normalization (AdaLN) |
| Cross Attention | Edifice.Blocks.CrossAttention |
Cross-attention between two sequences |
| Conv1D/2D | Edifice.Convolutional.Conv |
Configurable convolution blocks |
| FFN | Edifice.Blocks.FFN |
Standard and gated feed-forward networks |
| Transformer Block | Edifice.Blocks.TransformerBlock |
Pre-norm block with pluggable attention |
| Causal Mask | Edifice.Blocks.CausalMask |
Unified causal mask creation |
| Depthwise Conv | Edifice.Blocks.DepthwiseConv |
1D depthwise separable convolution |
| Model Builder | Edifice.Blocks.ModelBuilder |
Sequence/vision model skeletons |
| Message Passing | Edifice.Graph.MessagePassing |
Generic MPNN framework, global pooling |
| Scalable-Softmax | Edifice.Blocks.SSMax |
Drop-in softmax replacement for long sequences |
| Softpick | Edifice.Blocks.Softpick |
Non-saturating sparse attention function |
| KV Cache | Edifice.Blocks.KVCache |
Inference-time KV caching |
Guides
New to ML?
Start here if you’re new to machine learning. These guides build from zero to fluency with Edifice’s API and architecture families.
- ML Foundations — What neural networks are, how they learn, tensors and shapes
- Core Vocabulary — Essential terminology used across all guides
- The Problem Landscape — Classification, generation, sequence modeling — which architectures solve which problems
- Reading Edifice — The build/init/predict pattern, Axon graphs, shapes, and runnable examples
- Learning Path — A guided tour through the architecture families
Reference
- Architecture Taxonomy — Comprehensive catalog of architectures: descriptions, paper references, strengths/weaknesses, adoption context, and gap analysis
Architecture Guides
Conceptual guides covering theory, architecture evolution, and decision tables for each family.
Sequence Processing
Representation Learning
- Vision Architectures — ViT, Swin, UNet, ConvNeXt, MLP-Mixer
- Convolutional Networks — ResNet, DenseNet, MobileNet, TCN
- Contrastive Learning — SimCLR, BYOL, BarlowTwins, MAE, VICReg
- Graph & Set Networks — Message passing, spectral, invariance
Generative & Dynamic
Composition & Enhancement
Examples
See examples/ for runnable scripts including mlp_basics.exs, sequence_comparison.exs, graph_classification.exs, vae_generation.exs, and architecture_tour.exs.
Mamba for Sequence Modeling
model = Edifice.SSM.Mamba.build(
embed_size: 128,
hidden_size: 256,
state_size: 16,
num_layers: 4,
window_size: 100
)
{init_fn, predict_fn} = Axon.build(model)
params = init_fn.(Nx.template({1, 100, 128}, :f32), Axon.ModelState.empty())
output = predict_fn.(params, Nx.broadcast(0.5, {1, 100, 128}))
# => {1, 256}
Graph Classification with GCN
model = Edifice.Graph.GCN.build_classifier(
input_dim: 16,
hidden_dims: [64, 64],
num_classes: 2,
pool: :mean
)
{init_fn, predict_fn} = Axon.build(model)
params = init_fn.(
%{
"nodes" => Nx.template({4, 10, 16}, :f32),
"adjacency" => Nx.template({4, 10, 10}, :f32)
},
Axon.ModelState.empty()
)
output = predict_fn.(params, %{
"nodes" => Nx.broadcast(0.5, {4, 10, 16}),
"adjacency" => Nx.eye(10) |> Nx.broadcast({4, 10, 10})
})
# => {4, 2}
VAE with Reparameterization
{encoder, decoder} = Edifice.Generative.VAE.build(
input_size: 784,
latent_size: 32,
encoder_sizes: [512, 256],
decoder_sizes: [256, 512]
)
# Encoder outputs mu and log_var
{init_fn, predict_fn} = Axon.build(encoder)
params = init_fn.(Nx.template({1, 784}, :f32), Axon.ModelState.empty())
%{mu: mu, log_var: log_var} = predict_fn.(params, Nx.broadcast(0.5, {1, 784}))
# Sample latent vector (requires PRNG key for stochastic sampling)
key = Nx.Random.key(42)
{z, _new_key} = Edifice.Generative.VAE.reparameterize(mu, log_var, key)
# KL divergence for training
kl_loss = Edifice.Generative.VAE.kl_divergence(mu, log_var)
Permutation-Invariant Set Processing
model = Edifice.Sets.DeepSets.build(
input_dim: 3,
hidden_dim: 64,
output_dim: 10,
pool: :mean
)
{init_fn, predict_fn} = Axon.build(model)
params = init_fn.(Nx.template({4, 20, 3}, :f32), Axon.ModelState.empty())
# Process sets of 20 3D points
output = predict_fn.(params, Nx.broadcast(0.5, {4, 20, 3}))
# => {4, 10}
API Design
Every architecture module follows the same pattern:
# Module.build(opts) returns an Axon model
model = Edifice.SSM.Mamba.build(embed_size: 256, hidden_size: 512)
# Some modules expose layer-level builders for composition
layer = Edifice.Graph.GCN.gcn_layer(nodes, adjacency, output_dim)
# Generative models may return tuples
{encoder, decoder} = Edifice.Generative.VAE.build(input_size: 784)
# Utility functions for training
loss = Edifice.Generative.VAE.loss(reconstruction, target, mu, log_var)
energy = Edifice.Energy.Hopfield.energy(query, patterns, beta)
The unified registry lets you build any architecture by name:
# Useful for hyperparameter search, config-driven experiments
for arch <- [:mamba, :retnet, :griffin, :gla] do
model = Edifice.build(arch, embed_size: 256, hidden_size: 512, num_layers: 4)
# ... train and evaluate
end
Requirements
-
Elixir >= 1.18
-
Nx ~> 0.10
-
Axon ~> 0.8
-
Polaris ~> 0.1
-
EXLA ~> 0.10 (optional, for GPU acceleration)
License
MIT License. See LICENSE for details.