Vsr
Viewstamped Replication for Elixir
A distributed consensus system implementing the Viewstamped Replication (VSR) protocol, providing fault-tolerant state machine replication with automatic failure recovery.
Features
✅ Core VSR Protocol
- Primary-backup replication with view changes
- Automatic primary failure detection and recovery
- Log-based operation ordering and consistency
- Quorum-based consensus decisions
✅ Implemented Components
- Sequential operation validation with gap detection
- Heartbeat mechanism for failure detection
- Automatic view change triggering on primary timeout
- Memory management (cleanup of committed operation metadata)
- Pluggable state machines, log storage, and communication layers
✅ Observability
- Comprehensive telemetry instrumentation following Erlang/Elixir conventions
- Leadership span tracking (when nodes are primary/leader)
- Protocol event tracking (prepare, commit, view changes)
- State machine operation spans with duration metrics
- Timer and heartbeat event tracking
- See TELEMETRY_EVENTS.md for complete event documentation
Current Limitations
⚠️ Client Request Deduplication: Currently only implemented for read-only operations. Write operations may be processed multiple times if clients retry requests due to network timeouts. This does not affect VSR protocol correctness or safety properties, but may impact user experience.
- Workaround: Implement request deduplication at the application layer using unique request IDs
- Future Work: Full write operation deduplication requires propagating client identifiers through the entire VSR protocol
⚠️ No Reconfiguration Support: The implementation assumes a static cluster with fixed membership. Dynamic addition or removal of replicas (reconfiguration protocol from the VSR paper) is not currently supported.
- Limitation: Cluster size and membership must be determined at startup and cannot be changed during operation
- Workaround: Plan cluster capacity ahead of time to accommodate expected load
- Future Work: Implement the reconfiguration protocol described in "Viewstamped Replication Revisited"
Installation
If available in Hex, the package can be installed
by adding vsr to your list of dependencies in mix.exs:
def deps do
[
{:vsr, "~> 0.1.0"}
]
endUsage
# Start a VSR replica with a key-value state machine
{:ok, replica} = Vsr.start_link(
log: [],
state_machine: VsrKv,
cluster_size: 3
)
# Perform operations
VsrKv.put(replica, "key", "value")
result = VsrKv.get(replica, "key") # Returns "value"Testing
Unit Tests
mix testTest Status: 106/106 tests passing
Jepsen Maelstrom Testing
VSR includes integration with Jepsen Maelstrom, a workbench for learning distributed systems by writing your own implementations and testing them against fault injection.
Download Maelstrom
Download the latest Maelstrom release:
wget https://github.com/jepsen-io/maelstrom/releases/download/v0.2.3/maelstrom.tar.bz2 tar -xjf maelstrom.tar.bz2Or use the provided script to download and extract:
curl -L https://github.com/jepsen-io/maelstrom/releases/download/v0.2.3/maelstrom.tar.bz2 | tar -xj
Run Maelstrom Tests
The repository includes a convenience script for running linearizable key-value tests:
./maelstrom-kv
This runs the lin-kv workload which tests:
- Linearizable key-value operations (read, write, cas)
- Fault tolerance with network partitions
- Consistency under concurrent operations
Manual Maelstrom Testing
You can also run Maelstrom tests manually:
cd maelstrom
java -jar maelstrom.jar test \
-w lin-kv \
--bin ../run-vsr-node \
--node-count 3 \
--time-limit 10 \
--concurrency 6Workload Options:
lin-kv- Linearizable key-value store (read, write, cas operations)--node-count- Number of VSR replicas to run--time-limit- Duration of test in seconds--concurrency- Number of concurrent client operations
Interpreting Results
After a test run, check:
- Test results: Maelstrom will report if linearizability was maintained
- Logs: Found in
store/lin-kv/latest/jepsen.log- Test runner logs and errorsnode-logs/n*.log- Individual node logs
Success criteria:
- All operations must satisfy linearizability
- Minimal network timeouts (some expected during partitions)
- No crashes or protocol violations
Architecture
See SPECIFICATION.md for detailed VSR protocol specification.
Documentation can be generated with ExDoc and published on HexDocs. Once published, the docs can be found at https://hexdocs.pm/vsr.