RaftFleet
Elixir library to run multiple Raft consensus groups in a cluster of ErlangVMs
Feature & Design
-
Easy hosting of multiple "cluster-wide state"s
- Flexible data model (defined by rafted_value)
- Decentralized architecture and fault tolerance
-
Reasonably scalable placement of processes for multiple Raft consensus groups
- Consensus member processes are distributed to ErlangVMs in a data center-aware manner using rendezvous hashing
- Automatic rebalancing on adding/removing nodes
-
Location transparency
- Each consensus group leader is accessible using name (an atom) of the consensus group
- Actual pids of consensus leader processes are cached in a local ETS table for fast access
Notes on backward compatibility
-
Users of
<= 0.6.0should upgrade to0.6.1before upgrading to0.7.xdue to a change in internal data structure. While<= 0.6.0and0.7.xare not compatible,0.6.1should be able to interact with both<= 0.6.0and0.7.x.
Usage example
Suppose we have a cluster of 4 erlang nodes:
$ iex --sname 1 -S mix
iex(1@skirino-Manjaro)>
$ iex --sname 2 -S mix
iex(2@skirino-Manjaro)> Node.connect(:"1@skirino-Manjaro")
$ iex --sname 3 -S mix
iex(3@skirino-Manjaro)> Node.connect(:"1@skirino-Manjaro")
$ iex --sname 4 -S mix
iex(4@skirino-Manjaro)> Node.connect(:"1@skirino-Manjaro")
Load the following module that implements RaftedValue.Data behaviour on all nodes in the cluster.
defmodule JustAnInt do
@behaviour RaftedValue.Data
def new(), do: 0
def command(i, {:set, j}), do: {i, j }
def command(i, :inc ), do: {i, i + 1}
def query(i, :get), do: i
end
Call RaftFleet.activate/1 on all nodes.
iex(1@skirino-Manjaro)> RaftFleet.activate("zone1")
iex(2@skirino-Manjaro)> RaftFleet.activate("zone2")
iex(3@skirino-Manjaro)> RaftFleet.activate("zone1")
iex(4@skirino-Manjaro)> RaftFleet.activate("zone2")Create 5 consensus groups each of which replicates an integer and has 3 consensus members.
iex(1@skirino-Manjaro)> rv_config = RaftedValue.make_config(JustAnInt)
iex(1@skirino-Manjaro)> RaftFleet.add_consensus_group(:consensus1, 3, rv_config)
iex(1@skirino-Manjaro)> RaftFleet.add_consensus_group(:consensus2, 3, rv_config)
iex(1@skirino-Manjaro)> RaftFleet.add_consensus_group(:consensus3, 3, rv_config)
iex(1@skirino-Manjaro)> RaftFleet.add_consensus_group(:consensus4, 3, rv_config)
iex(1@skirino-Manjaro)> RaftFleet.add_consensus_group(:consensus5, 3, rv_config)Now we can run query/command from any node in the cluster:
iex(1@skirino-Manjaro)> RaftFleet.query(:consensus1, :get)
{:ok, 0}
iex(2@skirino-Manjaro)> RaftFleet.command(:consensus1, :inc)
{:ok, 0}
iex(3@skirino-Manjaro)> RaftFleet.query(:consensus1, :get)
{:ok, 1}Activating/deactivating a node in the cluster triggers rebalancing of consensus member processes.
Deployment notes
To run raft_fleet within an ErlangVM cluster, the followings are our general recommendations.
Cluster should consist of at least 3 nodes to tolerate 1 node failure. Similarly cluster nodes should span 3 (or more) data centers, so that the system keeps on functioning in the face of 1 data center failure.
When you add new ErlangVM nodes, each node should run the following initialization steps:
- establish connections to other running nodes,
-
call
RaftFleet.activate/1.
These steps are typically done within
start/2of the main OTP application. Information of other running nodes should be available from e.g. IaaS API.When terminating a node you should proceed as follows (although
raft_fleettolerates failures as long as quorums are maintained, it's much better to tellraft_fleetto make preparations beforehand):-
call
RaftFleet.deactivate/0within the node-to-be-terminated, - wait for a while (say, 10 min) so that existing consensus group members are migrated to the other nodes, then
- finally shutdown the node.
-
call
Links
- Raft official website
- The original paper and the thesis about the Raft protocol
rafted_value: Elixir implementation of the Raft consensus protocol- My slides to introduce rafted_value and raft_fleet