HyParView

Hex.pmDocumentationLicense

A clean, BEAM-native Elixir implementation of the HyParView hybrid partial-view membership protocol from Leitão, Pereira, and Rodrigues (DSN 2007).

Why this exists

If your Elixir cluster is approaching the point where libcluster's full-mesh distribution starts to crack (≈50–100 nodes), HyParView is the membership primitive that Riak Core and Partisan are built on.

Each node maintains:

The result: bounded per-node connection cost, fast failure detection (TCP), and high reliability under massive node churn — the paper demonstrates >90% delivery with 95% of nodes failed.

Status

Pre-release. v0.1.0 is feature-complete and tested. Not yet on hex.pm.

Installation

Once published, add to mix.exs:

def deps do
  [{:hyparview, "~> 0.1"}]
end

Quick start

Built-in test transport (in-process, ideal for tests)

{:ok, contact_pid} = HyParView.start_link(
  peer: HyParView.Peer.new("contact", make_ref()),
  transport: HyParView.Transport.Test
)

{:ok, joiner_pid} = HyParView.start_link(
  peer: HyParView.Peer.new("joiner", make_ref()),
  contacts: [HyParView.Peer.new("contact", make_ref())],  # same address
  transport: HyParView.Transport.Test
)

HyParView.active_view(contact_pid)
# => [%HyParView.Peer{id: "joiner", ...}]

Real TCP transport

contact = HyParView.Peer.new("node-a", {{127, 0, 0, 1}, 4000})
joiner  = HyParView.Peer.new("node-b", {{127, 0, 0, 1}, 4001})

{:ok, _} = HyParView.start_link(
  peer: contact,
  transport: HyParView.Transport.TCP
)

{:ok, joiner_pid} = HyParView.start_link(
  peer: joiner,
  contacts: [contact],
  transport: HyParView.Transport.TCP
)

HyParView.subscribe(joiner_pid)
receive do
  {:hyparview, {:peer_up, peer}} -> IO.puts("up: #{peer.id}")
end

Power-user mode: pure protocol core

For applications that want to drive the protocol from their own event loop (no GenServer, no transport) — HyParView.State is a pure functional state machine:

state = HyParView.State.new(local_peer, HyParView.Config.new())
{state, actions} = HyParView.State.handle_message(state, message)
{state, actions} = HyParView.State.tick_shuffle(state)
{state, actions} = HyParView.State.connection_lost(state, peer)

# `actions` is a list of one or more:
#   {:notify_up, peer}
#   {:notify_down, peer}
#   {:send, peer, message}

This is the canonical core; HyParView.Server is a thin wrapper around it with timers, transport, and subscriber notifications.

Comparison

libcluster Horde HyParView
Concern Discovery + Node.connect Distributed registry/supervisor Membership
Topology Full mesh Full mesh (uses libcluster) Partial mesh (active view)
Scale ceiling ~50–100 nodes ~50–100 nodes Hundreds to thousands
Failure detection BEAM net_kernel Inherited TCP, fast
Use this when You want a small cluster You want a registry/supervisor You want membership at scale

A future companion package, libcluster_hyparview, will combine HyParView for membership with Node.connect calls scoped to the active view — giving Elixir users partial-mesh BEAM distribution.

Architecture

                   ┌─────────────────────────────┐
                   │  HyParView.State (pure)     │
                   │  views, transitions, actions│
                   └──────────────┬──────────────┘
                                  │ %{peer, message}
                                  │ %{action, ...}
                   ┌──────────────┴──────────────┐
                   │  HyParView.Server (GenServer)│
                   │  timers, subscribers,        │
                   │  telemetry, transport plumb. │
                   └──────────────┬──────────────┘
                                  │
                   ┌──────────────┴──────────────┐
                   │  HyParView.Transport         │
                   │  (behaviour)                 │
                   └────┬───────────────┬─────────┘
                        │               │
              ┌─────────┴───┐  ┌────────┴─────────┐
              │ Transport.  │  │ Transport.TCP    │
              │ Test        │  │ + Connection      │
              │ (in-proc)   │  │ (gen_statem/peer) │
              └─────────────┘  └───────────────────┘

Configuration

HyParView.start_link(
  peer: %Peer{...},
  transport: HyParView.Transport.TCP,
  contacts: [%Peer{...}, ...],
  config: [
    active_view_size: 5,           # paper default
    passive_view_size: 30,         # paper default
    arwl: 6,                       # active random-walk length
    prwl: 3,                       # passive random-walk length
    shuffle_active_count: 3,       # ka — active samples per shuffle
    shuffle_passive_count: 4,      # kp — passive samples per shuffle
    shuffle_interval: 30_000,      # ms between shuffle ticks
    shuffle_ttl: 6                 # walk depth for SHUFFLE
  ]
)

Telemetry

All view changes emit events under the configured prefix (default [:hyparview]):

:telemetry.attach_many(
  "hyparview-handler",
  Enum.map(HyParView.Telemetry.event_paths(), &([:hyparview | &1])),
  fn event, measurements, metadata, _ ->
    Logger.info("#{inspect(event)}: #{inspect(metadata)}")
  end,
  nil
)

See HyParView.Telemetry for the full event catalog.

Custom transports

The HyParView.Transport behaviour has three callbacks. A minimal custom transport (e.g., over Erlang distribution) is around 30 lines. See HyParView.Transport.Test for the reference implementation.

What this library is, and isn't

Question Answer
Membership? Yes — JOIN, FORWARD_JOIN, NEIGHBOR, DISCONNECT, SHUFFLE.
Broadcast (Plumtree)? No. A separate library — membership and broadcast are different concerns.
Distributed registry / supervisor? No. That's Horde.
Replacement for libcluster? Adjacent. See Comparison above.
TLS / auth? Out of scope. Wrap the transport behaviour with your own.

Contributing

See CONTRIBUTING.md. All commits require DCO sign-off.

License

Apache 2.0. See LICENSE.

Citation

João Leitão, José Pereira, Luís Rodrigues. HyParView: a membership protocol for reliable gossip-based broadcast. In Proc. of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN '07), Edinburgh, UK, June 2007.