SuperCache

Introduction

High-performance in-memory caching library for Elixir backed by partitioned ETS tables with experimental distributed cluster support. SuperCache provides transparent local and distributed modes with configurable consistency guarantees, batch operations, and multiple data structures.

Features

Partitioned ETS Storage — Reduces contention by splitting data across multiple ETS tables
Multiple Data Structures — Tuples, key-value namespaces, queues, stacks, and struct storage
Distributed Clustering — Automatic node discovery, partition assignment, and replication
Configurable Consistency — Choose between async, sync (quorum), or strong (WAL) replication
Batch Operations — High-throughput bulk writes with put_batch!/1, add_batch/2, remove_batch/2
Performance Optimized — Compile-time log elimination, partition resolution inlining, worker pools, and early termination quorum reads
Health Monitoring — Built-in cluster health checks with telemetry integration

Architecture

SuperCache contains 34 modules organized into 7 layers:

┌─────────────────────────────────────────────────────────────┐
│                    Application Layer                          │
│  SuperCache │ KeyValue │ Queue │ Stack │ Struct               │
└─────────────────────────┬───────────────────────────────────┘
                          │
┌─────────────────────────▼───────────────────────────────────┐
│                    Routing Layer                              │
│  Partition Router (local) │ Cluster Router (distributed)     │
│  Cluster.DistributedStore (shared helpers)                    │
└─────────────────────────┬───────────────────────────────────┘
                          │
┌─────────────────────────▼───────────────────────────────────┐
│                  Replication Layer                            │
│  Replicator (async/sync) │ WAL (strong) │ ThreePhaseCommit   │
└─────────────────────────┬───────────────────────────────────┘
                          │
┌─────────────────────────▼───────────────────────────────────┐
│                   Storage Layer                               │
│  Storage (ETS wrapper) │ EtsHolder (table lifecycle)         │
│  Partition (hashing) │ Partition.Holder (registry)           │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                Cluster Infrastructure                         │
│  Manager │ NodeMonitor │ HealthMonitor │ Metrics │ Stats     │
│  TxnRegistry │ Router                                        │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                Buffer System (lazy_put)                       │
│  Buffer (scheduler-affine) → Internal.Queue → Internal.Stream│
└─────────────────────────────────────────────────────────────┘

Module Overview

Layer	Modules	Responsibility
API	`SuperCache`, `KeyValue`, `Queue`, `Stack`, `Struct`	Public interfaces for all data structures
Routing	`Partition`, `Cluster.Router`, `Cluster.DistributedStore`	Hash-based partition routing and distributed request routing
Replication	`Cluster.Replicator`, `Cluster.WAL`, `Cluster.ThreePhaseCommit`	Async/sync/strong replication engines
Storage	`Storage`, `EtsHolder`, `Partition.Holder`	ETS table management and lifecycle
Cluster	`Cluster.Manager`, `Cluster.NodeMonitor`, `Cluster.HealthMonitor`	Membership, discovery, and health monitoring
Observability	`Cluster.Metrics`, `Cluster.Stats`, `Cluster.TxnRegistry`	Counters, latency tracking, and transaction logs
Buffer	`Buffer`, `Internal.Queue`, `Internal.Stream`	Scheduler-affine write buffers for `lazy_put/1`

Installation

Requirements: Erlang/OTP 25 or later, Elixir 1.15 or later.

Add super_cache to your dependencies in mix.exs:

def deps do
  [
    {:super_cache, "~> 1.2"}
  ]
end

Quick Start

Local Mode

# Start with defaults (num_partition = schedulers, key_pos = 0, partition_pos = 0)
SuperCache.start!()

# Or with custom config
opts = [key_pos: 0, partition_pos: 1, table_type: :bag, num_partition: 4]
SuperCache.start!(opts)

# Basic tuple operations
SuperCache.put!({:user, 1, "Alice"})
SuperCache.get!({:user, 1})
# => [{:user, 1, "Alice"}]

SuperCache.delete!({:user, 1})

Key-Value API

alias SuperCache.KeyValue

KeyValue.add("session", :user_1, %{name: "Alice"})
KeyValue.get("session", :user_1)
# => %{name: "Alice"}

# Batch operations (10-100x faster than individual calls)
KeyValue.add_batch("session", [
  {:user_2, %{name: "Bob"}},
  {:user_3, %{name: "Charlie"}}
])

KeyValue.remove_batch("session", [:user_1, :user_2])

Queue & Stack

alias SuperCache.{Queue, Stack}

# FIFO Queue
Queue.add("jobs", "process_order_1")
Queue.add("jobs", "process_order_2")
Queue.out("jobs")
# => "process_order_1"

Queue.peak("jobs")
# => "process_order_2"

# LIFO Stack
Stack.push("history", "page_a")
Stack.push("history", "page_b")
Stack.pop("history")
# => "page_b"

Struct Storage

alias SuperCache.Struct

defmodule User do
  defstruct [:id, :name, :email]
end

Struct.init(%User{}, :id)
Struct.add(%User{id: 1, name: "Alice", email: "alice@example.com"})
{:ok, user} = Struct.get(%User{id: 1})
# => {:ok, %User{id: 1, name: "Alice", email: "alice@example.com"}}

Complete API Reference

SuperCache (Main API)

Primary entry point for tuple storage with transparent local/distributed mode support.

Lifecycle

SuperCache.start!()
SuperCache.start!(opts)
SuperCache.start()
SuperCache.start(opts)
SuperCache.started?()
SuperCache.stop()

Write Operations

SuperCache.put!(data)
SuperCache.put(data)
SuperCache.lazy_put(data)
SuperCache.put_batch!(data_list)

Read Operations

SuperCache.get!(data, opts \\ [])
SuperCache.get(data, opts \\ [])
SuperCache.get_by_key_partition!(key, partition_data, opts \\ [])
SuperCache.get_same_key_partition!(key, opts \\ [])
SuperCache.get_by_match!(partition_data, pattern, opts \\ [])
SuperCache.get_by_match!(pattern)
SuperCache.get_by_match_object!(partition_data, pattern, opts \\ [])
SuperCache.get_by_match_object!(pattern)
SuperCache.scan!(partition_data, fun, acc)
SuperCache.scan!(fun, acc)

Delete Operations

SuperCache.delete!(data)
SuperCache.delete(data)
SuperCache.delete_all()
SuperCache.delete_by_match!(partition_data, pattern)
SuperCache.delete_by_match!(pattern)
SuperCache.delete_by_key_partition!(key, partition_data)
SuperCache.delete_same_key_partition!(key)

Partition-Specific Operations

SuperCache.put_partition!(data, partition)
SuperCache.get_partition!(key, partition)
SuperCache.delete_partition!(key, partition)
SuperCache.put_partition_by_idx!(data, partition_idx)
SuperCache.get_partition_by_idx!(key, partition_idx)
SuperCache.delete_partition_by_idx!(key, partition_idx)

Statistics & Mode

SuperCache.stats()
SuperCache.cluster_stats()
SuperCache.distributed?()

KeyValue

In-memory key-value namespaces backed by ETS partitions. Multiple independent namespaces coexist using different kv_name values.

KeyValue.add(kv_name, key, value)
KeyValue.get(kv_name, key, default \\ nil, opts \\ [])
KeyValue.remove(kv_name, key)
KeyValue.remove_all(kv_name)

KeyValue.keys(kv_name, opts \\ [])
KeyValue.values(kv_name, opts \\ [])
KeyValue.count(kv_name, opts \\ [])
KeyValue.to_list(kv_name, opts \\ [])

KeyValue.add_batch(kv_name, pairs)
KeyValue.remove_batch(kv_name, keys)

Queue

Named FIFO queues backed by ETS partitions.

Queue.add(queue_name, value)
Queue.out(queue_name, default \\ nil)
Queue.peak(queue_name, default \\ nil, opts \\ [])
Queue.count(queue_name, opts \\ [])
Queue.get_all(queue_name)

Stack

Named LIFO stacks backed by ETS partitions.

Stack.push(stack_name, value)
Stack.pop(stack_name, default \\ nil)
Stack.count(stack_name, opts \\ [])
Stack.get_all(stack_name)

Struct

In-memory struct store backed by ETS partitions. Call init/2 once per struct type before using.

Struct.init(struct, key \\ :id)
Struct.add(struct)
Struct.get(struct, opts \\ [])
Struct.get_all(struct, opts \\ [])
Struct.remove(struct)
Struct.remove_all(struct)

Distributed Mode

SuperCache supports distributing data across a cluster of Erlang nodes with configurable consistency guarantees.

Configuration

All nodes must share identical partition configuration:

# config/config.exs
config :super_cache,
  auto_start:         true,
  key_pos:            0,
  partition_pos:      0,
  cluster:            :distributed,
  replication_mode:   :async,      # :async | :sync | :strong
  replication_factor: 2,           # primary + 1 replica
  table_type:         :set,
  num_partition:      8            # Must match across ALL nodes

# config/runtime.exs
config :super_cache,
  cluster_peers: [
    :"node1@10.0.0.1",
    :"node2@10.0.0.2",
    :"node3@10.0.0.3"
  ]

Replication Modes

Mode	Guarantee	Latency	Use Case
`:async`	Eventual consistency	~50-100µs	High-throughput caches, session data
`:sync`	Majority ack (adaptive quorum)	~100-300µs	Balanced durability/performance
`:strong`	WAL-based strong consistency	~200µs	Critical data requiring durability

Async Mode: Fire-and-forget replication via Task.Supervisor worker pool. Returns immediately after local write.

Sync Mode: Adaptive quorum writes — returns :ok once a strict majority of replicas acknowledge, avoiding waits for slow stragglers.

Strong Mode: Write-Ahead Log (WAL) replaces heavy 3PC. Writes locally first, then async replicates with majority acknowledgment. ~7x faster than traditional 3PC.

Read Modes (Distributed)

# Local read (fastest, may be stale)
SuperCache.get!({:user, 1})

# Primary read (consistent with primary node)
SuperCache.get!({:user, 1}, read_mode: :primary)

# Quorum read (majority agreement, early termination)
SuperCache.get!({:user, 1}, read_mode: :quorum)

Quorum reads use early termination — returns as soon as a strict majority agrees, avoiding waits for slow replicas.

Manual Bootstrap

SuperCache.Cluster.Bootstrap.start!(
  key_pos: 0,
  partition_pos: 0,
  cluster: :distributed,
  replication_mode: :strong,
  replication_factor: 2,
  num_partition: 8
)

Performance

Benchmarks (Local Mode, 4 partitions)

Operation	Throughput	Notes
`put!`	~1.2M ops/sec	~33% overhead vs raw ETS
`get!`	~2.1M ops/sec	Near raw ETS speed
`KeyValue.add_batch` (10k)	~1.1M ops/sec	Single ETS insert

Distributed Latency

Operation	Async	Sync (Quorum)	Strong (WAL)
Write	~50-100µs	~100-300µs	~200µs
Read (local)	~10µs	~10µs	~10µs
Read (quorum)	~100-200µs	~100-200µs	~100-200µs

Performance Optimizations

Compile-time log elimination — Debug macros expand to :ok when disabled (zero overhead)
Partition resolution inlining — Single function call with @compile {:inline}
Batch ETS operations — :ets.insert/2 with lists instead of per-item calls
Async replication worker pool — Task.Supervisor eliminates per-operation spawn/1 overhead
Adaptive quorum writes — Returns on majority ack, not all replicas
Quorum read early termination — Stops waiting once majority is reached
WAL-based strong consistency — Replaces 3PC with fast local write + async replication + majority ack
Persistent-term config — Hot-path config keys served from :persistent_term for O(1) access
Scheduler-affine buffers — lazy_put/1 routes to buffer on same scheduler
Protected ETS tables — Partition.Holder uses :protected ETS for lock-free reads

WAL Configuration

config :super_cache, :wal,
  majority_timeout: 2_000,  # ms to wait for majority ack
  cleanup_interval: 5_000,  # ms between WAL cleanup cycles
  max_pending: 10_000       # max uncommitted entries

Examples

The examples/ directory contains runnable examples:

examples/local_mode_example.exs — Complete local mode demonstration covering all APIs (tuple storage, KeyValue, Queue, Stack, Struct, batch operations)
examples/distributed_mode_example.exs — Distributed mode demonstration with cluster configuration, replication modes, read modes, and health monitoring

Run examples with:

mix run examples/local_mode_example.exs
mix run examples/distributed_mode_example.exs

Configuration Options

Option	Type	Default	Description
`key_pos`	integer	`0`	Tuple index for ETS key lookup
`partition_pos`	integer	`0`	Tuple index for partition hashing
`num_partition`	integer	schedulers	Number of ETS partitions
`table_type`	atom	`:set`	ETS table type (`:set`, `:bag`, `:ordered_set`, `:duplicate_bag`)
`table_prefix`	string	`"SuperCache.Storage.Ets"`	Prefix for ETS table atom names
`cluster`	atom	`:local`	`:local` or `:distributed`
`replication_mode`	atom	`:async`	`:async`, `:sync`, or `:strong`
`replication_factor`	integer	`2`	Total copies (primary + replicas)
`cluster_peers`	list	`[]`	List of peer node atoms
`auto_start`	boolean	`false`	Auto-start on application boot
`debug_log`	boolean	`false`	Enable debug logging (compile-time)

Health Monitoring

SuperCache includes a built-in health monitor that continuously tracks:

Node connectivity — RTT measurement via :erpc
Replication lag — Probe-based delay measurement
Partition balance — Size variance across nodes
Operation success rates — Failed vs total operations

Access health data:

SuperCache.Cluster.HealthMonitor.cluster_health()
SuperCache.Cluster.HealthMonitor.node_health(node)
SuperCache.Cluster.HealthMonitor.replication_lag(partition_idx)
SuperCache.Cluster.HealthMonitor.partition_balance()
SuperCache.Cluster.HealthMonitor.force_check()

Health data is also emitted via :telemetry events:

[:super_cache, :health, :check] — Periodic health check results
[:super_cache, :health, :alert] — Threshold violations

Debug Logging

Enable at compile time (zero overhead in production):

# config/config.exs
config :super_cache, debug_log: true

Or toggle at runtime:

SuperCache.Log.enable(true)
SuperCache.Log.enable(false)

Troubleshooting

Common Issues

"tuple size is lower than key_pos" — Ensure your tuples have enough elements for the configured key_pos.

"Partition count mismatch" — All nodes in a cluster must have the same num_partition value.

"Replication lag increasing" — Check network connectivity between nodes. Use HealthMonitor.cluster_health() to diagnose.

"Quorum reads timing out" — Ensure majority of nodes are reachable, check :erpc connectivity.

Performance Tips

Use put_batch!/1 for bulk inserts (10-100x faster)
Use KeyValue.add_batch/2 for key-value bulk operations
Prefer :async replication mode for high-throughput caches
Use read_mode: :local when eventual consistency is acceptable
Enable compile-time debug_log: false for production (default)
Monitor health metrics and wire telemetry to Prometheus/Datadog

Guides

Usage Guide — Complete API reference and usage examples
Distributed Guide — Detailed distributed mode documentation
Developer Guide — Development, testing, benchmarking, and contribution guide

Testing

# All tests (includes cluster tests)
mix test

# Unit tests only — no distribution needed
mix test --exclude cluster

# Cluster tests only
mix test.cluster

# Specific test file
mix test test/kv_test.exs

# With warnings as errors
mix test --warnings-as-errors

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Code Style

Run formatter: mix format
Check for warnings: mix compile --warnings-as-errors
Run tests: mix test --exclude cluster

License

MIT License. See LICENSE for details.

Changelog

v1.2.1

Unified API — Local and distributed modes now use the same modules (no separate Distributed.* namespaces)
Health Monitor — Added cluster_health/0, node_health/1, replication_lag/1, partition_balance/0
Read-Your-Writes — Router tracks recent writes and forces :primary reads for consistency
NodeMonitor — Supports static :nodes, dynamic :nodes_mfa, and legacy all-node watching
Buffer System — Scheduler-affine write buffers for lazy_put/1 with Internal.Queue and Internal.Stream
Examples — Added examples/local_mode_example.exs and examples/distributed_mode_example.exs
Documentation — Complete module reference with all 34 modules documented

v1.1.0

WAL-based strong consistency — Replaces 3PC with ~7x faster writes (~200µs vs ~1500µs)
Adaptive quorum writes — Sync mode returns on majority ack, not all replicas
Replication worker pool — Eliminates per-operation spawn/1 overhead
Batch API optimizations — add_batch/2 uses single ETS insert
Quorum read early termination — Stops waiting once majority is reached
Compile-time log elimination — Zero overhead when debug disabled
Partition resolution inlining — Faster hot-path lookups

v1.0.0

Initial release with ETS-backed caching
Distributed mode with 3PC consistency
Queue, Stack, KeyValue, and Struct APIs