Hedged

126 tests, zero warnings, Dialyzer + Credo strict clean.

Hedged requests for Elixir — fire a backup request after a delay, take whichever finishes first, cancel the rest. A tail-latency optimization with adaptive delay tuning.

Why hedged requests?

A single slow backend call can dominate your p99. Retries help with failures, but they don't help when the server is just slow. Hedged requests solve this by racing a backup against the original — your latency becomes the minimum of two attempts, not the maximum.

Google's "Tail at Scale" paper showed that issuing a redundant request after a brief delay can reduce p99 latency by up to 50% while adding only ~5% extra load.

No existing Elixir library does this. Go has cristalhq/hedgedhttp (HTTP-only) and gRPC has built-in hedging. We built a generic, composable version for any Elixir function.

Hedging vs retries

	Retries	Hedging
Trigger	Failure	Timeout (slow response)
Concurrent requests	No (sequential)	Yes (parallel)
Solves	Transient errors	Tail latency
Extra load	Only on failure	~5-10% steady state
Latency impact	Additive (delay + retry)	Subtractive (min of two)

They're complementary — use both. See Composing with BackoffRetry below.

Design goals

Generic — works with any fn -> result end, not just HTTP. Database queries, RPC calls, DNS lookups, file reads — anything
Adaptive delay — auto-tunes from observed latency percentiles so you don't have to guess a static delay
Token bucket — prevents hedge storms under sustained load (~10% hedge rate at defaults)
Non-fatal fast-forward — transient errors like :timeout or :econnrefused immediately fire the next hedge without waiting
Race mode — delay: 0 fires all requests simultaneously, takes the fastest
Staggered dispatch — fire up to N requests with configurable delays between them
Automatic cancellation — losers are shut down immediately, no wasted work
Composable — stateless mode for simple cases, supervised tracker for production
Testable — injectable now_fn for deterministic, instant test suites
Observable — built-in stats: total requests, hedge rate, hedge win rate, p50/p95/p99, current delay, token level
Zero runtime deps — just Elixir/OTP
Supervision-ready — child_spec/1 and start_link/1 for your supervision tree

Installation

def deps do
  [{:hedged, "~> 0.1.0"}]
end

Quick start

Stateless — fixed delay

# Fire a backup after 100ms if the first hasn&#39;t responded
{:ok, body} = Hedged.run(fn -> fetch(url) end)

# With options
{:ok, body} = Hedged.run(fn -> fetch(url) end,
  delay: 50,
  max_requests: 3,
  timeout: 2_000
)

Adaptive — delay auto-tunes from observed latency

# Add to your supervision tree
children = [
  {Hedged, name: MyApp.Hedged, percentile: 95, min_delay: 5, max_delay: 500}
]

# Delay adapts automatically based on p95 latency
{:ok, body} = Hedged.run(MyApp.Hedged, fn -> fetch(url) end, [])

# Check how it&#39;s doing
Hedged.Tracker.stats(MyApp.Hedged)
# => %{total_requests: 1042, hedged_requests: 98, hedge_won: 31,
#       p50: 12, p95: 45, p99: 120, current_delay: 45, tokens: 8.2}

Real-world examples

HTTP with non-fatal errors

Hedged.run(
  fn -> HTTPClient.get(url) end,
  delay: 50,
  max_requests: 3,
  non_fatal: fn
    :timeout -> true
    :econnrefused -> true
    _ -> false
  end,
  on_hedge: fn attempt ->
    Logger.info("Firing hedge ##{attempt}")
  end
)

When non_fatal returns true, the next hedge fires immediately instead of waiting for the delay — you don't waste time sleeping on errors you know are transient.

Database query with tight deadline

Hedged.run(fn -> Repo.query("SELECT ...") end,
  delay: 20,
  timeout: 1_000
)

DNS resolution — race mode

Fire all at once, take the fastest:

Hedged.run(fn -> dns_lookup(host) end,
  delay: 0,
  max_requests: 3
)

Adaptive with multiple services

# Each service gets its own tracker with tuned settings
children = [
  {Hedged, name: MyApp.PaymentHedge, percentile: 99, max_delay: 2_000},
  {Hedged, name: MyApp.SearchHedge, percentile: 90, max_delay: 200},
  {Hedged, name: MyApp.CacheHedge, percentile: 95, min_delay: 1, max_delay: 50}
]

# In your code
{:ok, result} = Hedged.run(MyApp.PaymentHedge, fn -> charge(card) end, [])
{:ok, results} = Hedged.run(MyApp.SearchHedge, fn -> search(query) end, [])

Composing with BackoffRetry

Hedge the outer call, retry the inner:

Hedged.run(fn ->
  BackoffRetry.retry(fn -> flaky_api_call() end,
    max_attempts: 2,
    backoff: :constant,
    base_delay: 50
  )
end, delay: 200)

How adaptive delay works

The adaptive tracker solves the "what delay should I pick?" problem by learning from your traffic:

Request lifecycle:
1. get_config() → {delay_ms, allow_hedge?}   # Tracker computes from recent latencies
2. Runner fires request #1, waits delay_ms
3. If no response, fires hedge #2
4. Winner returns, losers cancelled
5. record(%{latency_ms, hedged?, hedge_won?}) # Tracker learns from this request
6. Next request → delay has shifted

Percentile-based delay: A circular buffer (default 1000 samples) tracks recent latencies. The delay is set to the target percentile (e.g., p95) of that buffer, clamped to [min_delay, max_delay]. Old samples are evicted FIFO, so the delay naturally tracks recent conditions.

Requests 1-9:   delay = 100ms          (cold start, not enough samples)
Request 50:     p95 = 22ms → delay 22ms (learned from traffic)
[service degrades]
Request 200:    p95 = 180ms → delay 180ms (adapted to new conditions)
[service recovers]
Request 400:    p95 = 25ms → delay 25ms  (old slow samples evicted)

Token bucket: Prevents hedge storms. Each request earns a small credit (default 0.1 tokens). Each hedge costs more (default 1.0 token). When tokens drop below threshold, hedging is disabled entirely. At defaults this naturally limits the hedge rate to ~10% under steady state. After a burst of hedging depletes tokens, normal traffic replenishes them.

Options

Stateless (`run/2`)

Option	Default	Description
`delay`	`100`	ms before firing the next hedge
`max_requests`	`2`	total concurrent attempts
`timeout`	`5_000`	overall deadline in ms
`non_fatal`	`fn _ -> false end`	predicate: `true` fires next hedge immediately
`on_hedge`	`nil`	`fn attempt -> any` callback before each hedge
`now_fn`	`System.monotonic_time/1`	injectable clock for testing

Adaptive tracker (`start_link/1`)

Option	Default	Description
`name`	required	registered name
`percentile`	`95`	target percentile for adaptive delay
`buffer_size`	`1000`	max latency samples to keep
`min_delay`	`1`	floor for adaptive delay (ms)
`max_delay`	`5_000`	ceiling for adaptive delay (ms)
`initial_delay`	`100`	delay used before enough samples collected
`min_samples`	`10`	samples needed before adapting
`token_max`	`10`	token bucket capacity
`token_success_credit`	`0.1`	tokens earned per request
`token_hedge_cost`	`1.0`	tokens spent per hedge
`token_threshold`	`1.0`	min tokens to allow hedging

Tuning the token bucket

The defaults give ~10% hedge rate. To adjust:

Desired behavior	Configuration
More aggressive hedging (~20%)	`token_success_credit: 0.2`
Conservative hedging (~5%)	`token_success_credit: 0.05`
Always allow hedging	`token_threshold: 0`
Disable hedging temporarily	`token_max: 0`
Larger hedge budget bursts	`token_max: 20`

When not to hedge

Hedging adds ~5% extra load at defaults. Don't use it when:

Non-idempotent operations — double-charging a credit card is bad. Only hedge reads or idempotent writes
Resource-constrained backends — if your DB is at capacity, extra queries make things worse
Already fast — if p99 is already acceptable, hedging adds complexity for no gain
Single backend instance — hedging helps when slowness is per-request (GC pauses, network jitter), not when the entire service is slow

Algorithm

Fire request #1 immediately
Wait delay ms
If response arrived — return it, cancel nothing else pending
Fire request #2 (the hedge)
Wait for any response: first success wins, losers cancelled
If a failure is non_fatal — fire next hedge immediately (fast-forward)
If all attempts fail — return {:error, last_reason}
If overall timeout hit — cancel everything, return {:error, :timeout}

Key behaviors:

If a request fails before the delay expires, the next hedge fires when the delay would have normally triggered (or immediately if non_fatal)
If all pending requests have failed but max_requests isn't reached, the next hedge fires immediately — no point waiting
Raises, exits, and throws in tasks are captured via :DOWN messages — they don't crash the caller

Error handling

Exceptions, exits, and throws inside hedged tasks are captured — they never crash the caller:

Source	Wrapped as
`raise "boom"`	`{:error, %RuntimeError{}}`
`exit(:reason)`	`{:error, {:reason, stacktrace}}`
`throw(:value)`	`{:error, {{:nocatch, :value}, stacktrace}}`

If multiple tasks are in flight and one crashes, the others keep running. You still get a result as long as any task succeeds.

Return values

Scenario	Return
Function returns `{:ok, value}`	`{:ok, value}`
Bare value (e.g. `42`)	`{:ok, 42}`
`:ok`	`{:ok, :ok}`
`{:ok, {:error, _}}`	`{:ok, {:error, _}}` (inner value preserved)
All attempts return `{:error, r}`	`{:error, r}` (last error)
Overall timeout exceeded	`{:error, :timeout}`
All tasks raise / exit / throw	`{:error, reason}` (from `:DOWN`)
`:error`	`{:error, :error}`

Stats & observability

Hedged.Tracker.stats(MyApp.Hedged)

Returns:

Field	Description
`total_requests`	Total requests processed
`hedged_requests`	Requests that triggered at least one hedge
`hedge_won`	Times the hedge beat the original request
`p50`	Median observed latency (ms)
`p95`	95th percentile latency (ms)
`p99`	99th percentile latency (ms)
`current_delay`	Current adaptive delay being used (ms)
`tokens`	Current token bucket level

Architecture

lib/
  hedged.ex              # Public API: run/2 (stateless), run/3 (adaptive),
                         # start_link/1, child_spec/1
  hedged/
    runner.ex            # Core engine: staggered dispatch + receive loop
    tracker.ex           # GenServer: adaptive delay + token bucket + stats
    percentile.ex        # Circular buffer + percentile calculation

Three layers, each independently useful:

Runner — pure hedging engine, no state, no GenServer
Tracker — adaptive delay + throttling, plugs into Runner
Percentile — data structure, no processes, usable standalone

License

MIT