Hedged
126 tests, zero warnings, Dialyzer + Credo strict clean.
Hedged requests for Elixir — fire a backup request after a delay, take whichever finishes first, cancel the rest. A tail-latency optimization with adaptive delay tuning.
Why hedged requests?
A single slow backend call can dominate your p99. Retries help with failures, but they don't help when the server is just slow. Hedged requests solve this by racing a backup against the original — your latency becomes the minimum of two attempts, not the maximum.
Google's "Tail at Scale" paper showed that issuing a redundant request after a brief delay can reduce p99 latency by up to 50% while adding only ~5% extra load.
No existing Elixir library does this. Go has cristalhq/hedgedhttp (HTTP-only) and gRPC has built-in hedging. We built a generic, composable version for any Elixir function.
Hedging vs retries
| Retries | Hedging | |
|---|---|---|
| Trigger | Failure | Timeout (slow response) |
| Concurrent requests | No (sequential) | Yes (parallel) |
| Solves | Transient errors | Tail latency |
| Extra load | Only on failure | ~5-10% steady state |
| Latency impact | Additive (delay + retry) | Subtractive (min of two) |
They're complementary — use both. See Composing with BackoffRetry below.
Design goals
- Generic — works with any
fn -> result end, not just HTTP. Database queries, RPC calls, DNS lookups, file reads — anything - Adaptive delay — auto-tunes from observed latency percentiles so you don't have to guess a static delay
- Token bucket — prevents hedge storms under sustained load (~10% hedge rate at defaults)
- Non-fatal fast-forward — transient errors like
:timeoutor:econnrefusedimmediately fire the next hedge without waiting - Race mode —
delay: 0fires all requests simultaneously, takes the fastest - Staggered dispatch — fire up to N requests with configurable delays between them
- Automatic cancellation — losers are shut down immediately, no wasted work
- Composable — stateless mode for simple cases, supervised tracker for production
- Testable — injectable
now_fnfor deterministic, instant test suites - Observable — built-in stats: total requests, hedge rate, hedge win rate, p50/p95/p99, current delay, token level
- Zero runtime deps — just Elixir/OTP
- Supervision-ready —
child_spec/1andstart_link/1for your supervision tree
Installation
def deps do
[{:hedged, "~> 0.1.0"}]
endQuick start
Stateless — fixed delay
# Fire a backup after 100ms if the first hasn't responded
{:ok, body} = Hedged.run(fn -> fetch(url) end)
# With options
{:ok, body} = Hedged.run(fn -> fetch(url) end,
delay: 50,
max_requests: 3,
timeout: 2_000
)Adaptive — delay auto-tunes from observed latency
# Add to your supervision tree
children = [
{Hedged, name: MyApp.Hedged, percentile: 95, min_delay: 5, max_delay: 500}
]
# Delay adapts automatically based on p95 latency
{:ok, body} = Hedged.run(MyApp.Hedged, fn -> fetch(url) end, [])
# Check how it's doing
Hedged.Tracker.stats(MyApp.Hedged)
# => %{total_requests: 1042, hedged_requests: 98, hedge_won: 31,
# p50: 12, p95: 45, p99: 120, current_delay: 45, tokens: 8.2}Real-world examples
HTTP with non-fatal errors
Hedged.run(
fn -> HTTPClient.get(url) end,
delay: 50,
max_requests: 3,
non_fatal: fn
:timeout -> true
:econnrefused -> true
_ -> false
end,
on_hedge: fn attempt ->
Logger.info("Firing hedge ##{attempt}")
end
)
When non_fatal returns true, the next hedge fires immediately instead of waiting for the delay — you don't waste time sleeping on errors you know are transient.
Database query with tight deadline
Hedged.run(fn -> Repo.query("SELECT ...") end,
delay: 20,
timeout: 1_000
)DNS resolution — race mode
Fire all at once, take the fastest:
Hedged.run(fn -> dns_lookup(host) end,
delay: 0,
max_requests: 3
)Adaptive with multiple services
# Each service gets its own tracker with tuned settings
children = [
{Hedged, name: MyApp.PaymentHedge, percentile: 99, max_delay: 2_000},
{Hedged, name: MyApp.SearchHedge, percentile: 90, max_delay: 200},
{Hedged, name: MyApp.CacheHedge, percentile: 95, min_delay: 1, max_delay: 50}
]
# In your code
{:ok, result} = Hedged.run(MyApp.PaymentHedge, fn -> charge(card) end, [])
{:ok, results} = Hedged.run(MyApp.SearchHedge, fn -> search(query) end, [])Composing with BackoffRetry
Hedge the outer call, retry the inner:
Hedged.run(fn ->
BackoffRetry.retry(fn -> flaky_api_call() end,
max_attempts: 2,
backoff: :constant,
base_delay: 50
)
end, delay: 200)How adaptive delay works
The adaptive tracker solves the "what delay should I pick?" problem by learning from your traffic:
Request lifecycle:
1. get_config() → {delay_ms, allow_hedge?} # Tracker computes from recent latencies
2. Runner fires request #1, waits delay_ms
3. If no response, fires hedge #2
4. Winner returns, losers cancelled
5. record(%{latency_ms, hedged?, hedge_won?}) # Tracker learns from this request
6. Next request → delay has shiftedPercentile-based delay: A circular buffer (default 1000 samples) tracks recent latencies. The delay is set to the target percentile (e.g., p95) of that buffer, clamped to [min_delay, max_delay]. Old samples are evicted FIFO, so the delay naturally tracks recent conditions.
Requests 1-9: delay = 100ms (cold start, not enough samples)
Request 50: p95 = 22ms → delay 22ms (learned from traffic)
[service degrades]
Request 200: p95 = 180ms → delay 180ms (adapted to new conditions)
[service recovers]
Request 400: p95 = 25ms → delay 25ms (old slow samples evicted)Token bucket: Prevents hedge storms. Each request earns a small credit (default 0.1 tokens). Each hedge costs more (default 1.0 token). When tokens drop below threshold, hedging is disabled entirely. At defaults this naturally limits the hedge rate to ~10% under steady state. After a burst of hedging depletes tokens, normal traffic replenishes them.
Options
Stateless (run/2)
| Option | Default | Description |
|---|---|---|
delay | 100 | ms before firing the next hedge |
max_requests | 2 | total concurrent attempts |
timeout | 5_000 | overall deadline in ms |
non_fatal | fn _ -> false end |
predicate: true fires next hedge immediately |
on_hedge | nil | fn attempt -> any callback before each hedge |
now_fn | System.monotonic_time/1 | injectable clock for testing |
Adaptive tracker (start_link/1)
| Option | Default | Description |
|---|---|---|
name | required | registered name |
percentile | 95 | target percentile for adaptive delay |
buffer_size | 1000 | max latency samples to keep |
min_delay | 1 | floor for adaptive delay (ms) |
max_delay | 5_000 | ceiling for adaptive delay (ms) |
initial_delay | 100 | delay used before enough samples collected |
min_samples | 10 | samples needed before adapting |
token_max | 10 | token bucket capacity |
token_success_credit | 0.1 | tokens earned per request |
token_hedge_cost | 1.0 | tokens spent per hedge |
token_threshold | 1.0 | min tokens to allow hedging |
Tuning the token bucket
The defaults give ~10% hedge rate. To adjust:
| Desired behavior | Configuration |
|---|---|
| More aggressive hedging (~20%) | token_success_credit: 0.2 |
| Conservative hedging (~5%) | token_success_credit: 0.05 |
| Always allow hedging | token_threshold: 0 |
| Disable hedging temporarily | token_max: 0 |
| Larger hedge budget bursts | token_max: 20 |
When not to hedge
Hedging adds ~5% extra load at defaults. Don't use it when:
- Non-idempotent operations — double-charging a credit card is bad. Only hedge reads or idempotent writes
- Resource-constrained backends — if your DB is at capacity, extra queries make things worse
- Already fast — if p99 is already acceptable, hedging adds complexity for no gain
- Single backend instance — hedging helps when slowness is per-request (GC pauses, network jitter), not when the entire service is slow
Algorithm
- Fire request #1 immediately
-
Wait
delayms - If response arrived — return it, cancel nothing else pending
- Fire request #2 (the hedge)
- Wait for any response: first success wins, losers cancelled
-
If a failure is
non_fatal— fire next hedge immediately (fast-forward) -
If all attempts fail — return
{:error, last_reason} -
If overall
timeouthit — cancel everything, return{:error, :timeout}
Key behaviors:
-
If a request fails before the delay expires, the next hedge fires when the delay would have normally triggered (or immediately if
non_fatal) -
If all pending requests have failed but
max_requestsisn't reached, the next hedge fires immediately — no point waiting -
Raises, exits, and throws in tasks are captured via
:DOWNmessages — they don't crash the caller
Error handling
Exceptions, exits, and throws inside hedged tasks are captured — they never crash the caller:
| Source | Wrapped as |
|---|---|
raise "boom" | {:error, %RuntimeError{}} |
exit(:reason) | {:error, {:reason, stacktrace}} |
throw(:value) | {:error, {{:nocatch, :value}, stacktrace}} |
If multiple tasks are in flight and one crashes, the others keep running. You still get a result as long as any task succeeds.
Return values
| Scenario | Return |
|---|---|
Function returns {:ok, value} | {:ok, value} |
Bare value (e.g. 42) | {:ok, 42} |
:ok | {:ok, :ok} |
{:ok, {:error, _}} | {:ok, {:error, _}} (inner value preserved) |
All attempts return {:error, r} | {:error, r} (last error) |
| Overall timeout exceeded | {:error, :timeout} |
| All tasks raise / exit / throw | {:error, reason} (from :DOWN) |
:error | {:error, :error} |
Stats & observability
Hedged.Tracker.stats(MyApp.Hedged)Returns:
| Field | Description |
|---|---|
total_requests | Total requests processed |
hedged_requests | Requests that triggered at least one hedge |
hedge_won | Times the hedge beat the original request |
p50 | Median observed latency (ms) |
p95 | 95th percentile latency (ms) |
p99 | 99th percentile latency (ms) |
current_delay | Current adaptive delay being used (ms) |
tokens | Current token bucket level |
Architecture
lib/
hedged.ex # Public API: run/2 (stateless), run/3 (adaptive),
# start_link/1, child_spec/1
hedged/
runner.ex # Core engine: staggered dispatch + receive loop
tracker.ex # GenServer: adaptive delay + token bucket + stats
percentile.ex # Circular buffer + percentile calculationThree layers, each independently useful:
- Runner — pure hedging engine, no state, no GenServer
- Tracker — adaptive delay + throttling, plugs into Runner
- Percentile — data structure, no processes, usable standalone
License
MIT