Hedged

HexCI

126 tests, zero warnings, Dialyzer + Credo strict clean.

Hedged requests for Elixir — fire a backup request after a delay, take whichever finishes first, cancel the rest. A tail-latency optimization with adaptive delay tuning.

Why hedged requests?

A single slow backend call can dominate your p99. Retries help with failures, but they don't help when the server is just slow. Hedged requests solve this by racing a backup against the original — your latency becomes the minimum of two attempts, not the maximum.

Google's "Tail at Scale" paper showed that issuing a redundant request after a brief delay can reduce p99 latency by up to 50% while adding only ~5% extra load.

No existing Elixir library does this. Go has cristalhq/hedgedhttp (HTTP-only) and gRPC has built-in hedging. We built a generic, composable version for any Elixir function.

Hedging vs retries

Retries Hedging
Trigger Failure Timeout (slow response)
Concurrent requests No (sequential) Yes (parallel)
Solves Transient errors Tail latency
Extra load Only on failure ~5-10% steady state
Latency impact Additive (delay + retry) Subtractive (min of two)

They're complementary — use both. See Composing with BackoffRetry below.

Design goals

Installation

def deps do
  [{:hedged, "~> 0.1.0"}]
end

Quick start

Stateless — fixed delay

# Fire a backup after 100ms if the first hasn't responded
{:ok, body} = Hedged.run(fn -> fetch(url) end)

# With options
{:ok, body} = Hedged.run(fn -> fetch(url) end,
  delay: 50,
  max_requests: 3,
  timeout: 2_000
)

Adaptive — delay auto-tunes from observed latency

# Add to your supervision tree
children = [
  {Hedged, name: MyApp.Hedged, percentile: 95, min_delay: 5, max_delay: 500}
]

# Delay adapts automatically based on p95 latency
{:ok, body} = Hedged.run(MyApp.Hedged, fn -> fetch(url) end, [])

# Check how it's doing
Hedged.Tracker.stats(MyApp.Hedged)
# => %{total_requests: 1042, hedged_requests: 98, hedge_won: 31,
#       p50: 12, p95: 45, p99: 120, current_delay: 45, tokens: 8.2}

Real-world examples

HTTP with non-fatal errors

Hedged.run(
  fn -> HTTPClient.get(url) end,
  delay: 50,
  max_requests: 3,
  non_fatal: fn
    :timeout -> true
    :econnrefused -> true
    _ -> false
  end,
  on_hedge: fn attempt ->
    Logger.info("Firing hedge ##{attempt}")
  end
)

When non_fatal returns true, the next hedge fires immediately instead of waiting for the delay — you don't waste time sleeping on errors you know are transient.

Database query with tight deadline

Hedged.run(fn -> Repo.query("SELECT ...") end,
  delay: 20,
  timeout: 1_000
)

DNS resolution — race mode

Fire all at once, take the fastest:

Hedged.run(fn -> dns_lookup(host) end,
  delay: 0,
  max_requests: 3
)

Adaptive with multiple services

# Each service gets its own tracker with tuned settings
children = [
  {Hedged, name: MyApp.PaymentHedge, percentile: 99, max_delay: 2_000},
  {Hedged, name: MyApp.SearchHedge, percentile: 90, max_delay: 200},
  {Hedged, name: MyApp.CacheHedge, percentile: 95, min_delay: 1, max_delay: 50}
]

# In your code
{:ok, result} = Hedged.run(MyApp.PaymentHedge, fn -> charge(card) end, [])
{:ok, results} = Hedged.run(MyApp.SearchHedge, fn -> search(query) end, [])

Composing with BackoffRetry

Hedge the outer call, retry the inner:

Hedged.run(fn ->
  BackoffRetry.retry(fn -> flaky_api_call() end,
    max_attempts: 2,
    backoff: :constant,
    base_delay: 50
  )
end, delay: 200)

How adaptive delay works

The adaptive tracker solves the "what delay should I pick?" problem by learning from your traffic:

Request lifecycle:
1. get_config() → {delay_ms, allow_hedge?}   # Tracker computes from recent latencies
2. Runner fires request #1, waits delay_ms
3. If no response, fires hedge #2
4. Winner returns, losers cancelled
5. record(%{latency_ms, hedged?, hedge_won?}) # Tracker learns from this request
6. Next request → delay has shifted

Percentile-based delay: A circular buffer (default 1000 samples) tracks recent latencies. The delay is set to the target percentile (e.g., p95) of that buffer, clamped to [min_delay, max_delay]. Old samples are evicted FIFO, so the delay naturally tracks recent conditions.

Requests 1-9:   delay = 100ms          (cold start, not enough samples)
Request 50:     p95 = 22ms → delay 22ms (learned from traffic)
[service degrades]
Request 200:    p95 = 180ms → delay 180ms (adapted to new conditions)
[service recovers]
Request 400:    p95 = 25ms → delay 25ms  (old slow samples evicted)

Token bucket: Prevents hedge storms. Each request earns a small credit (default 0.1 tokens). Each hedge costs more (default 1.0 token). When tokens drop below threshold, hedging is disabled entirely. At defaults this naturally limits the hedge rate to ~10% under steady state. After a burst of hedging depletes tokens, normal traffic replenishes them.

Options

Stateless (run/2)

Option Default Description
delay100 ms before firing the next hedge
max_requests2 total concurrent attempts
timeout5_000 overall deadline in ms
non_fatalfn _ -> false end predicate: true fires next hedge immediately
on_hedgenilfn attempt -> any callback before each hedge
now_fnSystem.monotonic_time/1 injectable clock for testing

Adaptive tracker (start_link/1)

Option Default Description
namerequired registered name
percentile95 target percentile for adaptive delay
buffer_size1000 max latency samples to keep
min_delay1 floor for adaptive delay (ms)
max_delay5_000 ceiling for adaptive delay (ms)
initial_delay100 delay used before enough samples collected
min_samples10 samples needed before adapting
token_max10 token bucket capacity
token_success_credit0.1 tokens earned per request
token_hedge_cost1.0 tokens spent per hedge
token_threshold1.0 min tokens to allow hedging

Tuning the token bucket

The defaults give ~10% hedge rate. To adjust:

Desired behavior Configuration
More aggressive hedging (~20%) token_success_credit: 0.2
Conservative hedging (~5%) token_success_credit: 0.05
Always allow hedging token_threshold: 0
Disable hedging temporarily token_max: 0
Larger hedge budget bursts token_max: 20

When not to hedge

Hedging adds ~5% extra load at defaults. Don't use it when:

Algorithm

  1. Fire request #1 immediately
  2. Wait delay ms
  3. If response arrived — return it, cancel nothing else pending
  4. Fire request #2 (the hedge)
  5. Wait for any response: first success wins, losers cancelled
  6. If a failure is non_fatal — fire next hedge immediately (fast-forward)
  7. If all attempts fail — return {:error, last_reason}
  8. If overall timeout hit — cancel everything, return {:error, :timeout}

Key behaviors:

Error handling

Exceptions, exits, and throws inside hedged tasks are captured — they never crash the caller:

Source Wrapped as
raise "boom"{:error, %RuntimeError{}}
exit(:reason){:error, {:reason, stacktrace}}
throw(:value){:error, {{:nocatch, :value}, stacktrace}}

If multiple tasks are in flight and one crashes, the others keep running. You still get a result as long as any task succeeds.

Return values

Scenario Return
Function returns {:ok, value}{:ok, value}
Bare value (e.g. 42) {:ok, 42}
:ok{:ok, :ok}
{:ok, {:error, _}}{:ok, {:error, _}} (inner value preserved)
All attempts return {:error, r}{:error, r} (last error)
Overall timeout exceeded {:error, :timeout}
All tasks raise / exit / throw {:error, reason} (from :DOWN)
:error{:error, :error}

Stats & observability

Hedged.Tracker.stats(MyApp.Hedged)

Returns:

Field Description
total_requests Total requests processed
hedged_requests Requests that triggered at least one hedge
hedge_won Times the hedge beat the original request
p50 Median observed latency (ms)
p95 95th percentile latency (ms)
p99 99th percentile latency (ms)
current_delay Current adaptive delay being used (ms)
tokens Current token bucket level

Architecture

lib/
  hedged.ex              # Public API: run/2 (stateless), run/3 (adaptive),
                         # start_link/1, child_spec/1
  hedged/
    runner.ex            # Core engine: staggered dispatch + receive loop
    tracker.ex           # GenServer: adaptive delay + token bucket + stats
    percentile.ex        # Circular buffer + percentile calculation

Three layers, each independently useful:

License

MIT