StatusCodeTracker

An Elixir library for tracking HTTP status code rates and service health monitoring. It monitors the rate of 5xx error codes and flags the service as unhealthy when it reaches a configured threshold.

Installation

Add status_code_tracker to your list of dependencies in mix.exs:

def deps do
  [
    {:status_code_tracker, "~> 0.1.3"}
  ]
end

Configuration

Configure the library in your application config:

config :status_code_tracker, :settings,
  time_window_seconds: 60,
  error_threshold: 10,
  keep_unhealthy?: false,
  unhealthy_action: fn -> YourModule.on_unhealthy() end,
  healthy_action: fn -> YourModule.on_healthy() end,
  extra_checks: fn -> YourModule.extra_checks() end,
  unhealthy_status_code: 503,
  verbose?: true,
  unhealthy_message: "Service unhealthy due to many 5xx",
  extra_checks_error_message: "Extra checks failed"

Configuration Options

Option	Default	Description
`time_window_seconds`	`60`	Sliding time window (in seconds) for counting errors
`error_threshold`	`10`	Number of 5xx errors within the time window that triggers unhealthy state
`keep_unhealthy?`	`false`	If `true`, service stays unhealthy until manually reset. If `false`, service auto-recovers when errors drop below threshold
`unhealthy_action`	`fn -> :noop end`	Callback function triggered when service becomes unhealthy
`healthy_action`	`fn -> :noop end`	Callback function triggered when service recovers and becomes healthy again
`extra_checks`	`fn -> false end`	Custom validation function for additional health checks beyond error rate
`unhealthy_status_code`	`503`	HTTP status code returned when service is unhealthy
`verbose?`	`false`	Enable detailed logging of health status changes
`unhealthy_message`	`"Service unhealthy due to many 5xx"`	Custom message returned when unhealthy due to error threshold
`extra_checks_error_message`	`"Extra checks failed"`	Custom message returned when extra checks fail

Usage

Adding the Health Check Endpoint

You can add the health check endpoint to your router:

scope "/health" do
  get("/", StatusCodeTracker.HealthPlug, [json: true, body: "{\"status\":\"success\"}"])
end

Or add it to your endpoint:

plug StatusCodeTracker.HealthPlug, path: "/health"

Adding the Error Tracker

Add the tracker plug to your endpoint to automatically track all 5xx errors:

plug StatusCodeTracker.Plug

How it Works

Error Tracking

The library uses an ETS table to store timestamps of 5xx errors. When a request results in a 5xx status code, the timestamp is recorded. The health check endpoint is automatically excluded from tracking to prevent recursive errors.

Health Evaluation

When a health check is performed:

The library counts errors that occurred within time_window_seconds
If the count exceeds error_threshold, the service is marked unhealthy
Optionally, extra_checks function is called for additional validation

Automatic Cleanup

A periodic cleanup process removes old timestamps (older than time_window_seconds) to prevent memory growth.

Health State Behavior

When `keep_unhealthy?: false` (default)

The health status is transient - re-evaluated on every health check:

Errors below threshold → healthy (200 OK)
Errors above threshold → unhealthy (503)
Service automatically recovers when errors drop below threshold

[errors spike] → unhealthy → [errors drop] → automatically healthy

When `keep_unhealthy?: true`

The health status is sticky - once unhealthy, stays unhealthy:

When errors exceed threshold → service marked unhealthy permanently
Even if errors drop, service remains unhealthy
Recovery requires manual intervention (e.g., calling StatusCodeTracker.Server.update_healthy(true))

[errors spike] → unhealthy → [errors drop] → STILL unhealthy (requires manual reset)

Action Callbacks

`unhealthy_action`

Triggered when the service transitions from healthy to unhealthy. Use this for:

Sending alerts/notifications
Logging incidents
Triggering automated recovery procedures

unhealthy_action: fn ->
  Logger.error("Service became unhealthy!")
  AlertService.send_alert("Service down")
end

`healthy_action`

Triggered when the service transitions from unhealthy back to healthy (only when keep_unhealthy?: false). Use this for:

Sending recovery notifications
Logging recovery events
Resetting alert states

healthy_action: fn ->
  Logger.info("Service recovered!")
  AlertService.send_recovery("Service recovered")
end

Extra Checks

You can define custom health checks beyond error rate monitoring:

extra_checks: fn ->
  case check_database_connection() do
    :ok -> false  # false means no issues
    :error -> true  # true means check failed
  end
end

The extra_checks function should return:

false - all checks passed
true - checks failed, service should be marked unhealthy

API Reference