StatusCodeTracker
An Elixir library for tracking HTTP status code rates and service health monitoring. It monitors the rate of 5xx error codes and flags the service as unhealthy when it reaches a configured threshold.
Installation
Add status_code_tracker to your list of dependencies in mix.exs:
def deps do
[
{:status_code_tracker, "~> 0.1.3"}
]
endConfiguration
Configure the library in your application config:
config :status_code_tracker, :settings,
time_window_seconds: 60,
error_threshold: 10,
keep_unhealthy?: false,
unhealthy_action: fn -> YourModule.on_unhealthy() end,
healthy_action: fn -> YourModule.on_healthy() end,
extra_checks: fn -> YourModule.extra_checks() end,
unhealthy_status_code: 503,
verbose?: true,
unhealthy_message: "Service unhealthy due to many 5xx",
extra_checks_error_message: "Extra checks failed"Configuration Options
| Option | Default | Description |
|---|---|---|
time_window_seconds | 60 | Sliding time window (in seconds) for counting errors |
error_threshold | 10 | Number of 5xx errors within the time window that triggers unhealthy state |
keep_unhealthy? | false |
If true, service stays unhealthy until manually reset. If false, service auto-recovers when errors drop below threshold |
unhealthy_action | fn -> :noop end | Callback function triggered when service becomes unhealthy |
healthy_action | fn -> :noop end | Callback function triggered when service recovers and becomes healthy again |
extra_checks | fn -> false end | Custom validation function for additional health checks beyond error rate |
unhealthy_status_code | 503 | HTTP status code returned when service is unhealthy |
verbose? | false | Enable detailed logging of health status changes |
unhealthy_message | "Service unhealthy due to many 5xx" | Custom message returned when unhealthy due to error threshold |
extra_checks_error_message | "Extra checks failed" | Custom message returned when extra checks fail |
Usage
Adding the Health Check Endpoint
You can add the health check endpoint to your router:
scope "/health" do
get("/", StatusCodeTracker.HealthPlug, [json: true, body: "{\"status\":\"success\"}"])
endOr add it to your endpoint:
plug StatusCodeTracker.HealthPlug, path: "/health"Adding the Error Tracker
Add the tracker plug to your endpoint to automatically track all 5xx errors:
plug StatusCodeTracker.PlugHow it Works
Error Tracking
The library uses an ETS table to store timestamps of 5xx errors. When a request results in a 5xx status code, the timestamp is recorded. The health check endpoint is automatically excluded from tracking to prevent recursive errors.
Health Evaluation
When a health check is performed:
-
The library counts errors that occurred within
time_window_seconds -
If the count exceeds
error_threshold, the service is marked unhealthy -
Optionally,
extra_checksfunction is called for additional validation
Automatic Cleanup
A periodic cleanup process removes old timestamps (older than time_window_seconds) to prevent memory growth.
Health State Behavior
When keep_unhealthy?: false (default)
The health status is transient - re-evaluated on every health check:
- Errors below threshold → healthy (200 OK)
- Errors above threshold → unhealthy (503)
- Service automatically recovers when errors drop below threshold
[errors spike] → unhealthy → [errors drop] → automatically healthy
When keep_unhealthy?: true
The health status is sticky - once unhealthy, stays unhealthy:
- When errors exceed threshold → service marked unhealthy permanently
- Even if errors drop, service remains unhealthy
-
Recovery requires manual intervention (e.g., calling
StatusCodeTracker.Server.update_healthy(true))
[errors spike] → unhealthy → [errors drop] → STILL unhealthy (requires manual reset)Action Callbacks
unhealthy_action
Triggered when the service transitions from healthy to unhealthy. Use this for:
- Sending alerts/notifications
- Logging incidents
- Triggering automated recovery procedures
unhealthy_action: fn ->
Logger.error("Service became unhealthy!")
AlertService.send_alert("Service down")
endhealthy_action
Triggered when the service transitions from unhealthy back to healthy (only when keep_unhealthy?: false). Use this for:
- Sending recovery notifications
- Logging recovery events
- Resetting alert states
healthy_action: fn ->
Logger.info("Service recovered!")
AlertService.send_recovery("Service recovered")
endExtra Checks
You can define custom health checks beyond error rate monitoring:
extra_checks: fn ->
case check_database_connection() do
:ok -> false # false means no issues
:error -> true # true means check failed
end
end
The extra_checks function should return:
false- all checks passedtrue- checks failed, service should be marked unhealthy
API Reference
StatusCodeTracker.Server
track_error/0- Records a 5xx error timestamphealth_check_pass?/0- Returnstrueif service is healthyhealthy?/0- Returns current health stateupdate_healthy/1- Manually set health state (useful withkeep_unhealthy?: true)error_threshold_reached?/0- Check if errors exceed threshold
License
MIT License. See LICENSE file for details.