HealthCheck

Hex.pm Version

A very simple library used for check Elixir service if stable.

Motivation

In our production environment, there are several Elixir services need to check their if health:

and used in several scenes (most of them are service discovery):

For http server, we can check the port which service listen if return 200 for specific path, and this could match all scenes. But for grpc server, it is not easy match “rolling upgrade service” scene, and it’s more difficult for “pure elixir node”.

So, I decide to make a library.

Interface

Usage

grpc/http server

For grpc and http server, the usage is very simple. Under specific path:

/api/health

can return different response depend on if current node stable:

  get "/health" do
    conn
    |> put_status(return_status(HealthCheck.stable?()))
    |> json("")
  end

  defp return_status(true), do: :ok

  defp return_status(_), do: :service_unavailable

And for grpc server, could do like this:

  def check(params, _stream) do
    if HealthCheck.stable?() do
      if params.service == "just like you want" do
        Grpc.Health.V1.HealthCheckResponse.new(%{status: :SERVING})
      else
        raise GRPC.RPCError, status: :not_found
      end
    else
      Grpc.Health.V1.HealthCheckResponse.new(%{status: :NOT_SERVING})
    end
  end

command line

For command line, before use health_check I have to assume you use release tool (just like: distillery) for you Elixir service.

Just because it is very simple if use release tool to call Elixir node internal functions from command line.

you can use:

$ ./bin/your_elixir_node_name rpc Elixir.HealthCheck enable_maint

to set the current node maint status.

similarly, you can use:

$ ./bin/your_elixir_node_name rpc Elixir.HealthCheck disable_maint

to cancel maint status for the current node.

and, you can use:

$ ./bin/your_elixir_node_name rpc Elixir.HealthCheck stable?

check the current node if stable.

distributed node

In distributed environment, some Elixir node need to check the remote node if stable.

you can use:

true == :rpc.call(remote_node, HealthCheck, :stable?, [], 1_000)

to check the remote node if stable.

Based on this, it’s very simple to check the service if stable for different scenes.

License