system_monitor

Erlang telemetry collector

Build Status][ci-image]][ci-url]
[![License][license-image] [ Developed at Klarna

system_monitor is a BEAM VM monitoring and introspection application that helps troubleshooting live systems. It collects various information about Erlang processes and applications, and streams that data to Kafka. Unlike observer, system_monitor does not require connecting to the monitored system via Erlang distribution protocol, and can be used to monitor systems with very tight access restrictions.

Features

Process top

Information about top N Erlang processes consuming the most resources (such as reductions or memory), or have the longest message queues, is presented on process top dashboard:

Process top

Historical data can be accessed via standard Grafana time picker. status panel can display important information about the node state. Pids of the processes on that dashboard are clickable links that lead to the process history dashboard.

Process history

Process history

Process history dashboard displays time series data about certain Erlang process. Note that some data points can be missing if the process didn't consume enough resources to appear in the process top.

Application top

Application top

Application top dashboard contains various information aggregated per OTP application.

Usage example

In order to integrate system_monitor into your system, simply add it to the release apps. Add the following lines to rebar.config:

{deps, [..., system_monitor]}.

{relx,
 [ {release, {my_release, "1.0.0"},
    [kernel, sasl, ..., system_monitor]}
 ]}.

Add the following configuration to sys.config to enable export of telemetry to Kafka:

{system_monitor,
   [ {kafka_hosts, [{"localhost", 9094}]}
   , {kafka_topic, <<"system_monitor">>}
   , {kafka_client_config,
      [ {sasl, {plain, "path-to-kafka-credentials"}}
      , {ssl, true}
      ]}
   ]}

Custom node status

system_monitor can export arbitrary node status information that is deemed important for the operator. This is done by defining a callback function that returns an HTML-formatted string (or iolist):

-module(foo).

-export([node_status/0]).

node_status() ->
  ["my node type<br/>",
   case healthy() of
     true  -> "<font color=#0f0>UP</font><br/>"
     false -> "<mark>DEGRADED</mark><br/>"
   end,
   io_lib:format("very important value=~p", [very_important_value()])
  ].

This callback then needs to be added to the system_monitor application environment:

{system_monitor,
   [ {node_status_fun, {foo, node_status}}
   ...
   ]}

More information about configurable options is found here.

Collection of data

deployment diagram

On the receiving side Kafka messages are picked up by kflow and stored in Postgres. Finally, Grafana with postgres datasource presents the data.

Kflow

An example of Kflow configuration for processing system_monitor data can be found here.

Database

Here one can find the required database schema.

Grafana

Grafana dashboard templates are found here.

Development setup

A toy dockerized demo is maintained as part of kflow, it requires erlang, docker, docker-compose and pwgen to run. One can launch it like this:

git clone https://github.com/klarna-incubator/kflow
cd kflow
make run

After the services come up, all grafana dashboards will be availiable at http://localhost:3000/ with default login "admin" and password "admin".

How to contribute

See our guide on contributing.

Release History

See our changelog.

License

Copyright © 2020 Klarna Bank AB

For license details, see the LICENSE file in the root of this project.