Skip to content

Latest commit

 

History

History
177 lines (139 loc) · 8.18 KB

File metadata and controls

177 lines (139 loc) · 8.18 KB

Monitoring

travis-ci aliBuild codecov JIRA doxygen

Monitoring module injects user custom metrics and monitors the process. It supports multiple backends, protocols and data formats.

Table of contents

  1. Installation
  2. Getting started
  3. Advanced features
  4. System monitoring and server-side backends installation and configuration

Installation

Click here if you don't have aliBuild installed

  • Compile Monitoring and its dependencies via aliBuild
aliBuild build Monitoring --defaults o2-dataflow
  • Load the environment for Monitoring (in the alice directory)
alienv load Monitoring/latest

Getting started

Monitoring instance

Get an instance from MonitoringFactory by passing backend's URI(s) as a parameter (comma separated if more than one). The factory is accessible from o2::monitoring namespace.

#include <MonitoringFactory.h>
using namespace o2::monitoring;
std::unique_ptr<Monitoring> monitoring = MonitoringFactory::Get("backend[-protocol]://host:port[/verbosity][?query]");

See the table below to find URIs for supported backends:

Backend name Transport URI backend[-protocol] URI query Default verbosity
No-op - no-op -
InfluxDB UDP influxdb-udp - info
InfluxDB Unix socket influxdb-unix - info
ApMon UDP apmon - info
StdOut - stdout, infologger [Prefix] debug
Flume UDP flume - info
Kafka TCP kafka - info
StdCout output format
[METRIC] <name>,<type> <value> <timestamp> <tags>

The prefix ([METRIC]) can be changed using query component.

Metrics

A metric consist of 5 parameters: name, value, timestamp, verbosity and tags.

Parameter name Type Required Default
name string yes -
value int / double / string / uint64_t yes -
timestamp time_point<system_clock> no current time
verbosity Enum (Debug/Info/Prod) no Verbosity::Info
tags vector no host and process names

A metric can be constructed by providing required parameters (value and name):

Metric{10, "name"}

Verbosity

There are 3 verbosity levels (the same as for backends): Debug, Info, Prod. By default it is set to Verbosity::Info. The default value can be overwritten using: Metric::setDefaultVerbosity(verbosity). To overwrite verbosity on per metric basis use third, optional parameter to metric constructor:

Metric{10, "name", Verbosity::Prod}

Metrics need to match backends verbosity in order to be sent, eg. backend with /info verbosity will accept Info and Prod metrics only.

Tags

Each metric can be tagged with any number of predefined tags. In order to do so use addTag(tags::Key, tags::Value) or addTag(tags::Key, unsigned short) methods. The latter method allows assigning numeric value to a tag.

See the example: examples/2-TaggedMetrics.cxx.

Sending metric

Pass metric object to send method and l-value reference:

send({10, "name"})

See how it works in the example: examples/1-Basic.cxx.

Advanced features

Sending more than one metric

In order to send more than one metric in a packet group them into vector:

monitoring->send(std::vector<Metric>&& metrics);

It's also possible to send multiple, grouped values (only Flume and InfluxDB backends are supported); For example cpu metric can be composed of cpuUser, cpuSystem values.

void sendGroupped(std::string name, std::vector<Metric>&& metrics)

See how it works in the example: examples/8-Multiple.cxx

Buffering metrics

In order to avoid sending each metric separately, metrics can be temporary stored in the buffer and flushed at the most convenient moment. This feature can be operated with following two methods:

monitoring->enableBuffering(const std::size_t maxSize)
...
monitoring->flushBuffer();

enableBuffering takes maximum buffer size as its parameter. The buffer gets full all values are flushed automatically.

See how it works in the example: examples/10-Buffering.cxx.

Unique metric buffering

In addition to above, you may want to keep only unique metrics (defined by metric name) within the buffer (only the last metric is kept).

monitoring->enableUniqueBuffering(const std::size_t maxSize)

Calculating derived metrics

The module can calculate derived metrics. To do so, use optional DerivedMetricMode mode parameter of send method:

send(Metric&& metric, [DerivedMetricMode mode])

Three modes are available:

  • DerivedMetricMode::NONE - no action,
  • DerivedMetricMode::RATE - rate between two following metrics,
  • DerivedMetricMode::AVERAGE - average value of all metrics stored in cache.

Derived metrics are generated each time as new value is passed to the module. Their names are suffixed with derived mode name.

See how it works in the example: examples/4-RateDerivedMetric.cxx.

Global tags

Global tags are added to each metric. Two tags: hostname and name (process name) are set as global by the library.

You can add your own global tag by calling addGlobalTag(std::string_view key, std::string_view value) or addGlobalTag(tags::Key, tags::Value).

Process monitoring

enableProcessMonitoring([interval in seconds]);

The following metrics are generated every interval:

  • cpuUsedPercentage - percentage of a core usage over time interval
  • involuntaryContextSwitches - involuntary context switches over time interval
  • memoryUsagePercentage - ratio of the process's resident set size to the physical memory on the machine, expressed as a percentage (Linux only)

Automatic metric updates

Sometimes it's necessary to provide value every exact interval of time (even though value does not change). This can be done using AutoPushMetric.

ComplexMetric& metric = monitoring->getAutoPushMetric("exampleMetric");
metric = 10;

See how it works in the example: examples/11-AutoUpdate.cxx.

Regex verbosity policy

Overwrite metric verbosities using regex expression:

Metric::setVerbosityPolicy(Verbosity verbosity, const std::regex& regex)

System monitoring, server-side backends installation and configuration

This guide explains manual installation. For ansible deployment see AliceO2Group/system-configuration gitlab repo.