Running TTN Prometheus Exporter as systemd Service

Background

In previous posts we presented a Prometheus exporter to monitor for monitoring TheThingsNetwork gateways. The exporter can directly be executed in a terminal. However, to use it productively it should run in the background as a service. This article demonstrates how this can be achieved using the system and service manager systemd on Linux systems.

Preparation

The first step is to prepare the environment where the Prometheus exporter is installed. Throughout this article we will use the directory /usr/local/bin/prometheus-ttn-exporter. However, any other directory in the local file system is also possible.

Executing the exporter as root is possible, but not ideal. To reduce the privileges as much as possible, it is a better idea to create a system user and group. In our example the user and group is called ttn. We need to ensure that the directory as well as all contained files are owned by the ttn user and group.

Inside the directory we need to place the files ttn_gateway_exporter.py and requirements.txt both as provided in the Github Repository. To isolate the the exporter and its dependencies from the rest of the system, we will use a Python virtual environment (venv). Please note, that with this approach updates are not automatically installed when updating the rest of the system. You must ensure to update the Python venv on your own. The venv can be prepared by executing the following commands as the ttn user.

virtualenv --python=python3 venv
source venv/bin/activate
pip install -r requirements.txt
deactivate

With this the environment is prepared and the systemd service can be created.

Systemd

Systemd services are defined in text files. A common location for these service files is the directory /etc/systemd/system/. For our example we will create the service file /etc/systemd/system/ttn.service. The minimal content of this file is as follows.

[Unit]
Description=ttn prometheus exporter
Requires=network-online.target
After=network-online.target

[Service]
User=ttn
Group=ttn

ExecStart=/usr/local/bin/prometheus-ttn-exporter/venv/bin/python /usr/local/bin/prometheus-ttn-exporter/ttn_gateway_exporter.py --listen :9715 --key API_KEY

[Install]
WantedBy=multi-user.target

The downside of this minimal example is that the TheThingsNetwork API key is directly stored in the system service file. So every user who is allowed to read the service file can access the API key. Depending on your deployment scenario this might be sufficient. If not other options would be to store the API key directly in the exporter directory and restrict the file access appropriately.

The last step is to reload the available systemd service files and start the new ttn service by executing:

systemctl daemon-reload
systemctl start ttn.service

You can now verify that the Prometheus exporter is running by opening http://localhost:9715 in a browser.

The Things Stack Gateway Monitoring

Background

In previous posts we showed how the status of TheThingsNetwork gatways can be monitored. With the switch of the underlying infrastructure to TheThingsStack this becomes even easier, because the API allows direct access to the relevant statistics of the gateways. It is now also possible to authenticate with a (personal) API key, such that the login procedure becomes also easier to handle.

API Key

The first step is to create an API key with the appropriate permissions. This can be done in the console of TheThingsNetwork, by clicking on your username in the upper right corner and selecting the option Personal API Keys. You have to create a token with at least the permissions view gateway status and list gateways the user is a collaborator of.

Prometheus TTN Gateway Exporter

The changes of the prometheus exporter are minimal, because the authentication becomes easier with the API key. The main changes are the different API calls compared to the previous version. Besides that the things mentioned in the previous article are still applicable. The code on Github is already updated.

import signal
import sys
import threading

import requests
from absl import app
from absl import flags
from absl import logging
from cachetools import cached, TTLCache
from prometheus_client import start_wsgi_server, Gauge

FLAGS = flags.FLAGS
flags.DEFINE_string('listen', ':9714', 'Address:port to listen on')
flags.DEFINE_string('key', None, 'API key')
flags.DEFINE_string('password', None, 'Password to authenticate with')
flags.DEFINE_bool('verbose', False, 'Enable verbose logging')

exit_app = threading.Event()

TOKEN = None
EXPIRES = None

cache = TTLCache(maxsize=200, ttl=10)


@cached(cache)
def get_gateway_stats(gateway_id):
    session = requests.Session()
    header = {'Authorization': 'Bearer ' + FLAGS.key}
    res = session.get('https://eu1.cloud.thethings.network/api/v3/gs/gateways/%s/connection/stats' % gateway_id, headers=header)
    return res.json()


@cached(cache)
def get_gateway_ids():
    session = requests.Session()
    header = {'Authorization': 'Bearer ' + FLAGS.key}
    res = session.get('https://eu1.cloud.thethings.network/api/v3/gateways', headers=header)
    return [gateway['ids']['gateway_id'] for gateway in res.json()['gateways']]


def collect_metrics(gateway_id, metric) -> int:
    gateway_stats = get_gateway_stats(gateway_id)
    if metric in gateway_stats:
        return int(gateway_stats[metric])
    return 0


def prepare_metrics():
    logging.debug('prepare metrics')
    for metric in ['uplink_count', 'downlink_count']:
        gauge = Gauge('ttn_gateway_messages_%s' % metric, 'Number of %s messages' % metric, labelnames=['gateway_id'])
        for gateway_id in get_gateway_ids():
            gauge.labels(gateway_id=gateway_id).set_function(lambda i=gateway_id, m=metric: collect_metrics(i, m))


def quit_app(unused_signo, unused_frame):
    exit_app.set()


def main(unused_argv):
    if FLAGS.verbose:
        logging.set_verbosity(logging.DEBUG)
    if FLAGS.key is None:
        logging.error('Provide API key!')
        sys.exit(-1)

    prepare_metrics()

    address, port = FLAGS.listen.rsplit(':', 1)
    start_wsgi_server(port=int(port), addr=address)
    logging.info(f'Listening on {FLAGS.listen}')
    for sig in (signal.SIGTERM, signal.SIGINT, signal.SIGHUP):
        signal.signal(sig, quit_app)
    exit_app.wait()


if __name__ == '__main__':
    app.run(main)

To execute the exporter you need to pass the API key from the previous step by executing python ttn-gateway-exporter.py --key API_KEY.

Improved TheThingsNetwork Gateway Monitoring

In a previous post we showed how TTN gateways can be monitored using Prometheus. However, the presented solution had some limitations. For example it was only possible to monitor a single gateway. Also the code is now available on GitHub.

To support multiple gateways the output format has slightly be changed. The metrics for the different gateways can now be filtered by the label gateway_id.

# HELP python_gc_objects_collected_total Objects collected during gc
# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 48.0
python_gc_objects_collected_total{generation="1"} 344.0
python_gc_objects_collected_total{generation="2"} 0.0
# HELP python_gc_objects_uncollectable_total Uncollectable object found during GC
# TYPE python_gc_objects_uncollectable_total counter
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0
# HELP python_gc_collections_total Number of times this generation was collected
# TYPE python_gc_collections_total counter
python_gc_collections_total{generation="0"} 59.0
python_gc_collections_total{generation="1"} 5.0
python_gc_collections_total{generation="2"} 0.0
# HELP python_info Python platform information
# TYPE python_info gauge
python_info{implementation="CPython",major="3",minor="9",patchlevel="1",version="3.9.1"} 1.0
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 1.87236352e+08
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 3.1379456e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.60917394427e+09
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0.13999999999999999
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 8.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1024.0
# HELP ttn_gateway_messages_uplink Number of uplink messages
# TYPE ttn_gateway_messages_uplink gauge
ttn_gateway_messages_uplink{gateway_id="eui-1"} 10.0
ttn_gateway_messages_uplink{gateway_id="eui-2"} 20.0
# HELP ttn_gateway_messages_downlink Number of downlink messages
# TYPE ttn_gateway_messages_downlink gauge
ttn_gateway_messages_downlink{gateway_id="eui-1"} 1.0
ttn_gateway_messages_downlink{gateway_id="eui-2"} 2.0
# HELP ttn_gateway_messages_rx_ok Number of rx_ok messages
# TYPE ttn_gateway_messages_rx_ok gauge
ttn_gateway_messages_rx_ok{gateway_id="eui-1"} 10.0
ttn_gateway_messages_rx_ok{gateway_id="eui-2"} 20.0
# HELP ttn_gateway_messages_tx_in Number of tx_in messages
# TYPE ttn_gateway_messages_tx_in gauge
ttn_gateway_messages_tx_in{gateway_id="eui-1"} 1.0
ttn_gateway_messages_tx_in{gateway_id="eui-02c10afffe41d2a3"} 2.0

TheThingsNetwork Gateway Monitoring

Background

In a previous post we showed how to setup a new TheThingsNetwork gateway. After successfully building a gateway and connecting it to TTN, probably one of the most interesting information for gateway operators is how frequently the gateway is used. This information is available on the TheThingsNetwork Console, but of course we want to access this data via API so that we can use it in multiple ways. One option would be to create a Grafana dashboard and display the transmitted and received messages. TTN provides a public API, but there the interesting information is not included.

Reverse Engineering the TTN API

If we have a closer look on the requests the TTN console performs to display the number of transmitted messages, we can identify the following requests.

POST https://account.thethingsnetwork.org/api/v2/users/login

In this request we have to provide our credentials as body as {"username": "user", "password": "pass"}. After a successful response we also receive one session cookie for thethingsnetwork.org and two cookies for account.thethingsnetwork.org. With these cookies we can perform the second request.

GET https://console.thethingsnetwork.org

With this request we obtain three cookies for console.thethingsnetwork.org and we can directly perform the next request.

GET https://console.thethingsnetwork.org/refresh

This requests provides us the required JWT access token and its expiration data in its response. Using the token we obtain the gateway information in our last request.

GET https://console.thethingsnetwork.org/api/gateways

We have to use the access token from the previous request as bearer token and obtain the following JSON object.

[
    {
        "id": "eui-123",
        "activated": false,
        "frequency_plan": "EU_863_870",
        "frequency_plan_url": "https://account.thethingsnetwork.org/api/v2/frequency-plans/EU_863_870",
        "auto_update": false,
        "location_public": true,
        "status_public": true,
        "owner_public": false,
        "antenna_location": {
            "longitude": 9.0,
            "latitude": 48.0,
            "altitude": 0
        },
        "collaborators": [
            {
                "username": "username",
                "rights": [
                    "gateway:settings",
                    "gateway:collaborators",
                    "gateway:status",
                    "gateway:delete",
                    "gateway:location",
                    "gateway:owner",
                    "gateway:messages"
                ]
            }
        ],
        "key": "ttn-account-key",
        "attributes": {
            "brand": "Multi-channel Raspberry Pi gateway",
            "model": "Raspberry Pi with IMST iC880A",
            "placement": "indoor",
            "description": "TTN Gateway"
        },
        "router": {
            "id": "ttn-router-eu",
            "address": "eu.thethings.network:1901",
            "mqtt_address": "mqtts://bridge.eu.thethings.network:8882"
        },
        "fallback_routers": [
            {
                "id": "ttn-router-asia-se",
                "address": "asia-se.thethings.network:1901",
                "mqtt_address": "mqtts://bridge.asia-se.thethings.network"
            },
            {
                "id": "ttn-router-us-west",
                "address": "us-west.thethings.network:1901",
                "mqtt_address": "mqtts://bridge.us-west.thethings.network"
            },
            {
                "id": "ttn-router-brazil",
                "address": "brazil.thethings.network:1901",
                "mqtt_address": "mqtts://bridge.brazil.thethings.network"
            }
        ],
        "beta_updates": false,
        "owner": {
            "id": "",
            "username": ""
        },
        "rights": null,
        "status": {
            "timestamp": "2020-12-14T20:05:14.926987683Z",
            "uplink": 30,
            "downlink": 8,
            "location": {},
            "gps": {},
            "time": 1607976314926987683,
            "rx_ok": 30,
            "tx_in": 8
        }
    }
]

The interesting information is the status section at the bottom of the JSON response, which shows the total number of transmitted uplink and downlink messages.

Prometheus TTN Gateway Exporter

Using the information we have gathered so far, we can build a simple Prometheus exporter written in Python. The exporter performs the requests shown in the previous section and exposes the the number of uplink and downlink messages. For this a Web server provided by the prometheus_client library is used. This library also allows us the expose the TTN gateway statistics in a format that can directly be scraped by Prometheus. The source is shown in the following. Please note that the implementation has several limitations. E.g. only one gateway is supported at the moment.

import datetime
import random
import signal
import threading

import requests
from absl import app
from absl import flags
from absl import logging
from cachetools import cached, TTLCache
from prometheus_client import start_wsgi_server, Gauge

FLAGS = flags.FLAGS
flags.DEFINE_string('listen', ':9714', 'Address:port to listen on')
flags.DEFINE_string('username', None, 'Username to authenticate with')
flags.DEFINE_string('password', None, 'Password to authenticate with')
flags.DEFINE_bool('verbose', False, 'Enable verbose logging')

exit_app = threading.Event()

TOKEN = None
EXPIRES = None


def get_token(session):
    global TOKEN
    global EXPIRES
    logging.debug('get_token')
    now = datetime.datetime.now()
    if TOKEN and EXPIRES and EXPIRES > now:
        logging.debug('reuse existing token')
        return TOKEN
    else:
        logging.debug('get new token')
        login = {'username': FLAGS.username, 'password': FLAGS.password}
        res = session.post('https://account.thethingsnetwork.org/api/v2/users/login', data=login)
        res = session.get('https://console.thethingsnetwork.org')
        res = session.get('https://console.thethingsnetwork.org/refresh')
        json = res.json()
        TOKEN = json['access_token']
        EXPIRES = datetime.datetime.fromtimestamp(json['expires'] / 1000)
        return TOKEN


cache = TTLCache(maxsize=200, ttl=10)


def hashkey(*args, **kwargs):
    return args[0]


@cached(cache, key=hashkey)
def collect_metrics(metric):
    logging.debug('collect_metrics %s' % metric)
    session = requests.Session()
    local_token = get_token(session)
    header = {'Authorization': 'Bearer ' + local_token}
    res = session.get('https://console.thethingsnetwork.org/api/gateways', headers=header)
    gateways = res.json()
    gateway = gateways[0]
    if metric == 'uplink':
        return gateway['status']['uplink']
    elif metric == 'downlink':
        return gateway['status']['downlink']

    return random.random()


def prepare_metrics():
    logging.debug('prepare metrics')
    for metric in ['uplink', 'downlink']:
        g = Gauge('ttn_gateway_%s' % metric, 'Number of %s messages processed by the gateway' % metric)
        g.set_function(lambda m=metric: collect_metrics(m))


def quit_app(unused_signo, unused_frame):
    exit_app.set()


def main(unused_argv):
    if FLAGS.verbose:
        logging.set_verbosity(logging.DEBUG)
    if FLAGS.username is None or FLAGS.password is None:
        logging.error('Provide username and password!')
        exit(-1)
    logging.info(FLAGS.password)

    prepare_metrics()

    address, port = FLAGS.listen.rsplit(':', 1)
    start_wsgi_server(port=int(port), addr=address)
    logging.info(f'Listening on {FLAGS.listen}')
    for sig in (signal.SIGTERM, signal.SIGINT, signal.SIGHUP):
        signal.signal(sig, quit_app)
    exit_app.wait()


if __name__ == '__main__':
    app.run(main)

To execute the Python script a few requirements have to be met. These are:

absl-py
requests
prometheus_client
cachetools

After installing the requirements, the script can be executed with python ttn-gateway-exporter.py --username user --password pass. By default it listens on port 9714 on all network interfaces of the local machine. You are now able to access the metrics on http://localhost:9714. Besides the TTN statistics, named ttn_gateway_uplink and ttn_gateway_downlink also the Prometheus Python library returns also information about the process itself. An example output is shown below.

# HELP python_gc_objects_collected_total Objects collected during gc
# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 7468.0
python_gc_objects_collected_total{generation="1"} 3830.0
python_gc_objects_collected_total{generation="2"} 0.0
# HELP python_gc_objects_uncollectable_total Uncollectable object found during GC
# TYPE python_gc_objects_uncollectable_total counter
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0
# HELP python_gc_collections_total Number of times this generation was collected
# TYPE python_gc_collections_total counter
python_gc_collections_total{generation="0"} 83.0
python_gc_collections_total{generation="1"} 7.0
python_gc_collections_total{generation="2"} 0.0
# HELP python_info Python platform information
# TYPE python_info gauge
python_info{implementation="CPython",major="3",minor="7",patchlevel="3",version="3.7.3"} 1.0
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 8.8424448e+07
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 3.2833536e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.60797857466e+09
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 3364.25
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 24.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1024.0
# HELP ttn_gateway_uplink Number of uplink messages processed by the gateway
# TYPE ttn_gateway_uplink gauge
ttn_gateway_uplink 33.0
# HELP ttn_gateway_downlink Number of downlink messages processed by the gateway
# TYPE ttn_gateway_downlink gauge
ttn_gateway_downlink 9.0

Outlook

In the future an improved version of the Prometheus exporter might be provided. For now just use the shared code snippets and adapt them to your needs.