Prometheus Exporters and NRPE Checks for WikiOasis

Metrics and health checks in WikiOasis are distributed across every server in the fleet. Prometheus exporters expose machine- and service-level metrics that the monitoring server scrapes; NRPE (Nagios Remote Plugin Executor) agents run active checks that Icinga2 polls. Both systems are wired up automatically via top.sls — role-specific exporters and checks are applied to matching host groups without any manual targeting.

Prometheus Exporters

Seven exporter types expose metrics on fixed ports. The monitoring server’s file_sd targets are auto-generated from the dns_hosts pillar.

NRPE Checks

nrpe_common runs on every host; role-specific check states add service-level probes on top.

Prometheus exporters overview

Exporter state	Port	Deployed to	Package / binary
`monitoring.node_exporter`	9100	All servers (`*`)	`prometheus-node-exporter`
`monitoring.mysqld_exporter`	9104	`db*`	`prometheus-mysqld-exporter`
`monitoring.haproxy_exporter`	9101	`proxy*`	`prometheus-haproxy-exporter`
`monitoring.redis_exporter`	9121	`redis*`	`prometheus-redis-exporter`
`monitoring.statsd_exporter`	9102	`monitoring*`	Binary from GitHub releases
`monitoring.phpfpm_exporter`	9253	`apps`, `mw`, `staging`, `task`	Binary from GitHub releases
`monitoring.opensearch_exporter`	9114	`opensearch*`	`prometheus-elasticsearch-exporter`

phpfpm_exporter and statsd_exporter are not available in the Debian apt repository and are installed by extracting upstream release archives. Their systemd unit files are managed directly by Salt.

node_exporter (port 9100)

monitoring/node_exporter.sls is applied to every server via top.sls ('*' matcher). It installs prometheus-node-exporter from the Debian repository and ensures the service is running and enabled.

node_exporter.sls

node_exporter_pkg:
  pkg.installed:
    - name: prometheus-node-exporter

prometheus-node-exporter:
  service.running:
    - enable: True
    - require:
      - pkg: node_exporter_pkg

Apply

salt '*' state.apply monitoring.node_exporter

mysqld_exporter (port 9104)

monitoring/mysqld_exporter.sls is applied to db* servers. It installs prometheus-mysqld-exporter and writes a .my.cnf credential file at /etc/prometheus/mysqld.my.cnf, owned root:prometheus with mode 0640. The exporter connects to MariaDB on 127.0.0.1:3306 as the prom_exporter user. The password is sourced from monitoring:mysqld_exporter_password in the private pillar.

/etc/prometheus/mysqld.my.cnf (rendered)

[client]
user     = prom_exporter
password = <mysqld_exporter_password>
host     = 127.0.0.1
port     = 3306

The ARGS in /etc/default/prometheus-mysqld-exporter point the exporter at that file:

/etc/default/prometheus-mysqld-exporter

ARGS="--config.my-cnf=/etc/prometheus/mysqld.my.cnf"

Apply

salt 'db*' state.apply monitoring.mysqld_exporter

haproxy_exporter (port 9101)

monitoring/haproxy_exporter.sls is applied to proxy* servers. It installs prometheus-haproxy-exporter and configures it to scrape HAProxy metrics via the Unix stats socket at /run/haproxy/admin.sock. Because the socket has mode 660 and group haproxy, the state adds the prometheus system user to the haproxy group:

haproxy_exporter.sls (group membership)

prometheus_in_haproxy_group:
  user.present:
    - name: prometheus
    - groups:
      - haproxy
    - remove_groups: False

/etc/default/prometheus-haproxy-exporter

ARGS="--haproxy.scrape-uri=unix:/run/haproxy/admin.sock"

Apply

salt 'proxy*' state.apply monitoring.haproxy_exporter

redis_exporter (port 9121)

monitoring/redis_exporter.sls is applied to redis* servers. It installs prometheus-redis-exporter from the Debian repository and starts the service with no additional configuration — the exporter connects to Redis on localhost:6379 by default.

Apply

salt 'redis*' state.apply monitoring.redis_exporter

opensearch_exporter (port 9114)

monitoring/opensearch_exporter.sls is applied to opensearch* servers. It uses the prometheus-elasticsearch-exporter package (which is API-compatible with OpenSearch) and configures it to connect to the local OpenSearch instance on port 9200.

/etc/default/prometheus-elasticsearch-exporter

ARGS="--es.uri=http://localhost:9200"

Apply

salt 'opensearch*' state.apply monitoring.opensearch_exporter

phpfpm_exporter (port 9253)

monitoring/phpfpm_exporter.sls is applied to apps*, mw*, staging*, and task* servers. Because there is no Debian package, Salt downloads the upstream release archive from GitHub and installs it under /opt/phpfpm_exporter/, then creates a symlink at /usr/local/bin/prometheus-phpfpm-exporter.

phpfpm_exporter.sls (archive install)

phpfpm_exporter_binary:
  archive.extracted:
    - name: /opt/phpfpm_exporter
    - source: https://github.com/hipages/php-fpm_exporter/releases/download/v2.2.0/php-fpm_exporter_2.2.0_linux_amd64.tar.gz
    - source_hash: sha256=b1c207fcd89f9be20104fd90bc76b3c584987ea5a769c99d5759f79af8322449
    - if_missing: /opt/phpfpm_exporter/php-fpm_exporter

The systemd unit file is fully managed by Salt. The PHP version and socket path are resolved from the php:version pillar (defaulting to 8.3):

prometheus-phpfpm-exporter.service (rendered for PHP 8.3)

[Unit]
Description=Prometheus PHP-FPM Exporter
After=network.target php8.3-fpm.service

[Service]
User=www-data
ExecStart=/usr/local/bin/prometheus-phpfpm-exporter server \
  --phpfpm.fix-process-count \
  --phpfpm.scrape-uri "unix:///run/php/php8.3-fpm.sock;/status"
Restart=on-failure

[Install]
WantedBy=multi-user.target

Apply

salt 'apps* or mw* or staging* or task*' state.apply monitoring.phpfpm_exporter

statsd_exporter (port 9102)

monitoring/statsd_exporter.sls is applied only to monitoring* servers. Like phpfpm_exporter, there is no Debian package — Salt extracts the binary from the upstream GitHub release into /opt/statsd_exporter/ and symlinks it to /usr/local/bin/prometheus-statsd-exporter. A prometheus system user (no login shell, home /var/lib/prometheus) is created before the service starts:

statsd_exporter.sls (user creation)

prometheus_user:
  user.present:
    - name: prometheus
    - system: True
    - shell: /usr/sbin/nologin
    - home: /var/lib/prometheus
    - createhome: False

prometheus-statsd-exporter.service

[Unit]
Description=Prometheus StatsD Exporter
After=network.target

[Service]
User=prometheus
ExecStart=/usr/local/bin/prometheus-statsd-exporter
Restart=on-failure

[Install]
WantedBy=multi-user.target

Apply

salt 'monitoring*' state.apply monitoring.statsd_exporter

NRPE system

NRPE (Nagios Remote Plugin Executor) is the active-check agent that Icinga2 uses to run checks on remote hosts. The base agent (monitoring.nrpe) is installed on every server, then role-specific drop-in states add check definitions to /etc/nagios/nrpe.d/.

Base NRPE agent (`monitoring.nrpe`)

monitoring/nrpe/init.sls installs four packages and writes the main nrpe.cfg:

nagios-nrpe-server — the NRPE daemon itself
monitoring-plugins-basic — standard check plugins
monitoring-plugins-standard
monitoring-plugins-contrib

The nrpe.cfg template dynamically builds the allowed_hosts list by filtering the dns_hosts pillar for hosts whose names start with monitoring. All drop-in check definitions are loaded from include_dir=/etc/nagios/nrpe.d.

nrpe.cfg (rendered, excerpt)

server_port=5666
nrpe_user=nagios
nrpe_group=nagios
allowed_hosts=127.0.0.1,::1,10.0.0.5

command[check_load]=/usr/lib/nagios/plugins/check_load -r -w 0.80,0.80,0.80 -c 1.00,1.00,1.00
command[check_disk_root]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /
command[check_disk_srv]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /srv
command[check_procs]=/usr/lib/nagios/plugins/check_procs -w 700 -c 1000
command[check_swap]=/usr/lib/nagios/plugins/check_swap -w 40% -c 20%
include_dir=/etc/nagios/nrpe.d

Apply

salt '*' state.apply monitoring.nrpe

nrpe_common — checks on all servers

monitoring/nrpe_common.sls deploys two drop-in check definitions to every server:

Check command	Plugin	Description
`check_mem`	`check_mem.sh`	Memory usage — warn at 95%, crit at 100%
`check_apt`	`check_apt` (standard plugin)	Pending package upgrades

/etc/nagios/nrpe.d/mem.cfg

command[check_mem]=/usr/lib/nagios/plugins/check_mem.sh 95 100

Apply

salt '*' state.apply monitoring.nrpe_common

nrpe_salt — salt-minion check (all servers)

monitoring/nrpe_salt.sls deploys check_systemd_service.sh (a generic systemd unit status script) to all servers and registers check_salt_minion:

/etc/nagios/nrpe.d/salt_minion.cfg

command[check_salt_minion]=/usr/lib/nagios/plugins/check_systemd_service.sh salt-minion

Apply

salt '*' state.apply monitoring.nrpe_salt

nrpe_salt_master — salt-master check (salt* servers)

monitoring/nrpe_salt_master.sls is applied only to salt* servers. It adds the check_salt_master command using the same check_systemd_service.sh script deployed by nrpe_salt. The script is required to already be present (installed by nrpe_salt) before the drop-in is written.

/etc/nagios/nrpe.d/salt_master.cfg

command[check_salt_master]=/usr/lib/nagios/plugins/check_systemd_service.sh salt-master

Apply

salt 'salt*' state.apply monitoring.nrpe_salt_master

nrpe_haproxy — HAProxy backend checks (proxy* servers)

monitoring/nrpe_haproxy.sls adds two custom HAProxy check scripts and the haproxy.cfg drop-in. The nagios user is added to the haproxy group so it can read the stats socket (/run/haproxy/admin.sock, mode 660 haproxy:haproxy).

Check command	Script	Description
`check_haproxy`	`check_haproxy.sh`	Overall HAProxy health via stats socket
`check_haproxy_backends`	`check_haproxy_backends.sh`	Status of all configured backends

Apply

salt 'proxy*' state.apply monitoring.nrpe_haproxy

nrpe_mediawiki — MediaWiki HTTP health check (mw, staging)

monitoring/nrpe_mediawiki.sls deploys check_mediawiki.sh and registers the check_mediawiki command. The script performs an HTTP health check against the local MediaWiki installation.

Apply

salt 'mw* or staging*' state.apply monitoring.nrpe_mediawiki

nrpe_metal — RAID and SMART disk checks (metal* servers)

monitoring/nrpe_metal.sls handles bare-metal disk monitoring. It installs smartmontools, creates a sudoers entry allowing nagios to run smartctl without a password, and deploys two check scripts:

Check command	Script	Description
`check_smart`	`check_smart.sh`	SMART disk health
`check_raid`	`check_raid.sh`	Software RAID array status

/etc/sudoers.d/nagios-smartctl

nagios ALL=(root) NOPASSWD: /usr/sbin/smartctl

Apply

salt 'metal*' state.apply monitoring.nrpe_metal

nrpe_nginx — Nginx error log check (nginx servers)

monitoring/nrpe_nginx.sls adds the nagios user to the adm group (which has read access to /var/log/nginx/) and deploys check_nginx_errors.sh along with nginx.cfg.

Check command	Script	Description
`check_nginx_errors`	`check_nginx_errors.sh`	Nginx error log rate check

Applied to apps*, mw*, staging*, task*, and monitoring* servers (any host running Nginx).

Apply

salt 'apps* or mw* or staging* or task* or monitoring*' state.apply monitoring.nrpe_nginx

nrpe_opensearch — OpenSearch cluster health (opensearch* servers)

monitoring/nrpe_opensearch.sls deploys check_opensearch.sh and the opensearch.cfg drop-in.

Check command	Script	Description
`check_opensearch`	`check_opensearch.sh`	OpenSearch cluster health (green/yellow/red)

Apply

salt 'opensearch*' state.apply monitoring.nrpe_opensearch

nrpe_php — PHP-FPM pool and error log checks (apps, mw, staging, task)

monitoring/nrpe_php.sls installs libfcgi-bin (required to query the FPM status page via FastCGI), adds nagios to the adm group for log access, and deploys two check scripts with templated .cfg files (the PHP version and pool name are resolved from the php pillar).

Check command	Script	Description
`check_php_fpm`	`check_php_fpm.sh`	PHP-FPM pool status page (process count, queue depth)
`check_php_errors`	`check_php_errors.sh`	PHP error log rate

Apply

salt 'apps* or mw* or staging* or task*' state.apply monitoring.nrpe_php

nrpe_redis — Redis ping check (redis* servers)

monitoring/nrpe_redis.sls deploys check_redis.sh and the redis.cfg.jinja drop-in (templated for the configured Redis port/socket).

Check command	Script	Description
`check_redis`	`check_redis.sh`	Redis PING/PONG health check

Apply

salt 'redis*' state.apply monitoring.nrpe_redis

top.sls assignments

The following excerpt from top.sls shows how all exporter and NRPE states are distributed:

top.sls (monitoring-related assignments)

base:
  '*':
    - monitoring.nrpe
    - monitoring.nrpe_common
    - monitoring.nrpe_salt
    - monitoring.node_exporter
  'apps*':
    - monitoring.nrpe_nginx
    - monitoring.nrpe_php
    - monitoring.phpfpm_exporter
  'db*':
    - monitoring.mysqld_exporter
  'metal*':
    - monitoring.nrpe_metal
  'proxy*':
    - monitoring.nrpe_haproxy
    - monitoring.haproxy_exporter
  'monitoring*':
    - monitoring
    - monitoring.director
    - monitoring.nrpe_nginx
    - monitoring.prometheus
    - monitoring.grafana
    - monitoring.statsd_exporter
  'mw* or staging*':
    - match: compound
    - monitoring.nrpe_nginx
    - monitoring.nrpe_php
    - monitoring.nrpe_mediawiki
    - monitoring.phpfpm_exporter
  'task*':
    - monitoring.nrpe_nginx
    - monitoring.nrpe_php
    - monitoring.phpfpm_exporter
  'opensearch*':
    - monitoring.nrpe_opensearch
    - monitoring.opensearch_exporter
  'redis*':
    - monitoring.nrpe_redis
    - monitoring.redis_exporter
  'salt*':
    - monitoring.nrpe_salt_master

To apply all monitoring states to every server in the fleet at once (e.g. after a top.sls change), run a full highstate: salt '*' state.apply. Each minion will only pick up the states assigned to it.

monitoring.phpfpm_exporter and monitoring.statsd_exporter download binaries directly from GitHub. Ensure the target servers have outbound HTTPS access to github.com when these states are first applied, or pre-stage the archives and update the source: URLs in the respective .sls files.

Core

Web Tier

Data & Search

Infrastructure

Monitoring

Prometheus Exporters and NRPE Checks for WikiOasis