Monitoring State: Icinga2, Prometheus, and Grafana

The monitoring state provisions the dedicated monitoring server with the full WikiOasis observability stack — Icinga2 for active checks and alerting, Prometheus for metrics collection, and Grafana for dashboards. It is applied exclusively to hosts matching monitoring* in top.sls, alongside monitoring.prometheus, monitoring.grafana, monitoring.nrpe_nginx, and monitoring.statsd_exporter.

Icinga2 + Icingaweb2

Active checks, IDO-MySQL backend, Director module, and Nginx/PHP-FPM front-end

Prometheus

Metric scraping with file_sd auto-discovery. All targets registered from the dns_hosts pillar

Grafana

Dashboard server pre-configured with a Prometheus datasource and served behind Nginx

Packages installed

monitoring/init.sls installs the following packages after adding the official Icinga apt repository and running apt-get install -f to fix any broken dependencies.

Full package list

Package	Purpose
`icinga2`	Monitoring and alerting engine
`icinga2-ido-mysql`	IDO MySQL back-end for Icinga2
`icingaweb2`	Web UI for Icinga2
`icingacli`	CLI tool for Icingaweb2 management
`icinga-director`	Config management module for Icingaweb2
`mariadb-server` / `mariadb-client`	Local database for IDO and Icingaweb2
`nginx`	Reverse proxy for Icingaweb2 and Grafana
`php-fpm`	PHP FastCGI process manager
`php-mysql`, `php-intl`, `php-curl`, `php-gd`, `php-mbstring`, `php-xml`	PHP extensions required by Icingaweb2
`nagios-nrpe-plugin`	NRPE check runner on the monitoring server itself
`jq`, `curl`	Used by notification scripts

On Debian Trixie the state falls back to the icinga-bookworm repository because Icinga does not yet publish a Trixie-specific repository. The icinga_dist variable is resolved at render time from grains['oscodename'].

Icinga2 configuration

APT repository

The GPG key is fetched from https://packages.icinga.com/icinga.key, dearmored into /usr/share/keyrings/icinga-archive-keyring.gpg, and the source line is written to /etc/apt/sources.list.d/icinga.list. A cmd.run guard (creates:) ensures the key is only imported once.

zones.conf

/etc/icinga2/zones.conf is rendered from salt://monitoring/files/icinga2/zones.conf.jinja. It creates a single Endpoint and Zone named master using the minion ID (grains['id']).

zones.conf.jinja

{%- set hostname = grains['id'] %}

object Endpoint "{{ hostname }}" { }

object Zone "master" {
  endpoints = [ "{{ hostname }}" ]
}

API feature (api.conf)

/etc/icinga2/features-available/api.conf is rendered from api.conf.jinja. It creates an ApiUser with full permissions and enables command/config acceptance. The feature is enabled with icinga2 feature enable api.

api.conf.jinja

{%- set api_user = salt['pillar.get']('monitoring:icinga_api_user', 'director') %}
{%- set api_password = salt['pillar.get']('monitoring:icinga_api_password') %}

object ApiUser "{{ api_user }}" {
  password = "{{ api_password }}"
  permissions = [ "*" ]
}

object ApiListener "api" {
  accept_commands = true
  accept_config   = true
}

IDO-MySQL feature (ido-mysql.conf)

/etc/icinga2/features-available/ido-mysql.conf is rendered from ido-mysql.conf.jinja. Pillar values for ido_db_name, ido_db_user, and the IDO password (from private pillar) are injected. The feature is enabled with icinga2 feature enable ido-mysql.

ido-mysql.conf.jinja

{%- set db_name = salt['pillar.get']('monitoring:ido_db_name', 'icingadb') %}
{%- set db_user = salt['pillar.get']('monitoring:ido_db_user', 'icingadb') %}
{%- set db_password = salt['pillar.get']('monitoring:ido_db_password') %}

library "db_ido_mysql"

object IdoMysqlConnection "ido-mysql" {
  user     = "{{ db_user }}"
  password = "{{ db_password }}"
  host     = "localhost"
  database = "{{ db_name }}"
  enable_ha = false
}

Notification feature

The notification feature is enabled via icinga2 feature enable notification. A creates: guard prevents re-running once /etc/icinga2/features-enabled/notification.conf exists.

Default conf.d cleanup

The four default configuration files that ship with icinga2 are removed to prevent conflicts with the Salt-managed host and service objects:

/etc/icinga2/conf.d/hosts.conf
/etc/icinga2/conf.d/services.conf
/etc/icinga2/conf.d/users.conf
/etc/icinga2/conf.d/notifications.conf

notification-commands.conf

Four NotificationCommand objects are written to /etc/icinga2/conf.d/notification-commands.conf — one each for Discord and Slack host/service notifications. Each command invokes a shell script under /etc/icinga2/scripts/ and passes context via environment variables.

notification-commands.conf (excerpt)

object NotificationCommand "notify-host-by-discord" {
  command = [ "/etc/icinga2/scripts/discord_host_notification.sh" ]
  env = {
    NOTIFICATIONTYPE = "$notification.type$"
    HOSTNAME         = "$host.name$"
    HOSTSTATE        = "$host.state$"
    HOSTOUTPUT       = "$host.output$"
    LONGDATETIME     = "$icinga.long_date_time$"
  }
}

salt-hosts.conf (dynamic host objects)

/etc/icinga2/conf.d/salt-hosts.conf is rendered from salt-hosts.conf.jinja. It iterates the dns_hosts pillar and generates Host, Service, and Notification objects for every registered server. Host role is inferred from the hostname prefix (e.g. proxy*, db*, mw*) and role-specific services are added automatically.

salt-hosts.conf.jinja (excerpt)

{%- set hosts = salt['pillar.get']('dns_hosts', {}) %}
{%- for hostname, host_data in hosts.items() %}
object Host "{{ hostname }}" {
  import  "generic-salt-host"
  address = "{{ host_data.ip }}"
  vars.os = "Linux"
}
{%- endfor %}

Every host gets both Discord and Slack Notification objects for each service, so alerts fire on both channels without manual configuration.

Notification webhook scripts

The notification scripts live in /etc/icinga2/scripts/ and are deployed by the state. All four scripts (discord_host, discord_service, slack_host, slack_service) source a shared config file that injects the webhook URLs from pillar.

webhook_config.sh.jinja

DISCORD_WEBHOOK_URL="{{ salt['pillar.get']('notifications:discord_webhook_url') }}"
SLACK_WEBHOOK_URL="{{ salt['pillar.get']('notifications:slack_webhook_url') }}"

The config file is written to /etc/icinga2/scripts/webhook_config.sh with mode 0640 (readable only by root and the nagios group) and each notification script requires it before running.

Icingaweb2 configuration

The Icingaweb2 configuration directory /etc/icingaweb2 is owned by www-data:icingaweb2 with mode 2770 (setgid so new files inherit the group). Four INI files are templated into it:

File	Purpose
`config.ini`	Global Icingaweb2 settings; points `config_resource` at the `icingaweb2` DB resource
`resources.ini`	Database resource definitions — `icinga2`, `icingadb`, `icinga_director`, `icingaweb2` all pointing at the remote MariaDB host
`authentication.ini`	Sets up an `autologin` backend and a DB-backed `auth_db` backend, both using the `icingaweb2` resource
`roles.ini`	Grants all users (``) full permissions (``) under the `Administrators` role

resources.ini (rendered example)

[icinga2]
type     = "db"
db       = "mysql"
host     = "db-other-us-east-011"
dbname   = "icingadb"
username = "icingadb"
password = "<ido_db_password>"
charset  = "utf8mb4"

Director module

The Director module is configured directly within monitoring/init.sls (not a separate state file). It:

Creates /etc/icingaweb2/modules/director/ with ownership www-data:icingaweb2 and mode 2770.
Writes /etc/icingaweb2/modules/director/config.ini pointing the Director at the Icinga2 API on 127.0.0.1:5665.
Enables the module via icingacli module enable director (runs as www-data, idempotent with unless: guard).

modules/director/config.ini.jinja

{%- set api_user = salt['pillar.get']('monitoring:icinga_api_user', 'root') %}
{%- set api_pass = salt['pillar.get']('monitoring:icinga_api_password') %}

[db]
resource = "icingaweb2"

[icinga2]
api_host     = "127.0.0.1"
api_port     = 5665
api_username = "{{ api_user }}"
api_password = "{{ api_pass }}"

Prometheus state

monitoring/prometheus.sls installs and configures the Prometheus metrics server. It manages four categories of resources: the package, the defaults file (retention), the prometheus.yml configuration, and the file_sd target files.

Retention

Retention is controlled via ARGS in /etc/default/prometheus. The pillar key monitoring:prometheus:retention defaults to 15d in the state itself, but the public pillar default (pillar/monitoring/init.sls) sets it to 30d.

/etc/default/prometheus

ARGS="--storage.tsdb.retention.time=30d --web.enable-lifecycle"

prometheus.yml

The configuration file at /etc/prometheus/prometheus.yml defines a global scrape interval of 30s and one scrape_config per exporter type, all backed by file_sd_configs pointing at JSON files under /etc/prometheus/file_sd/. Prometheus watches the directory and reloads targets every 5 minutes.

prometheus.yml

global:
  scrape_interval:     30s
  evaluation_interval: 30s

scrape_configs:
  - job_name: 'node'
    file_sd_configs:
      - files: ['/etc/prometheus/file_sd/node.json']
        refresh_interval: 5m

  - job_name: 'mysqld'
    file_sd_configs:
      - files: ['/etc/prometheus/file_sd/mysqld.json']
        refresh_interval: 5m

  - job_name: 'haproxy'
    file_sd_configs:
      - files: ['/etc/prometheus/file_sd/haproxy.json']
        refresh_interval: 5m

  - job_name: 'redis'
    file_sd_configs:
      - files: ['/etc/prometheus/file_sd/redis.json']
        refresh_interval: 5m

  - job_name: 'statsd'
    file_sd_configs:
      - files: ['/etc/prometheus/file_sd/statsd.json']
        refresh_interval: 5m

  - job_name: 'phpfpm'
    file_sd_configs:
      - files: ['/etc/prometheus/file_sd/phpfpm.json']
        refresh_interval: 5m

  - job_name: 'opensearch'
    file_sd_configs:
      - files: ['/etc/prometheus/file_sd/opensearch.json']
        refresh_interval: 5m

file_sd auto-discovery

The file_sd directory at /etc/prometheus/file_sd/ contains one JSON file per exporter type. Each file is rendered from a Jinja template that iterates the dns_hosts pillar, filters by hostname prefix, and writes one target entry per matching host. Adding a host to dns_hosts and re-applying monitoring.prometheus is all that is required to register it as a new scrape target.

file_sd/node.json.jinja — all hosts

{%- set dns_hosts = salt['pillar.get']('dns_hosts', {}) %}
{%- set entries = [] %}
{%- for hostname, data in dns_hosts.items() %}
{%- set clean = hostname.split('.')[0] %}
{%- do entries.append('  {"targets": ["' ~ data.ip ~ ':9100"], "labels": {"instance": "' ~ clean ~ '"}}') %}
{%- endfor %}
[
{{ entries | join(',\n') }}
]

file_sd/mysqld.json.jinja — db* hosts only

{%- set dns_hosts = salt['pillar.get']('dns_hosts', {}) %}
{%- set entries = [] %}
{%- for hostname, data in dns_hosts.items() if hostname.startswith('db') %}
{%- set clean = hostname.split('.')[0] %}
{%- do entries.append('  {"targets": ["' ~ data.ip ~ ':9104"], "labels": {"instance": "' ~ clean ~ '"}}') %}
{%- endfor %}
[
{{ entries | join(',\n') }}
]

The complete set of scrape jobs and their target filters:

Job	Port	Target filter (`dns_hosts` prefix)
`node`	9100	All hosts
`mysqld`	9104	`db*`
`haproxy`	9101	`proxy*`
`redis`	9121	`redis*`
`statsd`	9102	`monitoring*`
`phpfpm`	9253	`apps`, `mw`, `staging*`
`opensearch`	9114	`opensearch*`

Grafana state

monitoring/grafana.sls installs Grafana from the official APT repository and configures it with a pre-provisioned Prometheus datasource.

grafana.ini

The main configuration file is rendered from grafana.ini.jinja. It sets the HTTP port to 3000, locks down sign-ups and anonymous access, and injects the admin credentials from pillar.

grafana.ini.jinja

[server]
http_port = 3000
domain    = grafana.wikioasis.org
root_url  = %(protocol)s://%(domain)s/
serve_from_sub_path = false

[database]
type = sqlite3
path = grafana.db

[security]
admin_user     = {{ admin_user }}
admin_password = {{ admin_pass }}

[users]
allow_sign_up = false

[auth.anonymous]
enabled = false

Prometheus datasource provisioning

/etc/grafana/provisioning/datasources/prometheus.yml is written from datasource.yml.jinja at apply time. This means Grafana starts with the Prometheus datasource already registered — no manual UI steps required.

Nginx vhost

A dedicated Nginx site grafana.conf is enabled (symlinked into sites-enabled/) and triggers an Nginx reload on change via watch_in.

Pillar reference

The following keys from the public pillar (pillar/monitoring/init.sls) are consumed by the monitoring state. Passwords and other secrets are stored in the private pillar and are not listed here — see the private pillar reference for those values.

monitoring namespace (public pillar)

Pillar key	Default	Description
`monitoring:icinga_api_user`	`root`	Username for the Icinga2 API user object
`monitoring:ido_db_name`	`icingadb`	MariaDB database name for IDO
`monitoring:ido_db_user`	`icingadb`	MariaDB user for IDO
`monitoring:web_db_name`	`icingaweb`	MariaDB database for Icingaweb2 session/config
`monitoring:director_db_name`	`icingaweb`	MariaDB database for Director module
`monitoring:director_db_user`	`icingadb`	MariaDB user for Director
`monitoring:grafana:admin_user`	`admin`	Grafana admin username
`monitoring:prometheus:retention`	`30d`	TSDB retention period passed to `--storage.tsdb.retention.time`

notifications namespace

Pillar key	Description
`notifications:discord_webhook_url`	Discord incoming webhook URL for Icinga2 alerts
`notifications:slack_webhook_url`	Slack incoming webhook URL for Icinga2 alerts

pillar/monitoring/init.sls (defaults)

monitoring:
  icinga_api_user: root
  ido_db_name: icingadb
  ido_db_user: icingadb
  web_db_name: icingaweb
  director_db_name: icingaweb
  director_db_user: icingadb

  grafana:
    admin_user: admin

  prometheus:
    retention: 30d

Passwords for IDO, Director, the Icinga2 API, Grafana admin, and both webhook URLs are stored in the private encrypted pillar with no defaults. The state will error at render time if any are missing.

Apply commands

Full monitoring stack
Monitoring init only
Prometheus only
Grafana only

Apply everything assigned to the monitoring server (monitoring state + all sub-states via top.sls):

salt 'monitoring*' state.apply

Apply just Icinga2, Icingaweb2, MariaDB, and Nginx:

salt 'monitoring*' state.apply monitoring

Re-generate prometheus.yml and all file_sd JSON target files, then reload Prometheus:

salt 'monitoring*' state.apply monitoring.prometheus

Update grafana.ini and the provisioned datasource, then restart Grafana:

salt 'monitoring*' state.apply monitoring.grafana

After adding a new host to the dns_hosts pillar, run salt 'monitoring*' state.apply monitoring.prometheus to regenerate the file_sd JSON files. Prometheus will pick up the new targets within the 5-minute refresh_interval without a restart because --web.enable-lifecycle is set.

Core

Web Tier

Data & Search

Infrastructure

Monitoring

Monitoring State: Icinga2, Prometheus, and Grafana

Icinga2 + Icingaweb2

Prometheus

Grafana

Packages installed

Icinga2 configuration

Notification webhook scripts

Icingaweb2 configuration

Director module

Prometheus state

Retention

prometheus.yml

file_sd auto-discovery

Grafana state

grafana.ini

Prometheus datasource provisioning

Nginx vhost

Pillar reference

Apply commands

Build docs developers (and LLMs) love

Core

Web Tier

Data & Search

Infrastructure

Monitoring

Documentation Index

Icinga2 + Icingaweb2

Prometheus

Grafana

​Packages installed

​Icinga2 configuration

​Notification webhook scripts

​Icingaweb2 configuration

​Director module

​Prometheus state

​Retention

​prometheus.yml

​file_sd auto-discovery

​Grafana state

​grafana.ini

​Prometheus datasource provisioning

​Nginx vhost

​Pillar reference

​Apply commands

Build docs developers (and LLMs) love

Packages installed

Icinga2 configuration

Notification webhook scripts

Icingaweb2 configuration

Director module

Prometheus state

Retention

prometheus.yml

file_sd auto-discovery

Grafana state

grafana.ini

Prometheus datasource provisioning

Nginx vhost

Pillar reference

Apply commands