-
Jelle van der Waa authored
Collects the smart data using smartctl and outputs them in the textcollector dir. This expects smartd to be configured to regularly self tests on a regular interval to detect if a disk is broken.
Jelle van der Waa authoredCollects the smart data using smartctl and outputs them in the textcollector dir. This expects smartd to be configured to regularly self tests on a regular interval to detect if a disk is broken.
Monitoring
All of our servers are monitored using Prometheus, exporters on the to be monitored machines have a firewall rule configured to allow connections from monitoring.archlinux.org for the specific exporter port. To access our monitoring system, go to https://monitoring.archlinux and log in via your Arch Linux SSO credentials.
Adding a new host to monitoring
- Add $host to node_exporters in
hosts
- Rollout exporter on host:
ansible-playbook playbooks/host.yml -t prometheus_exporters
- Rollout changes on monitoring host:
ansible-playbook playbooks/monitoring.archlinux.org.yml -t prometheus
System
For general system performance monitoring prometheus-node-exporter is used in combination with a textfile collector for Arch Linux specific and btrfs metrics. A systemd service/timer 'prometheus-arch-textcollector' writes the amount of out of date packages and security updates. For btrfs monitoring, btrfs device stats
is executed on all btrfs devices on the system and all errors stats are record. When running the prometheus_exporters role the node-exporter, arch textcollector and btrfs textcollector is automatically added.
memcached
prometheus-memcached-exporter is used for monitoring. Adding memcached monitoring to a host is as simple as:
- Add the host to the
memcached
group - Add
memcached_socket
to thehost_vars
of the machine with the location of the memcached socket - Rollout exporter on host:
ansible-playbook playbooks/host.yml -t prometheus_exporters
Borg
For monitoring our borg backups prometheus-node-exporter's textfile collector feature is used, the textfile is written by a systemd service called prometheus-borg-textcollector. Borg's last backup time is recorded for our Hetzner and rsync.net backups. Adding monitoring to a system is as simple as:
- Add the host to the
borg_clients
group - Rollout exporter on host:
ansible-playbook playbooks/host.yml -t prometheus_exporters
rebuilderd
The rebuilderd instance Arch Linux hosts is monitored using prometheus-node-exporter's textfile collector feature which periodically collects data using a prometheus-rebuilderd-textcollector.timer. The 'rebuilderd-textcollector.sh' script collects the queue length and amount of working rebuilders to monitor if the rebuilderd queue keeps growing forever or rebuilderd workers stopped working. The 'rebuilderd-status-textcollector.py' script collects the rebuilderd status good, bad and unknown packages per repository for keeping tracking of the reproducible builds progress. Adding monitoring for rebuilderd:
- Add the rebuilderd instance to the
rebuilderd
group - Rollout exporter on host:
ansible-playbook playbooks/host.yml -t prometheus_exporters
MySQL
For monitoring MySQL prometheus-mysqld-exporter configured to use a separate user for obtaining MySQL statistics.
- Add the host to the
mysql_servers
group - Rollout exporter on host:
ansible-playbook playbooks/host.yml -t prometheus_exporters
Keycloak
For monitoring Keycloak keycloak-metrics-spi is used, which exports basic Keycloak user events such as logins, errors and registration errors. The exporter is automatically configured when running the keycloak role and it's hardcoded in our prometheus configuration. The prometheus endpoint is protected with basic auth configured in the role and the endpoint is hardcoded in our prometheus configuration.