- 23 May, 2021 1 commit
-
-
Closes: #332 Signed-off-by:
Leonidas Spyropoulos <artafinde@gmail.com>
-
- 14 May, 2021 1 commit
-
-
Kristian Klausen authored
-
- 13 May, 2021 2 commits
-
-
Kristian Klausen authored
-
Co-authored-by:
Kristian Klausen <kristian@klausen.dk>
-
- 11 Apr, 2021 1 commit
-
-
Jelle van der Waa authored
We want to get notifications of pacman/arch-audit notifications grouped as otherwise we'll be spammed with ~ X emails for every host. Closes: #191
-
- 07 Apr, 2021 1 commit
-
-
Jelle van der Waa authored
Re introduce the arch-audit rule as arch-audit no longer reports false positives from [testing]. Lax the high cpu alert as our mediawiki instance is perfectly fine running on 85% CPU for some time, and lax our disk will fill within X alert as our borg backups generate enough data in a short time to trigger the 4 hour alarm.
-
- 27 Mar, 2021 1 commit
-
-
Jelle van der Waa authored
Our qemu TCG builds generate plenty of cpu usage over time to be excluded from warnings.
-
- 01 Mar, 2021 4 commits
-
-
Jelle van der Waa authored
Our dedicated servers are fairly slow when rebooting and are then not available for 5 minutes, which means a ServiceDown notification is send for a normal reboot.
-
Jelle van der Waa authored
The value of the expr is not really useful as of now, but if we show the value of probe_ssl_earliest_cert_expiry it should show the date when the cert expires.
-
Jelle van der Waa authored
Prometheus alerts $value is the result of the expression, so it will be the amount of seconds since the last backup and not the last backup date.
-
Jelle van der Waa authored
-
- 27 Feb, 2021 1 commit
-
-
Sven-Hendrik Haase authored
gemini takes a long time to run backups and it would sometimes produce false positives for not having backed up for some time. The higher threshold should help with those false positives.
-
- 14 Feb, 2021 1 commit
-
-
Kristian Klausen authored
yaml: truthy value should be one of [false, true] (truthy) yaml: wrong indentation: expected 4 but found 2 (indentation) yaml: too few spaces before comment (comments) yaml: missing starting space in comment (comments) yaml: too many blank lines (1 > 0) (empty-lines) yaml: too many spaces after colon (colons) yaml: comment not indented like content (comments-indentation) yaml: no new line character at the end of file (new-line-at-end-of-file) load-failure: Failed to load or parse file parser-error: couldn't resolve module/action 'hosts'. This often indicates a misspelling, missing collection, or incorrect module path.
-
- 31 Jan, 2021 1 commit
-
-
- 26 Jan, 2021 1 commit
-
-
Jelle van der Waa authored
Collect prometheus btrfs errors from the btrfs command from btrfs-progs which since 5.10 supports json output for device stats. The collected errors will in the future trigger an alert when the errors reach a certain treshold.
-
- 03 Jan, 2021 2 commits
-
-
- 18 Dec, 2020 1 commit
-
-
Jelle van der Waa authored
Rebuilderd-workers are expected to have a high cpu load. Closes: #240
-
- 16 Dec, 2020 1 commit
-
-
Frederik Schwan authored
-
- 29 Oct, 2020 1 commit
-
-
Jelle van der Waa authored
Closes: #166
-
- 06 Oct, 2020 1 commit
-
-
Jelle van der Waa authored
Start monitoring prometheus to keep check of the database growth rate and retain data for a longer preriod in prometheus.
-
- 20 Sep, 2020 1 commit
-
-
- 18 Sep, 2020 1 commit
-
-
3 days is a bit too late. Certbot renews the certificate 30 days before, so 25 days should be safe and shouldn't cause any "false positives" due to transient errors.
-
- 06 Sep, 2020 2 commits
-
-
Jelle van der Waa authored
Record the rebuilderd queue length in prometheus so we can generate an alert for when the queue length keeps rising. As this could be an indication that the rebuilders have builds which are stuck.
-
Jelle van der Waa authored
Run the blackbox exporter on monitoring.archlinux.org to monitor other machines http status for public services we provide. Also has an alert for when a certificate is about to expire in 3 days.
-
- 31 Aug, 2020 1 commit
-
-
Jelle van der Waa authored
Introduce a new monitoring server with prometheus and alertmanager for monitoring all our boxes.
-