1. 28 Aug, 2021 1 commit
  2. 01 Aug, 2021 1 commit
  3. 18 Jul, 2021 2 commits
    • Evangelos Foutras's avatar
      Split storage box monitoring into new text collector · c844d0cb
      Evangelos Foutras authored
      This was previously monitored as part of the borg text collector, but
      now that it only runs after each backup (instead of hourly) the stats
      from monitoring.archlinux.org do not remain accurate for long. Switch
      back to hourly checks of the storage box's disk usage by adding a new
      text collector just for this purpose.
    • Evangelos Foutras's avatar
      Run borg-textcollector after each backup completes · 68def695
      Evangelos Foutras authored
      Instead of gathering borg statistics every hour or so, run the text
      collector script only once after each borg-backup service finishes.
      Also split the borg text collector script into two similar scripts,
      where each one gathers borg statistics for its respective borg host.
  4. 17 Jul, 2021 1 commit
    • Evangelos Foutras's avatar
      Use RandomizedDelaySec=30min in Borg TextCollector · 3aa4d49f
      Evangelos Foutras authored
      Doing this in an attempt to be kind to our Borg hosts in cases where the
      prometheus-borg-textcollector.timer is restarted on all hosts and avoids
      having all machines querying the Borg hosts within the same minute. Only
      downside is that the timers will trigger every 75-ish minutes instead of
      exactly every hour, but this should not be a problem.
  5. 12 Jul, 2021 1 commit
  6. 13 May, 2021 1 commit
  7. 07 Mar, 2021 1 commit
    • Jelle van der Waa's avatar
      Adjust prometheus textcollector condition · 9f9378b2
      Jelle van der Waa authored
      Currently our textcollector can sometimes fail with 'Failed to
      create/acquire the lock /home/backup/$server/lock.exclusive (timeout)."
      Instead of checking on a borg lock file, check if our backup snapshot
      dir exists which the backup script creates and removes. This should give
      less false positives then our current method.
  8. 01 Mar, 2021 1 commit
  9. 11 Feb, 2021 1 commit
    • Jelle van der Waa's avatar
      Add correct After targets for prometheus_exporters · 042ff9cf
      Jelle van der Waa authored
      arch-audit hung as it was started before there was a working internet
      connection. To work around this issue, add proper Wants/After targets
      for network-online.target and for the rebuilderd textcollector let it
      start after rebuilderd.service is "online".
  10. 26 Jan, 2021 1 commit
    • Jelle van der Waa's avatar
      Add a btrfs prometheus exporter · 8ea35153
      Jelle van der Waa authored
      Collect prometheus btrfs errors from the btrfs command from btrfs-progs
      which since 5.10 supports json output for device stats. The collected
      errors will in the future trigger an alert when the errors reach a
      certain treshold.
  11. 14 Dec, 2020 1 commit
    • Jelle van der Waa's avatar
      Add archive specific monitoring · 4658d36d
      Jelle van der Waa authored
      To monitor our archive mirrors and the archive size itself a new
      textcollector has been added. This will allow us to monitor the archive
      growth and the sync rate to mirrors.
  12. 06 Oct, 2020 1 commit
    • Jelle van der Waa's avatar
      Add rebuilderd_results Prometheus metric · 7abc2500
      Jelle van der Waa authored
      To monitor if reproducible builds are going in the right direction,
      record the good/bad/unknown metrics from rebuilderd with a Prometheus
      textcollector for a Grafana dashboard to display a long term trend.
      A Python script is required to handle data collection as obtaining the
      status with jq/bash is non trivial and cannot easily dnyamically collect
      suites and statuses.
      Closes: #146
  13. 21 Sep, 2020 1 commit
  14. 12 Sep, 2020 1 commit
  15. 06 Sep, 2020 2 commits
    • Jelle van der Waa's avatar
      Add rebuilderd build queue length textcollector · cd4b2844
      Jelle van der Waa authored
      Record the rebuilderd queue length in prometheus so we can generate an
      alert for when the queue length keeps rising. As this could be an
      indication that the rebuilders have builds which are stuck.
    • Jelle van der Waa's avatar
      Introduce prometheus exporters role for collection · 23564b29
      Jelle van der Waa authored
      Add a new role called prometheus_exporters which should be run on every
      machine we have and starts different collectors depending on what group
      the machine is in. Currently supported our the gitlab runner exporter,
      rebuilder textcollector, mysqld-exporter, borg textcollector and an
      node/arch exporter. The arch exporter monitors the security status and
      pacman out of date packages gauge.