Commits · f9927e8d366b470b412119a75d04cf05c40601ea · Arch Linux / infrastructure

Jan 12, 2025

debuginfod: fix systemd 257 compatibility · f9927e8d

Since 257 DynamicUser sets PrivateTmp=disconnected making debuginfod
unable to read/write to /var/tmp/ properly  hampering debuginfod's
functioning.

Verified

f9927e8d

debuginfod: fix reloading of debuginfod service · e31a4e2c
Jelle van der Waa authored 2 months ago
```
You can't `systemctl reload debuginfod` after installing the systemd
unit we need a daemon-reload.
```
Verified

e31a4e2c

Remove obsolete dev.archlinux.org subdomain · ab567991

Kristian Klausen authored 2 months ago

This was apparently hosted on the long gone "apollo" server[1], and when
archweb was migrated to a dedicated cloud VM, it was changed to a
redirect to the main site (archlinux.org)[2][3].

It may have made sense at the time, but now four years later there is no
reason for keeping this around.

I guess dev.archlinux.org was something similar to what pkgbuild.com is
today ("Public HTML server" for staff), but only for developers.

[1] f6c3af0e ("Merge branch 'apollo_decomission' into 'master'")
[2] 824fb084 ("tf-stage1/archlinux: Change DNS records for the archweb migration and also increase the machine size")
[3] 9800d023 ("roles/archweb: Create domain redirects for the domains that point to specific archweb sub urls.")

Verified

ab567991

Jan 11, 2025
- archweb: update to latest release · ff113f77
  Jelle van der Waa authored 2 months ago
  
  Verified
  
  ff113f77
Jan 05, 2025

prometheus: Fix syntax issue in node rules · d3d0180b

Christian Heusel authored 2 months ago


Somehow these changes were not directly applied even though the role
reloads the prometheus config.

Fixes: 10475a62 ("prometheus: Alert if a build hosts is OOM for 12h")
Signed-off-by: Christian Heusel <christian@heusel.eu>

Verified

d3d0180b

Jan 04, 2025
- dbscripts: Remove obsolete repo helpers · d324d5c9
  Kristian Klausen authored 2 months ago
  
  Fixes: 4159a61f ("dbscripts: switch to Git packaging")
  Verified
  
  d324d5c9
- prometheus: Alert if a build hosts is OOM for 12h · 10475a62
  Christian Heusel authored 2 months ago
  
  Signed-off-by: Christian Heusel <christian@heusel.eu>
  Verified
  
  10475a62
Jan 03, 2025

prometheus: Disable the OOM alert for build hosts · 0794f65c

Christian Heusel authored 2 months ago

There is not much value in knowing when one of our build hosts has no
more memory left as all of them have plenty of swap available.
Additionally these rules trigger quite often even for short spikes.

Signed-off-by: Christian Heusel <christian@heusel.eu>

Verified

0794f65c

Dec 30, 2024
- firewalld: rebase firewalld.conf to firewalld 2.3.0-1 · de26c277
  Evangelos Foutras authored 3 months ago
  
  Verified
  
  de26c277
Dec 27, 2024

dovecot: Adapt mediator mailbox for new mediators · 6023dcd7

Christian Heusel authored 2 months ago

Link: https://lists.archlinux.org/archives/list/staff@lists.archlinux.org/message/3JNJQJ7O3LBIXST2EFBCAKM2HCHXJZUM/


Signed-off-by: Christian Heusel <christian@heusel.eu>

Verified

6023dcd7

archwiki: Update to 1.42.4-2 · 5a771aaf
Christian Heusel authored 3 months ago
```
Signed-off-by: Christian Heusel <christian@heusel.eu>
```
Verified

5a771aaf

Dec 25, 2024

postfix: Remove obsolete "Remove old files" task · d054afa9

Kristian Klausen authored 3 months ago

This was added more than 7 years ago[1] and has not been relevant for a
long time.

[1] d32ce421 ("postfix: Remove compat_maps")

Verified

d054afa9

postfix: Restrict authenticated senders to their own address(es) · 42d8aef2

Kristian Klausen authored 3 years ago

Please see the reject_authenticated_sender_login_mismatch option[1] for
more details.

For now service accounts are not restricted in any way, this should be
improved in the further.

[1] https://www.postfix.org/postconf.5.html#reject_authenticated_sender_login_mismatch

Fix #365

Verified

42d8aef2

postfix: Clean up main.tf thoroughly · 92cf7ac9

Kristian Klausen authored 3 years ago

This removes unnecessary parameters, mostly for one of these reasons:
the value is already the default value, the default value is good
enough, or the parameter is not used in our case.

A bit of reordering/"tidying" was also done.

Verified

92cf7ac9

postfix: Don't disable IPv6 for gmail.com · 23dc9a0e

Kristian Klausen authored 3 years ago

I think it was added to improve the mail reputation (avoid being
filtered as spam), but at lot has changed since it was added (+5 years
ago), so let's remove it.

Verified

23dc9a0e

postfix: Remove empty files · ae493c50
Kristian Klausen authored 3 years ago

Verified

ae493c50

Dec 23, 2024

Improve time robustness by switching to chrony, trustworthy time sources and NTS · 6d8afe73

Kristian Klausen authored 7 months ago

From chrony FAQ[1]:
"1.2. Should I prefer chrony over timesyncd if I do not need to run a
server?

Generally, yes.

systemd-timesyncd is a very simple NTP client included in the systemd
suite. It lacks almost all features of chrony and other advanced client
implementations listed on the comparison page. One of its main
limitations is that it cannot poll multiple servers at the same time and
detect servers having incorrect time (falsetickers in the NTP
terminology). It should be used only with trusted reliable servers,
ideally in local network.

Using timesyncd with pool.ntp.org is problematic. The pool is very
robust as a whole, but the individual servers run by volunteers cannot
be relied on. Occasionally, servers drift away or make a step to distant
past or future due to misconfiguration, problematic implementation, and
other bugs (e.g. in firmware of a GPS receiver). The pool monitoring
system detects such servers and quickly removes them from the pool DNS,
but clients like timesyncd cannot recover from that. They follow the
server as long as it claims to be synchronised. They need to be
restarted in order to get a new address from the pool DNS.

Note that the complexity of NTP and clock synchronisation is on the
client side. The amount of code in chrony specific to NTP server is very
small and it is disabled by default. If it was removed, it would not
significantly reduce the amount of memory or storage needed."

This commit fixes the issue by switching to a proper NTP client
(chrony), trustworthy time sources from Netnod and
Physikalisch-Technische Bundesanstalt which distributes the official
time for Sweden[2] and Germany[3] respectively, and finally NTS is used
to protect against MITM attacks.

Since most of our servers are in Germany or Finland (close to Sweden),
it makes sense to use these time sources as a low round-trip delay[4] is
preferred for NTP. For the few servers[5] we have outside Europe, the
root delay[4] will be higher than desired, but with the current use-case
for these servers, it should not be a problem.

[1] https://chrony-project.org/faq.html#_should_i_prefer_chrony_over_timesyncd_if_i_do_not_need_to_run_a_server
[2] https://www.netnod.se/swedish-distributed-time-service
[3] https://www.ptb.de/cms/en/ptb/fachabteilungen/abt4/fb-44/ag-442/dissemination-of-legal-time.html
[4] https://blog.meinbergglobal.com/2021/02/25/the-root-of-all-timing-understanding-root-delay-and-root-dispersion-in-ntp/
[5] {america,asia,sydney}.mirror.pkgbuild.com

Verified

6d8afe73

Migrate 'with_X' to 'loop' · 701c1d01
Robin Candau authored 3 months ago and Robin Candau committed 3 months ago

701c1d01
Use 'Start and enable ' in all corresponding systemd_service: tasks · a5f41049
Robin Candau authored 3 months ago and Robin Candau committed 3 months ago

a5f41049
Rename systemd module to systemd_service · 934db48d
Robin Candau authored 3 months ago and Robin Candau committed 3 months ago
```
See https://github.com/ansible/ansible/pull/77644
```
934db48d

Dec 22, 2024

grafana: Add simple dashboard for Nginx cache and HTTP/TLS version stats · 99977f2f

Kristian Klausen authored 3 months ago

It has three panels showing "Cache hit ratio", "HTTP version" and "TLS
version".

The metrics are generated with Loki recording rules (see the previous
three commits).

Verified

99977f2f

loki/nginx: Add recording rule for tracking upstream cache status[1] · a0ef2cb7

Kristian Klausen authored 7 months ago

This enables us to calculate the cache hit ratio, which may help
determine whether more caching would be beneficial.

Please note that this only counts requests for which caching is enabled
(e.g. {fastcgi,proxy}_cache is configured), e.g. for static served files
cache_status will be "".

[1] http://nginx.org/en/docs/http/ngx_http_upstream_module.html#var_upstream_cache_status

Verified

a0ef2cb7

loki/nginx: Add recording rule for tracking HTTP/TLS version and cipher · a08f7960
Kristian Klausen authored 7 months ago
```
Mainly because we are curious. The data may also be used to decide if we
want to drop older versions of TLS.
```
Verified

a08f7960

loki/prometheus: Add plumbing for using loki recording rules[1] · c9e9b3c6

Kristian Klausen authored 7 months ago

The plan is to use this for creating metrics from the nginx log lines
(e.g. requests per second).

[1] https://grafana.com/docs/loki/latest/alert/#recording-rules

Verified

c9e9b3c6

prometheus: Fix "cli configuration" changes not taking effect automatically · 12fbdc54
Kristian Klausen authored 3 months ago
```
Fixes: bd19c007 ("Add configuration to retain prometheus data for 1 year")
```
Verified

12fbdc54

Dec 16, 2024

mariadb: Add switch for innodb_snapshot_isolation · 79d069df

Christian Heusel authored 3 months ago

With the 11.6.2 release mariadb has made snapshot isolation a default
setting which recently caused issues on the AUR and the Forums, where
the respective database engines would fail due to the table changing
since the last time they read it:

    PHP Fatal error:  Uncaught mysqli_sql_exception: Record has changed since last read in table 'fluxbb_online' in /srv/http/fluxbb/include/dblayer/mysqli_innodb.php:79
    Stack trace:
    #0 /srv/http/fluxbb/include/dblayer/mysqli_innodb.php(79): mysqli_query()
    #1 /srv/http/fluxbb/include/functions.php(485): DBLayer->query()
    #2 /srv/http/fluxbb/include/common.php(162): update_users_online()
    #3 /srv/http/fluxbb/viewtopic.php(10): require('...')
    #4 {main}

We therefore introduce a config switch to restore the old behavior and
apply the changed setting on the two services.

Link: https://mariadb.com/kb/en/mariadb-11-6-2-release-notes/#innodb
Link: https://bbs.archlinux.org/viewtopic.php?id=301802
Link: aurweb#525


Signed-off-by: Christian Heusel <christian@heusel.eu>

Verified

79d069df

Dec 15, 2024

gitlab: Fix bot-token-extender script not extending project membership · 70901d06

Kristian Klausen authored 3 months ago

The project membership must also be extended, if not the user is simply
deleted when the membership expires (defeating the purpose of extending
the access tokens).

Fixes: 639101e6 ("gitlab: Add ruby script for continuous extending of bot tokens")

Verified

70901d06

Add alert for Fastly cost · ec6296bf

Kristian Klausen authored 1 year ago

If the cost exceeds $0, it indicates that we have run out of credit
and/or are doing something wrong, in either case we want to be alerted.

Verified

ec6296bf

Remove the WG private keys from the vault and store them only on the servers · 27553ab3

Kristian Klausen authored 3 months ago

With the support for network.wireguard.* credentials[1] in systemd
v256[2], we can now easily avoid storing the credentials centrally in
our ansible vault, which is preferable as it makes the private keys less
exposed. It may also make fine-grained access easier in the future[3] as
there is no longer a vault file for each server.

All the keys have been rotated and the new private keys are only stored
on the servers.

[1] https://github.com/systemd/systemd/pull/30826
[2] https://github.com/systemd/systemd/releases/tag/v256
[3] #64

Verified

27553ab3

gitlab_runner: Boot VMs in UEFI mode · f756d51f

Kristian Klausen authored 4 months ago

There is no technical reason for this at the moment, but UEFI is the de
facto firmware for x86-64, so let's be modern.

Verified

f756d51f

gitlab_runner: Allow discard requests from VMs to be passed to the filesystem · 732a92a4
Kristian Klausen authored 4 months ago
```
This should not change anything as the VMs are short-lived (15 minutes
at the most), so it is just added for good measure.
```
Verified

732a92a4

gitlab_runner: Enable free-page-reporting for VMs · 87a54893

Kristian Klausen authored 4 months ago

From the kernel patch series[1]:
"This series provides an asynchronous means of reporting free guest
pages to a hypervisor so that the memory associated with those pages can
be dropped and reused by other processes and/or guests on the host.
Using this it is possible to avoid unnecessary I/O to disk and greatly
improve performance in the case of memory overcommit on the host."[1]

The runner hosts may be memory overcommitted if there is too many
running VMs and containers at the same time, which this should help to
avoid.

[1] https://lore.kernel.org/linux-mm/20200211224416.29318.44077.stgit@localhost.localdomain/

Verified

87a54893

gitlab_runner: Switch to new libvirt-executor image[1] from arch-boxes · fefa51a5

Kristian Klausen authored 4 months ago

It makes more sense to build the image in arch-boxes than building it on
each runner, especially considering that arch-boxes already have all the
necessary infrastructure, so we can avoid maintaining similar code in
two repositories and avoid running losetup, mount, arch-chroot etc. (as
root) on the runners.

The arch-boxes MR[1] has a little more context.

[1] arch-boxes!200

Verified

fefa51a5

Revert "gitlab_runner: Initial the keyring in the base image for faster boot" · 8e22de79

Kristian Klausen authored 4 months ago

This reverts commit 466230e4.

This has been fixed in pacman[1], so it is no longer unreasonably slow.
Some quick testing at runner1 indicates that this only saves five
seconds at best, so IMO it is not worth the complexity to continue doing
this.

This revert does not revert the timeout back to 60 seconds, but keeps it
at 30 seconds.

[1] pacman/pacman!16

Verified

8e22de79

gitlab_runner: Inject the SSH public key at boot rather than burning it into the VM image · 3406ca5a

Kristian Klausen authored 4 months ago

This means that there is no need to make runner-specific changes to the
image, so in theory the image could be build centrally (e.g. in the
arch-boxes project[1]) and then distributed to the runner hosts.

This change also make the SSH keys ephemeral.

[1] https://gitlab.archlinux.org/archlinux/arch-boxes

Verified

3406ca5a

gitlab_runner: Remove tight coupling to libvirt filesystem pool · cc6195f3

Kristian Klausen authored 4 months ago

All libvirt volume management is now handled through virsh instead of
direct file system access. As a volume cannot be uploaded in an atomic
way, the current active volume is now tracked in a file on disk.

This may allow us to run the script with less privileges and use polkit
for libvirt access control[1].

[1] https://libvirt.org/aclpolkit.html

Verified

cc6195f3

gitlab_runner: Fix broken "Running on ..." in the libvirt-executor · 001300ff

Kristian Klausen authored 4 months ago

The prepare stage runs "echo "Running on $(hostname)...""[1], resulting
in "bash: line 7: hostname: command not found" and it outputting
"Running on ..." as the hostname command is provided by inetutils, which
is not installed.

Fix it by "monkey patching" it to use "hostnamectl hostname" and inject
the hostname with SMBIOS[2][3]. Injecting creds with SMBIOS may also be
useful in the future, e.g. for injecting an ephemeral SSH public key.

[1] https://gitlab.com/gitlab-org/gitlab-runner/-/blob/v17.5.2/shells/bash.go?ref_type=tags#L452-L456
[2] https://systemd.io/CREDENTIALS/
[3] https://github.com/systemd/systemd/pull/30814

Verified

001300ff

gitlab_runner: Reduce libvirt-executor noise in the job log · 2a1a488a

Kristian Klausen authored 4 months ago

This removes 13 instances of [1] and 1 instance of the IP address from
the job log.

The latter was fixed by no longer waiting for SSH in the "run" stage,
which is unnecessary as we wait for SSH in the "prepare" stage.

[1] Warning: Permanently added '192.168.122.xxx' (ED25519) to the list of known hosts.

Verified

2a1a488a

gitlab_runner: Move VM memory to a variable instead of hardcoding it twice · a7e949b6

Kristian Klausen authored 4 months ago

It was forgotten once[1] to update it in both places, so avoid that
issue in the future, by moving it to a variable.

[1] c370c9d0 ("gitlab_runner: Update concurreny math to reflect the new VM size")

Verified

a7e949b6

gitlab_runner: Fix incorrect permissions for the domain_template.xml file · 923550e4
Kristian Klausen authored 4 months ago

Verified

923550e4

Admin message