- Jan 12, 2025
-
-
Jelle van der Waa authored
Since 257 DynamicUser sets PrivateTmp=disconnected making debuginfod unable to read/write to /var/tmp/ properly hampering debuginfod's functioning.
-
Jelle van der Waa authored
You can't `systemctl reload debuginfod` after installing the systemd unit we need a daemon-reload.
-
Kristian Klausen authored
This was apparently hosted on the long gone "apollo" server[1], and when archweb was migrated to a dedicated cloud VM, it was changed to a redirect to the main site (archlinux.org)[2][3]. It may have made sense at the time, but now four years later there is no reason for keeping this around. I guess dev.archlinux.org was something similar to what pkgbuild.com is today ("Public HTML server" for staff), but only for developers. [1] f6c3af0e ("Merge branch 'apollo_decomission' into 'master'") [2] 824fb084 ("tf-stage1/archlinux: Change DNS records for the archweb migration and also increase the machine size") [3] 9800d023 ("roles/archweb: Create domain redirects for the domains that point to specific archweb sub urls.")
-
- Jan 11, 2025
-
-
Jelle van der Waa authored
-
- Jan 05, 2025
-
-
Christian Heusel authored
Somehow these changes were not directly applied even though the role reloads the prometheus config. Fixes: 10475a62 ("prometheus: Alert if a build hosts is OOM for 12h") Signed-off-by:
Christian Heusel <christian@heusel.eu>
-
- Jan 04, 2025
-
-
Kristian Klausen authored
Fixes: 4159a61f ("dbscripts: switch to Git packaging")
-
Christian Heusel authored
Signed-off-by:
Christian Heusel <christian@heusel.eu>
-
- Jan 03, 2025
-
-
Christian Heusel authored
There is not much value in knowing when one of our build hosts has no more memory left as all of them have plenty of swap available. Additionally these rules trigger quite often even for short spikes. Signed-off-by:
Christian Heusel <christian@heusel.eu>
-
- Dec 30, 2024
-
-
Evangelos Foutras authored
-
- Dec 27, 2024
-
-
Christian Heusel authored
Signed-off-by:
Christian Heusel <christian@heusel.eu>
- Dec 25, 2024
-
-
Kristian Klausen authored
This was added more than 7 years ago[1] and has not been relevant for a long time. [1] d32ce421 ("postfix: Remove compat_maps")
-
Kristian Klausen authored
Please see the reject_authenticated_sender_login_mismatch option[1] for more details. For now service accounts are not restricted in any way, this should be improved in the further. [1] https://www.postfix.org/postconf.5.html#reject_authenticated_sender_login_mismatch Fix #365
-
Kristian Klausen authored
This removes unnecessary parameters, mostly for one of these reasons: the value is already the default value, the default value is good enough, or the parameter is not used in our case. A bit of reordering/"tidying" was also done.
-
Kristian Klausen authored
I think it was added to improve the mail reputation (avoid being filtered as spam), but at lot has changed since it was added (+5 years ago), so let's remove it.
-
Kristian Klausen authored
-
- Dec 23, 2024
-
-
Kristian Klausen authored
From chrony FAQ[1]: "1.2. Should I prefer chrony over timesyncd if I do not need to run a server? Generally, yes. systemd-timesyncd is a very simple NTP client included in the systemd suite. It lacks almost all features of chrony and other advanced client implementations listed on the comparison page. One of its main limitations is that it cannot poll multiple servers at the same time and detect servers having incorrect time (falsetickers in the NTP terminology). It should be used only with trusted reliable servers, ideally in local network. Using timesyncd with pool.ntp.org is problematic. The pool is very robust as a whole, but the individual servers run by volunteers cannot be relied on. Occasionally, servers drift away or make a step to distant past or future due to misconfiguration, problematic implementation, and other bugs (e.g. in firmware of a GPS receiver). The pool monitoring system detects such servers and quickly removes them from the pool DNS, but clients like timesyncd cannot recover from that. They follow the server as long as it claims to be synchronised. They need to be restarted in order to get a new address from the pool DNS. Note that the complexity of NTP and clock synchronisation is on the client side. The amount of code in chrony specific to NTP server is very small and it is disabled by default. If it was removed, it would not significantly reduce the amount of memory or storage needed." This commit fixes the issue by switching to a proper NTP client (chrony), trustworthy time sources from Netnod and Physikalisch-Technische Bundesanstalt which distributes the official time for Sweden[2] and Germany[3] respectively, and finally NTS is used to protect against MITM attacks. Since most of our servers are in Germany or Finland (close to Sweden), it makes sense to use these time sources as a low round-trip delay[4] is preferred for NTP. For the few servers[5] we have outside Europe, the root delay[4] will be higher than desired, but with the current use-case for these servers, it should not be a problem. [1] https://chrony-project.org/faq.html#_should_i_prefer_chrony_over_timesyncd_if_i_do_not_need_to_run_a_server [2] https://www.netnod.se/swedish-distributed-time-service [3] https://www.ptb.de/cms/en/ptb/fachabteilungen/abt4/fb-44/ag-442/dissemination-of-legal-time.html [4] https://blog.meinbergglobal.com/2021/02/25/the-root-of-all-timing-understanding-root-delay-and-root-dispersion-in-ntp/ [5] {america,asia,sydney}.mirror.pkgbuild.com
-
-
-
- Dec 22, 2024
-
-
Kristian Klausen authored
It has three panels showing "Cache hit ratio", "HTTP version" and "TLS version". The metrics are generated with Loki recording rules (see the previous three commits).
-
Kristian Klausen authored
This enables us to calculate the cache hit ratio, which may help determine whether more caching would be beneficial. Please note that this only counts requests for which caching is enabled (e.g. {fastcgi,proxy}_cache is configured), e.g. for static served files cache_status will be "". [1] http://nginx.org/en/docs/http/ngx_http_upstream_module.html#var_upstream_cache_status
-
Kristian Klausen authored
Mainly because we are curious. The data may also be used to decide if we want to drop older versions of TLS.
-
Kristian Klausen authored
The plan is to use this for creating metrics from the nginx log lines (e.g. requests per second). [1] https://grafana.com/docs/loki/latest/alert/#recording-rules
-
Kristian Klausen authored
Fixes: bd19c007 ("Add configuration to retain prometheus data for 1 year")
-
- Dec 16, 2024
-
-
Christian Heusel authored
With the 11.6.2 release mariadb has made snapshot isolation a default setting which recently caused issues on the AUR and the Forums, where the respective database engines would fail due to the table changing since the last time they read it: PHP Fatal error: Uncaught mysqli_sql_exception: Record has changed since last read in table 'fluxbb_online' in /srv/http/fluxbb/include/dblayer/mysqli_innodb.php:79 Stack trace: #0 /srv/http/fluxbb/include/dblayer/mysqli_innodb.php(79): mysqli_query() #1 /srv/http/fluxbb/include/functions.php(485): DBLayer->query() #2 /srv/http/fluxbb/include/common.php(162): update_users_online() #3 /srv/http/fluxbb/viewtopic.php(10): require('...') #4 {main} We therefore introduce a config switch to restore the old behavior and apply the changed setting on the two services. Link: https://mariadb.com/kb/en/mariadb-11-6-2-release-notes/#innodb Link: https://bbs.archlinux.org/viewtopic.php?id=301802 Link: aurweb#525 Signed-off-by:
Christian Heusel <christian@heusel.eu>
-
- Dec 15, 2024
-
-
Kristian Klausen authored
The project membership must also be extended, if not the user is simply deleted when the membership expires (defeating the purpose of extending the access tokens). Fixes: 639101e6 ("gitlab: Add ruby script for continuous extending of bot tokens")
-
Kristian Klausen authored
If the cost exceeds $0, it indicates that we have run out of credit and/or are doing something wrong, in either case we want to be alerted.
-
Kristian Klausen authored
With the support for network.wireguard.* credentials[1] in systemd v256[2], we can now easily avoid storing the credentials centrally in our ansible vault, which is preferable as it makes the private keys less exposed. It may also make fine-grained access easier in the future[3] as there is no longer a vault file for each server. All the keys have been rotated and the new private keys are only stored on the servers. [1] https://github.com/systemd/systemd/pull/30826 [2] https://github.com/systemd/systemd/releases/tag/v256 [3] #64
-
Kristian Klausen authored
There is no technical reason for this at the moment, but UEFI is the de facto firmware for x86-64, so let's be modern.
-
Kristian Klausen authored
This should not change anything as the VMs are short-lived (15 minutes at the most), so it is just added for good measure.
-
Kristian Klausen authored
From the kernel patch series[1]: "This series provides an asynchronous means of reporting free guest pages to a hypervisor so that the memory associated with those pages can be dropped and reused by other processes and/or guests on the host. Using this it is possible to avoid unnecessary I/O to disk and greatly improve performance in the case of memory overcommit on the host."[1] The runner hosts may be memory overcommitted if there is too many running VMs and containers at the same time, which this should help to avoid. [1] https://lore.kernel.org/linux-mm/20200211224416.29318.44077.stgit@localhost.localdomain/
-
Kristian Klausen authored
It makes more sense to build the image in arch-boxes than building it on each runner, especially considering that arch-boxes already have all the necessary infrastructure, so we can avoid maintaining similar code in two repositories and avoid running losetup, mount, arch-chroot etc. (as root) on the runners. The arch-boxes MR[1] has a little more context. [1] arch-boxes!200
-
Kristian Klausen authored
This reverts commit 466230e4. This has been fixed in pacman[1], so it is no longer unreasonably slow. Some quick testing at runner1 indicates that this only saves five seconds at best, so IMO it is not worth the complexity to continue doing this. This revert does not revert the timeout back to 60 seconds, but keeps it at 30 seconds. [1] pacman/pacman!16
-
Kristian Klausen authored
This means that there is no need to make runner-specific changes to the image, so in theory the image could be build centrally (e.g. in the arch-boxes project[1]) and then distributed to the runner hosts. This change also make the SSH keys ephemeral. [1] https://gitlab.archlinux.org/archlinux/arch-boxes
-
Kristian Klausen authored
All libvirt volume management is now handled through virsh instead of direct file system access. As a volume cannot be uploaded in an atomic way, the current active volume is now tracked in a file on disk. This may allow us to run the script with less privileges and use polkit for libvirt access control[1]. [1] https://libvirt.org/aclpolkit.html
-
Kristian Klausen authored
The prepare stage runs "echo "Running on $(hostname)...""[1], resulting in "bash: line 7: hostname: command not found" and it outputting "Running on ..." as the hostname command is provided by inetutils, which is not installed. Fix it by "monkey patching" it to use "hostnamectl hostname" and inject the hostname with SMBIOS[2][3]. Injecting creds with SMBIOS may also be useful in the future, e.g. for injecting an ephemeral SSH public key. [1] https://gitlab.com/gitlab-org/gitlab-runner/-/blob/v17.5.2/shells/bash.go?ref_type=tags#L452-L456 [2] https://systemd.io/CREDENTIALS/ [3] https://github.com/systemd/systemd/pull/30814
-
Kristian Klausen authored
This removes 13 instances of [1] and 1 instance of the IP address from the job log. The latter was fixed by no longer waiting for SSH in the "run" stage, which is unnecessary as we wait for SSH in the "prepare" stage. [1] Warning: Permanently added '192.168.122.xxx' (ED25519) to the list of known hosts.
-
Kristian Klausen authored
It was forgotten once[1] to update it in both places, so avoid that issue in the future, by moving it to a variable. [1] c370c9d0 ("gitlab_runner: Update concurreny math to reflect the new VM size")
-
Kristian Klausen authored
-