README.md 8.23 KB
Newer Older
Sven-Hendrik Haase's avatar
Sven-Hendrik Haase committed
1
2
3
4
# Arch Infrastructure

This repository contains the complete collection of ansible playbooks and roles for the Arch Linux infrastructure.

Kristian Klausen's avatar
Kristian Klausen committed
5
6
7
## Table of contents
[[_TOC_]]

8
9
## Requirements

10
Install these packages:
11
  - terraform
12
  - python-click
13
  - python-jmespath
14
  - moreutils (for playbooks/tasks/reencrypt-vault-key.yml)
15

16
17
### Instructions

Sven-Hendrik Haase's avatar
Sven-Hendrik Haase committed
18
All systems are set up the same way. For the first time setup in the Hetzner rescue system,
19
run the provisioning script: `ansible-playbook playbooks/tasks/install-arch.yml -l $host`.
Sven-Hendrik Haase's avatar
Sven-Hendrik Haase committed
20
21
22
The provisioning script configures a sane basic systemd with sshd. By design, it is NOT idempotent.
After the provisioning script has run, it is safe to reboot.

23
Once in the new system, run the regular playbook: `HCLOUD_TOKEN=$(misc/get_key.py misc/vault_hetzner.yml hetzner_cloud_api_key) ansible-playbook playbooks/$hostname.yml`.
24
This playbook is the one regularity used for administrating the server and is entirely idempotent.
Sven-Hendrik Haase's avatar
Sven-Hendrik Haase committed
25

26
27
28
29
When adding a new machine you should also deploy our SSH known_hosts file and update the SSH hostkeys file in this git repo.
For this you can simply run the `playbooks/tasks/sync-ssh-hostkeys.yml` playbook and commit the changes it makes to this git repository.
It will also deploy any new SSH host keys to all our machines.

30
31
#### Note about GPG keys

Sven-Hendrik Haase's avatar
Sven-Hendrik Haase committed
32
33
The `root_access.yml` file contains the `root_gpgkeys` variable that determine the users that have access to the vault, as well as the borg backup keys.
All the keys should be on the local user gpg keyring and at **minimum** be locally signed with `--lsign-key`. This is necessary for running either the reencrypt-vault-key
Jelle van der Waa's avatar
Jelle van der Waa committed
34
or the fetch-borg-keys tasks.
35

36
37
38
39
40
41
42
43
#### Note about Ansible dynamic inventories

We use a dynamic inventory script in order to automatically get information for
all servers directly from hcloud. You don't really have to do anything to make
this work but you should keep in mind to NOT add hcloud servers to `hosts`!
They'll be available automatically.

#### Note about packer
Sven-Hendrik Haase's avatar
Sven-Hendrik Haase committed
44
45
46
47

We use packer to build snapshots on hcloud to use as server base images.
In order to use this, you need to install packer and then run

48
    packer build -var $(misc/get_key.py misc/vault_hetzner.yml hetzner_cloud_api_key --format env) packer/archlinux.json
Sven-Hendrik Haase's avatar
Sven-Hendrik Haase committed
49
50
51

This will take some time after which a new snapshot will have been created on the primary hcloud archlinux project.

52
#### Note about terraform
53

54
55
We use terraform in two ways:

56
57
1. To provision a part of the infrastructure on hcloud (and possibly other service providers in the future)
2. To declaratively configure applications
58
59
60

For both of these, we have set up a separate terraform script. The reason for that is that sadly terraform can't have
providers depend on other providers so we can't declaratively state that we want to configure software on a server which
61
62
itself needs to be provisioned first. Therefore, we use a two-stage process. Generally speaking, scenario 1. is configured in
`tf-stage1` and 2. is in `tf-stage2`. Maybe in the future, we can just have a single terraform script for everything
63
64
but for the time being, this is what we're stuck with.

65
66
The very first time you run terraform on your system, you'll have to init it:

67
    cd tf-stage1  # and also tf-stage2
68
    terraform init -backend-config="conn_str=postgres://terraform:$(../misc/get_key.py ../group_vars/all/vault_terraform.yml vault_terraform_db_password)@state.archlinux.org?sslmode=verify-full"
69

Sven-Hendrik Haase's avatar
Sven-Hendrik Haase committed
70
After making changes to the infrastructure in `tf-stage1/archlinux.tf`, run
71
72

    terraform plan
73
74
75
76

This will show you planned changes between the current infrastructure and the desired infrastructure.
You can then run

77
    terraform apply
78
79
80

to actually apply your changes.

81
82
83
The same applies to changed application configuration in which case you'd run
it inside of `tf-stage2` instead of `tf-stage1`.

84
85
86
87
88
89
We store terraform state on a special server that is the only hcloud server NOT
managed by terraform so that we do not run into a chicken-egg problem. The
state server is assumed to just exist so in an unlikely case where we have to
entirely redo this infrastructure, the state server would have to be manually
set up.

90
91
#### SMTP Configuration

Jelle van der Waa's avatar
Jelle van der Waa committed
92
All hosts should be relaying email through our primary mx host (currently 'mail.archlinux.org'). See [docs/email.md](./docs/email.md) for full details.
93

94
95
96
97
98
99
100
101
102
103
104
105
106
107
### Putting a service in maintenance mode

Most web services with a nginx configuration, can be put into a maintenance mode, by running the playbook with a maintenance variable:

    ansible-playbook -e maintenance=true playbooks/<playbook.yml>

This also works with a tag:

    ansible-playbook -t <tag> -e maintenance=true playbooks/<playbook.yml>

As long as you pass the maintenance variable to the playbook run, the web service will stay in maintenance mode. As soon as you stop
passing it on the command line and run the playbook again, the regular nginx configuration should resume and the service should accept
requests by the end of the run.

108
Passing maintenance=false, will also prevent the regular nginx configuration from resuming, but will not put the service into maintenance
109
110
111
112
113
mode.

Keep in mind that passing the maintenance variable to the whole playbook, without any tag, will make all the web services that have the
maintenance mode in them, to be put in maintenance mode. Use tags to affect only the services you want.

114
Documentation on how to add the maintenance mode to a web service is inside [docs/maintenance.md](./docs/maintenance.md).
115
116
117
118
119
120
121

### Finding servers requiring security updates

Arch-audit can be used to find servers in need of updates for security issues.

    ansible all -a "arch-audit -u"

122
123
124
125
#### Updating servers

The following steps should be used to update our managed servers:

126
127
128
129
  * pacman -Syu
  * sync
  * checkservices
  * reboot
130

131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
##### Semi-automated server updates (experimental)

For updating a lot of servers in a more unattended manner, the following
playbook can be used:

    ansible-playbook playbooks/tasks/upgrade-servers.yml [-l SUBSET]

It runs `pacman -Syu` on the targeted hosts in batches and then reboots them.
If any server fails to reboot successfully, the rolling update stops and
further batches are cancelled. To display the packages updated on each host,
you can pass the `--diff` option to ansible-playbook.

Using this update method, `.pacnew` files are left unmerged which is OK for
most configuration files that are managed by Ansible. However, care must be
taken with updates that require manual intervention (e.g. major PostgreSQL
releases).

Sven-Hendrik Haase's avatar
Sven-Hendrik Haase committed
148
149
## Servers

150
This section has been moved to [docs/servers.md](docs/servers.md).
151

152
153
154
155
## Ansible repo workflows

### Replace vault password and change vaulted passwords

156
157
158
159
  - Generate a new key and save it as ./new-vault-pw: `pwgen -s 64 1 > new-vault-pw`
  - `for i in $(ag ANSIBLE_VAULT -l); do ansible-vault rekey --new-vault-password-file new-vault-pw $i; done`
  - Change the key in misc/vault-password.gpg
  - `rm new-vault-pw`
160

161
162
### Re-encrypting the vault after adding or removing a new GPG key

163
  - Make sure you have all the GPG keys **at least** locally signed
164
  - Run the `playbooks/tasks/reencrypt-vault-key.yml` playbook and make sure it does not have **any** failed task
165
166
167
168
169
  - Test that the vault is working by running ansible-vault view on any encrypted vault file
  - Commit and push your changes

### Fetching the borg keys for local storage

170
  - Make sure you have all the GPG keys **at least** locally signed
171
  - Run the `playbooks/tasks/fetch-borg-keys.yml` playbook
172
  - Make sure the playbook runs successfully and check the keys under the borg-keys directory
173
174
175

## Backup documentation

176
177
We use BorgBackup for all of our backup needs. We have a primary backup storage as well as an
additional offsite backup.
178

179
See [docs/backups.md](./docs/backups.md) for detailed backup information.
180

181
182
183
184
## Updating Gitlab

Our Gitlab installation uses [Omnibus](https://docs.gitlab.com/omnibus/) to run Gitlab on Docker. Updating Gitlab is as simple as running the ansible gitlab playbook:

185
    ansible-playbook playbooks/gitlab.archlinux.org.yml --diff -t gitlab
186

187
188
To view the current Gitlab version visit [this url](https://gitlab.archlinux.org/help/)

189
190
191
192
## One-shots

A bunch of once-only admin task scripts can be found in `one-shots/`.
We try to minimize the amount of manual one-shot admin work we have to do but sometimes for some migrations it might be necessary to have such scripts.