Cloud-Init Mangles 'locale.gen' File
SHORT DESCRIPTION
The official Arch Linux ISO currently uses cloud-init to configure the locale of the machine in cloud providers. Unfortunately, cloud-init's locale
module seems to mangle the /etc/locale.gen
file: it deletes every line in it, leaving only one single line - for the en_US.UTF-8 locale. This makes knowing which locales are available, and adding them, much more difficult - specially for setups where ALL locales are needed.
FULL DESCRIPTION
I setup Arch Linux VPSs for Web Developers. Because our Web Developers often deploy multi-lingual sites using a (very) wide variety of languages, by default we simply enable ALL available UTF-8 locales on every server we deploy. Our default Ansible script usually goes through something like this:
- launch new Arch Linux VPS
- open
/etc/locale.gen
and uncomment all lines for UTF-8 locales, then save the file - run
locale-gen
Never had an issue in 2 years of running the script in Linode. Recently, however, we have started testing out Vulr, who apparently uses the official Arch Linux ISO as the base for its image. This image has cloud-init installed and pre-configured. During first boot, cloud-init uses its locale
module to set the virtual machine's locale to en_US
. That wouldn't be an issue, except that when it does that, it seem that it DELETES - rather than comment out - all locales that are not needed. This means, that we end up with a locale.gen
file that has a single line in it - for 'en_US'.
Again, this may not be a big issue for a user that only wants to have one - or a few - locales ever enabled in the machine. It should be easy enough to type those in by hand. But in our case, it now becomes a chore to try and "recreate" the original 'locale.gen' file, listing all the hundreds of currently available locales.
WORKAROUNDS
The 'locale.gen' file seems to be generated during installation of glibc. If we try to reinstall glibc, and the file is detected as already being there, it is skipped. So we need to first delete the mangled locale.gen
file, then re-install glibc, to get a 'fresh' copy of 'locale.gen'.
That, however, is a 'hack' rather than a solution, and it does not always provide reliable results. We have tried this approach on 3 different VPSs in Vultr, and in 2 of them for some unknown reason the locale.gen
file became corrupted after a couple of reboots. Perhaps cloud-init, or some other tool, is still trying to auto-manage it, and is messing with the manually re-installed version - and locales. On those 2 machines, when the 'locale.gen' file became corrupted, the machines also reverted to having a single locale available (en_US.UTF-8) - all the other hundreds of locales that had been previously generated seemed to have been "forgotten"...
POSSIBLE SOLUTION
Report Upstream
It might be worthwhile reporting this issue upstream, to the team that handles cloud-init's locale
module. It is possible that they might not have come across this issue before, as different distros - specially Ubuntu - handle locale management very differently to Arch.
Work around the 'locale' module
It might also be worthwhile to consider stop using cloud-init's locale
module in the initial configuration of Arch Linux' official ISO, until it stops mangling locale.gen
. If that is not possible, then it may be worthwhile to provide a backup of the original locale.gen
file - perhaps by using cloud-init's 'bootcmd' module - before 'locale' mangles it, so it makes it easier for us who need to have the full contents of the original 'locale.gen' file. Something like this may work - I am not that familiar with cloud-init:
#cloud-config
bootcmd:
- [cloud-init-per, once, cp, /etc/locale.gen, /etc/locale.gen.bak ]
# then, later on in the config:
locale: en_US