Skip to content
Snippets Groups Projects

Add RFC for upstream package source handling

Open David Runge requested to merge dvzrv/rfcs:rfc/upstream-package-sources into master
1 file
+ 307
0
Compare changes
  • Side-by-side
  • Inline
+ 307
0
---
title: 0046 Upstream Package Sources
---
# Upstream package sources
- Date proposed: 2024-11-14
- RFC MR: https://gitlab.archlinux.org/archlinux/rfcs/-/merge_requests/46
## Summary
Improve the security of Arch Linux distribution packages by relying on transparent and, if possible, cryptographically verifiable upstream sources by default.
Provide guidelines and best practices for distribution package maintainers in a document covering various source types and technologies for digital signatures.
Communicate the common goal of transparent and secure package delivery for package maintainers as well as upstream project maintainers.
## Motivation
Arch Linux currently does not have a clear guideline on the best practices of handling upstream sources.
Other distributions, such as Debian and Fedora, provide specific guidelines around some of the technicalities of source tracking and source verification (see e.g. [Upstream source location: `debian/watch`] and [Source file verification]).
However, those generally appear to not make a distinction between the grade of transparency of different types of sources, or provide explicit mention of which type to prefer.
Trust path handling when verifying sources using cryptographic signatures is not specifically considered either.
As shown in [ASA-202403-1], *custom* upstream source tarballs can be used as an additional location to obfuscate malicious code.
This RFC provides a definition of package source handling for Arch Linux, by establishing the use of more [transparent] sources.
It is created to serve as both a definition of our expectation towards upstreams and our package maintainers and also to help others to derive more specific recommendations and processes around the handling of upstream sources.
## Specification
The upstream sources to build system packages from are the first step of a trust relationship from upstream project to end-user.
The following sections outline the various concepts revolving around source transparency, verification and trust.
### Transparency
According to the ["Triangle of Secure Code Delivery"], transparency is considered one of the pillars of secure package delivery.
For any distribution this feature extends upon the source code from which those packages are built.
Transparent sources allow for easy audit of individual changes via version control systems.
More specifically they also allow to relate (auto-generated) release artifacts to the version controlled files they originate from.
Intransparent sources on the other hand e.g. do not offer access to their version control systems, only add larger code changes as one change into public facing version control systems and/or provide custom source tarballs (sometimes containing artifacts from unknown sources).
These types of sources make it very hard or even impossible for downstreams to audit changes, apply patches or reliably build a project.
Services such as [whatsrc.org] offer an overview over what sources the tracked distributions use for packaging purposes.
While the various chosen sources may not always represent the most transparent available ones, the service provides value in that it allows to evaluate the differences between the chosen upstream sources.
### Checksum verification
Releases of upstream sources, as well as individual VCS objects can be locked using a checksum.
By comparing the locked checksums of upstream sources whenever they are reused, one can ensure that they have not been altered in the meantime.
When newly downloaded sources do not match the locked checksums any longer, this could be an indicator of a [supply chain attack], but may also have different reasons:
* Recreation of a tag
* Use of a `.git_archival.txt` setup that leads to unreproducible sources (as promoted by [setuptools_scm] for a long time)
It is the package maintainer's responsibility to contact upstream whenever such an event occurs and to gather data on what exactly happened.
Only afterwards should the locked checksum be updated in a commit which clearly states the results of the investigation.
### Digital signatures
When source artifacts (e.g. source tarballs or VCS objects) are digitally signed, this allows for the authentication of an identity tied to the signer.
Here, the signer does not necessarily correlate with the creator of the source artifact.
Digital signatures offer an additional integrity check, which can also be used to verify, that a source artifact has not been tampered with.
The signer's certificate usually implies a digital identity, which may relate to a person.
However, also systems without a (meaningful) identity system are available, which can be used by many users (an example of this is GitHub's automated OpenPGP signing).
### Reproducibility
The [reproducible builds] project focuses on verifying, that *"given the same source code, build environment and build instructions, any party can recreate bit-by-bit identical copies of all specified artifacts"*.
This effort *does not* concern itself with the evaluation of whether a source can be considered benign, but rather with creating *"an independently-verifiable path from source to binary code"*.
By extension, reproducibility is always a desired outcome for a distribution, but it can not guarantee that binary code is not malicious.
### Source types
In the following sections the various types of sources are explained and weighed against one another in the context of verification, transparency, authentication and trust.
#### VCS objects
Version control systems ([VCS]) offer access to the state of a source repository via various objects (e.g. a git repository may be accessed using branches, specific commits or tags).
VCS objects can be referred to explicitly and can be made to never change (e.g. protected branches or tags on a source forge).
They are also very transparent and sources can be checksummed by builtin methods such as [git-archive].
They may however be a bit less convenient to work with for package maintainers.
For instance, in cases where a package is pinning a specific commit hash instead of a tag name, it is required to generate the `pkgver` using a custom `pkgver()` function.
The use of git submodules [requires additional `sources` entries and a `prepare()` function] to update them.
Some git repositories such as that of the Linux kernel may be very large and take very long to check out.
#### Auto-generated source tarballs
Assuming that the platform they are generated on is trusted, auto-generated source tarballs are very transparent, can easily be checksummed, are immutable (unless forge or VCS system alter their way of generating the tarball, e.g. [changes to git-archive] leading to [GitHub changing their way of generating tarballs]) and straight forward to work with for package maintainers.
However, these tarballs are generated by a remote system (once or on-demand) that also has to be trusted to some degree and [git-archive] (which is used to create such auto-generated tarballs) does not include [git submodules].
#### Custom source tarballs
Custom source tarballs may provide complete sources in the case where [git submodules] are used and can provide artifacts that can not be provided otherwise (e.g. due to legal reasons).
On the other hand, custom source tarballs are created on unknown systems, potentially by unknown users and are way less transparent or easily auditable as the content cannot be guaranteed to only contain artifacts from the version controlled repository.
They also incentivize adding custom data, e.g. model data, or prebuilt autotools configure scripts.
#### Patches
Patches used in package sources generally follow the same rules as any other sources.
Using pull / merge requests as a patch (e.g. `fix.patch::https://patch-diff.githubusercontent.com/raw/archlinux/contrib/pull/82.patch`) is prone to have side-effects, especially if the changes have not been merged yet: If new commits are added to the said pull / merge request, this changes the checksum of the applied patch.
Apart from the reproducibility issue that this causes, it is also non-transparent by nature as the contents of the patch cannot be guaranteed.
It it is therefore discouraged to use pull/merge requests as patches.
Remote patches referencing specific VCS objects (e.g. `fix.patch::https://github.com/archlinux/contrib/commit/1195588debc264d1baf76753b33ee09abff9ef08.patch`) may be used, but must rely on fully expanded identifiers (e.g. git commit hashes) and must be verified using a checksum.
However, to prevent issues with remote content, it is advisable to rely on local files instead by adding the patches in question to the package source repository.
This guarantees full control over their contents and avoids potentially unexpected changes.
### Trust path
When upstream projects sign release artifacts or VCS objects (e.g. git tags), they make a statement about their identity in relation to the project.
A trust path for signed package sources exists between releases, if the authors of the respective signature can be cryptographically verified against the signature authors of previous releases.
Following a correct order matters:
If a project's releases are issued and signed by person `A`, then person *A* is the one to establish the trust path towards any new person, e.g. person `B`. This principle also applies to different cryptographic key material of the same person `A` when transitioning from key material `A1` to a new key `A2`.
This effectively enlarges the set of trusted people and going forward also person `B` is able to extend or diminish the set (e.g. by adding person `C` or removing person `A`) and so forth.
Guaranteeing that users of an upstream project can easily verify the *intentional* relationship between members of said project allows for a trust path between upstream project and user.
A trust path between project and user can not be used to verify that sources provided by an authenticated upstream are not malicious.
However, it provides authentication for identities related to a project and allows to verify whether a specific identity is supposed to have provided and signed sources.
A trust path between releases can only be established and maintained by the upstream project.
Depending on used technology, this may require different approaches.
The trust path between upstream projects and Arch Linux has historically been maintained via pinning of OpenPGP fingerprints in PKGBUILDs (using the `validpgpkeys` array).
If the OpenPGP fingerprint of the issuer of the source artifact signature changes between releases, the package maintainer is expected to establish that a trust path between new and old release exist and otherwise contact upstream to establish one (more on the topic in the [subsection on OpenPGP]).
As this approach is not technology agnostic a dedicated `verify()` function can be used for verification using other technologies (e.g. Signify, SSH, or Sigstore) since pacman 6.1.
#### OpenPGP
[OpenPGP] offers multiple ways of maintaining a trust path between certificates.
The builtin mechanism is [authentication and delegation in third-party signatures], which allows to cryptographically verify a statement about a certificate issued by another.
However, it is also possible to authenticate a certificate by other means, such as signed additions to a file in a version controlled repository.
All mechanisms require that upstreams publicly publish up-to-date OpenPGP certificates, so that downstreams can make use of them.
Ideally, certificates are published on [OpenPGP keyservers] that expose [identity certifications], but they may also be provided elsewhere (e.g. in a version control repository, on a website, or in the context of a user account on a source forge).
##### PGPKI
When relying on the builtin mechanisms that OpenPGP has to offer, upstreams can establish trust between several certificates by cross-signing [User IDs].
Downstreams are then in turn able to authenticate a given certificate by verifying the third-party signatures issued for them.
Attempts at automating this authentication process in the context of version control have been made e.g. by the [sequoia-git] project.
##### File based authentication
Upstreams may maintain a plaintext file in their respective source repository, which contains [OpenPGP fingerprints] or ASCII-armored versions of OpenPGP certificates, that are allowed to sign releases.
Historically, files such as `MAINTAINERS` or the central `README` have been used for this purpose in some projects.
To allow downstreams to establish a trust path between any mentioned OpenPGP fingerprints or OpenPGP certificates, it is *essential* that additions to such a file are made in *signed commits* using a previously authenticated OpenPGP certificate.
#### Signify
The signify project offers a simple asymmetric signing solution.
Issuers of signatures publish their own public key alongside their signed source artifacts.
Signatures are created over a strong checksum (e.g. SHA-256) and both checksum and hashed raw signature are part of the resulting signature file.
Signify does not concern itself with authentication of identities.
As such the verification of a trust path can only be established via signatures on artifacts, that convey the semantics of a key change (e.g. a signed message stating that someone is now using another public key).
#### Minisign
The [minisign][minisign upstream] project offers a simple PKI based tool based on [ed25519], that allows creating and verifying signatures for messages.
Minisign does not concern itself with authentication of identities.
As such the verification of a trust path can only be established via signatures on artifacts, that convey the semantics of a key change (e.g. a signed message stating that someone is now using another public key).
#### SSH
Using [git-config], a `gpg.ssh.allowedSignersFile` (e.g. [.git_allowed_signers]) can be set, which defines the SSH public keys considered as trusted for signing commits and tags, while all SSH public keys found in a file specified by `gpg.ssh.revocationFile` are considered not trusted.
The trust path between the various upstream SSH signing keys has to be evaluated manually to ensure, that the a given signature for a release is trustworthy.
#### Sigstore
The sigstore project provides infrastructure and tooling to use ephemeral signing keys for signing artifacts.
The on-demand key creation is tied to a specific identity which is authenticated by an identity provider.
Signatures describe their own context and are logged in a [transparency log].
Although offline proofs may be used for signature verification, they appear to be less common than the authentication based on large-scale identity providers.
The latter may render verification of a trust path less relevant or impossible in the traditional sense (see [attestation] for a discussion on future work).
### Trust as function of transparency
Arch Linux as distributor of upstream projects effectively functions as a trust anchor for its users.
With the help of archlinux-keyring the distribution offers [OpenPGP delegations] for end-users to individual package maintainers.
By cryptographically signing packages, Arch Linux indicates that what is being distributed has been created by one of its package maintainers.
Metadata in each package encodes in what context, with which specific sources, other packages and what build script a given package has been built.
Whether this encoded data holds true is tested regularly using reproducible builds and represents a large part of Arch Linux's transparency promise to its users.
A package's transparency towards the user does not start at build time though.
It is in each package maintainer's responsibility to choose the most transparent sources for a given upstream project.
If the upstream source offers a [digital signature] for the source the package maintainer is expected to evaluate whether a [trust path] between releases exists.
Only if such a trust path can be established and is maintained in a conscientious manner by upstream, source verification using digital signatures should be used.
Here it is essential to evaluate whether the signing is done by actual members of the project using their specific key material, or whether an unsafe system (e.g. unguarded key in CI, automatic signatures using GitHub's OpenPGP key) is being used.
There are a few scenarios in which an upgrade or rebuild of a package must not be done and upstreams must be contacted for clarification by the package maintainer, as they break reproducibility and/or an established trust path:
* If a trust path between releases of an upstream project has been established and a new source release is created that can not be verified using it
* The removal or recreation of a project release (i.e. package sources for a release have changed)
In all of the above cases, the affected package is not to be updated until the reason for the failing verification could be identified.
As a distribution and fundamentally acting as trust anchor for all users of the distribution, Arch Linux is under the obligation to use the most transparent and trustworthy sources available, while ensuring that problems are communicated towards upstreams in a timely manner.
Here however, Arch Linux has to rely heavily on the cooperation and process of upstreams to help and arrive at more transparent and reproducible sources.
### Attestation
Currently, the validation of a trust path between releases is mostly a manual or not well defined process.
While some technologies allow some form of out-of-band verification (e.g. [OpenPGP]), others enable workflows around signed artifacts conveying a change in signer identity (e.g. [signify], [minisign], [ssh]).
New approaches, such as [sigstore] may in part or entirely rely on out-of-band infrastructure and require more dedicated tooling for validation.
As such, the automatic, generic and unified validation of trust paths is out of scope for this RFC.
Future work should evaluate the feasibility of integrating tooling such as [in-toto], to more generically allow the verification of artifacts.
This would eventually allow to gatekeep the release of software packages for some upstreams based on thresholds and more clearly encode the semantics for Arch Linux's supply chain security.
### Conclusion
Custom made tarballs are intransparent by nature which degrades their trustworthiness.
Using more transparent sources for our packages, such as auto-generated tarballs or VCS objects, e.g. when [git submodules] are used, allows for better auditability and higher trustworthiness.
As such, package maintainers are advised to strive for using the most transparent sources possible.
When adding new packages this implies, that if no sufficiently transparent sources are available, these should be requested from upstreams.
If cryptographic signatures for those sources are available and have a trust path guarantee, they should be used.
In cases where the trust path for cryptographic signatures is not clearly communicated by upstreams, the trust path guarantee must be clarified before making use of cryptographic signatures for upstream releases.
Revising package sources for existing packages is valuable, too.
If a package relies on non-transparent sources (e.g. custom made tarball), package maintainers are advised to evaluate switching to a more transparent source (e.g. VCS object or auto-generated tarball).
If a source is digitally signed, package maintainers are advised to verify that a trust path is in fact established and is maintained going forward.
Otherwise, upstreams should be contacted about this.
In an ideal world, upstreams would provide both transparent and cryptographically signed sources.
If a choice has to be made between less transparent, but cryptographically signed sources or more transparent but cryptographically unsigned sources, the general recommendation is for package maintainers to use the latter.
Going forward, the above advise should be documented in more detail (e.g. including copy/paste ready upstream recommendations) in distribution documentation as outlined in [RFC0021].
In the future this topic should be revisited to adapt our expectations and guidelines to emerging changes in technology.
## Drawbacks
With some upstreams (e.g. those relying on autotools setups, or git submodules), working with transparent sources (i.e. non-custom source artifacts, or git submodules) may be less convenient for the package maintainer as they require additional steps or preparation, which increases the complexity of the build process.
In some cases setups may be very complex (either technically speaking or because of a lack of information from upstream's side) and require clarification from upstreams.
In the case of trying to establish a trust path, the communication with upstream about Arch Linux's expectation and upstream's handling of digital signatures may require time and effort.
For some projects such as the Linux kernel it may not be feasible to opt for transparent sources (i.e. building from git), as the overhead per build is too immense (requiring several gigabytes of cloned data, or increasing the build time unreasonably).
## Alternatives Considered
In cases where upstreams are not (nor don't want to be) compliant with the points exposed in this RFC, an alternative could be to clone or copy upstream sources on our side and base the related packages on those cloned sources (like Debian does), which would offer more flexibility and control in source handling.
This is not desirable as it implies custom source handling on Arch Linux's side, which requires manual process in packaging and infrastructure.
[ASA-202403-1]: https://security.archlinux.org/ASA-202403-1
[whatsrc.org]: https://whatsrc.org/
[supply chain attack]: https://en.wikipedia.org/wiki/Supply_chain_attack
[setuptools_scm]: https://github.com/pypa/setuptools_scm/issues/806
[reproducible builds]: https://reproducible-builds.org/
[VCS]: https://en.wikipedia.org/wiki/Version_control
[git-archive]: https://man.archlinux.org/man/git-archive.1
[changes to git-archive]: https://github.com/git/git/commit/4f4be00d302bc52d0d9d5a3d4738bb525066c710
[GitHub changing their way of generating tarballs]: https://github.blog/changelog/2023-01-30-git-archive-checksums-may-change/
[subsection on OpenPGP]: #OpenPGP
[OpenPGP]:https://www.rfc-editor.org/rfc/rfc9580
[transparency log]: https://transparency.dev/
[OpenPGP delegations]: https://openpgp.dev/book/glossary.html#term-Delegation
[git submodules]: https://man.archlinux.org/man/git-submodule.1
[.git_allowed_signers]: https://github.com/openssh/openssh-portable/blob/master/.git_allowed_signers
[git-config]: https://man.archlinux.org/man/git-config.1
[transparent]: #transparency
["Triangle of Secure Code Delivery"]: https://defuse.ca/triangle-of-secure-code-delivery.htm
[digital signature]: #digital-signatures
[trust path]: #trust-path
[authentication and delegation in third-party signatures]: https://openpgp.dev/book/signing_components.html#authentication-and-delegation-in-third-party-signatures
[User IDs]: https://openpgp.dev/book/certificates.html#user-ids-in-openpgp-certificates
[OpenPGP keyservers]: https://wiki.archlinux.org/title/OpenPGP#Keyserver
[identity certifications]: https://openpgp.dev/book/certificates.html#third-party-identity-certifications
[OpenPGP fingerprints]: https://openpgp.dev/book/glossary.html#term-OpenPGP-Fingerprint
[sequoia-git]: https://gitlab.com/sequoia-pgp/sequoia-git
[requires additional `sources` entries and a `prepare()` function]: https://wiki.archlinux.org/title/VCS_package_guidelines#Git_submodules
[RFC0021]: https://rfc.archlinux.page/0021-create-a-distro-developer-manual/
[ed25519]: https://ed25519.cr.yp.to/
[minisign upstream]: https://jedisct1.github.io/minisign/
[attestation]: #attestation
[OpenPGP]: #openpgp
[signify]: #signify
[minisign]: #minisign
[ssh]: #ssh
[sigstore]: #sigstore
[in-toto]: https://in-toto.io/
[Source file verification]: https://docs.fedoraproject.org/en-US/packaging-guidelines/#_source_file_verification
[Upstream source location: `debian/watch`]: https://www.debian.org/doc/debian-policy/ch-source.html#upstream-source-location-debian-watch
Loading