Skip to content
Snippets Groups Projects

Provide a x86_64_v3 microarchitecture level port

Merged Allan McRae requested to merge (removed):optimisation into master
+ 150
0
================================================
Provide a x86-64-v3 microarchitecture level port
================================================
- Date proposed: 2020-03-02
- RFC MR: https://gitlab.archlinux.org/archlinux/rfcs/-/merge_requests/0002
Summary
-------
Provide a second Arch Linux port using -march=x86-64-v3 in the build flags.
Motivation
----------
Arch used to pride itself in providing optimised binaries out of the box.
However, the days where our i686 showed improvements over other
distributions are long behind us.
Recently, AMD, Intel, Red Hat, and SUSE collaborated to define three
x86-64 microarchitecture levels on top of the x86-64 baseline. The three
microarchitectures group together CPU features roughly based on hardware
release dates.
The first of these microarchitecture levels, x86-64-v2, assumes the
following on top of base level x86_64 instructions:
``CMPXCHG16B, LAHF-SAHF, POPCNT, SSE3, SSE4.1, SSE4.2, SSSE3``.
This basically raises the processor feature level requirement to around
Intel Nehalem, and supports any x86_64 processor made in the last decade.
The x86-64-v3 microarchitecture requires the following instruction sets:
``AVX, AVX2, BMI1, BMI2, F16C, FMA, LZCNT, MOVBE, XSAVE``.
That is close to a Haswell processor, but does exclude some recent low
end Intel CPU that removed AVX support.
Finally, x86-64-v4 requires:
``AVX512F, AVX512BW, AVX512CD, AVX512DQ, AVX512VL``
These microarchitecture became available in GCC version 11 (unreleased)
and LLVM version 12 (unreleased), and are supported in glibc-2.33 and
binutils-2.36.
You can see what architecture is support by your CPU by running:
``/lib/ld-linux-x86-64.so.2 --help``
::
Subdirectories of glibc-hwcaps directories, in priority order:
x86-64-v4
x86-64-v3 (supported, searched)
x86-64-v2 (supported, searched)
RHEL9 will use use x86-64-v2 as its baseline.
This RFC is proposing adding an x86_64_v3 port in Arch Linux. Assuming
SSE4 and AVX2 (and others) while compiling will provide greater
out-of-the-box performance in Arch Linux. There are also implications
in terms of battery life for laptop users.
Benchmarks
----------
It is difficult to benchmark an entire system, and workloads that benefit
most often have CPU detection inbuilt and use optimised paths. Also, the
relevant GCC and LLVM releases are not yet available. To make some tests
equivalent to x86-64-v3 using current GCC and LLVM, we can compile
packages using:
::
CFLAGS="$CFLAGS -mcx16 -msahf -mpopcnt -msse3 -msse4.1 -msse4.2 -mssse3 \
-mavx -mavx2 -mbmi -mbmi2 -mf16c -mfma -mlzcnt -mmovbe -mxsave"
CXXFLAGS="$CFLAGS"
Some benchmarks performed rebuilding packages with and without the above
CFLAGS additions against repositories from 2021-03-12:
**firefox-86.0.1-1**: benchmarking on Basemark Web 3.0
(https://web.basemark.com/) seven times (alternativing installs) gave a
median score of 514.68 for v1 and 565.42 for v3, representing a 9.9%
improvement. Note, this was rebuilding only firefox itself, and none of
its dependencies, thus representing a lower bound.
**openssl-1.1.1.j-1**: benchmarking using ``openssl speed rsa`` showed
improvements in the range of 3.4% to 5.1% for signing and verifying with
keys of different sizes.
Benchmarks posted on the arch-general mailing list [1] show a median
performance benefit of *-march=haswell* (roughly x86_64-v3) of around 10%.
[1] https://lists.archlinux.org/pipermail/arch-general/2021-March/048739.html
Specification
-------------
We will provide a second port where the distributed makepkg.conf
includes the following:
::
CARCH="x86_64_v3"
CHOST="x86_64-pc-linux-gnu"
CFLAGS="-march=x86-64-v3 -mtune=generic ...
CXXFLAGS="$CFLAGS"
Alternatives Considered
-----------------------
Moving the baseline to x86-64-v2 was discussed, but the gains were not
considered enough to justify removal of support for hardware without
SSE4.2.
Providing all four architectures would require a lot of resources in
terms of build time, and mirror space. Providing x86-64 and x86-64-v3
only is a trade-off in gaining support for more optimised binary support
for new hardware (while not requiring the absolute latest) and additional
build time and storage associated with providing multiple architectures.
Drawbacks
---------
Providing a second architecture would increase our repo size by
approximately 66% (~32GB).
Building two architectures will take additional packager time unless
automated.
Some developers may not have hardware to debug issues found purely in
x86-64-v3 packages. It is likely these issues are very rare.
Unresolved Questions
--------------------
When "Architecture = auto" is set in pacman.conf, pacman will use
uname to detect the architecture. As this "port" is more of an
optimised rebuild rather than a architecture change, uname will report
x86_64. We could patch pacman to use x86_64_v3 instead, but that may
not be the correct solution.
It would be preferable if pacman on x86_64_v3 could still install
packages from x86_64, particularly for non-Arch repositories that
may not want to build for both architectures. This would also allow a
transition into x86_64_v3 when firstly [core] gets rebuilt, followed
by other repos one at a time. Your friendly pacman developers may be
willing to add the ability to specify multiple architectures in
pacman.conf.
Loading