Provide a x86_64_v3 microarchitecture level port

How long has Intel been supporting these architectures?

EDIT: 2 decades. Got it. I was confused when the a-d-p email only mentioned AMD :)

The flags are roughly equivalent of Nehalem which was released in November 2008 (12 years ago).

approved this merge request

added 1 commit

3034941f - Use x86-64-v2 microarchitecture level

Compare with previous version

resolved all threads

I wish we could have AVX for the larger registers but the next step up (v3) seems too new.

Do we have any link that summarizes which processors support this? Would be nice to have an easy to view list. Other than that, I'm fine with this.

Hi, I find that this RFC fails to provide the adequate context on this subject, which I was expecting given this subject was already discussed on a-d-p.

There are a few points I want to clarify.

These instruction are usually very workload specific, while GCC can, and does, use them to optimize code, the performance gains are usually not that big. There are several reasons for this, but I am not going to get into them here. The applications that do benefit from such instructions will usually write custom kernels, and most of them will be included in 2).
These instructions are already widely used in our repos. The, by far, most common way to use them is by having multiple instruction branches using different instructions and selecting what to use at runtime. This is what glibc, blender, gnuradio, etc. do. We even have a package, libvolk, that provides reusable handwritten SIMD kernels for vector operations that does this (it is used gnuradio for eg.).

Implementing opt-in at runtime is fairly easy given GCC's target_clones function attribute. See https://lwn.net/Articles/691932/.

TLDR: Yes, these ISA extensions provide better performance in certain workloads, but most of the software that can take advantage from them already does, by opting-in at runtime, thus this would likely not result in any significant performance gain.

Assuming SSE4 (and others) while compiling will provide greater out-of-the-box performance in Arch Linux.

Yes, they would probably provide better performance, but there's a big asterisk there... As I explained above. I am very skeptical that the performance boost will actually be significant for something other than maybe a handful of packages.

For this reason, I would really like to see benchmarks on this.

Alternatives Considered

There was a big discussion on a-d-p, from which proposals were discarded and are not represented here.

This is not something that will likely affect software at wild, but rather only specific software. I feel like a better option is to come up with a solution to let maintainers provide alternative builds of certain packages for CPUs that support selected ISA extensions. That would also work for AVX, AVX2, AVX512, etc. which I feel like would be a much better solution.

We actually already do that, we provide alternative packages. AFAIK only have 5 (base) packages:

tensorflow
- tensorflow-opt
- tensorflow-opt-cuda
python-tensorflow
- python-tensorflow-opt
- python-tensorflow-opt-cuda
python-pytorch
- python-pytorch-opt
- python-pytorch-opt-cuda
liquid-dsp
- liquid-dsp-sse4.1
srslte
- srslte-avx2

From which, only 1 would be affected by this, liquid-dsp-sse4.1, all the other use ISA extensions that would continue to be unsupported and we will continue to provide them in this sub-optimal format.

I am not too worried about desktop systems, but please don't forget that many users run arch on servers and oher applications that are prone to have outdated hardware. I know a few users that will be affected by this, some personally. I have also received a few SIGILL bugreports in the past, so I would be careful here.

I think SSE2 is okay, ugh, maybe SSE3. But SSE4.1 and SSE4.2 are too recent IMO, and given the low impact it would likely have on the bulk of the software, I find it very very hard to justify.

What I would do here, given the interest, is support SSE2 on all packages and like I said above, come up with a way to let maintainers provide alternative packages. I think non-SSE2 hardware is reasonably old to be dropped, and the newer hardware, all the newer hardware, would have their optimized software for packages it actually makes sense.

I do not think this course of action, raising the baseline for everything, is optimal. It fails users with older hardware, by kicking them out of our distribution, and it fails users with newer hardware, by not providing support for ISA extensions that would make a difference for them.

As it stands, this proposal aims to essentially drop support for older CPUs in order to provide an unspecified increase in performance on an unspecified part of our packages, while doing nothing to actually solve our ISA extension problem.

I do not believe this is the correct path forward, at least as it is proposed, but of course this is only my opinion.

I hope my comments here make the other team members aware of the actual impact of this RFC and the current state of ISA extensions in arch.

Thanks.

I asked for some feedback on twitter and got the following.

There was some talk about this probably impacting low-cost KVM hosting. Including this reply:

Two of four x86_64 computers I still have are affected; Harpertown and Wolfdale Xeons.

All AMD machines before Bulldozer are affected, which make up a significant portion of the low-end host market.

I also got this, which solidifies my position on SSE2 and makes me more comfortable with SSE3.

if arch isn't supporting x86-32 anymore, then all amd64 CPUs have have SSE2 and the only one without is the original athlon 64. prescott P4 is effectively the baseline then.

(OK, there is one exception, and it's the bizarre Knight's Corner cores - they even lack SSE2 though...)

This would kill my desktop as well. I agree with @ffy00 that SSE4.1 and SSE4.2 are too recent.

I think the Motivation section could be expanded to include why this change is being made and what the benefits are. I only see a description of what -v2 is, and who defined it and what libs support it. But not why this change is even considered or what the benefits are.

Is it implied that it will optimize the general product? :)

(Also got a bit worried because I'm running gcc 10 so this change feels uber fresh, is it?)

Edit: Older servers that people just keep running might be affected, and those can be quite an investment for some people if they use it as private servers and saved up for them over time :)

Not a developer, just throwing in my 2 cents as I was curious about the different levels.

If anyone's interested in finding out exactly what uarch level their CPU supports, there are a few bash scripts on this StackExchange question that will do the appropriate wizardry.

Intel's Ark website doesn't explicitly state supported instructions, but thankfully Wikipedia has the supported instructions on $PROCESSOR_FAMILY pages (e.g. Ivy Bridge).

I think adding all optimized architectures (-v2, -v3, -v4) on top of our current baseline would be the correct way forward. It would allow users to choose the ideal optimization level and it would allow me to stop doing my hack with -opt suffix packages.

Since this isn't in the scope of this RFC I'll vote No for the time being.