Fix 16bit floats in Tensile.
Backport rocBLAS patch that fixes build flags on Tensile, so that half-precision floating-point operations do not result in garbage data.
The backported patch removes the -mf16c
build flag from several
targets, including Tensile. This is apparently required for rocm-llvm
versions higher than rocm-6.1.0.
This fixes running several Llama implementations like ollama and llama.cpp.
See #2 (closed)
I have tested this PKGBUILD
only with gfx1030
enabled in the build targets to have quicker results, but can confirm that it resolves the issue in llama.cpp
as well as ollama
for me. The ollama
package does not need any rebuild. Also building the rocblas source at 6.2.2 with the backported patch resulted in a working rocblas-bench
execution, which was my test during bisection.