Half-precision floating-point broken in Tensile when compiled with rocm-llvm 6.2.0
Description:
Running ollama
or llama.cpp
on ROCm 6.2 results in the same CUBLAS_STATUS_INTERNAL_ERROR
error:
rocBLAS error: Tensile solution found, but exception thrown for { a_type: "f16_r", b_type: "f16_r", c_type: "f16_r", d_type: "f16_r", compute_type: "f16_r", transA: 'T', transB: 'N', M: 128, N: 2, K: 32, alpha: 1, row_stride_a: 1, col_stride_a: 4096, row_stride_b: 1, col_stride_b: 32, row_stride_c: 1, col_stride_c: 128, row_stride_d: 1, col_stride_d: 128, beta: 0, batch_count: 40, strided_batch: true, stride_a: 524288, stride_b: 64, stride_c: 256, stride_d: 256, atomics_mode: atomics_allowed }
Alpha value 7.21875 doesn't match that set in problem: 1
ggml/src/ggml-cuda.cu:70: ROCm errorROCm error: CUBLAS_STATUS_INTERNAL_ERROR
current device: 0, in function ggml_cuda_mul_mat_batched_cublas at ggml/src/ggml-cuda.cu:1839
hipblasGemmStridedBatchedEx(ctx.cublas_handle(), HIPBLAS_OP_T, HIPBLAS_OP_N, ne01, ne11, ne10, alpha, (const char *) src0_f16, HIPBLAS_R_16F, nb01/nb00, nb02/nb00, (const char *) src1_f16, HIPBLAS_R_16F, nb11/nb10, nb12/nb10, beta, ( char *) dst_t, cu_data_type, ne01, nb2/nb0, ne12*ne13, cu_compute_type, HIPBLAS_GEMM_DEFAULT)
ptrace: Operation not permitted.
No stack.
The program is not being run.
Aborted (core dumped)
When running with AMD_LOG_LEVEL=4
, llama.cpp
causes multiple hipErrorNotFound
errors, where ollama
only causes an hipErrorNotReady
error (see logs).
The error is independent of the model being run.
Tested on gfx1030
/ RX 6900 XT
.
Additional info:
package version(s)
extra/rocblas 6.2.2-1
extra/ollama-rocm 0.3.12-5
config and/or log files:
ollama_serve_amd_log_level_3_llama3.log
llama_cli_amd_log_level_4_llama2.log
link to upstream bug report, if any:
https://github.com/ollama/ollama/issues/7564
https://github.com/ggerganov/llama.cpp/issues/10234
https://github.com/ollama/ollama/issues/6857 (Gentoo issue with same error)
Steps to reproduce:
Easiest way to reproduce is to run ollama
from extra.
pacman -S ollama
ollama serve
- In a different shell:
ollama run llama3.2