Skip to content

Half-precision floating-point broken in Tensile when compiled with rocm-llvm 6.2.0

Description:

Running ollama or llama.cpp on ROCm 6.2 results in the same CUBLAS_STATUS_INTERNAL_ERROR error:

rocBLAS error: Tensile solution found, but exception thrown for { a_type: "f16_r", b_type: "f16_r", c_type: "f16_r", d_type: "f16_r", compute_type: "f16_r", transA: 'T', transB: 'N', M: 128, N: 2, K: 32, alpha: 1, row_stride_a: 1, col_stride_a: 4096, row_stride_b: 1, col_stride_b: 32, row_stride_c: 1, col_stride_c: 128, row_stride_d: 1, col_stride_d: 128, beta: 0, batch_count: 40, strided_batch: true, stride_a: 524288, stride_b: 64, stride_c: 256, stride_d: 256, atomics_mode: atomics_allowed }
Alpha value 7.21875 doesn't match that set in problem: 1
ggml/src/ggml-cuda.cu:70: ROCm errorROCm error: CUBLAS_STATUS_INTERNAL_ERROR
  current device: 0, in function ggml_cuda_mul_mat_batched_cublas at ggml/src/ggml-cuda.cu:1839
  hipblasGemmStridedBatchedEx(ctx.cublas_handle(), HIPBLAS_OP_T, HIPBLAS_OP_N, ne01, ne11, ne10, alpha, (const char *) src0_f16, HIPBLAS_R_16F, nb01/nb00, nb02/nb00, (const char *) src1_f16, HIPBLAS_R_16F, nb11/nb10, nb12/nb10, beta, ( char *) dst_t, cu_data_type, ne01, nb2/nb0, ne12*ne13, cu_compute_type, HIPBLAS_GEMM_DEFAULT)

ptrace: Operation not permitted.
No stack.
The program is not being run.
Aborted (core dumped)

When running with AMD_LOG_LEVEL=4, llama.cpp causes multiple hipErrorNotFound errors, where ollama only causes an hipErrorNotReady error (see logs).

The error is independent of the model being run.

Tested on gfx1030 / RX 6900 XT.

Additional info:

package version(s)

extra/rocblas 6.2.2-1 extra/ollama-rocm 0.3.12-5

config and/or log files:

ollama_serve_amd_log_level_3_llama3.log

llama_cli_amd_log_level_4_llama2.log

link to upstream bug report, if any:

ollama-rocm#3 (closed)

https://github.com/ollama/ollama/issues/7564

https://github.com/ggerganov/llama.cpp/issues/10234

https://github.com/ollama/ollama/issues/6857 (Gentoo issue with same error)

Steps to reproduce:

Easiest way to reproduce is to run ollama from extra.

  1. pacman -S ollama
  2. ollama serve
  3. In a different shell: ollama run llama3.2
Edited by Lubosz Sarnecki
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information