arch linux package missing cuda v11 runners
Description:
After a recent update attempting run any previously functional models now leads to errors, ex;
ollama run codellama "Hello"
#> Error: POST predict: Post "http://127.0.0.1:41711/completion": EOF
Recommendation from upstream:
This may be a build issue. The cuda variant of your device is v11 but there's no v11 runner in your package. Looking at the ollama-cuda file list there's only a cuda_v12 runner. I suggest filing a ticket with the Arch ollama-cuda maintainers.
Additional info:
- Initially installed version (worked)
ollama-cuda-0.3.10-1-x86_64
- Broken on or before
0.5.1-2
- Installation method:
sudo packman -S ollama-cuda
- Linux flavor: Arch (I use Arch BTW™)
- GitHub --
ollama/ollama
-- Issue8089
Logs (snippet)
sudo journalctl -u ollama --no-pager | grep -C3 -i -- error | sed "s/$HOSTNAME/<HOST>/g"
Dec 13 08:41:19 <HOST> ollama[6310]: time=2024-12-13T08:41:19.590-08:00 level=INFO source=server.go:594 msg="llama runner started in 1.01 seconds"
Dec 13 08:41:19 <HOST> ollama[6310]: [GIN] 2024/12/13 - 08:41:19 | 200 | 1.129576145s | 127.0.0.1 | POST "/api/generate"
Dec 13 08:41:23 <HOST> ollama[6310]: ggml_cuda_compute_forward: ADD failed
Dec 13 08:41:23 <HOST> ollama[6310]: CUDA error: no kernel image is available for execution on the device
Dec 13 08:41:23 <HOST> ollama[6310]: current device: 0, in function ggml_cuda_compute_forward at llama/ggml-cuda.cu:2403
Dec 13 08:41:23 <HOST> ollama[6310]: err
Dec 13 08:41:23 <HOST> ollama[6310]: llama/ggml-cuda.cu:132: CUDA error
Dec 13 08:41:23 <HOST> ollama[6310]: ptrace: Operation not permitted.
Dec 13 08:41:23 <HOST> ollama[6310]: No stack.
Dec 13 08:41:23 <HOST> ollama[6310]: The program is not being run.
--
Dec 13 08:42:16 <HOST> ollama[6310]: time=2024-12-13T08:42:16.962-08:00 level=INFO source=server.go:376 msg="starting llama server" cmd="/usr/lib/ollama/runners/cuda_v12_avx/ollama_llama_server runner --model /var/lib/ollama/blobs/sha256-1cecc26325a197571a1961bfacf64dc6e35e0f05faf57d3c6941a982e1eb2e1d --ctx-size 2048 --batch-size 512 --n-gpu-layers 25 --threads 4 --parallel 1 --port 40811"
Dec 13 08:42:16 <HOST> ollama[6310]: time=2024-12-13T08:42:16.962-08:00 level=INFO source=sched.go:449 msg="loaded runners" count=1
Dec 13 08:42:16 <HOST> ollama[6310]: time=2024-12-13T08:42:16.962-08:00 level=INFO source=server.go:555 msg="waiting for llama runner to start responding"
Dec 13 08:42:16 <HOST> ollama[6310]: time=2024-12-13T08:42:16.962-08:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server error"
Dec 13 08:42:16 <HOST> ollama[6310]: time=2024-12-13T08:42:16.998-08:00 level=INFO source=runner.go:946 msg="starting go runner"
Dec 13 08:42:16 <HOST> ollama[6310]: time=2024-12-13T08:42:16.999-08:00 level=INFO source=runner.go:947 msg=system info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | cgo(gcc)" threads=4
Dec 13 08:42:16 <HOST> ollama[6310]: time=2024-12-13T08:42:16.999-08:00 level=INFO source=.:0 msg="Server listening on 127.0.0.1:40811"
--
Dec 13 08:42:52 <HOST> ollama[6310]: time=2024-12-13T08:42:52.414-08:00 level=INFO source=server.go:594 msg="llama runner started in 35.45 seconds"
Dec 13 08:42:52 <HOST> ollama[6310]: [GIN] 2024/12/13 - 08:42:52 | 200 | 40.77210645s | 127.0.0.1 | POST "/api/generate"
Dec 13 08:43:07 <HOST> ollama[6310]: ggml_cuda_compute_forward: ADD failed
Dec 13 08:43:07 <HOST> ollama[6310]: CUDA error: no kernel image is available for execution on the device
Dec 13 08:43:07 <HOST> ollama[6310]: current device: 0, in function ggml_cuda_compute_forward at llama/ggml-cuda.cu:2403
Dec 13 08:43:07 <HOST> ollama[6310]: err
Dec 13 08:43:07 <HOST> ollama[6310]: llama/ggml-cuda.cu:132: CUDA error
Dec 13 08:43:07 <HOST> ollama[6310]: ptrace: Operation not permitted.
Dec 13 08:43:07 <HOST> ollama[6310]: No stack.
Dec 13 08:43:07 <HOST> ollama[6310]: The program is not being run.
Steps to downgrade:
Arch pacman
archives for ollama-cuda
are available via;
https://archive.archlinux.org/packages/o/ollama-cuda/
- Make a directory path for saving logs
mkdir -vp "${HOME}/Documents/logs/pacman/downgrade"
- Set some Bash variables
_version='0.3.9-2' _url="https://archive.archlinux.org/packages/o/ollama-cuda/ollama-cuda-${_version}-x86_64.pkg.tar.zst"
Note: as of 2024-12-13 09:58 -0800 incrementally downgrading to version
0.3.9-2
finally led to joy! - Downgrade package to target
_version
script -ac "sudo pacman -U ${_url}" "${HOME}/Documents/logs/pacman/downgrade/ollama-cuda-${_version}.script"
- Restart service
sudo systemctl restart ollama.service
- Test if things work now
ollama run codellama "Hello" #> Error: POST predict: Post "http://127.0.0.1:37775/completion": EOF
If things no work, then update _version
and _url
variables and try again from that step onward.