arch linux package missing cuda v11 runners

Description:

After a recent update attempting run any previously functional models now leads to errors, ex;

ollama run codellama "Hello"
#> Error: POST predict: Post "http://127.0.0.1:41711/completion": EOF

Recommendation from upstream:

@rick-github

This may be a build issue. The cuda variant of your device is v11 but there's no v11 runner in your package. Looking at the ollama-cuda file list there's only a cuda_v12 runner. I suggest filing a ticket with the Arch ollama-cuda maintainers.

Additional info:

  • Initially installed version (worked) ollama-cuda-0.3.10-1-x86_64
  • Broken on or before 0.5.1-2
  • Installation method: sudo packman -S ollama-cuda
  • Linux flavor: Arch (I use Arch BTW™)
  • GitHub -- ollama/ollama -- Issue 8089

Logs (snippet)

sudo journalctl -u ollama --no-pager | grep -C3 -i -- error | sed "s/$HOSTNAME/<HOST>/g"

Dec 13 08:41:19 <HOST> ollama[6310]: time=2024-12-13T08:41:19.590-08:00 level=INFO source=server.go:594 msg="llama runner started in 1.01 seconds"
Dec 13 08:41:19 <HOST> ollama[6310]: [GIN] 2024/12/13 - 08:41:19 | 200 |  1.129576145s |       127.0.0.1 | POST     "/api/generate"
Dec 13 08:41:23 <HOST> ollama[6310]: ggml_cuda_compute_forward: ADD failed
Dec 13 08:41:23 <HOST> ollama[6310]: CUDA error: no kernel image is available for execution on the device
Dec 13 08:41:23 <HOST> ollama[6310]:   current device: 0, in function ggml_cuda_compute_forward at llama/ggml-cuda.cu:2403
Dec 13 08:41:23 <HOST> ollama[6310]:   err
Dec 13 08:41:23 <HOST> ollama[6310]: llama/ggml-cuda.cu:132: CUDA error
Dec 13 08:41:23 <HOST> ollama[6310]: ptrace: Operation not permitted.
Dec 13 08:41:23 <HOST> ollama[6310]: No stack.
Dec 13 08:41:23 <HOST> ollama[6310]: The program is not being run.
--
Dec 13 08:42:16 <HOST> ollama[6310]: time=2024-12-13T08:42:16.962-08:00 level=INFO source=server.go:376 msg="starting llama server" cmd="/usr/lib/ollama/runners/cuda_v12_avx/ollama_llama_server runner --model /var/lib/ollama/blobs/sha256-1cecc26325a197571a1961bfacf64dc6e35e0f05faf57d3c6941a982e1eb2e1d --ctx-size 2048 --batch-size 512 --n-gpu-layers 25 --threads 4 --parallel 1 --port 40811"
Dec 13 08:42:16 <HOST> ollama[6310]: time=2024-12-13T08:42:16.962-08:00 level=INFO source=sched.go:449 msg="loaded runners" count=1
Dec 13 08:42:16 <HOST> ollama[6310]: time=2024-12-13T08:42:16.962-08:00 level=INFO source=server.go:555 msg="waiting for llama runner to start responding"
Dec 13 08:42:16 <HOST> ollama[6310]: time=2024-12-13T08:42:16.962-08:00 level=INFO source=server.go:589 msg="waiting for server to become available" status="llm server error"
Dec 13 08:42:16 <HOST> ollama[6310]: time=2024-12-13T08:42:16.998-08:00 level=INFO source=runner.go:946 msg="starting go runner"
Dec 13 08:42:16 <HOST> ollama[6310]: time=2024-12-13T08:42:16.999-08:00 level=INFO source=runner.go:947 msg=system info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | cgo(gcc)" threads=4
Dec 13 08:42:16 <HOST> ollama[6310]: time=2024-12-13T08:42:16.999-08:00 level=INFO source=.:0 msg="Server listening on 127.0.0.1:40811"
--
Dec 13 08:42:52 <HOST> ollama[6310]: time=2024-12-13T08:42:52.414-08:00 level=INFO source=server.go:594 msg="llama runner started in 35.45 seconds"
Dec 13 08:42:52 <HOST> ollama[6310]: [GIN] 2024/12/13 - 08:42:52 | 200 |  40.77210645s |       127.0.0.1 | POST     "/api/generate"
Dec 13 08:43:07 <HOST> ollama[6310]: ggml_cuda_compute_forward: ADD failed
Dec 13 08:43:07 <HOST> ollama[6310]: CUDA error: no kernel image is available for execution on the device
Dec 13 08:43:07 <HOST> ollama[6310]:   current device: 0, in function ggml_cuda_compute_forward at llama/ggml-cuda.cu:2403
Dec 13 08:43:07 <HOST> ollama[6310]:   err
Dec 13 08:43:07 <HOST> ollama[6310]: llama/ggml-cuda.cu:132: CUDA error
Dec 13 08:43:07 <HOST> ollama[6310]: ptrace: Operation not permitted.
Dec 13 08:43:07 <HOST> ollama[6310]: No stack.
Dec 13 08:43:07 <HOST> ollama[6310]: The program is not being run.

Steps to downgrade:

Arch pacman archives for ollama-cuda are available via;

https://archive.archlinux.org/packages/o/ollama-cuda/
  1. Make a directory path for saving logs
    mkdir -vp "${HOME}/Documents/logs/pacman/downgrade"
  2. Set some Bash variables
    _version='0.3.9-2'
    _url="https://archive.archlinux.org/packages/o/ollama-cuda/ollama-cuda-${_version}-x86_64.pkg.tar.zst"

    Note: as of 2024-12-13 09:58 -0800 incrementally downgrading to version 0.3.9-2 finally led to joy!

  3. Downgrade package to target _version
    script -ac "sudo pacman -U ${_url}" "${HOME}/Documents/logs/pacman/downgrade/ollama-cuda-${_version}.script"
  4. Restart service
    sudo systemctl restart ollama.service
  5. Test if things work now
    ollama run codellama "Hello"
    #> Error: POST predict: Post "http://127.0.0.1:37775/completion": EOF

If things no work, then update _version and _url variables and try again from that step onward.

Attachments

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information