Skip to content

Assertion `err == hipSuccess' failed when using tensorflow-rocm

Description:

Python interpreter crashes when trying to list GPU devices in tensorflow

$ python                                                          
Python 3.10.0 | packaged by conda-forge | (default, Nov 20 2021, 02:24:10) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.config.list_physical_devices('GPU')
python: /usr/src/debug/hip-runtime/clr-rocm-6.2.4/hipamd/src/hip_code_object.cpp:1152: hip::FatBinaryInfo** hip::StatCO::addFatBinary(const void*, bool): Assertion `err == hipSuccess' failed.
[1]    855921 IOT instruction (core dumped)  python

Additional info:

I have found a same error reported to python-pytorch, which is solved by disabling assertions in the old package hip-runtime-amd.

  • package version(s):
    • python: 3.10.0
    • hip-runtime-amd: 6.2.4-1
    • tensorflow-rocm: 2.16.1
    • rocm-core: 6.2.4-2
    • rocm-hip-runtime: 6.2.2-1
  • hardware: AMD Radeon RX 6700

Steps to reproduce:

I installed tensorflow-rocm in a virtual Python environtment created by micromamba. I suppose this can be reproduce in other virtual Python environment created by conda or others.

  1. Install compatible tensorflow in the virtual environment.
pip install tensorflow-rocm==2.16.1 -f https://repo.radeon.com/rocm/manylinux/rocm-rel-6.2
  1. Check if tensorflow could find the graphic card
>>> import tensorflow as tf
>>> tf.config.list_physical_devices('GPU')
  1. Get the error
python: /usr/src/debug/hip-runtime/clr-rocm-6.2.4/hipamd/src/hip_code_object.cpp:1152: hip::FatBinaryInfo** hip::StatCO::addFatBinary(const void*, bool): Assertion `err == hipSuccess' failed.
[1]    858647 IOT instruction (core dumped)  python
Edited by Syize Liu
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information