Assertion `err == hipSuccess' failed when using tensorflow-rocm
Description:
Python interpreter crashes when trying to list GPU devices in tensorflow
$ python
Python 3.10.0 | packaged by conda-forge | (default, Nov 20 2021, 02:24:10) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.config.list_physical_devices('GPU')
python: /usr/src/debug/hip-runtime/clr-rocm-6.2.4/hipamd/src/hip_code_object.cpp:1152: hip::FatBinaryInfo** hip::StatCO::addFatBinary(const void*, bool): Assertion `err == hipSuccess' failed.
[1] 855921 IOT instruction (core dumped) python
Additional info:
I have found a same error reported to python-pytorch, which is solved by disabling assertions in the old package hip-runtime-amd.
- package version(s):
- python: 3.10.0
- hip-runtime-amd: 6.2.4-1
- tensorflow-rocm: 2.16.1
- rocm-core: 6.2.4-2
- rocm-hip-runtime: 6.2.2-1
- hardware: AMD Radeon RX 6700
Steps to reproduce:
I installed tensorflow-rocm in a virtual Python environtment created by micromamba. I suppose this can be reproduce in other virtual Python environment created by conda or others.
- Install compatible tensorflow in the virtual environment.
pip install tensorflow-rocm==2.16.1 -f https://repo.radeon.com/rocm/manylinux/rocm-rel-6.2
- Check if tensorflow could find the graphic card
>>> import tensorflow as tf
>>> tf.config.list_physical_devices('GPU')
- Get the error
python: /usr/src/debug/hip-runtime/clr-rocm-6.2.4/hipamd/src/hip_code_object.cpp:1152: hip::FatBinaryInfo** hip::StatCO::addFatBinary(const void*, bool): Assertion `err == hipSuccess' failed.
[1] 858647 IOT instruction (core dumped) python
Edited by Syize Liu