[opt-rocm/rocm] Matrix multiplication causes segfault
Description:
The matrix multiplication operation is causing a segfault. This bug is very likely a packaging bug as pytorch 2.8 installed in a venv using the following command does not exhibit the bugged behaviour.
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/rocm6.4
Additional info:
- package version(s): python-pytorch-opt-rocm 2.8.0-2 (also affects python-pytorch-rocm)
- hardware: 7900 xtx
- config and/or log files:
[opdesktop:40146:0:40146] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x28)
==== backtrace (tid: 40146) ====
0 0x000000000004dd71 ucs_rcache_distribution_get_num_bins() ???:0
1 0x000000000004df3d ucs_rcache_distribution_get_num_bins() ???:0
2 0x000000000003e540 __sigaction() ???:0
3 0x00000000003c4640 hiprtcLinkDestroy() ???:0
4 0x00000000003f7bc2 hiprtcLinkDestroy() ???:0
5 0x00000000000bbc12 hipGetDeviceProperties() ???:0
6 0x00000000000bc909 hipGetDeviceProperties() ???:0
7 0x00000000000571b2 hipGetCmdName() ???:0
8 0x0000000000057356 hipGetCmdName() ???:0
9 0x00000000002d0ac6 __gnu_f2h_ieee() ???:0
10 0x000000000028beaf hipRegisterTracerCallback() ???:0
11 0x000000000139c64a rocblas_zswap_strided_batched_64() ???:0
12 0x000000000139ddb4 rocblas_zswap_strided_batched_64() ???:0
13 0x000000000094eb6b rocblas_initialize() ???:0
14 0x000000000095f36a rocblas_internal_tensile_is_initialized() ???:0
15 0x000000000054ff01 rocblas_internal_gemm_template<float>() ???:0
16 0x000000000053440b rocblas_sgemm() ???:0
17 0x00000000000d98ab hipblasSgemm() ???:0
18 0x0000000001f0f065 at::cuda::blas::getrfBatched<c10::complex<float> >() ???:0
19 0x0000000001ff1895 at::native::_bmm_dtype_cuda() ???:0
20 0x0000000001ff364b at::native::structured_mm_out_cuda::impl() ???:0
21 0x0000000002354c40 at::cuda::bmm() ???:0
22 0x0000000002354d04 at::cuda::mm() ???:0
23 0x00000000025d662a at::_ops::mm::redispatch() ???:0
24 0x00000000051deca9 std::vector<c10::IValue, std::allocator<c10::IValue> >::_M_realloc_append<std::optional<bool>&>() ???:0
25 0x00000000051df307 std::vector<c10::IValue, std::allocator<c10::IValue> >::_M_realloc_append<std::optional<bool>&>() ???:0
26 0x0000000002639ecf at::_ops::mm::call() ???:0
27 0x00000000018c1ecb at::native::linalg_pinv_out() ???:0
28 0x00000000018c2574 at::native::matmul() ???:0
29 0x0000000002c81ad5 at::compositeimplicitautogradnestedtensor::reshape() ???:0
30 0x000000000278fb6f at::_ops::matmul::call() ???:0
31 0x0000000000463476 std::vector<bool, std::allocator<bool> >::_M_fill_insert() ???:0
32 0x000000000046352e std::vector<bool, std::allocator<bool> >::_M_fill_insert() ???:0
33 0x000000000019c3e3 _PyObject_GetMethod() ???:0
34 0x000000000020e50a PyType_GetModule() ???:0
35 0x00000000000ea954 PyNumber_InPlacePower() ???:0
36 0x000000000019f448 PyNumber_Add() ???:0
37 0x00000000002ee25b PyNumber_MatrixMultiply() ???:0
38 0x00000000001739a2 _PyEval_EvalFrameDefault() ???:0
39 0x000000000024e2c9 PyEval_EvalCode() ???:0
40 0x0000000000269083 PyMem_RawCalloc() ???:0
41 0x000000000017631e _PyEval_EvalFrameDefault() ???:0
42 0x000000000024e2c9 PyEval_EvalCode() ???:0
43 0x0000000000269083 PyMem_RawCalloc() ???:0
44 0x000000000018d976 _PyFunction_SetVersion() ???:0
45 0x00000000001645bd PyObject_Vectorcall() ???:0
46 0x000000000017697a _PyEval_EvalFrameDefault() ???:0
47 0x00000000002847b0 PyUnicode_AsUTF8String() ???:0
48 0x0000000000092634 ???() /usr/lib/libpython3.13.so.1.0:0
49 0x000000000023bbeb Py_BytesMain() ???:0
50 0x0000000000027675 __libc_init_first() ???:0
51 0x0000000000027729 __libc_start_main() ???:0
52 0x0000000000001045 _start() ???:0
=================================
[1] 40146 segmentation fault (core dumped) python
Steps to reproduce:
- Run the following python script:
import torch
x = torch.randn(1000, 1000, device="cuda")
y = torch.randn(1000, 1000, device="cuda")
z = x @ y