Due to an influx of spam, we have had to temporarily disable account registrations. Please write an email to accountsupport@archlinux.org, with your desired username, if you want to get access. Sorry for the inconvenience.
Blender 4.0.2-10 crashes when using HIP Cycles (rx 6800, rocm 5.6.1)
The same does not happen with blender 4.0.2-9, same file, same version of rocm, same configurations.
The error in the terminal is:
Memory access fault by GPU node-1 (Agent handle: 0x7092c2daa300) on address 0x708fde7ac000. Reason: Page not present or supervisor privilege.
IOT instruction (core dumped) blender
with regards to rocm 5.6.1, all new versions (5.7.1 and 6.0) have caused problems and crashes for users -- rocm 5.6.1 is stable across multiple applications (tested pytorch and blender up to 4.0.2-9).
See below recent forum post on the matter:
set Device to GPU Compute (make sure to select HIP back end using the AMD GPU in the preferences)
go to 'Sampling' options (within the 'Render' panel)
check Noise Threshold check box in Sampling-> Viewport
check Denoise checkbox in Sampling-> Viewport
check Noise Threshold check box in Sampling-> Render
check Denoise checkbox in Sampling-> Render
Change 'Viewport Shading' to 'Rendered'
if the segfault does not trigger immediately, try switching between 'Viewport Shading' 'Rendered' and 'Material Preview'
The segfault triggers for me after switching between 'Viewport Shading' 'Rendered' and 'Material Preview' a couple of times
the issue triggers for me straightforwardly on a number of other personal files I am not in the position of sharing. File classroom.blend is reproducing the issue as described in points above. Possibly other files from Blender Files do the same (not the standard start-up cube)
Edited
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related or that one is blocking others.
Learn more.
BTW, this error message is all over the web, so it could easily be an upstream problem due to the recent ROCm 6 upgrade, and not an Arch packaging issue.
Memory access fault by GPU node-1 (Agent handle: 0x7fff2c9ce600) on address 0x7ffc057ac000. Reason: Page not present or supervisor privilege.Thread 103 "blender" received signal SIGABRT, Aborted.[Switching to Thread 0x7fff1f800000 (LWP 20840)]__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44Downloading source file /usr/src/debug/glibc/glibc/nptl/pthread_kill.c44 return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0; (gdb) bt#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44#1 0x00007fffe38ab393 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78#2 0x00007fffe385a6c8 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26#3 0x00007fffe38424b8 in __GI_abort () at abort.c:79#4 0x00007fff1fa2132a in ??? () at /opt/rocm/lib/libhsa-runtime64.so.1#5 0x00007fff1fa745d2 in ??? () at /opt/rocm/lib/libhsa-runtime64.so.1#6 0x00007fff1fa257cc in ??? () at /opt/rocm/lib/libhsa-runtime64.so.1#7 0x00007fffe38a955a in start_thread (arg=<optimized out>) at pthread_create.c:447#8 0x00007fffe3926a3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78(gdb)
Note that installing rocm 6.0 as currently in arch repos is not an option for many AMD GPU users that want to use GPGPU capabilities.
It may not be accepted as bug report, but please see what it is reported in the forums:
Thanks for the trace. But I think you're going to have to experiment with some environment variables to try and get some better debug output from HIP. See for example here and here. Time to bring in @tpkessler methinks.
Blender 4.0.2 downloaded from blender.org works absolutely fine
I tested using the same methodology as in my original bug report, same input file (classroom) and following same steps.
I switched between 'Viewport Shading' 'Rendered' and 'Material Preview' at least 10 times, no crash.
Edit -- again I am using rocm 5.6.1 as was provided by Arch repos few months ago. I had to roll-back from 5.7.1 as that version was not working with blender and pytorch.
In the forum thread I referenced in previous post, many users have flagged that rocm 6.0 is still not working properly (ppl with rx 5700 and rx 6600 from a quick scan).
1 person has reported that switching from ROCM to opencl-rusticl-mesa has solved issues with darktable (but rusticl is not supported by blender or pytorch -- AFAIK)
No, I use my computer for work, I need something that I know works with pytorch -- I have hold-off the upgrade of a number of packages due to the status of rocm since 5.7.1
In the forum linked there are enough ppl flagging the issue, also with rocm 6.0
4.0.2-10 was a rebuild against ROCm 6 which appears to be the culprit. I read the entire thread you linked and there doesn't appear to be anyone pointing out what the result of ROCm 6 with Blender from blender.org is. Maybe @tpkessler can do a quick test? I don't have hardware to test this with.
Well, please don't be surprised if the ticket is closed with "@reporter uncooperative". Honestly, this is how Arch works. If you report a bug, you are required to do what is asked of you. How else will it get fixed?
Additionally, I didn't cotton on that you were in a "partial updates" situation which is completely unsupported! Ughh.
To put this differently: we'll not go back to an old upstream version and work with that as rolling back is not an option. We will have to get this fixed on ROCm 6. We need more eyes on this and some people who might be able to figure out what's wrong with the current packages.
On my end I did what I felt was right -- reporting the issue and provide debug info as requested for a software that has no hard dependency on rocm6 and its upstream version works fine -- aka it is an Arch package issue
I note that compared with output from @tpkessler there is no CUDA variables set -- I have no NVIDIA hardware or related packages installed on my machine
HIP version : 5.6.31062-== hipconfigHIP_PATH : /opt/rocmROCM_PATH : /opt/rocmHIP_COMPILER : clangHIP_PLATFORM : amdHIP_RUNTIME : rocclrCPP_CONFIG : -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm/include -I/opt/rocm/llvm/bin/../lib/clang/16.0.0 == hip-clangHIP_CLANG_PATH : /opt/rocm/llvm/binclang version 16.0.0Target: x86_64-pc-linux-gnuThread model: posixInstalledDir: /opt/rocm/llvm/binLLVM (http://llvm.org/): LLVM version 16.0.0git Optimized build with assertions. Default target: x86_64-pc-linux-gnu Host CPU: znver3 Registered Targets: amdgcn - AMD GCN GPUs nvptx - NVIDIA PTX 32-bit nvptx64 - NVIDIA PTX 64-bit r600 - AMD GPUs HD2XXX-HD6XXX x86 - 32-bit X86: Pentium-Pro and above x86-64 - 64-bit X86: EM64T and AMD64hip-clang-cxxflags : -isystem "/opt/rocm/bin/include" -O3hip-clang-ldflags : -O3 --hip-link --rtlib=compiler-rt -unwindlib=libgcc=== Environment VariablesPATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/opt/apache-spark/bin:/opt/apache-spark/sbin:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/opt/rocm/binegrep: warning: egrep is obsolescent; using grep -EHIP_ROCCLR_HOME=/opt/rocm/bin== Linux KernelHostname : XXXXXLinux gemini 6.7.4-zen1-1-zen #1 ZEN SMP PREEMPT_DYNAMIC Mon, 05 Feb 2024 22:07:37 +0000 x86_64 GNU/LinuxLSB Version: n/aDistributor ID: ArchDescription: Arch LinuxRelease: rollingCodename: n/a
taking a third look @tpkesslerhipconfig --full, noting that a bunch of NVIDIA related env variables are set:
PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/opt/rocm/bin:/opt/cuda/bin:/opt/cuda/nsight_compute:/opt/cuda/nsight_systems/bin:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/usr/lib/rustup/binegrep: warning: egrep is obsolescent; using grep -ECUDA_PATH=/opt/cuda
and the following string seems to indicate that the hardware is a lenovo 'idepad'
== Linux KernelHostname : Can't exec "hostname": No such file or directory at /opt/rocm/bin//hipconfig.pl line 211.Linux ideapad 6.7.4-arch1-1 #1 SMP PREEMPT_DYNAMIC Mon, 05 Feb 2024 22:07:49 +0000 x86_64 GNU/LinuxLSB Version: n/a
I wonder if there is some interaction with the NVIDIA hardware possibly installed in his machine.
To the best of my knowledge Lenovo IdeaPads have options with NIVIDIA RTX video cards dGPU -- no AMD dGPU (AFAIK)
I tested it on my laptop with integrated GPU, gfx1103 (no dGPU). As explained above, your blender crashes as the newest version is compiled with the recent version of ROCm, ROCm 6.0.0. Since this is a major release (from 5.7.1 to 6.0.0) it includes breaking API changes. So it's not surprising that your setup breaks. If you decide to use an older version of ROCm then you also have to roll back all packages that use it. In your case this means: Install an older (working) version of blender and add it to IgnorePkg in /etc/pacman.conf or locally recompile blender with your old ROCm version. The support of blender for ROCm 5+ doesn't mean that you can run it with any ROCm version but linked against one particular version of ROCm it will work. As we use ROCm 6 but you have ROCm 5.6.1 installed, blender will break.
So, you are saying that Arch is artificially imposing an external library minimum requirement (rocm 6.0) to a software (blender) that has no such minimum requirement (blender states rocm >= 5.3) and its upstream version (blender 4.0.2) works fine (tested) with rocm 5.6.1 as was available in Arch repos.
ok, I take note of this.
However consider that the above imposes essentially an extra dependency (unnecessary) on blender on whatever status of rocm stack is. To date rocm in Arch has been problematic (either with pytorch, blender, darktable etc). And many end users have flagged the issues both on the old Arch bug report portal and in forums.
If the desire of the maintainer is to close this bug report, please go ahead.
On my end I will block the upgrade to blender 4.0.2-10
We never supported partial upgrades in Arch because they are unsupportable. As such, we always have the implicit need for users to upgrade to the newest available versions of everything. Consider also that Arch is entirely volunteer run and we all do this in our spare time. If you need commercial level support for deprecated versions of some software, it might be better to look into an enterprise distro.
I understand your point and I appreciate all the effort from all volunteers (including end users providing feedback in their own spare time)
However there is a requirement to provide software packages in Arch official repos that are tested and stable -- otherwise everyone here is wasting his/her own time
Unfortunately rocm is demonstrably not stable as in Arch. Maybe including it in the official repos is not a good idea. And it is not a good idea to impose artificially dependencies on specific versions of external libraries to a software that has no such dependecies
We do have testing but for specialized groups of packages we sometimes have no testers. The good news is that you can help us with that and become a tester. If you apply, you get to review ROCm packages before they get released to stable.
Gladly. Hopefully you can now appreciate why partial upgrades are unsupportable. Another point worth a mention: it ruins backtraces via our debuginfod server i.e. the traces become ineffective due to "holes" in the output.
PMs - I'm very sorry for allowing this "partial upgrade" report through thus wasting your time!