Due to an influx of spam, we have had to temporarily disable account registrations. Please write an email to accountsupport@archlinux.org, with your desired username, if you want to get access. Sorry for the inconvenience.
Added 2023-11-21 21:43:49 UTC - Commented by loqs (loqs)
I'll disable WITH_CYCLES_OSL again but it's a mystery to me why this currently segfaults for radeon people. On Intel and NVIDIA, it works just fine.
Edit:
Upstream OSL states "It does not work currently" with respect to LLVM 16 1. Possibly ask upstream mesa what the difference is between the drivers?
Are you also looking into FS#80306? So perhaps both could be fixed at once.
The reason it only fails on radeon is that the radeon driver links with LLVM16, and the others do not. So, OSL is linking LLVM15, and the driver is linking LLVM16. Something is breaking because two different versions of LLVM are being linked concurrently and calling into each other, presumably because symbol versioning is disabled or broken.
You can also see in the callstack that it's starting off in a LLVM16 symbol, and then ends up inside a LLVM15 symbol. Once again, unsure what is actually causing this.
It's perhaps more clear with a fully symbolicated callstack. For some reason FindNodeOrInsertPos (frame 7) from llvm-16, used by the GPU driver, is calling NodeEquals (frame 6) from llvm-15, linked in from OpenShaderLanguage. And due to some ABI change that then crashes later.
Unmangled the symbols in question are:
ZN4llvm10FoldingSetINS_6SDNodeEE10NodeEqualsEPKNS_14FoldingSetBaseEPNS3_4NodeERKNS_16FoldingSetNodeIDEjRS8
which is called indirectly from
_ZN4llvm14FoldingSetBase19FindNodeOrInsertPosERKNS_16FoldingSetNodeIDERPvRKNS0_14FoldingSetInfoE
#0 llvm::FoldingSetNodeID::AddPointer(void const*) () at /usr/src/debug/llvm15/llvm-15.0.7.src/include/llvm/ADT/FoldingSet.h:337
#1 (closed) AddNodeIDOperands () at /usr/src/debug/llvm15/llvm-15.0.7.src/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:616
#2 (closed) AddNodeIDNode() () at /usr/src/debug/llvm15/llvm-15.0.7.src/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:868
#3 (closed) 0x00007fffccda7b98 in llvm::SDNode::Profile(llvm::FoldingSetNodeID&) const () at /usr/src/debug/llvm15/llvm-15.0.7.src/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:10818
#4 (closed) llvm::DefaultFoldingSetTraitllvm::SDNode::Profile(llvm::SDNode&, llvm::FoldingSetNodeID&) () at /usr/src/debug/llvm15/llvm-15.0.7.src/include/llvm/ADT/FoldingSet.h:236
#5 (closed) llvm::DefaultFoldingSetTraitllvm::SDNode::Equals(llvm::SDNode&, llvm::FoldingSetNodeID const&, unsigned int, llvm::FoldingSetNodeID&) () at /usr/src/debug/llvm15/llvm-15.0.7.src/include/llvm/ADT/FoldingSet.h:404
#6 (closed) llvm::FoldingSetllvm::SDNode::NodeEquals(llvm::FoldingSetBase const*, llvm::FoldingSetBase::Node*, llvm::FoldingSetNodeID const&, unsigned int, llvm::FoldingSetNodeID&) () at /usr/src/debug/llvm15/llvm-15.0.7.src/include/llvm/ADT/FoldingSet.h:538
#7 (closed) 0x00007fffe67278e6 in llvm::FoldingSetBase::FindNodeOrInsertPos(llvm::FoldingSetNodeID const&, void*&, llvm::FoldingSetBase::FoldingSetInfo const&) () at /usr/src/debug/llvm/llvm-16.0.6.src/lib/Support/FoldingSet.cpp:288
#8 0x00007fffe729b1b6 in llvm::FoldingSetImpl<llvm::FoldingSetllvm::SDNode, llvm::SDNode>::FindNodeOrInsertPos(llvm::FoldingSetNodeID const&, void*&) () at /usr/src/debug/llvm/llvm-16.0.6.src/include/llvm/ADT/FoldingSet.h:490
#9 llvm::SelectionDAG::FindNodeOrInsertPos(llvm::FoldingSetNodeID const&, llvm::SDLoc const&, void*&) () at /usr/src/debug/llvm/llvm-16.0.6.src/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:1343
#10 0x00007fffe72e79cd in llvm::SelectionDAG::getNode(unsigned int, llvm::SDLoc const&, llvm::SDVTList, llvm::ArrayRefllvm::SDValue, llvm::SDNodeFlags) () at /usr/src/debug/llvm/llvm-16.0.6.src/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:9411
#11 0x00007fffe940e816 in llvm::SelectionDAG::getCopyFromReg(llvm::SDValue, llvm::SDLoc const&, unsigned int, llvm::EVT) () at /usr/src/debug/llvm/llvm-16.0.6.src/include/llvm/CodeGen/SelectionDAG.h:798
#12 llvm::SITargetLowering::LowerFormalArguments(llvm::SDValue, unsigned int, bool, llvm::SmallVectorImplllvm::ISD::InputArg const&, llvm::SDLoc const&, llvm::SelectionDAG&, llvm::SmallVectorImplllvm::SDValue&) const ()
at /usr/src/debug/llvm/llvm-16.0.6.src/lib/Target/AMDGPU/SIISelLowering.cpp:2557
#13 0x00007fffe728f453 in llvm::SelectionDAGISel::LowerArguments(llvm::Function const&) () at /usr/src/debug/llvm/llvm-16.0.6.src/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:10697
#14 0x00007fffe730632c in llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) () at /usr/src/debug/llvm/llvm-16.0.6.src/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1398
#15 0x00007fffe73079b6 in llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) () at /usr/src/debug/llvm/llvm-16.0.6.src/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:480
#16 0x00007fffe6cea945 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) () at /usr/src/debug/llvm/llvm-16.0.6.src/lib/CodeGen/MachineFunctionPass.cpp:91
#17 0x00007fffe69ab989 in llvm::FPPassManager::runOnFunction(llvm::Function&) () at /usr/src/debug/llvm/llvm-16.0.6.src/lib/IR/LegacyPassManager.cpp:1430
#18 0x00007fffe822aefa in RunPassOnSCC () at /usr/src/debug/llvm/llvm-16.0.6.src/lib/Analysis/CallGraphSCCPass.cpp:179
#19 RunAllPassesOnSCC () at /usr/src/debug/llvm/llvm-16.0.6.src/lib/Analysis/CallGraphSCCPass.cpp:469
#20 runOnModule() () at /usr/src/debug/llvm/llvm-16.0.6.src/lib/Analysis/CallGraphSCCPass.cpp:534
#21 0x00007fffe69ac6ac in runOnModule () at /usr/src/debug/llvm/llvm-16.0.6.src/lib/IR/LegacyPassManager.cpp:1545
#22 llvm::legacy::PassManagerImpl::run(llvm::Module&) () at /usr/src/debug/llvm/llvm-16.0.6.src/lib/IR/LegacyPassManager.cpp:535
#23 0x00007fff84a2f380 in ac_compile_module_to_elf () at ../mesa-23.2.1/src/amd/llvm/ac_llvm_helper.cpp:270
#24 si_compile_llvm() () at ../mesa-23.2.1/src/gallium/drivers/radeonsi/si_shader_llvm.c:83
#25 0x00007fff84a30009 in si_get_shader_part() () at ../mesa-23.2.1/src/gallium/drivers/radeonsi/si_shader.c:2882
#26 0x00007fff84a4fffc in si_shader_select_ps_parts () at ../mesa-23.2.1/src/gallium/drivers/radeonsi/si_shader.c:3160
#27 si_create_shader_variant() () at ../mesa-23.2.1/src/gallium/drivers/radeonsi/si_shader.c:3293
#28 0x00007fff84a5c2cf in si_build_shader_variant() () at ../mesa-23.2.1/src/gallium/drivers/radeonsi/si_state_shaders.cpp:2520
#29 0x00007fff84a67c4d in si_shader_select_with_key<false, si_shader_key_ps> () at ../mesa-23.2.1/src/gallium/drivers/radeonsi/si_state_shaders.cpp:2863
#30 si_shader_select() () at ../mesa-23.2.1/src/gallium/drivers/radeonsi/si_state_shaders.cpp:2883
#31 0x00007fff84eeec55 in si_update_shaders<(amd_gfx_level)13, (si_has_tess)0, (si_has_gs)0, (si_has_ngg)1>() () at ../mesa-23.2.1/src/gallium/drivers/radeonsi/si_state_draw.cpp:257
#32 0x00007fff84ef8cf3 in si_draw<(amd_gfx_level)13, (si_has_tess)0, (si_has_gs)0, (si_has_ngg)1, (si_is_draw_vertex_state)0, (si_has_pairs)0, (util_popcnt)0> () at ../mesa-23.2.1/src/gallium/drivers/radeonsi/si_state_draw.cpp:2565
#33 si_draw_vbo<(amd_gfx_level)13, (si_has_tess)0, (si_has_gs)0, (si_has_ngg)1, (si_has_pairs)0>() () at ../mesa-23.2.1/src/gallium/drivers/radeonsi/si_state_draw.cpp:2714
#34 0x00007fff84dad593 in si_draw_rectangle() () at ../mesa-23.2.1/src/gallium/drivers/radeonsi/si_state_draw.cpp:2780
#35 0x00007fff84dc7239 in util_blitter_custom_color() () at ../mesa-23.2.1/src/gallium/auxiliary/util/u_blitter.c:2771
#36 0x00007fff84aa92dc in si_blit_decompress_color() () at ../mesa-23.2.1/src/gallium/drivers/radeonsi/si_blit.c:504
#37 0x00007fff84ab6fe2 in si_flush_resource () at ../mesa-23.2.1/src/gallium/drivers/radeonsi/si_blit.c:1332
#38 si_flush_resource() () at ../mesa-23.2.1/src/gallium/drivers/radeonsi/si_blit.c:1323
#39 0x00007fff84809487 in tc_call_flush_resource() () at ../mesa-23.2.1/src/gallium/auxiliary/util/u_threaded_context.c:4390
#40 0x00007fff847e8219 in batch_execute () at ../mesa-23.2.1/src/gallium/auxiliary/util/u_threaded_context.c:394
#41 tc_batch_execute() () at ../mesa-23.2.1/src/gallium/auxiliary/util/u_threaded_context.c:445
#42 0x00007fff8567f72a in _tc_sync.isra.0 () at ../mesa-23.2.1/src/gallium/auxiliary/util/u_threaded_context.c:680
#43 0x00007fff8480b018 in tc_flush() () at ../mesa-23.2.1/src/gallium/auxiliary/util/u_threaded_context.c:3587
#44 0x00007fff8438753f in st_flush () at ../mesa-23.2.1/src/mesa/state_tracker/st_cb_flush.c:63
#45 st_context_flush() () at ../mesa-23.2.1/src/mesa/state_tracker/st_manager.c:822
#46 0x00007fff842be454 in dri_flush() () at ../mesa-23.2.1/src/gallium/frontends/dri/dri_drawable.c:537
#47 0x00007fff8d9def58 in dri2_flush_drawable_for_swapbuffers () at ../mesa-23.2.1/src/egl/drivers/dri2/egl_dri2.c:1867
#48 dri2_wl_swap_buffers_with_damage () at ../mesa-23.2.1/src/egl/drivers/dri2/platform_wayland.c:1612
#49 0x00007fff8d9cf328 in dri2_swap_buffers () at ../mesa-23.2.1/src/egl/drivers/dri2/egl_dri2.c:1881
#50 0x00007fff8d9c706d in eglSwapBuffers () at ../mesa-23.2.1/src/egl/main/eglapi.c:1433
#51 0x0000555557eb4744 in GHOST_ContextEGL::initializeDrawingContext() ()
#52 0x0000555557eb2f68 in GHOST_WindowWayland::newDrawingContext(GHOST_TDrawingContextType) ()
#53 0x0000555557e93759 in GHOST_Window::setDrawingContextType(GHOST_TDrawingContextType) ()
#54 0x0000555557eb119b in GHOST_WindowWayland::GHOST_WindowWayland(GHOST_SystemWayland*, char const*, int, int, unsigned int, unsigned int, GHOST_TWindowState, GHOST_IWindow const*, GHOST_TDrawingContextType, bool, bool, bool) ()
#55 0x0000555557eae3d4 in GHOST_SystemWayland::createWindow(char const*, int, int, unsigned int, unsigned int, GHOST_TWindowState, GHOST_GPUSettings, bool, bool, GHOST_IWindow const*) ()
#56 0x0000555557e8fca4 in GHOST_CreateWindow ()
#57 0x000055555656a7d2 in wm_window_ghostwindow_ensure(wmWindowManager*, wmWindow*, bool) ()
#58 0x000055555656aa1d in wm_window_ghostwindows_ensure(wmWindowManager*) ()
#59 0x0000555556534454 in WM_check(bContext*) ()
#60 0x000055555654a2de in wm_homefile_read_ex(bContext*, wmHomeFileRead_Params const*, ReportList*, wmFileReadPost_Params**) ()
#61 0x000055555654f9f4 in WM_init(bContext*, int, char const**) ()
#62 0x0000555555de1681 in main ()
Added 2023-11-22 14:06:41 UTC - Commented by c (grinness)
Hi,
for information, I have tested blender-4.0.1-linux-x64.tar.xz downloaded from blender.org:
blender starts without segfaulting, however attempting to enable HIP backend for cycles (i am on a rx6800) freezes blender itself, leading to a 'defunct' process that also hangs the system reboot/shut down process
I tested latest blender 3.6.5 from my pacman cache (blender 3.6.5-2, also downgrading openvb to 10.1.0-1) and I experience the same issue:
blender starts without segfaulting, however attempting to enable HIP backend for cycles freezes blender itself, leading to a 'defunct' process that also hangs the system reboot/shut down process
I am on the latest rocm+hip stack (5.7.1).
The above did not happen with the older rocm+hip stack (5.6.1)
In case anyone NEEDS to render something with their AMD GPU, using the proprietary AMDGPU PRO driver works.
https://wiki.archlinux.org/title/AMDGPU_PRO
Launch blender with progl blender
Toolybird, isn't this perhaps the same as the ROCm crash for pytorch?
All in all, I think I probably shouldn't change the Blender package for this as it appears to be an artifact of a bug in ROCm or Mesa with Radeon specifically.
The reason so many people are dumping unsymbolicated crash dumps here is because Blender itself dumps this useless (to Arch) crash dump in the /tmp/ folder on every crash.
Also, I think the reason this is a problem for Radeon users is because ROCm is linked against LLVM 15 to avoid some sort of crash bug with newer versions. I'm guessing the latest 17 would break even more things for both Mesa and ROCm.
All in all, I think I probably shouldn't change the Blender package for this as it appears to be an artifact of a bug in ROCm or Mesa with Radeon specifically.
I can understand the reasoning here, and yet it means people with AMD gpus (and mesa drivers) won't be able to use the one available blender package in the official arch repos, right?
At the very least this will prompt repeated bug reports and forum threads.
Would it be acceptable to create a package blender-amd with OSL disabled? Alternatively print a note in post install of the blender package to point out the issue and that users with AMD cards should use an alternative package from the AUR?
It's probably worth re-upping the mesa bugs for this issue, because this really shouldn't be happening even when blender is using its own LLVM version. This is why the symbols are versioned. There might be some complexity that's causing it to fail which could be fixed by the mesa side. Also because it keeps happening every time llvm versions get bumped. :')
All the drivers provided by the mesa package link with LLVM16?
Yeah I guess so, but there's something seriously broken for it to jump from LLVM 16's library into LLVM 15's library. Maybe a symbol isn't versioned properly, or something very cursed is happening with object vtables. I don't know. Regardless the presence of two different LLVM versions is a prerequisite for the problem.
Added 2023-11-23 14:13:02 UTC - Commented by loqs (loqs)
Does updating openshadinglanuage to the latest git commit (see attached source pkg) which removes use of LLVM15 does that help? There has been no soname bump but that could be because there has been no release yet.
Edit:
I was mistaken, there has been an soname bump so you would need to rebuild blender.
If you start blender using glpro (aur/amdgpu-pro-oglp I guess) it works just fine. Seems to be an issue between llvm-15 used by intel libraries and llvm-16 used by the free driver stuff?
Added 2023-11-24 22:31:25 UTC - Commented by loqs (loqs)
Also, sadly, loqs package broke again due to libOpenImageDenoise upgrading.
Rebuilt for openimagedenoise 2.1.0 see also attached PKGBUILD.diff which shows the changes I made so you can do it locally if required for future updates. The pkgrel is 5.2 as 5.1 was an unsuccessful test using openshadinglanuage-git which is compatible with clang 16 but not compatible with blender 17:4.0.1.
https://drive.google.com/file/d/1u9hjrnpzkZmZJ2L3_WoLaOzsyk2uGYxx/view?usp=sharing
Is disabling OSL for Cycles worth it for AMD users? I suppose it might be. It's kind of sad, I wish we could enable the full feature set for all users. Do we even have confirmation that it might be compatible if we can build everything with the same version of LLVM? I could disable OSL and put a notice into the package but I'd like to ideally also have some kind of criterion that tells us when we might be able to enable it again.
Why in the world is the linker binding LLVM-15 symbols to LLVM16. What the heck. This is the output of LD_DEBUG=bindings blender 2>&1 | rg "LLVM-15" | rg "LLVM-16" I kinda assume there's something broken in the arch pkgbuilds but idk what it is.
I think we might make this work by grabbing a local static build of llvm15 in the openshadinglanguage package and then using that instead of the dynamic version that we currently use. It should be fairly straight forward but I don't currently have to time to test this. Could someone whip up a package and check this out? That should remove the llvm15 symbols.
Alternatively, try to backport the LLVM 16 PR and make this work with LLVM 16 directly but that's more work though it would certainly be preferable.
However I do get a whole system crash when trying to enable HIP. Once in a bluemoon it just crashes the display driver and not the entire system and I can extract the following dmesg: