Due to an influx of spam, we have had to temporarily disable account registrations. Please write an email to accountsupport@archlinux.org, with your desired username, if you want to get access. Sorry for the inconvenience.
When updating to 6.9 or greater, as soon as the graphical session starts, the system freezes. This applies to linux, linux-zen and through testing, also the latest LTS. However, when building linux-git from the AUR, 6.9.1 works fine and doesn't show any of the issues from the Arch kernel. I was able to apply the Arch kernel patches to the linux-git package and it also worked fine.
Additionally, when using the Arch kernel 6.9 or above, if I remove one of my monitors from the equation and boot single monitor, the issue is gone. I can power on the monitor after the graphical session starts and use it, however, resuming from sleep results in the same crash.
If needed, amdgpu issues can be reported upstream at https://gitlab.freedesktop.org/drm/amd. But if it's already fixed on mainline.. probably no point. The chances are a fix will filter through to the stable tree before too long..
If that works for you the config is to blame, if it does not we'll need to find the commit that has caused the issue
Also, did you also check if the 6.9 (mainline release) kernel has this issue for you? You can check either via aur/linux-mainline (has a repo with precompiled stuff) or via the repo package:
Freezes my main tty on boot at [ OK ] Finished Virtual Console Setup., but switching to another one allows me to enter. Hyprland still crashes, unable to properly check SDDM.
Should I just build 6.9 myself then and see if it crashes as well to get the bad starting point for bisecting? Seems like Arch might not be the culprit.
commit 34241dc665cf21bc628f1fea2249adb10010dfc0drm/amd/display: reenable windowed mpo odm support on dcn32 and dcn321
Well, that is where I ended up, which makes sense as the issue is clearly drm related. I'm curious to learn if there's a way for me to build 6.9 without that specific commit to test it?
I certainly wasn't going to be able to figure out that wizardry, so thanks. That worked and I can confirm that 6.9.1-arch works with that commit removed, albeit with a nasty stutter that completely freezes everything for a moment every maybe 5 seconds? This is all anecdotal, but I experienced that on 6.8.x as well, however, it was considerably less noticeable and not nearly as impactful. linux-git didn't seem to be as bad, but I'll retry that one as well.
So, with all of that being said, since linux-git doesn't have the issue and you all have helped me confirm that arch isn't the issue, should I just wait it out then? Or would it be worthwhile taking it to the drm gitlab and creating an issue there for it?
edit - just built linux-git again from the AUR and it's flawless. 6.8.9-arch was good, 6.9.x with the problem commit removed works, but poorly and linux-git is the superior one out of all of them.
I'm an idiot and had already cleared out the folder I was bisecting in before I got that message. I've just been avoiding going through it all over. I'll start the process again today.
It's okay. You guys do more than enough. I used the linux-git aur package this time around and it was insanely faster than doing it all manually.
So I finished up and ran the commands mentioned above to try and isolate the fixed commit and received this -
❯ git bisect start --term-new=fixed --term-old=unfixedUpdating files: 100% (25015/25015), done.Previous HEAD position was 340383c734f8 drm/amd/display: Remove pixle rate limit for subvpSwitched to branch 'master'Your branch is up to date with 'origin/master'.status: waiting for both good and bad commits❯ git bisect fixedstatus: waiting for good commit(s), bad commit known❯ git bisect unfixed v6.9Bisecting: 6677 revisions left to test after this (roughly 13 steps)[db5d28c0bfe566908719bec8e25443aabecbb802] Merge tag 'drm-next-2024-05-15' of https://gitlab.freedesktop.org/drm/kernel
So does this mean I need to do another round of bisecting to get to the fixed commit? Never mind... I actually read the link above and realized I'm in a new fixed/unfixed series. Had no idea this was a thing. Pretty cool.
I got about 3 or 4 revisions in and hit a wall. It errored out, so I figured I'd try make clean to reset things a bit, and it didn't work. Ultimately, that lead to me running make oldconfig because it was complaining about a missing .config, but even after doing that, I'm still hitting a wall -
Warning: 'make modules_install' requires /doesnt/exist. Please install it.This is probably in the kmod package.==> Tidying install... -> Removing libtool files... -> Purging unwanted files... -> Removing static library files... -> Copying source files needed for debug symbols... -> Compressing man and info pages...==> Checking for packaging issues...==> Creating package "linux-git"... -> Generating .PKGINFO file... -> Generating .BUILDINFO file... -> Adding install file... -> Generating .MTREE file... -> Compressing package...==> Starting package_linux-git-headers()...Installing build files...install: cannot stat 'tools/bpf/resolve_btfids/resolve_btfids': No such file or directory==> ERROR: A failure occurred in package_linux-git-headers(). Aborting...
@gromit that is indeed what it was, so thanks for that. I was able to get through the rest of it today and here's the result. I'll bring all of these findings over the the issue on the drm gitlab page. Thanks for the help, everyone!
ab6a0edb7ded060e84dc1a24e3936c86c3d048b9 is the first fixed commitcommit ab6a0edb7ded060e84dc1a24e3936c86c3d048b9Author: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>Date: Mon Apr 22 07:43:55 2024 -0600 Revert "drm/amd/display: Add fallback configuration when set DRR" This reverts commit d76c0a23b557c6ebb3fac32548100d76a1e0ce23. This change must be reverted since it caused soft hangs when changing the refresh rate to 122 & 144Hz when using a 7000 series GPU. Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Reported-by: Mark Broadworth <Mark.Broadworth@amd.com> Cc: Daniel Wheeler <daniel.wheeler@amd.com> Cc: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> drivers/gpu/drm/amd/display/dc/optc/dcn32/dcn32_optc.c | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-)
Can confirm, same is happening on my system. NVidia, Hyprland.
Downgrading to
linux 6.8.9.arch1-2linux-headers 6.8.9.arch1-2
Fixes the issue. Software rendering still works, SDDM loads fine. The Hyprland doesn't seem to be the one responsible for the issue, as GPU rendering broke specifically after kernel update.
Hey, @ptr1337 ! I tried to pass it as a kernel param for my efistub, didn't seem to change anything for 6.8.9.arch1-2, but passing it for 6.9.1.arch1-1 seems to break even sddm, as the system isn't even able to launch it, being stuck in an infinite loop.
Wlroots-hyprland doesn't launch either, still can't find the GPU.
[CRITICAL] m_sWLRRenderer was NULL! This usually means wlroots could not find a GPU or enountered some issues.[CRITICAL] Critical error thrown: wlr_gles2_renderer_create_with_drm_fd() failed!
Hey, @loqs! I don't have any AMD GPUs I could use to test, but my brother's rx560 (laptop) seems to work fine on the latest kernel release.
I'll keep You updated in case his system starts failing to reach graphical interface.
I faced the same issue on 6.9.1.hardened1-2 but for me everything works fine on 6.9.1.arch1-2.
I'm using nvidia-open-dkms 550.78-5 which won't compile on 6.8.9.hardened.* (mkinitcpio can't find nvidia module) but works fine on 6.8.9.arch1-2.
So based on that I assume that newer versions of nvidia-open-dkms conflicts with some parameters in config.
As your issue can only be reproduced on linux-hardened could you please open a new issue against that package. It would be great if you could try building linux-hardened with the config from linux and vice-versa and include the results of those tests on the new issue.
That would be wonderful, @gromit. Do i just mark as "good" the last commit that was working and mark the first that isn't working as "bad"?
Then, after the bisection is started, it will scroll through all the commits starting from the one that comes right after the one that was working, while automatically checking out?
Also... Do i just git clone the Torvald's repo and scroll through commits, or should i clone the ArchLinux's Linux repo?
Sorry if this sounds silly of me, Git is still really confusing to me...
You keep build and testing the revision it throws at you
You tell if the current build is good or bad
That just continues until the culprit is found The cool idea behind this is that it uses binary search on the commitrange so you do not have to test i.e. the 15678 revisions between linux 6.8 and linux 6.9 but only like 13 example commits.
I can also just provide you the kernels to test (like I did in #56 (comment 187650))
No you install the package that was built from the commit in question
If we do it together you'll just get links to prebuilt packages and I'll also do the bisection in my local repo, so you'll just have to report whether the issue still exists with the package I gave you
That... Would be wonderful! Thank you so much for this.
So the latest working tag is 6.8.9.arch1-2, then the tag that comes right after, 6.9.arch1-1, already has the issue... So it must be caused by a commit in-between the two tags
Uh thats indeed a bit small do you have fallback images activated still?
See my comments in the other thread for that: #56 (comment 187671)
I think generally its a good idea to still have another kernel lying around because after all its a mainline kernel, so some other issues might arise with it during the bisection (usually nothing major). But if you choose to go the route of uninstalling the 'linux' package maybe create yourself a usb of which you can do a possible rescue from first.
Thanks, @gromit! I've managed to install the mainline kernel by removing the fallback images and disabling them in configs. I'm using efistub, should I just create a EFI entry for vmlinuz-linux-mainline?
Alrighty, I've booted to the mainline kernel successfully! Unfortunately, WLRoots renderer is still unable to find my GPU, even though it is there... Even SDDM managed to pick it up!
Just in case, I'm referring to the https://pkgbuild.com/~gromit/linux-56/linux-mainline-v6.8.r8073.g480e035-1-x86_64.pkg.tar.zst kernel.
So... This one commit doesn't work... I suppose it is expected?
Alrighty, this one is working as intended! 6.8.0-1-mainline-05202-g9187210eee7d #1 SMP PREEMPT_DYNAMIC. Everything works flawlessly; WLRoots sees GPU, SDDM launches!
$ 9040d0297a476a4cea468663741177a79c19626bgit bisect good9040d0297a476a4cea468663741177a79c19626b is the first bad commitcommit 9040d0297a476a4cea468663741177a79c19626bAuthor: Thomas Zimmermann <tzimmermann@suse.de>Date: Mon Feb 12 10:06:12 2024 +0100 fbdev/efifb: Remove PM for parent device The EFI device has the correct parent device set. This allows Linux to handle the power management internally. Hence, remove the manual PM management for the parent device from efifb. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240212090736.11464-5-tzimmermann@suse.de
Click to see bisect log
$ it bisect log git bisect start# status: waiting for both good and bad commits# bad: [a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6] Linux 6.9git bisect bad a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6# good: [e8f897f4afef0031fe618a8e94127a0934896aba] Linux 6.8git bisect good e8f897f4afef0031fe618a8e94127a0934896aba# bad: [480e035fc4c714fb5536e64ab9db04fedc89e910] Merge tag 'drm-next-2024-03-13' of https://gitlab.freedesktop.org/drm/kernelgit bisect bad 480e035fc4c714fb5536e64ab9db04fedc89e910# good: [9187210eee7d87eea37b45ea93454a88681894a4] Merge tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextgit bisect good 9187210eee7d87eea37b45ea93454a88681894a4# bad: [119b225f01e4d3ce974cd3b4d982c76a380c796d] Merge tag 'amd-drm-next-6.9-2024-03-08-1' of https://gitlab.freedesktop.org/agd5f/linux into drm-nextgit bisect bad 119b225f01e4d3ce974cd3b4d982c76a380c796d# bad: [9ac4beb7578a88baa4f7e6a59eeb5be79d7b011a] Merge tag 'drm-misc-next-2024-02-15' of git://anongit.freedesktop.org/drm/drm-misc into drm-nextgit bisect bad 9ac4beb7578a88baa4f7e6a59eeb5be79d7b011a# good: [d5597444032b2f5c8624918fb5b29be5bba78a3c] drm/amdgpu: Fix HDP flush for VFs on nbio v7.9git bisect good d5597444032b2f5c8624918fb5b29be5bba78a3c# good: [0d966d59d1e58df8555a3e6760a6eb3956b3d0ef] drm: bridge: simple-bridge: use drm_bridge_edid_read()git bisect good 0d966d59d1e58df8555a3e6760a6eb3956b3d0ef# good: [f1ee98cff3d86271491b08315fcdfa4c3f097e1e] drm/i915/fbdev: Fix smem_start for LMEMBAR stolen objectsgit bisect good f1ee98cff3d86271491b08315fcdfa4c3f097e1e# good: [247f2ee4498cfcaf18b3c3486dffd2302d56fc17] drm/xe: Fix a missing argument to drm_err_printergit bisect good 247f2ee4498cfcaf18b3c3486dffd2302d56fc17# skip: [eb927f01dfb6309c8a184593c2c0618c4000c481] drm/i915/gt: Restart the heartbeat timer when forcing a pulsegit bisect skip eb927f01dfb6309c8a184593c2c0618c4000c481# good: [29f3067a236ac55f245ea8f23712a0d240cf1f30] drm/i915/alpm: Calculate ALPM Entry checkgit bisect good 29f3067a236ac55f245ea8f23712a0d240cf1f30# good: [75fa9b7e375e35739663cde0252d31e586c6314a] video: Add helpers for decoding screen_infogit bisect good 75fa9b7e375e35739663cde0252d31e586c6314a# skip: [d2435a8e3d683adb9143b9ad3c416ac3a4ca9688] drm/i915: Add flex arrays to struct i915_syncmapgit bisect skip d2435a8e3d683adb9143b9ad3c416ac3a4ca9688# skip: [a797099562267ebb281acd59750f1a8dbba36eef] drm/i915/huc: Allow for very slow HuC loadinggit bisect skip a797099562267ebb281acd59750f1a8dbba36eef# skip: [86ceaaaec59707b06216a15b3852867fa2f1574e] drm/i915/gem: Atomically invalidate userptr on mmu-notifiergit bisect skip 86ceaaaec59707b06216a15b3852867fa2f1574e# good: [6f167a3673463c2b1733ff04fada65346bbc772b] Merge tag 'drm-intel-gt-next-2024-02-15' of git://anongit.freedesktop.org/drm/drm-intel into drm-nextgit bisect good 6f167a3673463c2b1733ff04fada65346bbc772b# bad: [784e27f2811884ab78edc713a4ef0d4deca9b668] fbdev/efifb: Do not track parent device statusgit bisect bad 784e27f2811884ab78edc713a4ef0d4deca9b668# bad: [9040d0297a476a4cea468663741177a79c19626b] fbdev/efifb: Remove PM for parent devicegit bisect bad 9040d0297a476a4cea468663741177a79c19626b# good: [036105e3a776b6fc2fe0d262896a23ff2cc2e6b1] video: Provide screen_info_get_pci_dev() to find screen_info's PCI devicegit bisect good 036105e3a776b6fc2fe0d262896a23ff2cc2e6b1# good: [9eac534db0013aff9b9124985dab114600df9081] firmware/sysfb: Set firmware-framebuffer parent devicegit bisect good 9eac534db0013aff9b9124985dab114600df9081# first bad commit: [9040d0297a476a4cea468663741177a79c19626b] fbdev/efifb: Remove PM for parent device
The kernel developers do not class causing breakage to the nvidia out of tree modules as a regression. So if enabling nvidia mode setting resolves the issue that is probably the easiest fix.
Well, I assume this is the primary cause of the issue... I can't thank you enough for helping with this stuff, I don't think i would be able to figure this stuff on my own...
Just in case, the settings i pass to the kernel are quiet loglevel=3 nvidia_drm.modeset=1 ibt=off vt.global_cursor_default=0 splash... But the issue still exists even without nvidia_drm.modeset=1 and/or ibt=off
The modprobe.d/nvidia.conf:
optionsnvidia-drmmodeset=1
Here's the journalctl from linux-mainline-v6.8.rc3.r271.g9eac534-1: journalctl.log
@loqs Huh... This worked! 6.8.0-rc3-1-mainline-00271-g9eac534db001 launched without issues (except that TTYs are flashing to an odd grey color on switch, for some reason) and the GPU is seen by all graphical applications.
I'll test on 6.9.arch1-1 and report the results though.
I had tested the nvidia_drm.fbdev=1 setting already, but this time I also had ibt=off passed with it, maybe that could be it? I'll update when I've tested this.
UPDATE: the release 6.9.arch1-1 doesn't work, with the same settings as on linux-mainline-v6.8.rc3.r271.g9eac534-1... The dmesg and journalctl: dmesgjournalctl.log
If the nvidia_drm options are being passed later in a config file so their absence is expected my only suggestion would be to work backwards through the bad bisection points to try and find where nvidia_drm.modeset=1 nvidia_drm.fbdev=1 stops fixing the issue.
@loqs To be entirely honest, I wouldn't call this a fix... I mean, yes, it does mitigate the issue I had for the bisected commit; however being forced to use experimental flags like nvidia_drm.fbdev=1 doesn't really sound like a good idea.
The kernel has changed something, this issue started to appear; the bisection found the first commit that has this issue. What's the point of doing another bisection, just to find when this issue stopped being mitigated by the experimental flag?
Why should it matter when/whether an additional experimental flag "fixes" it?
@loqs To be entirely honest, I wouldn't call this a fix... I mean, yes, it does mitigate the issue I had for the bisected commit; however being forced to use experimental flags like nvidia_drm.fbdev=1 doesn't really sound like a good idea.
Can you reproduce the issue with an untainted kernel?
Can you reproduce the issue with an untainted kernel?
With an untainted kernel as in "Without NVIDIA graphics drivers"? If so - then no, since the issue has to do with some DEs that need proper graphics drivers to even function, which means that testing them with an untainted kernel (i.e. without NVIDIA graphics drivers) is essentially impossible, unless You're using a software renderer (which also wouldn't really help since the issue is with GPU firmware handling, or something like that)
UPDATE! Just updated to 6.9.5 and it works! No clue what happened or what changed... But it just works!
I'll test 6.9.4 too, just in case.
EDIT: Just tested 6.9.4, it works too...
Now, I realize I wasted a LOT of time of other people, I just assumed that whenever this issue would be closed is whenever the problem will be fixed; which, in hindsight, is a weird assumption to make, I admit.
Since this is technically a different issue, I just want to note that the original issue that I am experiencing is still present in 6.9.5. There's an issue created on the DRM page at https://gitlab.freedesktop.org/drm/amd/-/issues/3405Issue is unrelated.
I assumed that was the issue when I started this. That issue was created after I started this one and the journal output and description of the issue is almost identical. However, the fixed commit that I landed on has already been implemented in 6.9.5 for sure, but I still have the issue on that kernel.
I saved every kernel I built this time though so maybe I can go back through the process and double check things.
The linked journal report is not available. Can you please post the full system journal or at least all kernel messages from a boot with the issue. I am interested in how many simple drm devices are created which seems to be the cause of https://bbs.archlinux.org/viewtopic.php?id=295923.
I know you mentioned that you're unable to check the journal output because of the crash, but can you load into a session with a kernel that works and run journalctl -rb -n where n is the number of sessions since you booted the failing kernel? That will give you the show the final moments of your crash before it happened and we can hopefully compare. 6.10 has been stable for me and ultimately, it was never an issue with the arch kernel like I initially suspected.
@u-2at As I said in #53 (comment 191977), it crashes too early, so in journalctl I only see the boots on the LTS kernel since I was never able to boot on vanilla and zen kernels.
I don't know if there is an issue upstream for this one. The one that I thought covered this issue turned out to be something different as it's still ongoing, but unless you use a docking station, it's likely not your issue.