Suspend/resume: null pointer when suspending / hibernating
Description:
My machine occasionally hangs on suspend/hibernate using the 6.7-arch3 kernel. Interestingly I witnessed similar messages before (6.6 and mainline 6.7), but no hangs. Although I can switch to different VTs, I can no longer access the machine and need to hard reset it.
Additional info:
neofetch output:
OS: Arch Linux x86_64
Host: ROG Flow X16 GV601VI_GV601VI 1.0
Kernel: 6.7.0-arch3-1
Uptime: 3 hours, 53 mins
Packages: 2741 (pacman), 10 (flatpak)
Shell: bash 5.2.21
Resolution: 2560x1600
DE: Hyprland
WM: sway
Theme: Breeze [GTK2/3]
Icons: Papirus [GTK2/3]
Terminal: emacs
CPU: 13th Gen Intel i9-13900H (20) @ 1.924
GPU: Intel Raptor Lake-P [Iris Xe Graphics
GPU: NVIDIA GeForce RTX 4070 Max-Q / Mobil
Memory: 9202MiB / 31722MiB
Steps to reproduce:
- Suspend or hibernate machine
Sample from journalctl:
Jan 16 12:38:36 xox kernel: BUG: kernel NULL pointer dereference, address: 0000000000000010
Jan 16 12:38:36 xox kernel: #PF: supervisor read access in kernel mode
Jan 16 12:38:36 xox kernel: #PF: error_code(0x0000) - not-present page
Jan 16 12:38:36 xox kernel: PGD 0 P4D 0
Jan 16 12:38:36 xox kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Jan 16 12:38:36 xox kernel: CPU: 3 PID: 21942 Comm: kworker/u40:23 Tainted: P OE 6.7.0-arch3-1 #1 29ada86f174bb9983ea57568622d66509982ed7e
Jan 16 12:38:36 xox kernel: Hardware name: ASUSTeK COMPUTER INC. ROG Flow X16 GV601VI_GV601VI/GV601VI, BIOS GV601VI.313 11/14/2023
Jan 16 12:38:36 xox kernel: Workqueue: kacpi_hotplug acpi_hotplug_work_fn
Jan 16 12:38:36 xox kernel:
Jan 16 12:38:36 xox kernel: RIP: 0010:sdhci_pci_remove+0x12/0x60 [sdhci_pci]
Jan 16 12:38:36 xox kernel: Code: 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 55 53 48 8b af 38 01 00 00 <80> 7d 10 00 75 25 8b 45 20 85 c0 7e 17 31 db 48 63 c3 83 c3 01 48
Jan 16 12:38:36 xox kernel: RSP: 0000:ffffbc7e8813fcd8 EFLAGS: 00010286
Jan 16 12:38:36 xox kernel: RAX: ffffffffc5075d10 RBX: ffff9fce8334a0c0 RCX: ffffffffac43d381
Jan 16 12:38:36 xox kernel: RDX: ffffffffab74ac48 RSI: 0000000000000202 RDI: ffff9fce8334a000
Jan 16 12:38:36 xox kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 000000008020001f
Jan 16 12:38:36 xox kernel: R10: 0000000000000000 R11: ffffbc7e8813fc90 R12: ffffffffc5080440
Jan 16 12:38:36 xox kernel: R13: ffff9fce8334a140 R14: 0000000000000080 R15: ffff9fce83238b60
Jan 16 12:38:36 xox kernel: FS: 0000000000000000(0000) GS:ffff9fd5eb2c0000(0000) knlGS:0000000000000000
Jan 16 12:38:36 xox kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 16 12:38:36 xox kernel: CR2: 0000000000000010 CR3: 0000000122820000 CR4: 0000000000f50ef0
Jan 16 12:38:36 xox kernel: PKRU: 55555554
Jan 16 12:38:36 xox kernel: Call Trace:
Jan 16 12:38:36 xox kernel: <TASK>
Jan 16 12:38:36 xox kernel: ? __die+0x23/0x70
Jan 16 12:38:36 xox kernel: ? page_fault_oops+0x171/0x4e0
Jan 16 12:38:36 xox kernel: ? exc_page_fault+0x7f/0x180
Jan 16 12:38:36 xox kernel: ? asm_exc_page_fault+0x26/0x30
Jan 16 12:38:36 xox kernel: ? __pfx_sdhci_pci_remove+0x10/0x10 [sdhci_pci 905214c66dc0866b11be2c648815e71e2d3bf5dc]
Jan 16 12:38:36 xox kernel: ? rpm_resume+0x2e8/0x7b0
Jan 16 12:38:36 xox kernel: ? sdhci_pci_remove+0x12/0x60 [sdhci_pci 905214c66dc0866b11be2c648815e71e2d3bf5dc]
Jan 16 12:38:36 xox kernel: pci_device_remove+0x37/0xa0
Jan 16 12:38:36 xox kernel: device_release_driver_internal+0x19f/0x200
Jan 16 12:38:36 xox kernel: pci_stop_bus_device+0x6c/0x90
Jan 16 12:38:36 xox kernel: pci_stop_and_remove_bus_device+0x12/0x20
Jan 16 12:38:36 xox kernel: trim_stale_devices+0x13d/0x1a0
Jan 16 12:38:36 xox kernel: acpiphp_check_bridge.part.0+0xfd/0x150
Jan 16 12:38:36 xox kernel: acpiphp_hotplug_notify+0xc6/0x270
Jan 16 12:38:36 xox kernel: ? __pfx_acpiphp_hotplug_notify+0x10/0x10
Jan 16 12:38:36 xox kernel: acpi_device_hotplug+0xc5/0x570
Jan 16 12:38:36 xox kernel: acpi_hotplug_work_fn+0x1e/0x30
Jan 16 12:38:36 xox kernel: process_one_work+0x171/0x340
Jan 16 12:38:36 xox kernel: worker_thread+0x27b/0x3a0
Jan 16 12:38:36 xox kernel: ? __pfx_worker_thread+0x10/0x10
Jan 16 12:38:36 xox kernel: kthread+0xe5/0x120
Jan 16 12:38:36 xox kernel: ? __pfx_kthread+0x10/0x10
Jan 16 12:38:36 xox kernel: ret_from_fork+0x31/0x50
Jan 16 12:38:36 xox kernel: ? __pfx_kthread+0x10/0x10
Jan 16 12:38:36 xox kernel: ret_from_fork_asm+0x1b/0x30
Jan 16 12:38:36 xox kernel: </TASK>
Jan 16 12:38:36 xox kernel: Modules linked in: ccm rfcomm snd_seq_dummy snd_hrtimer cmac algif_hash algif_skcipher af_alg snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_generic_allocation soundwire_bus snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine snd_hda_codec_hdmi bnep hid_sensor_accel_3d hid_sensor_trigger industrialio_triggered_buffer kfifo_buf hid_sensor_iio_common industrialio hid_sensor_custom hid_sensor_hub intel_uncore_frequency intel_ishtp_hid intel_uncore_frequency_common intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp kvm_intel snd_hda_codec_realtek snd_hda_codec_generic kvm snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi uvcvideo irqbypass snd_hda_codec btusb snd_hda_scodec_cs35l41_spi videobuf2_vmalloc iwlmvm snd_hda_scodec_cs35l41_i2c uvc btrtl videobuf2_memops processor_thermal_device_pci
Jan 16 12:38:36 xox kernel: snd_hda_scodec_cs35l41 snd_hda_core btintel processor_thermal_device videobuf2_v4l2 rapl snd_seq processor_thermal_wt_hint snd_hda_cs_dsp_ctls snd_hwdep btbcm snd_seq_device coretemp mac80211 hid_multitouch 8250_dw spi_pxa2xx_platform snd_pcm videodev cs_dsp processor_thermal_rfim btmtk sdhci_pci dw_dmac iTCO_wdt intel_cstate libarc4 iwlwifi cqhci intel_pmc_bxt snd_timer processor_thermal_rapl pmt_telemetry nvidia_drm(POE) intel_rapl_msr mei_hdcp snd_soc_cs35l41_lib mei_pxp ucsi_acpi videobuf2_common spi_nor intel_rapl_common iTCO_vendor_support sdhci pmt_class intel_uncore intel_lpss_pci pcspkr typec_ucsi mei_me processor_thermal_wt_req snd bluetooth cfg80211 wmi_bmof mc intel_ish_ipc nvidia_modeset(POE) i2c_i801 mtd nvidia_wmi_ec_backlight intel_lpss mmc_core typec i2c_hid_acpi processor_thermal_power_floor thunderbolt mei soundcore ecdh_generic idma64 i2c_smbus intel_ishtp intel_vsec processor_thermal_mbox roles serial_multi_instantiate mousedev joydev i2c_hid int3403_thermal int3400_thermal
Jan 16 12:38:36 xox kernel: int340x_thermal_zone acpi_pad acpi_thermal_rel acpi_tad soc_button_array intel_hid mac_hid usbip_host usbip_core nvidia_uvm(POE) nvidia(POE) sg crypto_user acpi_call(OE) loop fuse nfnetlink ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 dm_crypt cbc encrypted_keys trusted asn1_encoder tee dm_mod i915 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 i2c_algo_bit aesni_intel drm_buddy nvme ttm crypto_simd intel_gtt nvme_core cryptd spi_intel_pci drm_display_helper xhci_pci nvme_auth spi_intel xhci_pci_renesas cec hid_asus asus_nb_wmi serio_raw atkbd asus_wmi libps2 ledtrig_audio vivaldi_fmap sparse_keymap i8042 platform_profile usbhid rfkill serio video wmi usb_storage
Jan 16 12:38:36 xox kernel: CR2: 0000000000000010
Jan 16 12:38:36 xox kernel: ---[ end trace 0000000000000000 ]---
Jan 16 12:38:36 xox kernel: RIP: 0010:sdhci_pci_remove+0x12/0x60 [sdhci_pci]
Jan 16 12:38:36 xox kernel: Code: 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 55 53 48 8b af 38 01 00 00 <80> 7d 10 00 75 25 8b 45 20 85 c0 7e 17 31 db 48 63 c3 83 c3 01 48
Jan 16 12:38:36 xox kernel: RSP: 0000:ffffbc7e8813fcd8 EFLAGS: 00010286
Jan 16 12:38:36 xox kernel: RAX: ffffffffc5075d10 RBX: ffff9fce8334a0c0 RCX: ffffffffac43d381
Jan 16 12:38:36 xox kernel: RDX: ffffffffab74ac48 RSI: 0000000000000202 RDI: ffff9fce8334a000
Jan 16 12:38:36 xox kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 000000008020001f
Jan 16 12:38:36 xox kernel: R10: 0000000000000000 R11: ffffbc7e8813fc90 R12: ffffffffc5080440
Jan 16 12:38:36 xox kernel: R13: ffff9fce8334a140 R14: 0000000000000080 R15: ffff9fce83238b60
Jan 16 12:38:36 xox kernel: FS: 0000000000000000(0000) GS:ffff9fd5eb2c0000(0000) knlGS:0000000000000000
Jan 16 12:38:36 xox kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 16 12:38:36 xox kernel: CR2: 0000000000000010 CR3: 0000000122820000 CR4: 0000000000f50ef0
Jan 16 12:38:36 xox kernel: PKRU: 55555554
Jan 16 12:38:36 xox kernel: note: kworker/u40:23[21942] exited with irqs disabled
Jan 16 12:38:36 xox kernel: OOM killer enabled.
Jan 16 12:38:36 xox kernel: Restarting tasks ... done.
Jan 16 12:38:36 xox kernel: PM: hibernation: hibernation exit
Probably related?
There are messages like these in the journal:
Jan 16 18:17:04 xox kernel: pci 0000:32:00.0: not ready 1023ms after resume; waiting
Jan 16 18:17:05 xox kernel: pci 0000:32:00.0: not ready 2047ms after resume; waiting
Jan 16 18:17:07 xox kernel: pci 0000:32:00.0: not ready 4095ms after resume; waiting
Jan 16 18:17:11 xox kernel: pci 0000:32:00.0: not ready 8191ms after resume; waiting
...
Jan 16 18:18:11 xox kernel: pci 0000:32:00.0: not ready 65535ms after resume; giving up
Jan 16 18:18:11 xox kernel: sdhci-pci 0000:32:00.0: Unable to change power state from D3cold to D0, device inaccessible
Jan 16 18:18:11 xox kernel: sdhci-pci 0000:32:00.0: SDHCI controller found [17a0:9755] (rev 0)
Jan 16 18:18:11 xox kernel: sdhci-pci 0000:32:00.0: Driver probe function unexpectedly returned 134
lspci lists 32:00 as:
32:00.0 SD Host controller: Genesys Logic, Inc GL9755 SD Host Controller
What now?
I hope to get some pointers on how to better triage this problem. I did not find any similar bug report upstream nor here nor anywhere else.
Edited by Sascha Lüdecke