linux 6.7.arch1-1 crashes on boot with a null pointer deref (maybe in ath11k)
Description:
linux 6.7.arch1-1 fails to boot on my Thinkpad P14s, crashing with a null pointer deref
Successful last boot
journalctl -o short-precise -k
shows nothing abnormal with 6.6.10 (said differently, the crash I'll be exhibiting below doesn't happen in 6.6.10, from which I'm posting this issue).
Unsuccessful previous boot
journalctl -o short-precise -k -b -1
shows with 6.7:
Jan 10 05:52:54.068052 p kernel: r8169 0000:01:00.0 enp1s0f0: Link is Down
Jan 10 05:52:54.144711 p kernel: BUG: kernel NULL pointer dereference, address: 00000000000000a0
Jan 10 05:52:54.144768 p kernel: #PF: supervisor write access in kernel mode
Jan 10 05:52:54.144929 p kernel: #PF: error_code(0x0002) - not-present page
Jan 10 05:52:54.144962 p kernel: PGD 0 P4D 0
Jan 10 05:52:54.144978 p kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
Jan 10 05:52:54.145106 p kernel: CPU: 2 PID: 1684 Comm: NetworkManager Not tainted 6.7.0-arch1-1 #1 b188f6053bf453d8a64ad4bed6c0c629aca48c4a
Jan 10 05:52:54.145113 p kernel: Hardware name: LENOVO 21K5001HUS/21K5001HUS, BIOS R2FET36W (1.16 ) 10/24/2023
Jan 10 05:52:54.145118 p kernel: RIP: 0010:down_write+0x20/0x60
Jan 10 05:52:54.145123 p kernel: Code: 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 48 89 fb 2e 2e 2e 31 c0 65 ff 05 ff 5b a6 44 31 c0 ba 01 00 00 00 <f0> 48 0f b1 13 75 27 65 48 8b 04 25 c0 39 03 00 48 89 43 08 65 ff
Jan 10 05:52:54.145132 p kernel: RSP: 0018:ffffaa8947b7b558 EFLAGS: 00010246
Jan 10 05:52:54.145141 p kernel: RAX: 0000000000000000 RBX: 00000000000000a0 RCX: ffffff8100000000
Jan 10 05:52:54.145147 p kernel: RDX: 0000000000000001 RSI: 0000000000000064 RDI: 00000000000000a0
Jan 10 05:52:54.145153 p kernel: RBP: ffffaa8947b7b5b8 R08: 0000000000000000 R09: 0000000000000000
Jan 10 05:52:54.145161 p kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff9286ccb20000
Jan 10 05:52:54.145169 p kernel: R13: 0000000000000000 R14: ffff92869e81e068 R15: 0000000000000001
Jan 10 05:52:54.145177 p kernel: FS: 00007f135a317200(0000) GS:ffff928d01e80000(0000) knlGS:0000000000000000
Jan 10 05:52:54.145302 p kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 10 05:52:54.145314 p kernel: CR2: 00000000000000a0 CR3: 000000011e8e6000 CR4: 0000000000f50ef0
Jan 10 05:52:54.145322 p kernel: PKRU: 55555554
Jan 10 05:52:54.145332 p kernel: Call Trace:
Jan 10 05:52:54.145338 p kernel: <TASK>
Jan 10 05:52:54.145348 p kernel: ? __die+0x23/0x70
Jan 10 05:52:54.145354 p kernel: ? page_fault_oops+0x171/0x4e0
Jan 10 05:52:54.145365 p kernel: ? finish_task_switch.isra.0+0x94/0x2f0
Jan 10 05:52:54.145370 p kernel: ? exc_page_fault+0x7f/0x180
Jan 10 05:52:54.145377 p kernel: ? asm_exc_page_fault+0x26/0x30
Jan 10 05:52:54.145383 p kernel: ? down_write+0x20/0x60
Jan 10 05:52:54.145503 p kernel: simple_recursive_removal+0xef/0x280
Jan 10 05:52:54.145513 p kernel: ? __pfx_remove_one+0x10/0x10
Jan 10 05:52:54.145520 p kernel: ? idr_for_each+0xb1/0xf0
Jan 10 05:52:54.145525 p kernel: debugfs_remove+0x44/0x70
Jan 10 05:52:54.145530 p kernel: ath11k_debugfs_remove_interface+0x1e/0x30 [ath11k d09803d0e916e45419c9137cf68e1808fa256383]
Jan 10 05:52:54.145539 p kernel: ath11k_mac_op_remove_interface+0x18a/0x2b0 [ath11k d09803d0e916e45419c9137cf68e1808fa256383]
Jan 10 05:52:54.145544 p kernel: drv_remove_interface+0x70/0x160 [mac80211 3a3a97ed759296675796493f60dbe2dd7261333a]
Jan 10 05:52:54.145552 p kernel: ieee80211_do_stop+0x537/0x800 [mac80211 3a3a97ed759296675796493f60dbe2dd7261333a]
Jan 10 05:52:54.145559 p kernel: ieee80211_stop+0x58/0x180 [mac80211 3a3a97ed759296675796493f60dbe2dd7261333a]
Jan 10 05:52:54.145566 p kernel: __dev_close_many+0x9b/0x110
Jan 10 05:52:54.145578 p kernel: __dev_change_flags+0x1a6/0x240
Jan 10 05:52:54.145583 p kernel: dev_change_flags+0x26/0x70
Jan 10 05:52:54.145713 p kernel: do_setlink+0x39c/0x12d0
Jan 10 05:52:54.145724 p kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Jan 10 05:52:54.145729 p kernel: ? __mod_lruvec_page_state+0x105/0x130
Jan 10 05:52:54.145734 p kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Jan 10 05:52:54.145739 p kernel: ? mod_lruvec_page_state.constprop.0+0x1c/0x30
Jan 10 05:52:54.145749 p kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Jan 10 05:52:54.145754 p kernel: ? __kmalloc_large_node+0xa5/0x130
Jan 10 05:52:54.145761 p kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Jan 10 05:52:54.145766 p kernel: ? __nla_validate_parse+0x61/0xcf0
Jan 10 05:52:54.145771 p kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Jan 10 05:52:54.145776 p kernel: ? __kmalloc_node_track_caller+0xc4/0x150
Jan 10 05:52:54.145782 p kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Jan 10 05:52:54.145786 p kernel: ? security_sock_rcv_skb+0x35/0x50
Jan 10 05:52:54.145909 p kernel: __rtnl_newlink+0x651/0xa10
Jan 10 05:52:54.145918 p kernel: ? __kmem_cache_alloc_node+0x1a0/0x2e0
Jan 10 05:52:54.145925 p kernel: ? rtnl_newlink+0x2e/0x70
Jan 10 05:52:54.145931 p kernel: rtnl_newlink+0x47/0x70
Jan 10 05:52:54.145936 p kernel: rtnetlink_rcv_msg+0x14f/0x3c0
Jan 10 05:52:54.145942 p kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Jan 10 05:52:54.145948 p kernel: ? _copy_to_iter+0x8b/0x630
Jan 10 05:52:54.145953 p kernel: ? __pfx_genl_done+0x10/0x10
Jan 10 05:52:54.145961 p kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Jan 10 05:52:54.145965 p kernel: ? __pfx_rtnetlink_rcv_msg+0x10/0x10
Jan 10 05:52:54.145970 p kernel: netlink_rcv_skb+0x58/0x110
Jan 10 05:52:54.145980 p kernel: netlink_unicast+0x1a3/0x290
Jan 10 05:52:54.145985 p kernel: netlink_sendmsg+0x254/0x4d0
Jan 10 05:52:54.146107 p kernel: ____sys_sendmsg+0x396/0x3d0
Jan 10 05:52:54.146114 p kernel: ? copy_msghdr_from_user+0x7d/0xc0
Jan 10 05:52:54.146122 p kernel: ___sys_sendmsg+0x9a/0xe0
Jan 10 05:52:54.146127 p kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Jan 10 05:52:54.146132 p kernel: __sys_sendmsg+0x7a/0xd0
Jan 10 05:52:54.146140 p kernel: do_syscall_64+0x61/0xe0
Jan 10 05:52:54.146146 p kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Jan 10 05:52:54.146151 p kernel: ? syscall_exit_to_user_mode+0x2b/0x40
Jan 10 05:52:54.146159 p kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Jan 10 05:52:54.146163 p kernel: ? do_syscall_64+0x70/0xe0
Jan 10 05:52:54.146169 p kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Jan 10 05:52:54.146173 p kernel: ? do_user_addr_fault+0x304/0x670
Jan 10 05:52:54.146178 p kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Jan 10 05:52:54.146183 p kernel: ? exc_page_fault+0x7f/0x180
Jan 10 05:52:54.146190 p kernel: entry_SYSCALL_64_after_hwframe+0x6e/0x76
Jan 10 05:52:54.146313 p kernel: RIP: 0033:0x7f135b28db3d
Jan 10 05:52:54.146321 p kernel: Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 4a 62 f7 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 9e 62 f7 ff 48
Jan 10 05:52:54.146329 p kernel: RSP: 002b:00007ffd8ba0ddf0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
Jan 10 05:52:54.146334 p kernel: RAX: ffffffffffffffda RBX: 0000000000000010 RCX: 00007f135b28db3d
Jan 10 05:52:54.146340 p kernel: RDX: 0000000000000000 RSI: 00007ffd8ba0de30 RDI: 000000000000000d
Jan 10 05:52:54.146345 p kernel: RBP: 00007ffd8ba0e200 R08: 0000000000000000 R09: 0000000000000000
Jan 10 05:52:54.146353 p kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 00005611fb00e3c0
Jan 10 05:52:54.146360 p kernel: R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
Jan 10 05:52:54.146365 p kernel: </TASK>
Jan 10 05:52:54.146370 p kernel: Modules linked in: qrtr_mhi intel_rapl_msr intel_rapl_common amdgpu(+) snd_soc_dmic snd_ps_pdm_dma snd_soc_ps_mach snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp qrtr snd_ctl_led snd_sof snd_hda_codec_realtek snd_sof_utils ath11k_pci snd_hda_codec_generic edac_mce_amd snd_soc_core snd_hda_codec_hdmi ath11k amdxcp snd_compress drm_exec ac97_bus kvm_amd gpu_sched uvcvideo qmi_helpers snd_pcm_dmaengine snd_hda_intel drm_buddy btusb i2c_algo_bit snd_intel_dspcfg snd_pci_ps videobuf2_vmalloc btrtl mac80211 btintel drm_suballoc_helper snd_rpl_pci_acp6x uvc snd_intel_sdw_acpi kvm snd_usb_audio snd_acp_pci videobuf2_memops drm_ttm_helper btbcm hid_multitouch snd_hda_codec libarc4 videobuf2_v4l2 ttm snd_acp_legacy_common btmtk snd_usbmidi_lib think_lmi(+) irqbypass snd_ump wmi_bmof firmware_attributes_class thinkpad_acpi snd_pci_acp6x drm_display_helper snd_hda_core bluetooth snd_rawmidi r8169 snd_pci_acp5x videodev cfg80211
Jan 10 05:52:54.155883 p kernel: ledtrig_audio snd_hwdep snd_seq_device snd_rn_pci_acp3x cec vfat platform_profile realtek snd_acp_config ucsi_acpi videobuf2_common snd_soc_acpi sp5100_tco snd_pcm typec_ucsi video ecdh_generic mdio_devres rapl pcspkr fat psmouse typec thunderbolt mc k10temp snd_timer rfkill i2c_piix4 mousedev snd joydev libphy mhi snd_pci_acp3x soundcore roles i2c_hid_acpi wmi i2c_hid amd_pmc mac_hid pkcs8_key_parser i2c_dev crypto_user loop fuse nfnetlink ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid dm_crypt cbc encrypted_keys trusted asn1_encoder tee dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel sha512_ssse3 serio_raw sha256_ssse3 atkbd sha1_ssse3 libps2 nvme aesni_intel vivaldi_fmap nvme_core crypto_simd xhci_pci cryptd i8042 ccp xhci_pci_renesas nvme_auth serio
Jan 10 05:52:54.155907 p kernel: CR2: 00000000000000a0
Jan 10 05:52:54.156028 p kernel: ---[ end trace 0000000000000000 ]---
Jan 10 05:52:54.156034 p kernel: RIP: 0010:down_write+0x20/0x60
Jan 10 05:52:54.156040 p kernel: Code: 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 48 89 fb 2e 2e 2e 31 c0 65 ff 05 ff 5b a6 44 31 c0 ba 01 00 00 00 <f0> 48 0f b1 13 75 27 65 48 8b 04 25 c0 39 03 00 48 89 43 08 65 ff
Jan 10 05:52:54.156046 p kernel: RSP: 0018:ffffaa8947b7b558 EFLAGS: 00010246
Jan 10 05:52:54.156059 p kernel: RAX: 0000000000000000 RBX: 00000000000000a0 RCX: ffffff8100000000
Jan 10 05:52:54.156065 p kernel: RDX: 0000000000000001 RSI: 0000000000000064 RDI: 00000000000000a0
Jan 10 05:52:54.156070 p kernel: RBP: ffffaa8947b7b5b8 R08: 0000000000000000 R09: 0000000000000000
Jan 10 05:52:54.156075 p kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff9286ccb20000
Jan 10 05:52:54.156080 p kernel: R13: 0000000000000000 R14: ffff92869e81e068 R15: 0000000000000001
Jan 10 05:52:54.156085 p kernel: FS: 00007f135a317200(0000) GS:ffff928d01e80000(0000) knlGS:0000000000000000
Jan 10 05:52:54.156090 p kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 10 05:52:54.156096 p kernel: CR2: 00000000000000a0 CR3: 000000011e8e6000 CR4: 0000000000f50ef0
Jan 10 05:52:54.156101 p kernel: PKRU: 55555554
Jan 10 05:52:54.156107 p kernel: note: NetworkManager[1684] exited with irqs disabled
Jan 10 05:52:54.156229 p kernel: note: NetworkManager[1684] exited with preempt_count 1
Additional info:
- Feel free to ask for the full 6.6.10 / 6.7 logs. They're big, so I didn't attach them.
- Thinkpad P14s with up-to-date BIOS
- Arch Linux fully up-to-date as of Jan 10 with core,extra,multilib
- Problem reproducible with both -vanilla, or -zen
- Testing linux-6.7 with a partial upgrade of the single kernel .zst. I know partial upgrades are unsupported, but
- I know the kernel is generally an island that is safe to test partial-upgraded
- I've looked at the current contents of Core-Testing and am confident there's nothing here required by linux-6.7: [ unrelated iana-etc upgrade, unrelated libseccomp upgrade ]
- Do shout at me if I'm clueless and wrong
- Link to upstream bug report: none yet, I don't know if the issue is Arch or upstream
Steps to reproduce:
- Partial upgrade to linux 6.7
- Boot