Possible kernel SOFT LOCKUP in Rust binder

Description:

Possible kernel SOFT LOCKUP in Rust binder

Additional info:

  • package version(s): 6.18.0.arch1+
  • config and/or log files:
12 19 23:21:37 home.server kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [kswapd0:117]
12 19 23:21:37 home.server kernel: CPU#0 Utilization every 4000ms during lockup:
12 19 23:21:37 home.server kernel:         #1: 100% system,          0% softirq,          0% hardirq,          0% idle
12 19 23:21:37 home.server kernel:         #2: 100% system,          0% softirq,          0% hardirq,          0% idle
12 19 23:21:37 home.server kernel:         #3: 100% system,          0% softirq,          1% hardirq,          0% idle
12 19 23:21:37 home.server kernel:         #4: 100% system,          0% softirq,          0% hardirq,          0% idle
12 19 23:21:37 home.server kernel:         #5: 100% system,          0% softirq,          0% hardirq,          0% idle
12 19 23:21:37 home.server kernel: Modules linked in: mousedev snd_usb_audio snd_usbmidi_lib snd_ump snd_rawmidi mc sch_ingress tcp_diag af_key udp_diag inet_diag xfrm_user xfrm_algo xfrm_interface xfrm6_tunnel tunnel4 tunnel6 xt_policy xt_bpf xt_mark xt_state xt_conntrack xt_u32 ipt_REJECT nf_reject_ipv4 xt_NFLOG nfnetlink_log xt_connmark xt_TCPMSS xt_tcpudp iptable_nat iptable_mangle iptable_raw vsock_loopback vmw_vsock_virtio_transport_common iptable_filter vmw_vsock_vmci_transport vsock vmw_vmci veth overlay nft_masq nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device bridge stp llc cmac algif_hash algif_skcipher af_alg bnep nct6775 nct6775_core hwmon_vid nf_tables vfat fat snd_hda_codec_intelhdmi snd_hda_codec_alc662 snd_hda_codec_realtek_lib snd_hda_codec_generic snd_hda_intel snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel snd_sof_intel_hda_sdw_bpt snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda_mlink
12 19 23:21:37 home.server kernel:  snd_sof_intel_hda snd_hda_codec_hdmi soundwire_cadence snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils xe(OE) snd_soc_acpi_intel_match snd_soc_acpi_intel_sdca_quirks soundwire_generic_allocation snd_soc_acpi soundwire_bus intel_uncore_frequency snd_soc_sdca intel_uncore_frequency_common crc8 intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp snd_soc_avs coretemp snd_soc_hda_codec snd_hda_ext_core intel_sriov_compat(OE) btusb drm_gpuvm snd_hda_codec kvm_intel btmtk gpu_sched btrtl drm_exec snd_hda_core btbcm drm_suballoc_helper drm_ttm_helper snd_intel_dspcfg btintel snd_intel_sdw_acpi processor_thermal_device_pci kvm snd_hwdep processor_thermal_device bluetooth processor_thermal_wt_hint drm_buddy snd_soc_core platform_temperature_control ttm irqbypass processor_thermal_soc_slider i2c_algo_bit snd_compress polyval_clmulni ghash_clmulni_intel platform_profile ac97_bus drm_display_helper aesni_intel iTCO_wdt processor_thermal_rfim snd_pcm_dmaengine rapl intel_pmc_bxt intel_rapl_msr
12 19 23:21:37 home.server kernel:  processor_thermal_rapl mei_pxp ee1004 mei_hdcp snd_pcm spi_nor intel_cstate iTCO_vendor_support cec intel_rapl_common wmi_bmof i2c_i801 snd_timer processor_thermal_wt_req intel_pmc_core mei_me processor_thermal_power_floor mtd intel_gtt snd i2c_smbus intel_uncore pmt_telemetry processor_thermal_mbox intel_oc_wdt pcspkr video mei pmt_discovery int340x_thermal_zone i2c_mux soundcore pmt_class wmi intel_pmc_ssram_telemetry int3400_thermal intel_vsec acpi_pad pinctrl_alderlake acpi_tad acpi_thermal_rel mac_hid tcp_bbr sch_fq_pie sch_pie pkcs8_key_parser i2c_dev sg crypto_user ntsync loop dm_mod nfnetlink ip_tables x_tables tun mt7921e mt7921_common mt792x_lib mt76_connac_lib mt76 r8169 nvme realtek mdio_devres mac80211 nvme_core libphy nvme_keyring intel_lpss_pci spi_intel_pci libarc4 nvme_auth mdio_bus intel_lpss spi_intel hkdf idma64 cfg80211 rfkill netconsole
12 19 23:21:37 home.server kernel: CPU: 0 UID: 0 PID: 117 Comm: kswapd0 Tainted: G     U     OE       6.18.1-zen1-2-zen #1 PREEMPT(full)  0804075ae640cd5f63df55652e7e041dda90991d
12 19 23:21:37 home.server kernel: Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
12 19 23:21:37 home.server kernel: Hardware name: Maxsun Default string/MS-Terminator B760M D4, BIOS H7.7G 05/14/2025
12 19 23:21:37 home.server kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x64/0x2e0
12 19 23:21:37 home.server kernel: Code: ba 29 08 0f 92 c2 8b 01 0f b6 d2 c1 e2 08 30 e4 09 d0 3d ff 00 00 00 0f 87 1e 02 00 00 85 c0 74 10 0f b6 01 84 c0 74 09 f3 90 <0f> b6 01 84 c0 75 f7 b8 01 00 00 00 66 89 01 65 48 ff 05 cd 8f ed
12 19 23:21:37 home.server kernel: RSP: 0018:ffffd0994086fa30 EFLAGS: 00000202
12 19 23:21:37 home.server kernel: RAX: 0000000000000001 RBX: ffffd0994086fb10 RCX: ffff8cdf02a04bd8
12 19 23:21:37 home.server kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8cdf02a04bd8
12 19 23:21:37 home.server kernel: RBP: ffffffff95d16f38 R08: 0000000000000000 R09: 0000000000000000
12 19 23:21:37 home.server kernel: R10: ffff8cfe00438340 R11: ffff8cfe7f7d6000 R12: ffff8cdfba1f4000
12 19 23:21:37 home.server kernel: R13: ffffffff93914c70 R14: ffff8cdf02a04bc0 R15: ffff8cdf02a04bc0
12 19 23:21:37 home.server kernel: FS:  0000000000000000(0000) GS:ffff8cfe6a8b1000(0000) knlGS:0000000000000000
12 19 23:21:37 home.server kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
12 19 23:21:37 home.server kernel: CR2: 00007f24addf0000 CR3: 0000000e17e24005 CR4: 0000000000f72ef0
12 19 23:21:37 home.server kernel: PKRU: 55555554
12 19 23:21:37 home.server kernel: Call Trace:
12 19 23:21:37 home.server kernel:  <TASK>
12 19 23:21:37 home.server kernel:  _raw_spin_lock+0x29/0x30
12 19 23:21:37 home.server kernel:  __list_lru_walk_one.constprop.0+0x94/0x1d0
12 19 23:21:37 home.server kernel:  ? __pfx_rust_shrink_free_page_wrap+0x10/0x10
12 19 23:21:37 home.server kernel:  ? __pfx_rust_shrink_free_page_wrap+0x10/0x10
12 19 23:21:37 home.server kernel:  list_lru_walk_node+0x46/0x1f0
12 19 23:21:37 home.server kernel:  ? __pfx_rust_shrink_free_page_wrap+0x10/0x10
12 19 23:21:37 home.server kernel:  rust_helper_list_lru_walk+0x9d/0xe0
12 19 23:21:37 home.server kernel:  do_shrink_slab+0x140/0x350
12 19 23:21:37 home.server kernel:  shrink_slab+0xd7/0x3e0
12 19 23:21:37 home.server kernel:  shrink_one+0xfe/0x1d0
12 19 23:21:37 home.server kernel:  shrink_node+0xb4a/0xd60
12 19 23:21:37 home.server kernel:  ? pgdat_balanced+0x83/0x140
12 19 23:21:37 home.server kernel:  kswapd+0x870/0x1100
12 19 23:21:37 home.server kernel:  ? __switch_to+0x103/0x3f0
12 19 23:21:37 home.server kernel:  ? __pfx_kswapd+0x10/0x10
12 19 23:21:37 home.server kernel:  kthread+0xfc/0x240
12 19 23:21:37 home.server kernel:  ? __pfx_kthread+0x10/0x10
12 19 23:21:37 home.server kernel:  ret_from_fork+0x1c2/0x1f0
12 19 23:21:37 home.server kernel:  ? __pfx_kthread+0x10/0x10
12 19 23:21:37 home.server kernel:  ret_from_fork_asm+0x1a/0x30
12 19 23:21:37 home.server kernel:  </TASK>
12 19 23:22:05 home.server kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 52s! [kswapd0:117]
12 19 23:22:05 home.server kernel: CPU#0 Utilization every 4000ms during lockup:
12 19 23:22:05 home.server kernel:         #1: 100% system,          0% softirq,          0% hardirq,          0% idle
12 19 23:22:05 home.server kernel:         #2: 100% system,          0% softirq,          0% hardirq,          0% idle
12 19 23:22:05 home.server kernel:         #3: 100% system,          0% softirq,          1% hardirq,          0% idle
12 19 23:22:05 home.server kernel:         #4: 100% system,          0% softirq,          0% hardirq,          0% idle
12 19 23:22:05 home.server kernel:         #5: 100% system,          0% softirq,          0% hardirq,          0% idle

rust_helper_list_lru_walk appears to be a function used by the Rust binder implementation: https://github.com/gregkh/linux/blob/9448598b22c50c8a5bb77a9103e2d49f134c9578/rust/helpers/binder.c#L15

More reports of this issue can be found on the Arch Linux forums: https://bbs.archlinux.org/viewtopic.php?id=311223

Steps to reproduce:

  1. Install Waydroid
  2. Start and run the Android container
  3. Stop the Android container
  4. Run some CPU-intensive workload