This project is mirrored from https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git.
Pull mirroring updated .
- Dec 22, 2023
-
-
Luis Claudio R. Goncalves authored
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
-
- Dec 18, 2023
-
-
Wang Yong authored
The ltp test prompts the following bug information under the 5.10 kernel: BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:969 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 796, name: cat Preemption disabled at: [<ffffffe40f433980>] do_debug_exception+0x60/0x180 CPU: 3 PID: 796 Comm: cat Not tainted 5.10.59-rt52-KERNEL_VERSION #38 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x198 show_stack+0x20/0x30 dump_stack+0xf0/0x13c ___might_sleep+0x140/0x178 rt_spin_lock+0x30/0x90 force_sig_info_to_task+0x30/0xe0 force_sig_fault_to_task+0x54/0x78 force_sig_fault+0x1c/0x28 arm64_force_sig_fault+0x48/0x78 send_user_sigtrap+0x4c/0x80 brk_handler+0x3c/0x68 do_debug_exception+0xac/0x180 el0_dbg+0x34/0x58 el0_sync_handler+0x50/0xb8 el0_sync+0x180/0x1c0 It has been fixed by 0c34700d ("arm64: signal: Use ARCH_RT_DELAYS_SIGNAL_SEND.") in higher versions of the kernel. This patch needs to be compatible with 5.10. 5.10 kernel does not have signal.h file, so adding signal.h file to define ARCH_RT_DELAYS_SIGNAL_SEND. Link: https://lore.kernel.org/r/202309121514283793475@zte.com.cn Signed-off-by: Wang Yong <wang.yong12@zte.com.cn> Cc: Xuexin Jiang <jiang.xuexin@zte.com.cn> Cc: Yang Yang <yang.yang29@zte.com.cn> Cc: Xiaokai Ran <ran.xiaokai@zte.com.cn> Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
-
Luis Claudio R. Goncalves authored
This reverts commit 32232bcd. The support for deferred printing was removed in v5.10-rc1-rt1 by commit 9153e3c5 ("printk: remove deferred printing") because: Since printing occurs either atomically or from the printing kthread, there is no need for any deferring or tracking possible recursion paths. Remove all printk context tracking. Fixes: 32232bcd ("printk: declare printk_deferred_{enter,safe}() in include/linux/printk.h") Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
-
Luis Claudio R. Goncalves authored
This reverts commit a992c387. The support for deferred printing was removed in v5.10-rc1-rt1 by commit 9153e3c5 ("printk: remove deferred printing") because: Since printing occurs either atomically or from the printing kthread, there is no need for any deferring or tracking possible recursion paths. Remove all printk context tracking. Also, disabling interrupts in __build_all_zonelists() should produce warnings once that code path is hit. Fixes: a992c387 ("mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock") Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
-
Steffen Dirkwinkel authored
Without this we get system hangs within a couple of days. It's also reproducible in minutes with "stress-ng --exec 20". Example error in dmesg: INFO: task stress-ng:163916 blocked for more than 120 seconds. Not tainted 5.10.168-rt83 #2 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:stress-ng state:D stack: 0 pid:163916 ppid: 72833 flags:0x00004000 Call Trace: __schedule+0x2bd/0x940 preempt_schedule_lock+0x23/0x50 rt_spin_lock_slowlock_locked+0x117/0x2c0 rt_spin_lock_slowlock+0x51/0x80 rt_write_lock+0x1e/0x1c0 do_exit+0x3ac/0xb20 do_group_exit+0x39/0xb0 get_signal+0x145/0x960 ? wake_up_new_task+0x21f/0x3c0 arch_do_signal_or_restart+0xf1/0x830 ? __x64_sys_futex+0x146/0x1d0 exit_to_user_mode_prepare+0x116/0x1a0 syscall_exit_to_user_mode+0x28/0x190 entry_SYSCALL_64_after_hwframe+0x61/0xc6 RIP: 0033:0x7f738d9074a7 RSP: 002b:00007ffdafda3cb0 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca RAX: fffffffffffffe00 RBX: 00000000000000ca RCX: 00007f738d9074a7 RDX: 0000000000028051 RSI: 0000000000000000 RDI: 00007f738be949d0 RBP: 00007ffdafda3d88 R08: 0000000000000000 R09: 00007f738be94700 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000028051 R13: 00007f738be949d0 R14: 00007ffdafda51e0 R15: 00007f738be94700 Fixes: 1ba44dcf ("Merge tag 'v5.10.162' into v5.10-rt") Acked-by: Joe Korty <joe.korty@concurrent-rt.com> Signed-off-by: Steffen Dirkwinkel <s.dirkwinkel@beckhoff.com> Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
-
John Ogness authored
The ttynull driver does not provide an implementation for the write() callback. This leads to a NULL pointer dereference in the related printing kthread, which assumes it can call that callback. Do not create kthreads for consoles that do not implement the write() callback. Also, for pr_flush(), ignore consoles that do not implement write() or write_atomic(), since there is no way those consoles can flush their output. Link: https://lore.kernel.org/lkml/1831554214.546921.1676479103702.JavaMail.zimbra@hale.at Reported-by: Michael Thalmeier <michael.thalmeier@hale.at> Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
-
Salvatore Bonaccorso authored
As same as in commit 870d1675 ("arm64: make _TIF_WORK_MASK bits contiguous") in mainline, we need to make the bits of _TIF_WORK_MASK to be contiguous in order to use this as an immediate argument to an AND instruction in entry.S. We shuffle these bits down-by-one keeping the existing contiguity after inserting TIF_NEED_RESCHED_LAZY in the preempt-rt patch series. Otherwise, omitting this change will result in a build failure as below: arch/arm64/kernel/entry.S: Assembler messages: arch/arm64/kernel/entry.S:763: Error: immediate out of range at operand 3 -- `and x2,x19,#((1<<1)|(1<<0)|(1<<2)|(1<<3)|(1<<4)|(1<<5)|(1<<6)|(1<<13)|(1<<7))' Reported-by: Vignesh Raghavendra <vigneshr@ti.com> Reported-by: Pavel Machek <pavel@denx.de> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/lkml/40de655e-26f3-aa7b-f1ec-6877396a9f1e@ti.com/ Signed-off-by: Salvatore Bonaccorso <carnil@debian.org> Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
-
Anand Je Saipureddy authored
In kernel/trace/trace_events_trigger.c --> stacktrace_trigger() --> __trace_stack() is not defined as per the function definition. With commit edbaaa13 ("tracing: Merge irqflags + preemt counter, add RT bits") the irqflags(flags) and preemption counter(preempt_count()) are now should be evaluated early by tracing_gen_ctx(). This patch replaces the irqflags and preemption counter with tracing_gen_ctx(). Fixes: 5e8446e3 ("tracing: Dump stacktrace trigger to the corresponding instance") Link: https://lore.kernel.org/r/20220723064943.16532-1-s.anandje1@gmail.com Signed-off-by: Anand Je Saipureddy <s.anandje1@gmail.com> Reviewed-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
-
Yajun Deng authored
We can use EXPORT_SYMBOL() instead of EXPORT_SYMBOL_GPL() in ww_mutex_lock_interruptible() and ww_mutex_lock(). That match ww_mutex_unlock() well. And also good for 3rd kernel modules. Link: https://lore.kernel.org/r/20220803062430.1307312-1-yajun.deng@linux.dev Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
-
Luis Claudio R. Goncalves authored
Fix the build error below while keeping the current PREEMPT_RT code: kernel/trace/trace_events_trigger.c: In function ‘stacktrace_trigger’: kernel/trace/trace_events_trigger.c:1227:3: error: too many arguments to function ‘__trace_stack’ __trace_stack(file->tr, flags, STACK_SKIP, preempt_count()); ^~~~~~~~~~~~~ In file included from kernel/trace/trace_events_trigger.c:15: kernel/trace/trace.h:826:6: note: declared here void __trace_stack(struct trace_array *tr, unsigned int trace_ctx, int skip); ^~~~~~~~~~~~~ Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
-
Xie Yongji authored
commit 4b374986 upstream. We should defer eventfd_signal() to the workqueue when eventfd_signal_allowed() return false rather than return true. Fixes: b542e383 ("eventfd: Make signal recursion protection a task bit") Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Link: https://lore.kernel.org/r/20210913111928.98-1-xieyongji@bytedance.com Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
-
Sebastian Andrzej Siewior authored
This aligns the patch ("stop_machine: Add function and caller debug info) with commit a8b62fd0 ("stop_machine: Add function and caller debug info") that was merged upstream and is slightly different. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-
Thomas Gleixner authored
Upstream commit b542e383 The recursion protection for eventfd_signal() is based on a per CPU variable and relies on the !RT semantics of spin_lock_irqsave() for protecting this per CPU variable. On RT kernels spin_lock_irqsave() neither disables preemption nor interrupts which allows the spin lock held section to be preempted. If the preempting task invokes eventfd_signal() as well, then the recursion warning triggers. Paolo suggested to protect the per CPU variable with a local lock, but that's heavyweight and actually not necessary. The goal of this protection is to prevent the task stack from overflowing, which can be achieved with a per task recursion protection as well. Replace the per CPU variable with a per task bit similar to other recursion protection bits like task_struct::in_page_owner. This works on both !RT and RT kernels and removes as a side effect the extra per CPU storage. No functional change for !RT kernels. Reported-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Daniel Bristot de Oliveira <bristot@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/r/87wnp9idso.ffs@tglx Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
-
Sebastian Andrzej Siewior authored
On PREEMPT_RT most items are processed as LAZY via softirq context. Avoid to spin-wait for them because irq_work_sync() could have higher priority and not allow the irq-work to be completed. Wait additionally for !IRQ_WORK_HARD_IRQ irq_work items on PREEMPT_RT. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20211006111852.1514359-5-bigeasy@linutronix.de Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
The irq_work callback is invoked in hard IRQ context. By default all callbacks are scheduled for invocation right away (given supported by the architecture) except for the ones marked IRQ_WORK_LAZY which are delayed until the next timer-tick. While looking over the callbacks, some of them may acquire locks (spinlock_t, rwlock_t) which are transformed into sleeping locks on PREEMPT_RT and must not be acquired in hard IRQ context. Changing the locks into locks which could be acquired in this context will lead to other problems such as increased latencies if everything in the chain has IRQ-off locks. This will not solve all the issues as one callback has been noticed which invoked kref_put() and its callback invokes kfree() and this can not be invoked in hardirq context. Some callbacks are required to be invoked in hardirq context even on PREEMPT_RT to work properly. This includes for instance the NO_HZ callback which needs to be able to observe the idle context. The callbacks which require to be run in hardirq have already been marked. Use this information to split the callbacks onto the two lists on PREEMPT_RT: - lazy_list Work items which are not marked with IRQ_WORK_HARD_IRQ will be added to this list. Callbacks on this list will be invoked from a per-CPU thread. The handler here may acquire sleeping locks such as spinlock_t and invoke kfree(). - raised_list Work items which are marked with IRQ_WORK_HARD_IRQ will be added to this list. They will be invoked in hardirq context and must not acquire any sleeping locks. The wake up of the per-CPU thread occurs from irq_work handler/ hardirq context. The thread runs with lowest RT priority to ensure it runs before any SCHED_OTHER tasks do. [bigeasy: melt tglx's irq_work_tick_soft() which splits irq_work_tick() into a hard and soft variant. Collected fixes over time from Steven Rostedt and Mike Galbraith. Move to per-CPU threads instead of softirq as suggested by PeterZ.] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20211007092646.uhshe3ut2wkrcfzv@linutronix.de Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
irq_work() triggers instantly an interrupt if supported by the architecture. Otherwise the work will be processed on the next timer tick. In worst case irq_work_sync() could spin up to a jiffy. irq_work_sync() is usually used in tear down context which is fully preemptible. Based on review irq_work_sync() is invoked from preemptible context and there is one waiter at a time. This qualifies it to use rcuwait for synchronisation. Let irq_work_sync() synchronize with rcuwait if the architecture processes irqwork via the timer tick. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20211006111852.1514359-3-bigeasy@linutronix.de Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
Disabling interrupts and invoking the irq_work function directly breaks on PREEMPT_RT. PREEMPT_RT does not invoke all irq_work from hardirq context because some of the user have spinlock_t locking in the callback function. These locks are then turned into a sleeping locks which can not be acquired with disabled interrupts. Using irq_work_queue() has the benefit that the irqwork will be invoked in the regular context. In general there is "no" delay between enqueuing the callback and its invocation because the interrupt is raised right away on architectures which support it (which includes x86). Use irq_work_queue() + irq_work_sync() instead invoking the callback directly. Reported-by: Clark Williams <williams@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
might_sleep_no_state_check() serves the same purpose as might_sleep() except it is used before sleeping locks are acquired and therefore does not check task_struct::state because the state is preserved. That state is preserved in the locking slow path so we must not schedule at the begin of the locking function because the state will be lost and not preserved at that time. Remove might_resched() from might_sleep_no_state_check() to avoid losing the state before it is preserved. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
This is an update of the original patch, removing put_cpu_var() which was overseen in the initial patch. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
In the commit mentioned below, fscache was converted from slow-work to workqueue. slow_work_enqueue() and slow_work_sleep_till_thread_needed() did not use a per-CPU workqueue. They choose from two global waitqueues depending on the SLOW_WORK_VERY_SLOW bit which was not set so it always one waitqueue. I can't find out how it is ensured that a waiter on certain CPU is woken up be the other side. My guess is that the timeout in schedule_timeout() ensures that it does not wait forever (or a random wake up). fscache_object_sleep_till_congested() must be invoked from preemptible context in order for schedule() to work. In this case this_cpu_ptr() should complain with CONFIG_DEBUG_PREEMPT enabled except the thread is bound to one CPU. wake_up() wakes only one waiter and I'm not sure if it is guaranteed that only one waiter exists. Replace the per-CPU waitqueue with one global waitqueue. Fixes: 8b8edefa ("fscache: convert object to use workqueue instead of slow-work") Reported-by: Gregor Beck <gregor.beck@gmail.com> Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
TRANSPARENT_HUGEPAGE: There are potential non-deterministic delays to an RT thread if a critical memory region is not THP-aligned and a non-RT buffer is located in the same hugepage-aligned region. It's also possible for an unrelated thread to migrate pages belonging to an RT task incurring unexpected page faults due to memory defragmentation even if khugepaged is disabled. Regular HUGEPAGEs are not affected by this can be used. NUMA_BALANCING: There is a non-deterministic delay to mark PTEs PROT_NONE to gather NUMA fault samples, increased page faults of regions even if mlocked and non-deterministic delays when migrating pages. [Mel Gorman worded 99% of the commit description]. Link: https://lore.kernel.org/all/20200304091159.GN3818@techsingularity.net/ Link: https://lore.kernel.org/all/20211026165100.ahz5bkx44lrrw5pt@linutronix.de/ Cc: stable-rt@vger.kernel.org Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Mel Gorman <mgorman@techsingularity.net> Link: https://lore.kernel.org/r/20211028143327.hfbxjze7palrpfgp@linutronix.de Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
preempt_enable_no_resched() should point to preempt_enable() on PREEMPT_RT so nobody is playing any preempt tricks and enables preemption without checking for the need-resched flag. This was misplaced in v3.14.0-rt1 und remained unnoticed until now. Point preempt_enable_no_resched() and preempt_enable() on RT. Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
With PREEMPT_RT enabled all hrtimers callbacks will be invoked in softirq mode unless they are explicitly marked as HRTIMER_MODE_HARD. During boot kthread_bind() is used for the creation of per-CPU threads and then hangs in wait_task_inactive() if the ksoftirqd is not yet up and running. The hang disappeared since commit 26c7295b ("kthread: Do not preempt current task if it is going to call schedule()") but enabling function trace on boot reliably leads to the freeze on boot behaviour again. The timer in wait_task_inactive() can not be directly used by an user interface to abuse it and create a mass wake of several tasks at the same time which would to long sections with disabled interrupts. Therefore it is safe to make the timer HRTIMER_MODE_REL_HARD. Switch the timer to HRTIMER_MODE_REL_HARD. Cc: stable-rt@vger.kernel.org Link: https://lkml.kernel.org/r/20210826170408.vm7rlj7odslshwch@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Sebastian Andrzej Siewior authored
push_rt_task() attempts to move the currently running task away if the next runnable task has migration disabled and therefore is pinned on the current CPU. The current task is retrieved via get_push_task() which only checks for nr_cpus_allowed == 1, but does not check whether the task has migration disabled and therefore cannot be moved either. The consequence is a pointless invocation of the migration thread which correctly observes that the task cannot be moved. Return NULL if the task has migration disabled and cannot be moved to another CPU. Cc: stable-rt@vger.kernel.org Fixes: a7c81556 ("sched: Fix migrate_disable() vs rt/dl balancing") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210826133738.yiotqbtdaxzjsnfj@linutronix.de Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Mike Galbraith authored
local_lock_t becoming a synonym of spinlock_t had consequences for the RT mods to zsmalloc, which were taking a mutex while holding a local_lock, inspiring a lockdep "BUG: Invalid wait context" gripe. Converting zsmalloc_handle.lock to a spinlock_t restored lockdep silence. Cc: stable-rt@vger.kernel.org Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Andrew Halaney authored
There's no chance of sleeping here, the reader is giving up the lock and possibly waking up the writer who is waiting on it. Reported-by: Chunyu Hu <chuhu@redhat.com> Signed-off-by: Andrew Halaney <ahalaney@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Chao Qin authored
[ Upstream commit 83e9288d9c4295d1195e9d780fcbc42c72ba4a83 ] There is msleep in pr_flush(). If call WARN() in the early boot stage such as in early_initcall, pr_flush() will run into msleep when process scheduler is not ready yet. And then the system will sleep forever. Before the system_state is SYSTEM_RUNNING, make sure DO NOT sleep in pr_flush(). Fixes: c0b395bd0fe3("printk: add pr_flush()") Signed-off-by: Chao Qin <chao.qin@intel.com> Signed-off-by: Lili Li <lili.li@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/20210719022649.3444072-1-chao.qin@intel.com Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Valentin Schneider authored
commit 475ea6c6 upstream. Will reported that the 'XXX __migrate_task() can fail' in migration_cpu_stop() can happen, and it *is* sort of a big deal. Looking at it some more, one will note there is a glaring hole in the deferred CPU selection: (w/ CONFIG_CPUSET=n, so that the affinity mask passed via taskset doesn't get AND'd with cpu_online_mask) $ taskset -pc 0-2 $PID # offline CPUs 3-4 $ taskset -pc 3-5 $PID `\ $PID may stay on 0-2 due to the cpumask_any_distribute() picking an offline CPU and __migrate_task() refusing to do anything due to cpu_is_allowed(). set_cpus_allowed_ptr() goes to some length to pick a dest_cpu that matches the right constraints vs affinity and the online/active state of the CPUs. Reuse that instead of discarding it in the affine_move_task() case. Fixes: 6d337eab ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()") Reported-by: Will Deacon <will@kernel.org> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210526205751.842360-2-valentin.schneider@arm.com Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Peter Zijlstra authored
commit 50caf9c1 upstream. Now that we have set_affinity_pending::stop_pending to indicate if a stopper is in progress, and we have the guarantee that if that stopper exists, it will (eventually) complete our @pending we can simplify the refcount scheme by no longer counting the stopper thread. Fixes: 6d337eab ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()") Cc: stable@kernel.org Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210224131355.724130207@infradead.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Peter Zijlstra authored
commit 9e81889c upstream. Consider: sched_setaffinity(p, X); sched_setaffinity(p, Y); Then the first will install p->migration_pending = &my_pending; and issue stop_one_cpu_nowait(pending); and the second one will read p->migration_pending and _also_ issue: stop_one_cpu_nowait(pending), the _SAME_ @pending. This causes stopper list corruption. Add set_affinity_pending::stop_pending, to indicate if a stopper is in progress. Fixes: 6d337eab ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()") Cc: stable@kernel.org Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210224131355.649146419@infradead.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Peter Zijlstra authored
commit 3f1bc119 upstream. When the purpose of migration_cpu_stop() is to migrate the task to 'any' valid CPU, don't migrate the task when it's already running on a valid CPU. Fixes: 6d337eab ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()") Cc: stable@kernel.org Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210224131355.569238629@infradead.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Peter Zijlstra authored
commit 58b1a450 upstream. The SCA_MIGRATE_ENABLE and task_running() cases are almost identical, collapse them to avoid further duplication. Fixes: 6d337eab ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()") Cc: stable@kernel.org Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210224131355.500108964@infradead.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Peter Zijlstra authored
commit c20cf065 upstream. When affine_move_task() issues a migration_cpu_stop(), the purpose of that function is to complete that @pending, not any random other p->migration_pending that might have gotten installed since. This realization much simplifies migration_cpu_stop() and allows further necessary steps to fix all this as it provides the guarantee that @pending's stopper will complete @pending (and not some random other @pending). Fixes: 6d337eab ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()") Cc: stable@kernel.org Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210224131355.430014682@infradead.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Peter Zijlstra authored
commit 8a6edb52 upstream. When affine_move_task(p) is called on a running task @p, which is not otherwise already changing affinity, we'll first set p->migration_pending and then do: stop_one_cpu(cpu_of_rq(rq), migration_cpu_stop, &arg); This then gets us to migration_cpu_stop() running on the CPU that was previously running our victim task @p. If we find that our task is no longer on that runqueue (this can happen because of a concurrent migration due to load-balance etc.), then we'll end up at the: } else if (dest_cpu < 1 || pending) { branch. Which we'll take because we set pending earlier. Here we first check if the task @p has already satisfied the affinity constraints, if so we bail early [A]. Otherwise we'll reissue migration_cpu_stop() onto the CPU that is now hosting our task @p: stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop, &pending->arg, &pending->stop_work); Except, we've never initialized pending->arg, which will be all 0s. This then results in running migration_cpu_stop() on the next CPU with arg->p == NULL, which gives the by now obvious result of fireworks. The cure is to change affine_move_task() to always use pending->arg, furthermore we can use the exact same pattern as the SCA_MIGRATE_ENABLE case, since we'll block on the pending->done completion anyway, no point in adding yet another completion in stop_one_cpu(). This then gives a clear distinction between the two migration_cpu_stop() use cases: - sched_exec() / migrate_task_to() : arg->pending == NULL - affine_move_task() : arg->pending != NULL; And we can have it ignore p->migration_pending when !arg->pending. Any stop work from sched_exec() / migrate_task_to() is in addition to stop works from affine_move_task(), which will be sufficient to issue the completion. Fixes: 6d337eab ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()") Cc: stable@kernel.org Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20210224131355.357743989@infradead.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-
Ahmed S. Darwish authored
A sequence counter write section must be serialized or its internal state can get corrupted. A plain seqcount_t does not contain the information of which lock must be held to guaranteee write side serialization. For xfrm_state_hash_generation, use seqcount_spinlock_t instead of plain seqcount_t. This allows to associate the spinlock used for write serialization with the sequence counter. It thus enables lockdep to verify that the write serialization lock is indeed held before entering the sequence counter write section. If lockdep is disabled, this lock association is compiled out and has neither storage size nor runtime overhead. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
-
Thomas Gleixner authored
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Clark Williams authored
Add a /sys/kernel entry to indicate that the kernel is a realtime kernel. Clark says that he needs this for udev rules, udev needs to evaluate if its a PREEMPT_RT kernel a few thousand times and parsing uname output is too slow or so. Are there better solutions? Should it exist and return 0 on !-rt? Signed-off-by: Clark Williams <williams@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
-
Ingo Molnar authored
Creates long latencies for no value Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-
Matt Fleming authored
The way user struct reference counting works changed significantly with, fda31c50 ("signal: avoid double atomic counter increments for user accounting") Now user structs are only freed once the last pending signal is dequeued. Make sigqueue_free_current() follow this new convention to avoid freeing the user struct multiple times and triggering this warning: refcount_t: underflow; use-after-free. WARNING: CPU: 0 PID: 6794 at lib/refcount.c:288 refcount_dec_not_one+0x45/0x50 Call Trace: refcount_dec_and_lock_irqsave+0x16/0x60 free_uid+0x31/0xa0 __dequeue_signal+0x17c/0x190 dequeue_signal+0x5a/0x1b0 do_sigtimedwait+0x208/0x250 __x64_sys_rt_sigtimedwait+0x6f/0xd0 do_syscall_64+0x72/0x200 entry_SYSCALL_64_after_hwframe+0x49/0xbe Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Reported-by: Daniel Wagner <wagi@monom.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-
Thomas Gleixner authored
To avoid allocation allow rt tasks to cache one sigqueue struct in task struct. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-