Commits · v5.10.204-rt99-rebase · Arch Linux / Packaging / Upstream / linux-rt-lts

This project is mirrored from https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git. Pull mirroring updated 18 minutes ago.

Dec 22, 2023
- Linux 5.10.204-rt99 REBASE · f50a6ab2
  Luis Claudio R. Goncalves authored 1 year ago
  
  Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
  v5.10.204-rt99-rebase
  
  f50a6ab2
Dec 18, 2023

arm64: signal: Use ARCH_RT_DELAYS_SIGNAL_SEND · 2074b2bf

Wang Yong authored 1 year ago

The ltp test prompts the following bug information under the 5.10 kernel:
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:969
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 796, name: cat
Preemption disabled at:
[<ffffffe40f433980>] do_debug_exception+0x60/0x180
CPU: 3 PID: 796 Comm: cat Not tainted 5.10.59-rt52-KERNEL_VERSION #38
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace+0x0/0x198
 show_stack+0x20/0x30
 dump_stack+0xf0/0x13c
 ___might_sleep+0x140/0x178
 rt_spin_lock+0x30/0x90
 force_sig_info_to_task+0x30/0xe0
 force_sig_fault_to_task+0x54/0x78
 force_sig_fault+0x1c/0x28
 arm64_force_sig_fault+0x48/0x78
 send_user_sigtrap+0x4c/0x80
 brk_handler+0x3c/0x68
 do_debug_exception+0xac/0x180
 el0_dbg+0x34/0x58
 el0_sync_handler+0x50/0xb8
 el0_sync+0x180/0x1c0

It has been fixed by
0c34700d ("arm64: signal: Use ARCH_RT_DELAYS_SIGNAL_SEND.") in
higher versions of the kernel. This patch needs to be compatible with 5.10.
5.10 kernel does not have signal.h file, so adding signal.h file to
define ARCH_RT_DELAYS_SIGNAL_SEND.

Link: https://lore.kernel.org/r/202309121514283793475@zte.com.cn


Signed-off-by: Wang Yong <wang.yong12@zte.com.cn>
Cc: Xuexin Jiang <jiang.xuexin@zte.com.cn>
Cc: Yang Yang <yang.yang29@zte.com.cn>
Cc: Xiaokai Ran <ran.xiaokai@zte.com.cn>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>

2074b2bf

Revert "printk: declare printk_deferred_{enter,safe}() in include/linux/printk.h" · f59d4376

Luis Claudio R. Goncalves authored 1 year ago

This reverts commit 32232bcd.

The support for deferred printing was removed in v5.10-rc1-rt1 by commit
9153e3c5 ("printk: remove deferred printing") because:

Since printing occurs either atomically or from the printing
kthread, there is no need for any deferring or tracking possible
recursion paths. Remove all printk context tracking.

Fixes: 32232bcd ("printk: declare printk_deferred_{enter,safe}() in include/linux/printk.h")
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>

f59d4376

Revert "mm/page_alloc: fix potential deadlock on zonelist_update_seqseqlock" · 59fc9fa5

Luis Claudio R. Goncalves authored 1 year ago


This reverts commit a992c387.

The support for deferred printing was removed in v5.10-rc1-rt1 by commit
9153e3c5 ("printk: remove deferred printing") because:

    Since printing occurs either atomically or from the printing
    kthread, there is no need for any deferring or tracking possible
    recursion paths. Remove all printk context tracking.

Also, disabling interrupts in __build_all_zonelists() should produce warnings
once that code path is hit.

Fixes: a992c387 ("mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock")
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>

59fc9fa5

kernel: fork: set wake_q_sleeper.next=NULL again in dup_task_struct · de3bf889

Steffen Dirkwinkel authored 1 year ago


Without this we get system hangs within a couple of days.
It's also reproducible in minutes with "stress-ng --exec 20".

Example error in dmesg:
INFO: task stress-ng:163916 blocked for more than 120 seconds.
      Not tainted 5.10.168-rt83 #2
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:stress-ng       state:D stack:    0 pid:163916 ppid: 72833 flags:0x00004000
Call Trace:
 __schedule+0x2bd/0x940
 preempt_schedule_lock+0x23/0x50
 rt_spin_lock_slowlock_locked+0x117/0x2c0
 rt_spin_lock_slowlock+0x51/0x80
 rt_write_lock+0x1e/0x1c0
 do_exit+0x3ac/0xb20
 do_group_exit+0x39/0xb0
 get_signal+0x145/0x960
 ? wake_up_new_task+0x21f/0x3c0
 arch_do_signal_or_restart+0xf1/0x830
 ? __x64_sys_futex+0x146/0x1d0
 exit_to_user_mode_prepare+0x116/0x1a0
 syscall_exit_to_user_mode+0x28/0x190
 entry_SYSCALL_64_after_hwframe+0x61/0xc6
RIP: 0033:0x7f738d9074a7
RSP: 002b:00007ffdafda3cb0 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 00000000000000ca RCX: 00007f738d9074a7
RDX: 0000000000028051 RSI: 0000000000000000 RDI: 00007f738be949d0
RBP: 00007ffdafda3d88 R08: 0000000000000000 R09: 00007f738be94700
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000028051
R13: 00007f738be949d0 R14: 00007ffdafda51e0 R15: 00007f738be94700

Fixes: 1ba44dcf ("Merge tag 'v5.10.162' into v5.10-rt")
Acked-by: Joe Korty <joe.korty@concurrent-rt.com>
Signed-off-by: Steffen Dirkwinkel <s.dirkwinkel@beckhoff.com>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>

de3bf889

printk: ignore consoles without write() callback · 7efd1251

John Ogness authored 1 year ago

The ttynull driver does not provide an implementation for the write()
callback. This leads to a NULL pointer dereference in the related
printing kthread, which assumes it can call that callback.

Do not create kthreads for consoles that do not implement the write()
callback. Also, for pr_flush(), ignore consoles that do not implement
write() or write_atomic(), since there is no way those consoles can
flush their output.

Link: https://lore.kernel.org/lkml/1831554214.546921.1676479103702.JavaMail.zimbra@hale.at

Reported-by: Michael Thalmeier <michael.thalmeier@hale.at>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>

7efd1251

rt: arm64: make _TIF_WORK_MASK bits contiguous · 599ec456

Salvatore Bonaccorso authored 2 years ago

As same as in commit 870d1675 ("arm64: make _TIF_WORK_MASK bits
contiguous") in mainline, we need to make the bits of _TIF_WORK_MASK to
be contiguous in order to use this as an immediate argument to an AND
instruction in entry.S.

We shuffle these bits down-by-one keeping the existing contiguity after
inserting TIF_NEED_RESCHED_LAZY in the preempt-rt patch series.

Otherwise, omitting this change will result in a build failure as below:

arch/arm64/kernel/entry.S: Assembler messages:
arch/arm64/kernel/entry.S:763: Error: immediate out of range at operand 3 -- `and x2,x19,#((1<<1)|(1<<0)|(1<<2)|(1<<3)|(1<<4)|(1<<5)|(1<<6)|(1<<13)|(1<<7))'

Reported-by: Vignesh Raghavendra <vigneshr@ti.com>
Reported-by: Pavel Machek <pavel@denx.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/lkml/40de655e-26f3-aa7b-f1ec-6877396a9f1e@ti.com/

Signed-off-by: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>

599ec456

ftrace: Fix improper usage of __trace_stack() function. · 54eb98b6

Anand Je Saipureddy authored 2 years ago

In kernel/trace/trace_events_trigger.c --> stacktrace_trigger() -->
__trace_stack() is not defined as per the function definition.

With commit edbaaa13
("tracing: Merge irqflags + preemt counter, add RT bits")
the irqflags(flags) and preemption counter(preempt_count()) are
now should be evaluated early by tracing_gen_ctx().

This patch replaces the irqflags and preemption counter
with tracing_gen_ctx().

Fixes: 5e8446e3 ("tracing: Dump stacktrace trigger to the corresponding instance")
Link: https://lore.kernel.org/r/20220723064943.16532-1-s.anandje1@gmail.com


Signed-off-by: Anand Je Saipureddy <s.anandje1@gmail.com>
Reviewed-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>

54eb98b6

locking/rtmutex: switch to EXPORT_SYMBOL() for ww_mutex_lock{,_interruptible}() · a02b1841

Yajun Deng authored 2 years ago

We can use EXPORT_SYMBOL() instead of EXPORT_SYMBOL_GPL() in
ww_mutex_lock_interruptible() and ww_mutex_lock(). That match
ww_mutex_unlock() well. And also good for 3rd kernel modules.

Link: https://lore.kernel.org/r/20220803062430.1307312-1-yajun.deng@linux.dev

Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>

a02b1841

rt: remove extra parameter from __trace_stack() · 38a85604

Luis Claudio R. Goncalves authored 2 years ago


Fix the build error below while keeping the current PREEMPT_RT code:

kernel/trace/trace_events_trigger.c: In function ‘stacktrace_trigger’:
kernel/trace/trace_events_trigger.c:1227:3: error: too many arguments to function ‘__trace_stack’
   __trace_stack(file->tr, flags, STACK_SKIP, preempt_count());
   ^~~~~~~~~~~~~
In file included from kernel/trace/trace_events_trigger.c:15:
kernel/trace/trace.h:826:6: note: declared here
 void __trace_stack(struct trace_array *tr, unsigned int trace_ctx, int skip);
      ^~~~~~~~~~~~~

Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>

38a85604

aio: Fix incorrect usage of eventfd_signal_allowed() · fd840c98

Xie Yongji authored 3 years ago


commit 4b374986 upstream.

We should defer eventfd_signal() to the workqueue when
eventfd_signal_allowed() return false rather than return
true.

Fixes: b542e383 ("eventfd: Make signal recursion protection a task bit")
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Link: https://lore.kernel.org/r/20210913111928.98-1-xieyongji@bytedance.com


Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>

fd840c98

stop_machine: Remove this_cpu_ptr() from print_stop_info(). · ce2910a4

Sebastian Andrzej Siewior authored 3 years ago


This aligns the patch ("stop_machine: Add function and caller debug
info) with commit
  a8b62fd0 ("stop_machine: Add function and caller debug info")

that was merged upstream and is slightly different.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

ce2910a4

eventfd: Make signal recursion protection a task bit · 17c41196

Thomas Gleixner authored 3 years ago


Upstream commit b542e383

The recursion protection for eventfd_signal() is based on a per CPU
variable and relies on the !RT semantics of spin_lock_irqsave() for
protecting this per CPU variable. On RT kernels spin_lock_irqsave() neither
disables preemption nor interrupts which allows the spin lock held section
to be preempted. If the preempting task invokes eventfd_signal() as well,
then the recursion warning triggers.

Paolo suggested to protect the per CPU variable with a local lock, but
that's heavyweight and actually not necessary. The goal of this protection
is to prevent the task stack from overflowing, which can be achieved with a
per task recursion protection as well.

Replace the per CPU variable with a per task bit similar to other recursion
protection bits like task_struct::in_page_owner. This works on both !RT and
RT kernels and removes as a side effect the extra per CPU storage.

No functional change for !RT kernels.

Reported-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/87wnp9idso.ffs@tglx


Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>

17c41196

irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ on PREEMPT_RT · ca685c96

Sebastian Andrzej Siewior authored 3 years ago

On PREEMPT_RT most items are processed as LAZY via softirq context.
Avoid to spin-wait for them because irq_work_sync() could have higher
priority and not allow the irq-work to be completed.

Wait additionally for !IRQ_WORK_HARD_IRQ irq_work items on PREEMPT_RT.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20211006111852.1514359-5-bigeasy@linutronix.de

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

ca685c96

irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT · 2b803272

Sebastian Andrzej Siewior authored 3 years ago


The irq_work callback is invoked in hard IRQ context. By default all
callbacks are scheduled for invocation right away (given supported by
the architecture) except for the ones marked IRQ_WORK_LAZY which are
delayed until the next timer-tick.

While looking over the callbacks, some of them may acquire locks
(spinlock_t, rwlock_t) which are transformed into sleeping locks on
PREEMPT_RT and must not be acquired in hard IRQ context.
Changing the locks into locks which could be acquired in this context
will lead to other problems such as increased latencies if everything
in the chain has IRQ-off locks. This will not solve all the issues as
one callback has been noticed which invoked kref_put() and its callback
invokes kfree() and this can not be invoked in hardirq context.

Some callbacks are required to be invoked in hardirq context even on
PREEMPT_RT to work properly. This includes for instance the NO_HZ
callback which needs to be able to observe the idle context.

The callbacks which require to be run in hardirq have already been
marked. Use this information to split the callbacks onto the two lists
on PREEMPT_RT:
- lazy_list
  Work items which are not marked with IRQ_WORK_HARD_IRQ will be added
  to this list. Callbacks on this list will be invoked from a per-CPU
  thread.
  The handler here may acquire sleeping locks such as spinlock_t and
  invoke kfree().

- raised_list
  Work items which are marked with IRQ_WORK_HARD_IRQ will be added to
  this list. They will be invoked in hardirq context and must not
  acquire any sleeping locks.

The wake up of the per-CPU thread occurs from irq_work handler/
hardirq context. The thread runs with lowest RT priority to ensure it
runs before any SCHED_OTHER tasks do.

[bigeasy: melt tglx's irq_work_tick_soft() which splits irq_work_tick() into a
	  hard and soft variant. Collected fixes over time from Steven
	  Rostedt and Mike Galbraith. Move to per-CPU threads instead of
	  softirq as suggested by PeterZ.]

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20211007092646.uhshe3ut2wkrcfzv@linutronix.de


Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

2b803272

irq_work: Allow irq_work_sync() to sleep if irq_work() no IRQ support. · cf4bb976

Sebastian Andrzej Siewior authored 3 years ago


irq_work() triggers instantly an interrupt if supported by the
architecture. Otherwise the work will be processed on the next timer
tick. In worst case irq_work_sync() could spin up to a jiffy.

irq_work_sync() is usually used in tear down context which is fully
preemptible. Based on review irq_work_sync() is invoked from preemptible
context and there is one waiter at a time. This qualifies it to use
rcuwait for synchronisation.

Let irq_work_sync() synchronize with rcuwait if the architecture
processes irqwork via the timer tick.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20211006111852.1514359-3-bigeasy@linutronix.de


Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

cf4bb976

drm/i915/gt: Queue and wait for the irq_work item. · 45cb435e

Sebastian Andrzej Siewior authored 3 years ago


Disabling interrupts and invoking the irq_work function directly breaks
on PREEMPT_RT.
PREEMPT_RT does not invoke all irq_work from hardirq context because
some of the user have spinlock_t locking in the callback function.
These locks are then turned into a sleeping locks which can not be
acquired with disabled interrupts.

Using irq_work_queue() has the benefit that the irqwork will be invoked
in the regular context. In general there is "no" delay between enqueuing
the callback and its invocation because the interrupt is raised right
away on architectures which support it (which includes x86).

Use irq_work_queue() + irq_work_sync() instead invoking the callback
directly.

Reported-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

45cb435e

locking: Drop might_resched() from might_sleep_no_state_check() · 8eddd2dc

Sebastian Andrzej Siewior authored 3 years ago


might_sleep_no_state_check() serves the same purpose as might_sleep()
except it is used before sleeping locks are acquired and therefore does
not check task_struct::state because the state is preserved.

That state is preserved in the locking slow path so we must not schedule
at the begin of the locking function because the state will be lost and
not preserved at that time.

Remove might_resched() from might_sleep_no_state_check() to avoid losing the
state before it is preserved.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

8eddd2dc

fscache: Use only one fscache_object_cong_wait. · 5fe656c5

Sebastian Andrzej Siewior authored 3 years ago


This is an update of the original patch, removing put_cpu_var() which
was overseen in the initial patch.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

5fe656c5

fscache: Use only one fscache_object_cong_wait. · 9ebcbc0c

Sebastian Andrzej Siewior authored 3 years ago


In the commit mentioned below, fscache was converted from slow-work to
workqueue. slow_work_enqueue() and slow_work_sleep_till_thread_needed()
did not use a per-CPU workqueue. They choose from two global waitqueues
depending on the SLOW_WORK_VERY_SLOW bit which was not set so it always
one waitqueue.

I can't find out how it is ensured that a waiter on certain CPU is woken
up be the other side. My guess is that the timeout in schedule_timeout()
ensures that it does not wait forever (or a random wake up).

fscache_object_sleep_till_congested() must be invoked from preemptible
context in order for schedule() to work. In this case this_cpu_ptr()
should complain with CONFIG_DEBUG_PREEMPT enabled except the thread is
bound to one CPU.

wake_up() wakes only one waiter and I'm not sure if it is guaranteed
that only one waiter exists.

Replace the per-CPU waitqueue with one global waitqueue.

Fixes: 8b8edefa ("fscache: convert object to use workqueue instead of slow-work")
Reported-by: Gregor Beck <gregor.beck@gmail.com>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

9ebcbc0c

mm: Disable NUMA_BALANCING_DEFAULT_ENABLED and TRANSPARENT_HUGEPAGE on PREEMPT_RT · 4e8d8999

Sebastian Andrzej Siewior authored 3 years ago

TRANSPARENT_HUGEPAGE:
There are potential non-deterministic delays to an RT thread if a critical
memory region is not THP-aligned and a non-RT buffer is located in the same
hugepage-aligned region. It's also possible for an unrelated thread to migrate
pages belonging to an RT task incurring unexpected page faults due to memory
defragmentation even if khugepaged is disabled.

Regular HUGEPAGEs are not affected by this can be used.

NUMA_BALANCING:
There is a non-deterministic delay to mark PTEs PROT_NONE to gather NUMA fault
samples, increased page faults of regions even if mlocked and non-deterministic
delays when migrating pages.

[Mel Gorman worded 99% of the commit description].

Link: https://lore.kernel.org/all/20200304091159.GN3818@techsingularity.net/
Link: https://lore.kernel.org/all/20211026165100.ahz5bkx44lrrw5pt@linutronix.de/


Cc: stable-rt@vger.kernel.org
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Link: https://lore.kernel.org/r/20211028143327.hfbxjze7palrpfgp@linutronix.de


Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

4e8d8999

preempt: Move preempt_enable_no_resched() to the RT block · 8cc1a32c

Sebastian Andrzej Siewior authored 3 years ago


preempt_enable_no_resched() should point to preempt_enable() on
PREEMPT_RT so nobody is playing any preempt tricks and enables
preemption without checking for the need-resched flag.

This was misplaced in v3.14.0-rt1 und remained unnoticed until now.

Point preempt_enable_no_resched() and preempt_enable() on RT.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

8cc1a32c

sched: Switch wait_task_inactive to HRTIMER_MODE_REL_HARD · f29a6f7b

Sebastian Andrzej Siewior authored 3 years ago

With PREEMPT_RT enabled all hrtimers callbacks will be invoked in
softirq mode unless they are explicitly marked as HRTIMER_MODE_HARD.
During boot kthread_bind() is used for the creation of per-CPU threads
and then hangs in wait_task_inactive() if the ksoftirqd is not
yet up and running.
The hang disappeared since commit
   26c7295b ("kthread: Do not preempt current task if it is going to call schedule()")

but enabling function trace on boot reliably leads to the freeze on boot
behaviour again.
The timer in wait_task_inactive() can not be directly used by an user
interface to abuse it and create a mass wake of several tasks at the
same time which would to long sections with disabled interrupts.
Therefore it is safe to make the timer HRTIMER_MODE_REL_HARD.

Switch the timer to HRTIMER_MODE_REL_HARD.

Cc: stable-rt@vger.kernel.org
Link: https://lkml.kernel.org/r/20210826170408.vm7rlj7odslshwch@linutronix.de


Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

f29a6f7b

sched: Fix get_push_task() vs migrate_disable() · 93c4258b

Sebastian Andrzej Siewior authored 3 years ago


push_rt_task() attempts to move the currently running task away if the
next runnable task has migration disabled and therefore is pinned on the
current CPU.

The current task is retrieved via get_push_task() which only checks for
nr_cpus_allowed == 1, but does not check whether the task has migration
disabled and therefore cannot be moved either. The consequence is a
pointless invocation of the migration thread which correctly observes
that the task cannot be moved.

Return NULL if the task has migration disabled and cannot be moved to
another CPU.

Cc: stable-rt@vger.kernel.org
Fixes: a7c81556 ("sched: Fix migrate_disable() vs rt/dl balancing")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210826133738.yiotqbtdaxzjsnfj@linutronix.de


Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

93c4258b

mm, zsmalloc: Convert zsmalloc_handle.lock to spinlock_t · b4a9c844

Mike Galbraith authored 3 years ago


local_lock_t becoming a synonym of spinlock_t had consequences for the RT
mods to zsmalloc, which were taking a mutex while holding a local_lock,
inspiring a lockdep "BUG: Invalid wait context" gripe.

Converting zsmalloc_handle.lock to a spinlock_t restored lockdep silence.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

b4a9c844

locking/rwsem-rt: Remove might_sleep() in __up_read() · 7131b777

Andrew Halaney authored 3 years ago


There's no chance of sleeping here, the reader is giving up the
lock and possibly waking up the writer who is waiting on it.

Reported-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Andrew Halaney <ahalaney@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

7131b777

printk: Enhance the condition check of msleep in pr_flush() · 6e1b154d

Chao Qin authored 3 years ago


[ Upstream commit 83e9288d9c4295d1195e9d780fcbc42c72ba4a83 ]

There is msleep in pr_flush(). If call WARN() in the early boot
stage such as in early_initcall, pr_flush() will run into msleep
when process scheduler is not ready yet. And then the system will
sleep forever.

Before the system_state is SYSTEM_RUNNING, make sure DO NOT sleep
in pr_flush().

Fixes: c0b395bd0fe3("printk: add pr_flush()")
Signed-off-by: Chao Qin <chao.qin@intel.com>
Signed-off-by: Lili Li <lili.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20210719022649.3444072-1-chao.qin@intel.com


Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

6e1b154d

sched: Don't defer CPU pick to migration_cpu_stop() · b625852d

Valentin Schneider authored 3 years ago


commit 475ea6c6 upstream.

Will reported that the 'XXX __migrate_task() can fail' in migration_cpu_stop()
can happen, and it *is* sort of a big deal. Looking at it some more, one
will note there is a glaring hole in the deferred CPU selection:

  (w/ CONFIG_CPUSET=n, so that the affinity mask passed via taskset doesn't
  get AND'd with cpu_online_mask)

  $ taskset -pc 0-2 $PID
  # offline CPUs 3-4
  $ taskset -pc 3-5 $PID
    `\
      $PID may stay on 0-2 due to the cpumask_any_distribute() picking an
      offline CPU and __migrate_task() refusing to do anything due to
      cpu_is_allowed().

set_cpus_allowed_ptr() goes to some length to pick a dest_cpu that matches
the right constraints vs affinity and the online/active state of the
CPUs. Reuse that instead of discarding it in the affine_move_task() case.

Fixes: 6d337eab ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210526205751.842360-2-valentin.schneider@arm.com


Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

b625852d

sched: Simplify set_affinity_pending refcounts · 6681b566

Peter Zijlstra authored 3 years ago


commit 50caf9c1 upstream.

Now that we have set_affinity_pending::stop_pending to indicate if a
stopper is in progress, and we have the guarantee that if that stopper
exists, it will (eventually) complete our @pending we can simplify the
refcount scheme by no longer counting the stopper thread.

Fixes: 6d337eab ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210224131355.724130207@infradead.org


Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

6681b566

sched: Fix affine_move_task() self-concurrency · e9faaf02

Peter Zijlstra authored 3 years ago


commit 9e81889c upstream.

Consider:

   sched_setaffinity(p, X);		sched_setaffinity(p, Y);

Then the first will install p->migration_pending = &my_pending; and
issue stop_one_cpu_nowait(pending); and the second one will read
p->migration_pending and _also_ issue: stop_one_cpu_nowait(pending),
the _SAME_ @pending.

This causes stopper list corruption.

Add set_affinity_pending::stop_pending, to indicate if a stopper is in
progress.

Fixes: 6d337eab ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210224131355.649146419@infradead.org


Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

e9faaf02

sched: Optimize migration_cpu_stop() · b1353ab8

Peter Zijlstra authored 3 years ago


commit 3f1bc119 upstream.

When the purpose of migration_cpu_stop() is to migrate the task to
'any' valid CPU, don't migrate the task when it's already running on a
valid CPU.

Fixes: 6d337eab ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210224131355.569238629@infradead.org


Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

b1353ab8

sched: Collate affine_move_task() stoppers · e62b758a

Peter Zijlstra authored 3 years ago


commit 58b1a450 upstream.

The SCA_MIGRATE_ENABLE and task_running() cases are almost identical,
collapse them to avoid further duplication.

Fixes: 6d337eab ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210224131355.500108964@infradead.org


Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

e62b758a

sched: Simplify migration_cpu_stop() · 4f468832

Peter Zijlstra authored 3 years ago


commit c20cf065 upstream.

When affine_move_task() issues a migration_cpu_stop(), the purpose of
that function is to complete that @pending, not any random other
p->migration_pending that might have gotten installed since.

This realization much simplifies migration_cpu_stop() and allows
further necessary steps to fix all this as it provides the guarantee
that @pending's stopper will complete @pending (and not some random
other @pending).

Fixes: 6d337eab ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210224131355.430014682@infradead.org


Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

4f468832

sched: Fix migration_cpu_stop() requeueing · 013f5d75

Peter Zijlstra authored 3 years ago


commit 8a6edb52 upstream.

When affine_move_task(p) is called on a running task @p, which is not
otherwise already changing affinity, we'll first set
p->migration_pending and then do:

	 stop_one_cpu(cpu_of_rq(rq), migration_cpu_stop, &arg);

This then gets us to migration_cpu_stop() running on the CPU that was
previously running our victim task @p.

If we find that our task is no longer on that runqueue (this can
happen because of a concurrent migration due to load-balance etc.),
then we'll end up at the:

	} else if (dest_cpu < 1 || pending) {

branch. Which we'll take because we set pending earlier. Here we first
check if the task @p has already satisfied the affinity constraints,
if so we bail early [A]. Otherwise we'll reissue migration_cpu_stop()
onto the CPU that is now hosting our task @p:

	stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
			    &pending->arg, &pending->stop_work);

Except, we've never initialized pending->arg, which will be all 0s.

This then results in running migration_cpu_stop() on the next CPU with
arg->p == NULL, which gives the by now obvious result of fireworks.

The cure is to change affine_move_task() to always use pending->arg,
furthermore we can use the exact same pattern as the
SCA_MIGRATE_ENABLE case, since we'll block on the pending->done
completion anyway, no point in adding yet another completion in
stop_one_cpu().

This then gives a clear distinction between the two
migration_cpu_stop() use cases:

  - sched_exec() / migrate_task_to() : arg->pending == NULL
  - affine_move_task() : arg->pending != NULL;

And we can have it ignore p->migration_pending when !arg->pending. Any
stop work from sched_exec() / migrate_task_to() is in addition to stop
works from affine_move_task(), which will be sufficient to issue the
completion.

Fixes: 6d337eab ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210224131355.357743989@infradead.org


Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

013f5d75

net: xfrm: Use sequence counter with associated spinlock · 97138dbb

Ahmed S. Darwish authored 3 years ago


A sequence counter write section must be serialized or its internal
state can get corrupted. A plain seqcount_t does not contain the
information of which lock must be held to guaranteee write side
serialization.

For xfrm_state_hash_generation, use seqcount_spinlock_t instead of plain
seqcount_t.  This allows to associate the spinlock used for write
serialization with the sequence counter. It thus enables lockdep to
verify that the write serialization lock is indeed held before entering
the sequence counter write section.

If lockdep is disabled, this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

97138dbb

Add localversion for -RT release · 180a0118
Thomas Gleixner authored 13 years ago
```
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
```
180a0118

sysfs: Add /sys/kernel/realtime entry · e7909a8a

Clark Williams authored 13 years ago


Add a /sys/kernel entry to indicate that the kernel is a
realtime kernel.

Clark says that he needs this for udev rules, udev needs to evaluate
if its a PREEMPT_RT kernel a few thousand times and parsing uname
output is too slow or so.

Are there better solutions? Should it exist and return 0 on !-rt?

Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>

e7909a8a

genirq: Disable irqpoll on -rt · 4795907a

Ingo Molnar authored 15 years ago


Creates long latencies for no value

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

4795907a

signal: Prevent double-free of user struct · 9a8853e6

Matt Fleming authored 4 years ago


The way user struct reference counting works changed significantly with,

  fda31c50 ("signal: avoid double atomic counter increments for user accounting")

Now user structs are only freed once the last pending signal is
dequeued. Make sigqueue_free_current() follow this new convention to
avoid freeing the user struct multiple times and triggering this
warning:

 refcount_t: underflow; use-after-free.
 WARNING: CPU: 0 PID: 6794 at lib/refcount.c:288 refcount_dec_not_one+0x45/0x50
 Call Trace:
  refcount_dec_and_lock_irqsave+0x16/0x60
  free_uid+0x31/0xa0
  __dequeue_signal+0x17c/0x190
  dequeue_signal+0x5a/0x1b0
  do_sigtimedwait+0x208/0x250
  __x64_sys_rt_sigtimedwait+0x6f/0xd0
  do_syscall_64+0x72/0x200
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Reported-by: Daniel Wagner <wagi@monom.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

9a8853e6

signals: Allow rt tasks to cache one sigqueue struct · 1f9d07b4

Thomas Gleixner authored 15 years ago


To avoid allocation allow rt tasks to cache one sigqueue struct in
task struct.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

1f9d07b4

Admin message