Commits · v4.19.322-rt138-rebase · Arch Linux / Packaging / Upstream / linux-rt-lts

This project is mirrored from https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git. Pull mirroring updated 8 minutes ago.

Sep 23, 2024

Linux 4.19.322-rt138 REBASE · 37a0707b
Daniel Wagner authored 5 months ago
```
Signed-off-by: Daniel Wagner <wagi@monom.org>
```
View commits for tag v4.19.322-rt138-rebase v4.19.322-rt138-rebase

37a0707b

Revert "sched/rt: Provide migrate_disable/enable() inlines" · 84f3e7d3

Daniel Wagner authored 1 year ago


This reverts commit 56e89498.

The tree contains already the migrate_disable/enable() helpers thus this
stable backport conflicts (b) with the existing definition (compiler
complains with conflicting definition). Thus we don't need this
backported functions and can avoid the conflict by just dropping the
backport.

Signed-off-by: Daniel Wagner <wagi@monom.org>

84f3e7d3

workqueue: Fix deadlock due to recursive locking of pool->lock · 8c3ebd2d

Brennan Lamoreaux (VMware) authored 2 years ago


Upstream commit d8bb65ab ("workqueue: Use rcuwait for wq_manager_wait")
replaced the waitqueue with rcuwait in the workqueue code. This change
involved removing the acquisition of pool->lock in put_unbound_pool(),
as it also adds the function wq_manager_inactive() which acquires this same
lock and is called one line later as a parameter to rcu_wait_event().

However, the backport of this commit in the PREEMPT_RT patchset
4.19.255-rt114 (patch 347) missed the removal of the acquisition of
pool->lock in put_unbound_pool(). This leads to a deadlock due to
recursive locking of pool->lock, as shown below in lockdep:

[  252.083713] WARNING: possible recursive locking detected
[  252.083718] 4.19.269-3.ph3-rt #1-photon Not tainted
[  252.083721] --------------------------------------------
[  252.083733] kworker/2:0/33 is trying to acquire lock:
[  252.083747] 000000000b7b1ceb (&pool->lock/1){....}, at:
put_unbound_pool+0x10d/0x260

[  252.083857]
               but task is already holding lock:
[  252.083860] 000000000b7b1ceb (&pool->lock/1){....}, at:
put_unbound_pool+0xbd/0x260

[  252.083876]
               other info that might help us debug this:
[  252.083897]  Possible unsafe locking scenario:

[  252.083900]        CPU0
[  252.083903]        ----
[  252.083904]   lock(&pool->lock/1);
[  252.083911]   lock(&pool->lock/1);
[  252.083919]
                *** DEADLOCK ***

[  252.083921]  May be due to missing lock nesting notation

Fix this deadlock by removing the pool->lock acquisition in
put_unbound_pool().

Signed-off-by: Brennan Lamoreaux (VMware) <brennanlamoreaux@gmail.com>
Cc: Daniel Wagner <wagi@monom.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Reviewed-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Link: https://lore.kernel.org/r/20230228224938.88035-1-brennanlamoreaux@gmail.com


Signed-off-by: Daniel Wagner <wagi@monom.org>

8c3ebd2d

Revert "percpu: include irqflags.h for raw_local_irq_save()" · 17d6a330

Ben Hutchings authored 2 years ago


This reverts commit 0d796a9e.

After merging stable release 4.19.266 into the -rt branch, an x86
build will fail with the following error:

    .../include/linux/percpu-defs.h:49:34: error: 'PER_CPU_BASE_SECTION' undeclared here (not in a function); did you mean 'PER_CPU_FIRST_SECTION'?

This is due to an #include loop:

    <asm/percpu.h>
     -> <linux/irqflags.h>
         -> <asm/irqflags.h>
             -> <asm/nospec-branch.h>
                 -> <asm/percpu.h>

which appears after the merge because:

- The reverted commit added <asm/percpu.h> -> <linux/irqflags.h>
- 4.19.266 added <asm/nospec-branch.h> -> <asm/percpu.h>

Neither upstream nor any other maintained stable-rt branch has this
include, and my build succeeded without it.  Revert it here as well.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Link: https://lore.kernel.org/r/Y5O/aVw/zHKqmpu7@decadent.org.uk


Signed-off-by: Daniel Wagner <wagi@monom.org>

17d6a330

timers: Don't block on ->expiry_lock for TIMER_IRQSAFE timers · c73a08a4

Sebastian Andrzej Siewior authored 2 years ago


Upstream commit c725dafc

PREEMPT_RT does not spin and wait until a running timer completes its
callback but instead it blocks on a sleeping lock to prevent a livelock in
the case that the task waiting for the callback completion preempted the
callback.

This cannot be done for timers flagged with TIMER_IRQSAFE. These timers can
be canceled from an interrupt disabled context even on RT kernels.

The expiry callback of such timers is invoked with interrupts disabled so
there is no need to use the expiry lock mechanism because obviously the
callback cannot be preempted even on RT kernels.

Do not use the timer_base::expiry_lock mechanism when waiting for a running
callback to complete if the timer is flagged with TIMER_IRQSAFE.

Also add a lockdep assertion for RT kernels to validate that the expiry
lock mechanism is always invoked in preemptible context.

[ bigeasy: Dropping that lockdep_assert_preemption_enabled() check in
           backport ]

Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201103190937.hga67rqhvknki3tp@linutronix.de


Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Daniel Wagner <wagi@monom.org>

c73a08a4

timers: Move clearing of base::timer_running under base:: Lock · d3a615cc

Thomas Gleixner authored 2 years ago


Upstream commit bb7262b2

syzbot reported KCSAN data races vs. timer_base::timer_running being set to
NULL without holding base::lock in expire_timers().

This looks innocent and most reads are clearly not problematic, but
Frederic identified an issue which is:

 int data = 0;

 void timer_func(struct timer_list *t)
 {
    data = 1;
 }

 CPU 0                                            CPU 1
 ------------------------------                   --------------------------
 base = lock_timer_base(timer, &flags);           raw_spin_unlock(&base->lock);
 if (base->running_timer != timer)                call_timer_fn(timer, fn, baseclk);
   ret = detach_if_pending(timer, base, true);    base->running_timer = NULL;
 raw_spin_unlock_irqrestore(&base->lock, flags);  raw_spin_lock(&base->lock);

 x = data;

If the timer has previously executed on CPU 1 and then CPU 0 can observe
base->running_timer == NULL and returns, assuming the timer has completed,
but it's not guaranteed on all architectures. The comment for
del_timer_sync() makes that guarantee. Moving the assignment under
base->lock prevents this.

For non-RT kernel it's performance wise completely irrelevant whether the
store happens before or after taking the lock. For an RT kernel moving the
store under the lock requires an extra unlock/lock pair in the case that
there is a waiter for the timer, but that's not the end of the world.

Reported-by:  <syzbot+aa7c2385d46c5eba0b89@syzkaller.appspotmail.com>
Reported-by:  <syzbot+abea4558531bae1ba9fe@syzkaller.appspotmail.com>
Fixes: 030dcdd1 ("timers: Prepare support for PREEMPT_RT")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/87lfea7gw8.fsf@nanos.tec.linutronix.de


Cc: stable@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Daniel Wagner <wagi@monom.org>

d3a615cc

workqueue: Use rcuwait for wq_manager_wait · f8a0f3fc

Sebastian Andrzej Siewior authored 2 years ago


Upstream commit d8bb65ab

The workqueue code has it's internal spinlock (pool::lock) and also
implicit spinlock usage in the wq_manager waitqueue. These spinlocks
are converted to 'sleeping' spinlocks on a RT-kernel.

Workqueue functions can be invoked from contexts which are truly atomic
even on a PREEMPT_RT enabled kernel. Taking sleeping locks from such
contexts is forbidden.

pool::lock can be converted to a raw spinlock as the lock held times
are short. But the workqueue manager waitqueue is handled inside of
pool::lock held regions which again violates the lock nesting rules
of raw and regular spinlocks.

The manager waitqueue has no special requirements like custom wakeup
callbacks or mass wakeups. While it does not use exclusive wait mode
explicitly there is no strict requirement to queue the waiters in a
particular order as there is only one waiter at a time.

This allows to replace the waitqueue with rcuwait which solves the
locking problem because rcuwait relies on existing locking.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
[wagi: Updated context as v4.19-rt was using swait]
Signed-off-by: Daniel Wagner <wagi@monom.org>

f8a0f3fc

rcu: Update rcuwait · 65d923a0

Daniel Wagner authored 2 years ago


This is an all in one commit backporting updates for rcuwait:
 - 03f4b48e ("rcuwait: Annotate task_struct with __rcu")
 - 191a43be ("rcuwait: Introduce rcuwait_active()")
 - 5c21f7b3 ("rcuwait: Introduce prepare_to and finish_rcuwait")
 - 80fbaf1c ("rcuwait: Add @state argument to rcuwait_wait_event()")
 - 9d9a6ebf ("rcuwait: Let rcuwait_wake_up() return whether or not a task was awoken")
 - 58d4292b ("rcu: Uninline multi-use function: finish_rcuwait()")

Signed-off-by: Daniel Wagner <wagi@monom.org>

65d923a0

timers: Don't block on ->expiry_lock for TIMER_IRQSAFE timers · 97362a16

Sebastian Andrzej Siewior authored 2 years ago


Upstream commit c725dafc

PREEMPT_RT does not spin and wait until a running timer completes its
callback but instead it blocks on a sleeping lock to prevent a livelock in
the case that the task waiting for the callback completion preempted the
callback.

This cannot be done for timers flagged with TIMER_IRQSAFE. These timers can
be canceled from an interrupt disabled context even on RT kernels.

The expiry callback of such timers is invoked with interrupts disabled so
there is no need to use the expiry lock mechanism because obviously the
callback cannot be preempted even on RT kernels.

Do not use the timer_base::expiry_lock mechanism when waiting for a running
callback to complete if the timer is flagged with TIMER_IRQSAFE.

Also add a lockdep assertion for RT kernels to validate that the expiry
lock mechanism is always invoked in preemptible context.

Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201103190937.hga67rqhvknki3tp@linutronix.de


[bigeasy: The logic in v4.19 is slightly different but the outcome is the
   same as we must not sleep while waiting for the irqsafe timer to
   complete. The IRQSAFE timer can not be preempted.
   The "lockdep annotation" is not available and has been replaced with
   might_sleep()]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Daniel Wagner <wagi@monom.org>

97362a16

Revert "workqueue: Use local irq lock instead of irq disable regions" · 34651208

Sebastian Andrzej Siewior authored 2 years ago


This reverts the PREEMPT_RT related changes to workqueue. It reverts the
usage of local_locks() and cpu_chill().

This is a preparation to pull in the PREEMPT_RT related changes which
were merged upstream.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
[wagi: 827b6f69 ("workqueue: rework") already reverted
       most of the changes, except the missing update in
       put_pwq_unlocked.]
Signed-off-by: Daniel Wagner <wagi@monom.org>

34651208

local_lock: Provide INIT_LOCAL_LOCK(). · d57105a6

Sebastian Andrzej Siewior authored 2 years ago


The original code was using INIT_LOCAL_LOCK() and I tried to sneak
around it and forgot that this code also needs to compile on !RT
platforms.

Provide INIT_LOCAL_LOCK() to initialize properly on RT and do nothing on
!RT. Let random.c use which is the only user so far and oes not compile
on !RT otherwise.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/all/YzcEIU17EIZ7ZIF5@linutronix.de/


Signed-off-by: Daniel Wagner <wagi@monom.org>

d57105a6

random: Bring back the local_locks · 0d4edd90

Sebastian Andrzej Siewior authored 2 years ago

As part of the backports the random code lost its local_lock_t type and
the whole operation became a local_irq_{disable|enable}() simply because
the older kernel did not provide those primitives.

RT as of v4.9 has a slightly different variant of local_locks.
Replace the local_irq_*() operations with matching local_lock_irq*()
operations which were there as part of commit
77760fd7 ("random: remove batched entropy locking")

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/all/20220819092446.980320-2-bigeasy@linutronix.de/

Signed-off-by: Daniel Wagner <dwagner@suse.de>

0d4edd90

genirq: Add lost hunk to irq_forced_thread_fn(). · d0344fe5

Sebastian Andrzej Siewior authored 2 years ago


The irq_settings_no_softirq_call() related handling got lost in process,
here are the missing bits.

Reported-by: Martin Kaistra <martin.kaistra@linutronix.de>
Fixes: b0cf5c23 ("Merge tag 'v4.19.183' into linux-4.19.y-rt")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Clark Williams <williams@redhat.com>

d0344fe5

net: Add missing xmit_lock_owner hunks. · f1c3f122

Sebastian Andrzej Siewior authored 3 years ago


The patch
	net: move xmit_recursion to per-task variable on -RT

lost a few hunks during its rebase.

Add the `xmit_lock_owner' accessor/wrapper.

Reported-by: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

f1c3f122

rt: PREEMPT_RT safety net for backported patches · aa57c708

Clark Williams authored 3 years ago

While doing some 4.19-rt cleanup work, I stumbled across the fact that parts of
two backported patches were dependent on CONFIG_PREEMPT_RT, rather than
the CONFIG_PREEMPT_RT_FULL used in 4.19 and earlier RT series. The commits
in the linux-stable-rt v4.19-rt branch are:

dad4c6a3 mm: slub: Don't resize the location tracking cache on PREEMPT_RT
e626b6f8 net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT

Discussing this at the Stable RT maintainers meeting, Steven Rostedt suggested that
we automagically select CONFIG_PREEMPT_RT if CONFIG_PREEMPT_RT_FULL is on, giving
us a safety net for any subsequently backported patches. Here's my first cut at
that patch.

I suspect we'll need a similar patch for stable RT kernels < 4.19.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Clark Williams <williams@redhat.com>

aa57c708

fscache: fix initialisation of cookie hash table raw spinlocks · b5c5f3fb

Gregor Beck authored 3 years ago


The original patch, 60266060 ("fscache: initialize cookie hash
table raw spinlocks"), subtracted 1 from the shift and so still left
some spinlocks uninitialized.  This fixes that.

[zanussi: Added changelog text]

Signed-off-by: Gregor Beck <gregor.beck@gmail.com>
Fixes: 60266060 ("fscache: initialize cookie hash table raw spinlocks")
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
(cherry picked from commit 2cdede91)
Signed-off-by: Clark Williams <williams@redhat.com>

b5c5f3fb

locking/rwsem-rt: Remove might_sleep() in __up_read() · e751a604

Andrew Halaney authored 3 years ago


There's no chance of sleeping here, the reader is giving up the
lock and possibly waking up the writer who is waiting on it.

Reported-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Andrew Halaney <ahalaney@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
(cherry picked from commit b2ed0a43)
Signed-off-by: Clark Williams <williams@redhat.com>

e751a604

locking/rwsem_rt: Add __down_read_interruptible() · c52a94ba

Sebastian Andrzej Siewior authored 4 years ago


The stable backported a patch which adds __down_read_interruptible() for
the generic rwsem implementation.

Add RT's version __down_read_interruptible().

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

c52a94ba

mm: slub: Don't resize the location tracking cache on PREEMPT_RT · 952029dd

Sebastian Andrzej Siewior authored 4 years ago


The location tracking cache has a size of a page and is resized if its
current size is too small.
This allocation happens with disabled interrupts and can't happen on
PREEMPT_RT.
Should one page be too small, then we have to allocate more at the
beginning. The only downside is that less callers will be visible.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
(cherry picked from commit 87bd0bf3)
Signed-off-by: Clark Williams <williams@redhat.com>

952029dd

ptrace: fix ptrace_unfreeze_traced() race with rt-lock · 49cd9498

Oleg Nesterov authored 4 years ago


[ Upstream commit 0fdc9197 ]

The patch "ptrace: fix ptrace vs tasklist_lock race" changed
ptrace_freeze_traced() to take task->saved_state into account, but
ptrace_unfreeze_traced() has the same problem and needs a similar fix:
it should check/update both ->state and ->saved_state.

Reported-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Fixes: "ptrace: fix ptrace vs tasklist_lock race"
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Tom Zanussi <zanussi@kernel.org>

49cd9498

mm/memcontrol: Disable preemption in __mod_memcg_lruvec_state() · b5c9136b

Sebastian Andrzej Siewior authored 4 years ago


[ Upstream commit 74858f0d ]

The callers expect disabled preemption/interrupts while invoking
__mod_memcg_lruvec_state(). This works mainline because a lock of
somekind is acquired.

Use preempt_disable_rt() where per-CPU variables are accessed and a
stable pointer is expected. This is also done in __mod_zone_page_state()
for the same reason.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>

 Conflicts:
	mm/memcontrol.c

b5c9136b

net: xfrm: fix compress vs decompress serialization · 7f1d830e

Davidlohr Bueso authored 4 years ago

A crash was seen in xfrm when running ltp's 'tcp4_ipsec06' stresser on v4.x
based RT kernels.

ipcomp_compress() will serialize access to the ipcomp_scratches percpu buffer by
disabling BH and preventing a softirq from coming in and running ipcom_decompress(),
which is never called from process context. This of course won't work on RT and
the buffer can get corrupted; there have been similar issues with in the past with
such assumptions, ie: ebf255ed (net: add back the missing serialization in
ip_send_unicast_reply()).

Similarly, this patch addresses the issue with locallocks allowing RT to have a
percpu spinlock and do the correct serialization.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>

7f1d830e

net: phy: fixed_phy: Remove unused seqcount · 613c811e

Ahmed S. Darwish authored 4 years ago


[ Upstream commit 6554eac9 ]

Commit bf7afb29 ("phy: improve safety of fixed-phy MII register
reading") protected the fixed PHY status with a sequence counter.

Two years later, commit d2b97793 ("net: phy: fixed-phy: remove
fixed_phy_update_state()") removed the sequence counter's write side
critical section -- neutralizing its read side retry loop.

Remove the unused seqcount.

Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from v5.8-rc1 commit 79cbb6bc)
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>

613c811e

Bluetooth: Acquire sk_lock.slock without disabling interrupts · 3946c24c

Sebastian Andrzej Siewior authored 4 years ago


[ Upstream commit e6da0edc ]

There was a lockdep which led to commit
   fad003b6 ("Bluetooth: Fix inconsistent lock state with RFCOMM")

Lockdep noticed that `sk->sk_lock.slock' was acquired without disabling
the softirq while the lock was also used in softirq context.
Unfortunately the solution back then was to disable interrupts before
acquiring the lock which however made lockdep happy.
It would have been enough to simply disable the softirq. Disabling
interrupts before acquiring a spinlock_t is not allowed on PREEMPT_RT
because these locks are converted to 'sleeping' spinlocks.

Use spin_lock_bh() in order to acquire the `sk_lock.slock'.

Reported-by: Luis Claudio R. Goncalves <lclaudio@uudg.org>
Reported-by: kbuild test robot <lkp@intel.com> [missing unlock]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>

3946c24c

signal: Prevent double-free of user struct · 890dbf45

Matt Fleming authored 4 years ago


[ Upsteam commit 9567db2e ]

The way user struct reference counting works changed significantly with,

  fda31c50 ("signal: avoid double atomic counter increments for user accounting")

Now user structs are only freed once the last pending signal is
dequeued. Make sigqueue_free_current() follow this new convention to
avoid freeing the user struct multiple times and triggering this
warning:

 refcount_t: underflow; use-after-free.
 WARNING: CPU: 0 PID: 6794 at lib/refcount.c:288 refcount_dec_not_one+0x45/0x50
 Call Trace:
  refcount_dec_and_lock_irqsave+0x16/0x60
  free_uid+0x31/0xa0
  __dequeue_signal+0x17c/0x190
  dequeue_signal+0x5a/0x1b0
  do_sigtimedwait+0x208/0x250
  __x64_sys_rt_sigtimedwait+0x6f/0xd0
  do_syscall_64+0x72/0x200
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Reported-by: Daniel Wagner <wagi@monom.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>

890dbf45

tasklet: Fix UP case for tasklet CHAINED state · 68da2869

Tom Zanussi authored 4 years ago


commit 62d0a2a3 (tasklet: Address a race resulting in
double-enqueue) addresses a problem that can result in a tasklet being
enqueued on two cpus at the same time by combining the RUN flag with a
new CHAINED flag, and relies on the combination to be present in order
to zero it out, which can never happen on (!SMP and !PREEMPT_RT_FULL)
because the RUN flag is SMP/PREEMPT_RT_FULL-only.

So make sure the above commit is only applied for the SMP ||
PREEMPT_RT_FULL case.

Fixes: 62d0a2a3 ("tasklet: Address a race resulting in double-enqueue")
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Reported-by: Ramon Fried <rfried.dev@gmail.com>
Tested-By: Ramon Fried <rfried.dev@gmail.com>

68da2869

mm: slub: Always flush the delayed empty slubs in flush_all() · f7b37ef7

Kevin Hao authored 4 years ago


[ Upstream commit 23a2c31b ]

After commit f0b23110 ("mm/SLUB: delay giving back empty slubs to
IRQ enabled regions"), when the free_slab() is invoked with the IRQ
disabled, the empty slubs are moved to a per-CPU list and will be
freed after IRQ enabled later. But in the current codes, there is
a check to see if there really has the cpu slub on a specific cpu
before flushing the delayed empty slubs, this may cause a reference
of already released kmem_cache in a scenario like below:
	cpu 0				cpu 1
  kmem_cache_destroy()
    flush_all()
                         --->IPI       flush_cpu_slab()
                                         flush_slab()
                                           deactivate_slab()
                                             discard_slab()
                                               free_slab()
                                             c->page = NULL;
      for_each_online_cpu(cpu)
        if (!has_cpu_slab(1, s))
          continue
        this skip to flush the delayed
        empty slub released by cpu1
    kmem_cache_free(kmem_cache, s)

                                       kmalloc()
                                         __slab_alloc()
                                            free_delayed()
                                            __free_slab()
                                            reference to released kmem_cache

Fixes: f0b23110 ("mm/SLUB: delay giving back empty slubs to IRQ enabled regions")
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Tom Zanussi <zanussi@kernel.org>

f7b37ef7

fs/dcache: Include swait.h header · ede21695

Sebastian Andrzej Siewior authored 4 years ago


[ Upstream commit 279f90dd ]

Include the swait.h header so it compiles even if not all patches are
applied.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>

 Conflicts:
	fs/proc/base.c

ede21695

hrtimer: fix logic for when grabbing softirq_expiry_lock can be elided · aea4fcb3

Rasmus Villemoes authored 4 years ago


Commit

  hrtimer: Add a missing bracket and hide `migration_base' on !SMP

which is 47b6de0b in 5.2-rt and 40aae570 in 4.19-rt,
inadvertently changed the logic from base != &migration_base to base
== &migration_base.

On !CONFIG_SMP, the effect was to effectively always elide this
lock/unlock pair (since is_migration_base() is unconditionally false),
which for me consistently causes lockups during reboot, and reportedly
also often causes a hang during boot.

Adding this logical negation (or, what is effectively the same thing
on !CONFIG_SMP, reverting the above commit as well as "hrtimer:
Prevent using hrtimer_grab_expiry_lock() on migration_base") fixes
that lockup.

Fixes: 40aae570 (hrtimer: Add a missing bracket and hide `migration_base' on !SMP) # 4.19-rt
Fixes: 47b6de0b (hrtimer: Add a missing bracket and hide `migration_base' on !SMP) # 5.2-rt
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>

aea4fcb3

tasklet: Address a race resulting in double-enqueue · 7b52c409

Zhang Xiao authored 4 years ago

The kernel bugzilla has the following race condition reported:

CPU0                    CPU1            CPU2
------------------------------------------------
test_set SCHED
 test_set RUN
   if SCHED
    add_list
    raise
    clear RUN
<softirq>
test_set RUN
test_clear SCHED
 ->func
                        test_set SCHED
tasklet_try_unlock ->0
test_clear SCHED
                                        test_set SCHED
 ->func
tasklet_try_unlock ->1
                                        test_set RUN
                                        if SCHED
                                        add list
                                        raise
                                        clear RUN
                        test_set RUN
                        if SCHED
                         add list
                         raise
                         clear RUN

As a result the tasklet is enqueued on both CPUs and run on both CPUs. Due
to the nature of the list used here, it is possible that further
(different) tasklets, which are enqueued after this double-enqueued
tasklet, are scheduled on CPU2 but invoked on CPU1. It is also possible
that these tasklets won't be invoked at all, because during the second
enqueue process the t->next pointer is set to NULL - dropping everything
from the list.

This race will trigger one or two of the WARN_ON() in
tasklet_action_common().
The problem is that the tasklet may be invoked multiple times and clear
SCHED bit on each invocation. This makes it possible to enqueue the
very same tasklet on different CPUs.

Current RT-devel is using the upstream implementation which does not
re-run tasklets if they have SCHED set again and so it does not clear
the SCHED bit multiple times on a single invocation.

Introduce the CHAINED flag. The tasklet will only be enqueued if the
CHAINED flag has been set successfully.
If it is possible to exchange the flags (CHAINED | RUN) -> 0 then the
tasklet won't be re-run. Otherwise the possible SCHED flag is removed
and the tasklet is re-run again.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=61451


Not-signed-off-by: Zhang Xiao <xiao.zhang@windriver.com>
[bigeasy: patch description]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Signed-off-by: Tom Zanussi <zanussi@kernel.org>

7b52c409

irq_work: Fix checking of IRQ_WORK_LAZY flag set on non PREEMPT_RT · 3aaa68ea

Steven Rostedt (VMware) authored 4 years ago

When CONFIG_PREEMPT_RT_FULL is not set, some of the checks for using
lazy_list are not properly made as the IRQ_WORK_LAZY is not checked. There's
two locations that need this update, so a use_lazy_list() helper function is
added and used in both locations.

Link: https://lore.kernel.org/r/20200321230028.GA22058@duo.ucw.cz


Reported-by: Pavel Machek <pavel@denx.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

3aaa68ea

lib/ubsan: Remove flags parameter from calls to ubsan_prologue() and ubsan_epilogue() · 66c8df19

Tiejun Chen authored 4 years ago


Fails to build with CONFIG_UBSAN=y

lib/ubsan.c: In function '__ubsan_handle_vla_bound_not_positive':
lib/ubsan.c:348:2: error: too many arguments to function 'ubsan_prologue'
  ubsan_prologue(&data->location, &flags);
  ^~~~~~~~~~~~~~
lib/ubsan.c:146:13: note: declared here
 static void ubsan_prologue(struct source_location *location)
             ^~~~~~~~~~~~~~
lib/ubsan.c:353:2: error: too many arguments to function 'ubsan_epilogue'
  ubsan_epilogue(&flags);
  ^~~~~~~~~~~~~~
lib/ubsan.c:155:13: note: declared here
 static void ubsan_epilogue(void)
             ^~~~~~~~~~~~~~

Signed-off-by: Tiejun Chen <tiejunc@vmware.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

66c8df19

tracing: make preempt_lazy and migrate_disable counter smaller · 6c9ab8d5

Sebastian Andrzej Siewior authored 5 years ago


[ Upstream commit dd430bf5 ]

The migrate_disable counter should not exceed 255 so it is enough to
store it in an 8bit field.
With this change we can move the `preempt_lazy_count' member into the
gap so the whole struct shrinks by 4 bytes to 12 bytes in total.
Remove the `padding' field, it is not needed.
Update the tracing fields in trace_define_common_fields() (it was
missing the preempt_lazy_count field).

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

6c9ab8d5

drm/vmwgfx: Drop preempt_disable() in vmw_fifo_ping_host() · 32673f4f

Sebastian Andrzej Siewior authored 5 years ago

[ Upstream commit b901491e ]

vmw_fifo_ping_host() disables preemption around a test and a register
write via vmw_write(). The write function acquires a spinlock_t typed
lock which is not allowed in a preempt_disable()ed section on
PREEMPT_RT. This has been reported in the bugzilla.

It has been explained by Thomas Hellstrom that this preempt_disable()ed
section is not required for correctness.

Remove the preempt_disable() section.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206591
Link: https://lkml.kernel.org/r/0b5e1c65d89951de993deab06d1d197b40fd67aa.camel@vmware.com


Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

32673f4f

locallock: Include header for the `current' macro · d03853fe

Sebastian Andrzej Siewior authored 5 years ago


[ Upstream commit e693075a ]

Include the header for `current' macro so that
CONFIG_KERNEL_HEADER_TEST=y passes.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

d03853fe

mm/memcontrol: Move misplaced local_unlock_irqrestore() · 5385919d

Matt Fleming authored 5 years ago


[ Upstream commit 071a1d6a ]

The comment about local_lock_irqsave() mentions just the counters and
css_put_many()'s callback just invokes a worker so it is safe to move the
unlock function after memcg_check_events() so css_put_many() can be invoked
without the lock acquired.

Cc: Daniel Wagner <wagi@monom.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
[bigeasy: rewrote the patch description]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

5385919d

sched: migrate_enable: Remove __schedule() call · 7d117c7d

Scott Wood authored 5 years ago


[ Upstream commit b8162e61 ]

We can rely on preempt_enable() to schedule.  Besides simplifying the
code, this potentially allows sequences such as the following to be
permitted:

migrate_disable();
preempt_disable();
migrate_enable();
preempt_enable();

Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Scott Wood <swood@redhat.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

7d117c7d

sched: migrate_enable: Use per-cpu cpu_stop_work · d2923cc0

Scott Wood authored 5 years ago


[ Upstream commit 2dcd94b4 ]

Commit e6c287b1 ("sched: migrate_enable: Use stop_one_cpu_nowait()")
adds a busy wait to deal with an edge case where the migrated thread
can resume running on another CPU before the stopper has consumed
cpu_stop_work.  However, this is done with preemption disabled and can
potentially lead to deadlock.

While it is not guaranteed that the cpu_stop_work will be consumed before
the migrating thread resumes and exits the stack frame, it is guaranteed
that nothing other than the stopper can run on the old cpu between the
migrating thread scheduling out and the cpu_stop_work being consumed.
Thus, we can store cpu_stop_work in per-cpu data without it being
reused too early.

Fixes: e6c287b1 ("sched: migrate_enable: Use stop_one_cpu_nowait()")
Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Scott Wood <swood@redhat.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

d2923cc0

userfaultfd: Use a seqlock instead of seqcount · 5288c6d5

Sebastian Andrzej Siewior authored 5 years ago


[ Upstream commit dc952a56 ]

On RT write_seqcount_begin() disables preemption which leads to warning
in add_wait_queue() while the spinlock_t is acquired.
The waitqueue can't be converted to swait_queue because
userfaultfd_wake_function() is used as a custom wake function.

Use seqlock instead seqcount to avoid the preempt_disable() section
during add_wait_queue().

Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

5288c6d5

sched: migrate_enable: Busy loop until the migration request is completed · 0cc80daa

Sebastian Andrzej Siewior authored 5 years ago


[ Upstream commit 140d7f54 ]

If user task changes the CPU affinity mask of a running task it will
dispatch migration request if the current CPU is no longer allowed. This
might happen shortly before a task enters a migrate_disable() section.
Upon leaving the migrate_disable() section, the task will notice that
the current CPU is no longer allowed and will will dispatch its own
migration request to move it off the current CPU.
While invoking __schedule() the first migration request will be
processed and the task returns on the "new" CPU with "arg.done = 0". Its
own migration request will be processed shortly after and will result in
memory corruption if the stack memory, designed for request, was used
otherwise in the meantime.

Spin until the migration request has been processed if it was accepted.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

0cc80daa

Admin message