Skip to content
Snippets Groups Projects
This project is mirrored from https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git. Pull mirroring updated .
  1. Aug 29, 2022
    • Thomas Gleixner's avatar
    • Clark Williams's avatar
      sysfs: Add /sys/kernel/realtime entry · 518bacec
      Clark Williams authored
      
      Add a /sys/kernel entry to indicate that the kernel is a
      realtime kernel.
      
      Clark says that he needs this for udev rules, udev needs to evaluate
      if its a PREEMPT_RT kernel a few thousand times and parsing uname
      output is too slow or so.
      
      Are there better solutions? Should it exist and return 0 on !-rt?
      
      Signed-off-by: default avatarClark Williams <williams@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      
      518bacec
    • Sebastian Andrzej Siewior's avatar
      POWERPC: Allow to enable RT · bc37f67e
      Sebastian Andrzej Siewior authored
      
      Allow to select RT.
      
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      
      bc37f67e
    • Sebastian Andrzej Siewior's avatar
      powerpc/stackprotector: work around stack-guard init from atomic · 5642a355
      Sebastian Andrzej Siewior authored
      
      This is invoked from the secondary CPU in atomic context. On x86 we use
      tsc instead. On Power we XOR it against mftb() so lets use stack address
      as the initial value.
      
      Cc: stable-rt@vger.kernel.org
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      
      5642a355
    • Bogdan Purcareata's avatar
      powerpc/kvm: Disable in-kernel MPIC emulation for PREEMPT_RT · 8fed5560
      Bogdan Purcareata authored
      
      While converting the openpic emulation code to use a raw_spinlock_t enables
      guests to run on RT, there's still a performance issue. For interrupts sent in
      directed delivery mode with a multiple CPU mask, the emulated openpic will loop
      through all of the VCPUs, and for each VCPUs, it call IRQ_check, which will loop
      through all the pending interrupts for that VCPU. This is done while holding the
      raw_lock, meaning that in all this time the interrupts and preemption are
      disabled on the host Linux. A malicious user app can max both these number and
      cause a DoS.
      
      This temporary fix is sent for two reasons. First is so that users who want to
      use the in-kernel MPIC emulation are aware of the potential latencies, thus
      making sure that the hardware MPIC and their usage scenario does not involve
      interrupts sent in directed delivery mode, and the number of possible pending
      interrupts is kept small. Secondly, this should incentivize the development of a
      proper openpic emulation that would be better suited for RT.
      
      Acked-by: default avatarScott Wood <scottwood@freescale.com>
      Signed-off-by: default avatarBogdan Purcareata <bogdan.purcareata@freescale.com>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      
      8fed5560
    • Sebastian Andrzej Siewior's avatar
      powerpc/pseries/iommu: Use a locallock instead local_irq_save() · c44d9b81
      Sebastian Andrzej Siewior authored
      
      The locallock protects the per-CPU variable tce_page. The function
      attempts to allocate memory while tce_page is protected (by disabling
      interrupts).
      
      Use local_irq_save() instead of local_irq_disable().
      
      Cc: stable-rt@vger.kernel.org
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      
      c44d9b81
    • Sebastian Andrzej Siewior's avatar
      powerpc: traps: Use PREEMPT_RT · 7ab753cc
      Sebastian Andrzej Siewior authored
      
      Add PREEMPT_RT to the backtrace if enabled.
      
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      
      7ab753cc
    • Sebastian Andrzej Siewior's avatar
      ARM64: Allow to enable RT · 4c972dfd
      Sebastian Andrzej Siewior authored
      
      Allow to select RT.
      
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      
      4c972dfd
    • Sebastian Andrzej Siewior's avatar
      ARM: Allow to enable RT · fb827876
      Sebastian Andrzej Siewior authored
      
      Allow to select RT.
      
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      
      fb827876
    • Thomas Gleixner's avatar
      tty/serial/pl011: Make the locking work on RT · 0f0d848b
      Thomas Gleixner authored
      
      The lock is a sleeping lock and local_irq_save() is not the optimsation
      we are looking for. Redo it to make it work on -RT and non-RT.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      
      0f0d848b
    • Thomas Gleixner's avatar
      tty/serial/omap: Make the locking RT aware · a07cc75b
      Thomas Gleixner authored
      
      The lock is a sleeping lock and local_irq_save() is not the
      optimsation we are looking for. Redo it to make it work on -RT and
      non-RT.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      
      a07cc75b
    • Yadi.hu's avatar
      ARM: enable irq in translation/section permission fault handlers · 87fcc844
      Yadi.hu authored
      
      Probably happens on all ARM, with
      CONFIG_PREEMPT_RT
      CONFIG_DEBUG_ATOMIC_SLEEP
      
      This simple program....
      
      int main() {
         *((char*)0xc0001000) = 0;
      };
      
      [ 512.742724] BUG: sleeping function called from invalid context at kernel/rtmutex.c:658
      [ 512.743000] in_atomic(): 0, irqs_disabled(): 128, pid: 994, name: a
      [ 512.743217] INFO: lockdep is turned off.
      [ 512.743360] irq event stamp: 0
      [ 512.743482] hardirqs last enabled at (0): [< (null)>] (null)
      [ 512.743714] hardirqs last disabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
      [ 512.744013] softirqs last enabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
      [ 512.744303] softirqs last disabled at (0): [< (null)>] (null)
      [ 512.744631] [<c041872c>] (unwind_backtrace+0x0/0x104)
      [ 512.745001] [<c09af0c4>] (dump_stack+0x20/0x24)
      [ 512.745355] [<c0462490>] (__might_sleep+0x1dc/0x1e0)
      [ 512.745717] [<c09b6770>] (rt_spin_lock+0x34/0x6c)
      [ 512.746073] [<c0441bf0>] (do_force_sig_info+0x34/0xf0)
      [ 512.746457] [<c0442668>] (force_sig_info+0x18/0x1c)
      [ 512.746829] [<c041d880>] (__do_user_fault+0x9c/0xd8)
      [ 512.747185] [<c041d938>] (do_bad_area+0x7c/0x94)
      [ 512.747536] [<c041d990>] (do_sect_fault+0x40/0x48)
      [ 512.747898] [<c040841c>] (do_DataAbort+0x40/0xa0)
      [ 512.748181] Exception stack(0xecaa1fb0 to 0xecaa1ff8)
      
      Oxc0000000 belongs to kernel address space, user task can not be
      allowed to access it. For above condition, correct result is that
      test case should receive a “segment fault” and exits but not stacks.
      
      the root cause is commit 02fe2845 ("avoid enabling interrupts in
      prefetch/data abort handlers"),it deletes irq enable block in Data
      abort assemble code and move them into page/breakpiont/alignment fault
      handlers instead. But author does not enable irq in translation/section
      permission fault handlers. ARM disables irq when it enters exception/
      interrupt mode, if kernel doesn't enable irq, it would be still disabled
      during translation/section permission fault.
      
      We see the above splat because do_force_sig_info is still called with
      IRQs off, and that code eventually does a:
      
              spin_lock_irqsave(&t->sighand->siglock, flags);
      
      As this is architecture independent code, and we've not seen any other
      need for other arch to have the siglock converted to raw lock, we can
      conclude that we should enable irq for ARM translation/section
      permission exception.
      
      
      Signed-off-by: default avatarYadi.hu <yadi.hu@windriver.com>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      
      87fcc844
    • Thomas Gleixner's avatar
      arm: Disable jump-label on PREEMPT_RT. · 0f5fc742
      Thomas Gleixner authored
      
      jump-labels are used to efficiently switch between two possible code
      paths. To achieve this, stop_machine() is used to keep the CPU in a
      known state while the opcode is modified. The usage of stop_machine()
      here leads to large latency spikes which can be observed on PREEMPT_RT.
      
      Jump labels may change the target during runtime and are not restricted
      to debug or "configuration/ setup" part of a PREEMPT_RT system where
      high latencies could be defined as acceptable.
      
      Disable jump-label support on a PREEMPT_RT system.
      
      [bigeasy: Patch description.]
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Link: https://lkml.kernel.org/r/20220613182447.112191-2-bigeasy@linutronix.de
      0f5fc742
    • Anders Roxell's avatar
      arch/arm64: Add lazy preempt support · 084d4ab1
      Anders Roxell authored
      
      arm64 is missing support for PREEMPT_RT. The main feature which is
      lacking is support for lazy preemption. The arch-specific entry code,
      thread information structure definitions, and associated data tables
      have to be extended to provide this support. Then the Kconfig file has
      to be extended to indicate the support is available, and also to
      indicate that support for full RT preemption is now available.
      
      Signed-off-by: default avatarAnders Roxell <anders.roxell@linaro.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      
      084d4ab1
    • Thomas Gleixner's avatar
      powerpc: Add support for lazy preemption · 05e4861a
      Thomas Gleixner authored
      
      Implement the powerpc pieces for lazy preempt.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      
      05e4861a
    • Thomas Gleixner's avatar
      arm: Add support for lazy preemption · 47408959
      Thomas Gleixner authored
      
      Implement the arm pieces for lazy preempt.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      
      47408959
    • Thomas Gleixner's avatar
      entry: Fix the preempt lazy fallout · fdc638c2
      Thomas Gleixner authored
      
      Common code needs common defines....
      
      Fixes: f2f9e496 ("x86: Support for lazy preemption")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      fdc638c2
    • Thomas Gleixner's avatar
      x86: Support for lazy preemption · 32f21ed2
      Thomas Gleixner authored
      
      Implement the x86 pieces for lazy preempt.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      
      32f21ed2
    • Sebastian Andrzej Siewior's avatar
      x86/entry: Use should_resched() in idtentry_exit_cond_resched() · b0ec1bab
      Sebastian Andrzej Siewior authored
      
      The TIF_NEED_RESCHED bit is inlined on x86 into the preemption counter.
      By using should_resched(0) instead of need_resched() the same check can
      be performed which uses the same variable as 'preempt_count()` which was
      issued before.
      
      Use should_resched(0) instead need_resched().
      
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      
      b0ec1bab
    • Thomas Gleixner's avatar
      sched: Add support for lazy preemption · 2d1febd7
      Thomas Gleixner authored
      
      It has become an obsession to mitigate the determinism vs. throughput
      loss of RT. Looking at the mainline semantics of preemption points
      gives a hint why RT sucks throughput wise for ordinary SCHED_OTHER
      tasks. One major issue is the wakeup of tasks which are right away
      preempting the waking task while the waking task holds a lock on which
      the woken task will block right after having preempted the wakee. In
      mainline this is prevented due to the implicit preemption disable of
      spin/rw_lock held regions. On RT this is not possible due to the fully
      preemptible nature of sleeping spinlocks.
      
      Though for a SCHED_OTHER task preempting another SCHED_OTHER task this
      is really not a correctness issue. RT folks are concerned about
      SCHED_FIFO/RR tasks preemption and not about the purely fairness
      driven SCHED_OTHER preemption latencies.
      
      So I introduced a lazy preemption mechanism which only applies to
      SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of the
      existing preempt_count each tasks sports now a preempt_lazy_count
      which is manipulated on lock acquiry and release. This is slightly
      incorrect as for lazyness reasons I coupled this on
      migrate_disable/enable so some other mechanisms get the same treatment
      (e.g. get_cpu_light).
      
      Now on the scheduler side instead of setting NEED_RESCHED this sets
      NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER preemption and
      therefor allows to exit the waking task the lock held region before
      the woken task preempts. That also works better for cross CPU wakeups
      as the other side can stay in the adaptive spinning loop.
      
      For RT class preemption there is no change. This simply sets
      NEED_RESCHED and forgoes the lazy preemption counter.
      
       Initial test do not expose any observable latency increasement, but
      history shows that I've been proven wrong before :)
      
      The lazy preemption mode is per default on, but with
      CONFIG_SCHED_DEBUG enabled it can be disabled via:
      
       # echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features
      
      and reenabled via
      
       # echo PREEMPT_LAZY >/sys/kernel/debug/sched_features
      
      The test results so far are very machine and workload dependent, but
      there is a clear trend that it enhances the non RT workload
      performance.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      
      2d1febd7
    • Sebastian Andrzej Siewior's avatar
      Revert "drm/i915: Depend on !PREEMPT_RT." · 7951c15a
      Sebastian Andrzej Siewior authored
      
      Once the known issues are addressed, it should be safe to enable the
      driver.
      
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      7951c15a
    • Sebastian Andrzej Siewior's avatar
      drm/i915: Drop the irqs_disabled() check · 82361798
      Sebastian Andrzej Siewior authored
      
      The !irqs_disabled() check triggers on PREEMPT_RT even with
      i915_sched_engine::lock acquired. The reason is the lock is transformed
      into a sleeping lock on PREEMPT_RT and does not disable interrupts.
      
      There is no need to check for disabled interrupts. The lockdep
      annotation below already check if the lock has been acquired by the
      caller and will yell if the interrupts are not disabled.
      
      Remove the !irqs_disabled() check.
      
      Reported-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      82361798
    • Sebastian Andrzej Siewior's avatar
      drm/i915/gt: Use spin_lock_irq() instead of local_irq_disable() + spin_lock() · 1815c275
      Sebastian Andrzej Siewior authored
      
      execlists_dequeue() is invoked from a function which uses
      local_irq_disable() to disable interrupts so the spin_lock() behaves
      like spin_lock_irq().
      This breaks PREEMPT_RT because local_irq_disable() + spin_lock() is not
      the same as spin_lock_irq().
      
      execlists_dequeue_irq() and execlists_dequeue() has each one caller
      only. If intel_engine_cs::active::lock is acquired and released with the
      _irq suffix then it behaves almost as if execlists_dequeue() would be
      invoked with disabled interrupts. The difference is the last part of the
      function which is then invoked with enabled interrupts.
      I can't tell if this makes a difference. From looking at it, it might
      work to move the last unlock at the end of the function as I didn't find
      anything that would acquire the lock again.
      
      Reported-by: default avatarClark Williams <williams@redhat.com>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      1815c275
    • Sebastian Andrzej Siewior's avatar
      drm/i915/gt: Queue and wait for the irq_work item. · ebeec035
      Sebastian Andrzej Siewior authored
      
      Disabling interrupts and invoking the irq_work function directly breaks
      on PREEMPT_RT.
      PREEMPT_RT does not invoke all irq_work from hardirq context because
      some of the user have spinlock_t locking in the callback function.
      These locks are then turned into a sleeping locks which can not be
      acquired with disabled interrupts.
      
      Using irq_work_queue() has the benefit that the irqwork will be invoked
      in the regular context. In general there is "no" delay between enqueuing
      the callback and its invocation because the interrupt is raised right
      away on architectures which support it (which includes x86).
      
      Use irq_work_queue() + irq_work_sync() instead invoking the callback
      directly.
      
      Reported-by: default avatarClark Williams <williams@redhat.com>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Reviewed-by: default avatarMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      ebeec035
    • Sebastian Andrzej Siewior's avatar
      drm/i915: skip DRM_I915_LOW_LEVEL_TRACEPOINTS with NOTRACE · f80c7f87
      Sebastian Andrzej Siewior authored
      
      The order of the header files is important. If this header file is
      included after tracepoint.h was included then the NOTRACE here becomes a
      nop. Currently this happens for two .c files which use the tracepoitns
      behind DRM_I915_LOW_LEVEL_TRACEPOINTS.
      
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      f80c7f87
    • Sebastian Andrzej Siewior's avatar
      drm/i915: Disable tracing points on PREEMPT_RT · dfbb4baa
      Sebastian Andrzej Siewior authored
      
      Luca Abeni reported this:
      | BUG: scheduling while atomic: kworker/u8:2/15203/0x00000003
      | CPU: 1 PID: 15203 Comm: kworker/u8:2 Not tainted 4.19.1-rt3 #10
      | Call Trace:
      |  rt_spin_lock+0x3f/0x50
      |  gen6_read32+0x45/0x1d0 [i915]
      |  g4x_get_vblank_counter+0x36/0x40 [i915]
      |  trace_event_raw_event_i915_pipe_update_start+0x7d/0xf0 [i915]
      
      The tracing events use trace_i915_pipe_update_start() among other events
      use functions acquire spinlock_t locks which are transformed into
      sleeping locks on PREEMPT_RT. A few trace points use
      intel_get_crtc_scanline(), others use ->get_vblank_counter() wich also
      might acquire a sleeping locks on PREEMPT_RT.
      At the time the arguments are evaluated within trace point, preemption
      is disabled and so the locks must not be acquired on PREEMPT_RT.
      
      Based on this I don't see any other way than disable trace points on
      PREMPT_RT.
      
      Reported-by: default avatarLuca Abeni <lucabe72@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      dfbb4baa
    • Sebastian Andrzej Siewior's avatar
      drm/i915: Don't check for atomic context on PREEMPT_RT · ebed9177
      Sebastian Andrzej Siewior authored
      The !in_atomic() check in _wait_for_atomic() triggers on PREEMPT_RT
      because the uncore::lock is a spinlock_t and does not disable
      preemption or interrupts.
      
      Changing the uncore:lock to a raw_spinlock_t doubles the worst case
      latency on an otherwise idle testbox during testing. Therefore I'm
      currently unsure about changing this.
      
      Link: https://lore.kernel.org/all/20211006164628.s2mtsdd2jdbfyf7g@linutronix.de/
      
      
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      ebed9177
    • Mike Galbraith's avatar
      drm/i915: Don't disable interrupts on PREEMPT_RT during atomic updates · 44083240
      Mike Galbraith authored
      
      Commit
         8d7849db ("drm/i915: Make sprite updates atomic")
      
      started disabling interrupts across atomic updates. This breaks on PREEMPT_RT
      because within this section the code attempt to acquire spinlock_t locks which
      are sleeping locks on PREEMPT_RT.
      
      According to the comment the interrupts are disabled to avoid random delays and
      not required for protection or synchronisation.
      If this needs to happen with disabled interrupts on PREEMPT_RT, and the
      whole section is restricted to register access then all sleeping locks
      need to be acquired before interrupts are disabled and some function
      maybe moved after enabling interrupts again.
      This includes:
      - prepare_to_wait() + finish_wait() due its wake queue.
      - drm_crtc_vblank_put() -> vblank_disable_fn() drm_device::vbl_lock.
      - skl_pfit_enable(), intel_update_plane(), vlv_atomic_update_fifo() and
        maybe others due to intel_uncore::lock
      - drm_crtc_arm_vblank_event() due to drm_device::event_lock and
        drm_device::vblank_time_lock.
      
      Don't disable interrupts on PREEMPT_RT during atomic updates.
      
      [bigeasy: drop local locks, commit message]
      
      Signed-off-by: default avatarMike Galbraith <umgwanakikbuti@gmail.com>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      44083240
    • Mike Galbraith's avatar
      drm/i915: Use preempt_disable/enable_rt() where recommended · 86566f4c
      Mike Galbraith authored
      
      Mario Kleiner suggest in commit
        ad3543ed ("drm/intel: Push get_scanout_position() timestamping into kms driver.")
      
      a spots where preemption should be disabled on PREEMPT_RT. The
      difference is that on PREEMPT_RT the intel_uncore::lock disables neither
      preemption nor interrupts and so region remains preemptible.
      
      The area covers only register reads and writes. The part that worries me
      is:
      - __intel_get_crtc_scanline() the worst case is 100us if no match is
        found.
      
      - intel_crtc_scanlines_since_frame_timestamp() not sure how long this
        may take in the worst case.
      
      It was in the RT queue for a while and nobody complained.
      Disable preemption on PREEPMPT_RT during timestamping.
      
      [bigeasy: patch description.]
      
      Cc: Mario Kleiner <mario.kleiner.de@gmail.com>
      Signed-off-by: default avatarMike Galbraith <umgwanakikbuti@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      86566f4c
    • John Ogness's avatar
      printk: avoid preempt_disable() for PREEMPT_RT · 8ea35f13
      John Ogness authored
      
      During non-normal operation, printk() calls will attempt to
      write the messages directly to the consoles. This involves
      using console_trylock() to acquire @console_sem.
      
      Preemption is disabled while directly printing to the consoles
      in order to ensure that the printing task is not scheduled away
      while holding @console_sem, thus blocking all other printers
      and causing delays in printing.
      
      Commit fd5f7cde ("printk: Never set console_may_schedule in
      console_trylock()") specifically reverted a previous attempt at
      allowing preemption while printing.
      
      However, on PREEMPT_RT systems, disabling preemption while
      printing is not allowed because console drivers typically
      acquire a spin lock (which under PREEMPT_RT is an rtmutex).
      Since direct printing is only used during early boot and
      non-panic dumps, the risks of delayed print output for these
      scenarios will be accepted under PREEMPT_RT.
      
      Signed-off-by: default avatarJohn Ogness <john.ogness@linutronix.de>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      8ea35f13
    • John Ogness's avatar
      serial: 8250: implement write_atomic · 337fa844
      John Ogness authored
      
      Implement a non-sleeping NMI-safe write_atomic() console function in
      order to support atomic console printing during a panic.
      
      Trasmitting data requires disabling interrupts. Since write_atomic()
      can be called from any context, it may be called while another CPU
      is executing in console code. In order to maintain the correct state
      of the IER register, use the global cpu_sync to synchronize all
      access to the IER register. This synchronization is only necessary
      for serial ports that are being used as consoles.
      
      The global cpu_sync is also used to synchronize between the write()
      and write_atomic() callbacks. write() synchronizes per character,
      write_atomic() synchronizes per line.
      
      Signed-off-by: default avatarJohn Ogness <john.ogness@linutronix.de>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      337fa844
    • John Ogness's avatar
      printk: add infrastucture for atomic consoles · e24930fd
      John Ogness authored
      
      Many times it is not possible to see the console output on
      panic because printing threads cannot be scheduled and/or the
      console is already taken and forcibly overtaking/busting the
      locks does provide the hoped results.
      
      Introduce a new infrastructure to support "atomic consoles".
      A new optional callback in struct console, write_atomic(), is
      available for consoles to provide an implemention for writing
      console messages. The implementation must be NMI safe if they
      can run on an architecture where NMIs exist.
      
      Console drivers implementing the write_atomic() callback must
      also select CONFIG_HAVE_ATOMIC_CONSOLE in order to enable the
      atomic console code within the printk subsystem.
      
      If atomic consoles are available, panic() will flush the kernel
      log only to the atomic consoles (before busting spinlocks).
      Afterwards, panic() will continue  as before, which includes
      attempting to flush the other (non-atomic) consoles.
      
      Signed-off-by: default avatarJohn Ogness <john.ogness@linutronix.de>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      e24930fd
    • Sebastian Andrzej Siewior's avatar
      printk: Bring back the RT bits. · 9684feaa
      Sebastian Andrzej Siewior authored
      
      This is a revert of the commits:
      | 07a22b61 Revert "printk: add functions to prefer direct printing"
      | 5831788a Revert "printk: add kthread console printers"
      | 2d9ef940 Revert "printk: extend console_lock for per-console locking"
      | 007eeab7 Revert "printk: remove @console_locked"
      | 05c96b37 Revert "printk: Block console kthreads when direct printing will be required"
      | 20fb0c82 Revert "printk: Wait for the global console lock when the system is going down"
      
      which is needed for the atomic consoles which are used on PREEMPT_RT.
      
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      9684feaa
    • Sebastian Andrzej Siewior's avatar
      locking/lockdep: Remove lockdep_init_map_crosslock. · 81e7c331
      Sebastian Andrzej Siewior authored
      
      The cross-release bits have been removed, lockdep_init_map_crosslock() is
      a leftover.
      
      Remove lockdep_init_map_crosslock.
      
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Reviewed-by: default avatarWaiman Long <longman@redhat.com>
      Link: https://lore.kernel.org/r/20220311164457.46461-1-bigeasy@linutronix.de
      Link: https://lore.kernel.org/r/YqITgY+2aPITu96z@linutronix.de
      81e7c331
    • Mike Galbraith's avatar
      zram: Replace bit spinlocks with spinlock_t for PREEMPT_RT. · 61f7b0f8
      Mike Galbraith authored
      
      The bit spinlock disables preemption on PREEMPT_RT. With disabled preemption it
      is not allowed to acquire other sleeping locks which includes invoking
      zs_free().
      
      Use a spinlock_t on PREEMPT_RT for locking and set/ clear ZRAM_LOCK after the
      lock has been acquired/ dropped.
      
      Signed-off-by: default avatarMike Galbraith <umgwanakikbuti@gmail.com>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Link: https://lkml.kernel.org/r/YqIbMuHCPiQk+Ac2@linutronix.de
      61f7b0f8
    • Haris Okanovic's avatar
      tpm_tis: fix stall after iowrite*()s · 09c75ddd
      Haris Okanovic authored
      
      ioread8() operations to TPM MMIO addresses can stall the cpu when
      immediately following a sequence of iowrite*()'s to the same region.
      
      For example, cyclitest measures ~400us latency spikes when a non-RT
      usermode application communicates with an SPI-based TPM chip (Intel Atom
      E3940 system, PREEMPT_RT kernel). The spikes are caused by a
      stalling ioread8() operation following a sequence of 30+ iowrite8()s to
      the same address. I believe this happens because the write sequence is
      buffered (in cpu or somewhere along the bus), and gets flushed on the
      first LOAD instruction (ioread*()) that follows.
      
      The enclosed change appears to fix this issue: read the TPM chip's
      access register (status code) after every iowrite*() operation to
      amortize the cost of flushing data to chip across multiple instructions.
      
      Signed-off-by: default avatarHaris Okanovic <haris.okanovic@ni.com>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      
      09c75ddd
    • Frederic Weisbecker's avatar
      tick: Fix timer storm since introduction of timersd · 0fff0ee1
      Frederic Weisbecker authored
      
      If timers are pending while the tick is reprogrammed on nohz_mode, the
      next expiry is not armed to fire now, it is delayed one jiffy forward
      instead so as not to raise an inextinguishable timer storm with such
      scenario:
      
      1) IRQ triggers and queue a timer
      2) ksoftirqd() is woken up
      3) IRQ tail: timer is reprogrammed to fire now
      4) IRQ exit
      5) TIMER interrupt
      6) goto 3)
      
      ...all that until we finally reach ksoftirqd.
      
      Unfortunately we are checking the wrong softirq vector bitmask since
      timersd kthread has split from ksoftirqd. Timers now have their own
      vector state field that must be checked separately. As a result, the
      old timer storm is back. This shows up early on boot with extremely long
      initcalls:
      
      	[  333.004807] initcall dquot_init+0x0/0x111 returned 0 after 323822879 usecs
      
      and the cause is uncovered with the right trace events showing just
      10 microseconds between ticks (~100 000 Hz):
      
      |swapper/-1 1dn.h111 60818582us : hrtimer_expire_entry: hrtimer=00000000e0ef0f6b function=tick_sched_timer now=60415486608
      |swapper/-1 1dn.h111 60818592us : hrtimer_expire_entry: hrtimer=00000000e0ef0f6b function=tick_sched_timer now=60415496082
      |swapper/-1 1dn.h111 60818601us : hrtimer_expire_entry: hrtimer=00000000e0ef0f6b function=tick_sched_timer now=60415505550
      
      Fix this by checking the right timer vector state from the nohz code.
      
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Link: https://lkml.kernel.org/r/20220405010752.1347437-2-frederic@kernel.org
      0fff0ee1
    • Frederic Weisbecker's avatar
      rcutorture: Also force sched priority to timersd on boosting test. · c2c66eb8
      Frederic Weisbecker authored
      
      ksoftirqd is statically boosted to the priority level right above the
      one of rcu_torture_boost() so that timers, which torture readers rely on,
      get a chance to run while rcu_torture_boost() is polling.
      
      However timers processing got split from ksoftirqd into their own kthread
      (timersd) that isn't boosted. It has the same SCHED_FIFO low prio as
      rcu_torture_boost() and therefore timers can't preempt it and may
      starve.
      
      The issue can be triggered in practice on v5.17.1-rt17 using:
      
      	./kvm.sh --allcpus --configs TREE04 --duration 10m --kconfig "CONFIG_EXPERT=y CONFIG_PREEMPT_RT=y"
      
      Fix this with statically boosting timersd just like is done with
      ksoftirqd in commit
         ea6d962e ("rcutorture: Judge RCU priority boosting on grace periods, not callbacks")
      
      Suggested-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Link: https://lkml.kernel.org/r/20220405010752.1347437-1-frederic@kernel.org
      
      
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      c2c66eb8
    • Sebastian Andrzej Siewior's avatar
      softirq: Use a dedicated thread for timer wakeups. · 61581707
      Sebastian Andrzej Siewior authored
      
      A timer/hrtimer softirq is raised in-IRQ context. With threaded
      interrupts enabled or on PREEMPT_RT this leads to waking the ksoftirqd
      for the processing of the softirq.
      Once the ksoftirqd is marked as pending (or is running) it will collect
      all raised softirqs. This in turn means that a softirq which would have
      been processed at the end of the threaded interrupt, which runs at an
      elevated priority, is now moved to ksoftirqd which runs at SCHED_OTHER
      priority and competes with every regular task for CPU resources.
      This introduces long delays on heavy loaded systems and is not desired
      especially if the system is not overloaded by the softirqs.
      
      Split the TIMER_SOFTIRQ and HRTIMER_SOFTIRQ processing into a dedicated
      timers thread and let it run at the lowest SCHED_FIFO priority.
      RT tasks are are woken up from hardirq context so only timer_list timers
      and hrtimers for "regular" tasks are processed here. The higher priority
      ensures that wakeups are performed before scheduling SCHED_OTHER tasks.
      
      Using a dedicated variable to store the pending softirq bits values
      ensure that the timer are not accidentally picked up by ksoftirqd and
      other threaded interrupts.
      It shouldn't be picked up by ksoftirqd since it runs at lower priority.
      However if the timer bits are ORed while a threaded interrupt is
      running, then the timer softirq would be performed at higher priority.
      The new timer thread will block on the softirq lock before it starts
      softirq work. This "race window" isn't closed because while timer thread
      is performing the softirq it can get PI-boosted via the softirq lock by
      a random force-threaded thread.
      The timer thread can pick up pending softirqs from ksoftirqd but only
      if the softirq load is high. It is not be desired that the picked up
      softirqs are processed at SCHED_FIFO priority under high softirq load
      but this can already happen by a PI-boost by a force-threaded interrupt.
      
      Reported-by: kernel test robot <lkp@intel.com> [ static timer_threads ]
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      61581707
    • Sebastian Andrzej Siewior's avatar
Loading