This project is mirrored from https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git. Pull mirroring updated .
  1. 07 Sep, 2021 1 commit
  2. 12 May, 2021 3 commits
  3. 21 Mar, 2021 1 commit
    • Ingo Molnar's avatar
      sched: Fix various typos · 3b03706f
      Ingo Molnar authored
      
      
      Fix ~42 single-word typos in scheduler code comments.
      
      We have accumulated a few fun ones over the years. :-)
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ben Segall <bsegall@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: linux-kernel@vger.kernel.org
      3b03706f
  4. 06 Mar, 2021 1 commit
  5. 17 Feb, 2021 1 commit
  6. 24 Nov, 2020 1 commit
  7. 10 Nov, 2020 1 commit
  8. 29 Oct, 2020 2 commits
  9. 25 Oct, 2020 1 commit
  10. 26 Aug, 2020 2 commits
  11. 25 Jun, 2020 3 commits
  12. 15 Jun, 2020 1 commit
  13. 28 May, 2020 2 commits
  14. 17 Jan, 2020 1 commit
  15. 20 Nov, 2019 2 commits
  16. 11 Nov, 2019 5 commits
    • Rafael J. Wysocki's avatar
      cpuidle: Use nanoseconds as the unit of time · c1d51f68
      Rafael J. Wysocki authored
      
      
      Currently, the cpuidle subsystem uses microseconds as the unit of
      time which (among other things) causes the idle loop to incur some
      integer division overhead for no clear benefit.
      
      In order to allow cpuidle to measure time in nanoseconds, add two
      new fields, exit_latency_ns and target_residency_ns, to represent the
      exit latency and target residency of an idle state in nanoseconds,
      respectively, to struct cpuidle_state and initialize them with the
      help of the corresponding values in microseconds provided by drivers.
      Additionally, change cpuidle_governor_latency_req() to return the
      idle state exit latency constraint in nanoseconds.
      
      Also meeasure idle state residency (last_residency_ns in struct
      cpuidle_device and time_ns in struct cpuidle_driver) in nanoseconds
      and update the cpuidle core and governors accordingly.
      
      However, the menu governor still computes typical intervals in
      microseconds to avoid integer overflows.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarDoug Smythies <dsmythies@telus.net>
      Tested-by: default avatarDoug Smythies <dsmythies@telus.net>
      c1d51f68
    • Peter Zijlstra's avatar
      sched/core: Further clarify sched_class::set_next_task() · a0e813f2
      Peter Zijlstra authored
      
      
      It turns out there really is something special to the first
      set_next_task() invocation. In specific the 'change' pattern really
      should not cause balance callbacks.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: bsegall@google.com
      Cc: dietmar.eggemann@arm.com
      Cc: juri.lelli@redhat.com
      Cc: ktkhai@virtuozzo.com
      Cc: mgorman@suse.de
      Cc: qais.yousef@arm.com
      Cc: qperret@google.com
      Cc: rostedt@goodmis.org
      Cc: valentin.schneider@arm.com
      Cc: vincent.guittot@linaro.org
      Fixes: f95d4eae ("sched/{rt,deadline}: Fix set_next_task vs pick_next_task")
      Link: https://lkml.kernel.org/r/20191108131909.775434698@infradead.org
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a0e813f2
    • Peter Zijlstra's avatar
      sched/core: Simplify sched_class::pick_next_task() · 98c2f700
      Peter Zijlstra authored
      
      
      Now that the indirect class call never uses the last two arguments of
      pick_next_task(), remove them.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: bsegall@google.com
      Cc: dietmar.eggemann@arm.com
      Cc: juri.lelli@redhat.com
      Cc: ktkhai@virtuozzo.com
      Cc: mgorman@suse.de
      Cc: qais.yousef@arm.com
      Cc: qperret@google.com
      Cc: rostedt@goodmis.org
      Cc: valentin.schneider@arm.com
      Cc: vincent.guittot@linaro.org
      Link: https://lkml.kernel.org/r/20191108131909.660595546@infradead.org
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      98c2f700
    • Peter Zijlstra's avatar
      sched/core: Optimize pick_next_task() · 5d7d6056
      Peter Zijlstra authored
      
      
      Ever since we moved the sched_class definitions into their own files,
      the constant expression {fair,idle}_sched_class.pick_next_task() is
      not in fact a compile time constant anymore and results in an indirect
      call (barring LTO).
      
      Fix that by exposing pick_next_task_{fair,idle}() directly, this gets
      rid of the indirect call (and RETPOLINE) on the fast path.
      
      Also remove the unlikely() from the idle case, it is in fact /the/ way
      we select idle -- and that is a very common thing to do.
      
      Performance for will-it-scale/sched_yield improves by 2% (as reported
      by 0-day).
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: bsegall@google.com
      Cc: dietmar.eggemann@arm.com
      Cc: juri.lelli@redhat.com
      Cc: ktkhai@virtuozzo.com
      Cc: mgorman@suse.de
      Cc: qais.yousef@arm.com
      Cc: qperret@google.com
      Cc: rostedt@goodmis.org
      Cc: valentin.schneider@arm.com
      Cc: vincent.guittot@linaro.org
      Link: https://lkml.kernel.org/r/20191108131909.603037345@infradead.org
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5d7d6056
    • Peter Zijlstra's avatar
      sched/core: Make pick_next_task_idle() more consistent · f488e105
      Peter Zijlstra authored
      
      
      Only pick_next_task_fair() needs the @prev and @rf argument; these are
      required to implement the cpu-cgroup optimization. None of the other
      pick_next_task() methods need this. Make pick_next_task_idle() more
      consistent.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: bsegall@google.com
      Cc: dietmar.eggemann@arm.com
      Cc: juri.lelli@redhat.com
      Cc: ktkhai@virtuozzo.com
      Cc: mgorman@suse.de
      Cc: qais.yousef@arm.com
      Cc: qperret@google.com
      Cc: rostedt@goodmis.org
      Cc: valentin.schneider@arm.com
      Cc: vincent.guittot@linaro.org
      Link: https://lkml.kernel.org/r/20191108131909.545730862@infradead.org
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f488e105
  17. 08 Nov, 2019 1 commit
    • Peter Zijlstra's avatar
      sched: Fix pick_next_task() vs 'change' pattern race · 6e2df058
      Peter Zijlstra authored
      Commit 67692435 ("sched: Rework pick_next_task() slow-path")
      inadvertly introduced a race because it changed a previously
      unexplored dependency between dropping the rq->lock and
      sched_class::put_prev_task().
      
      The comments about dropping rq->lock, in for example
      newidle_balance(), only mentions the task being current and ->on_cpu
      being set. But when we look at the 'change' pattern (in for example
      sched_setnuma()):
      
      	queued = task_on_rq_queued(p); /* p->on_rq == TASK_ON_RQ_QUEUED */
      	running = task_current(rq, p); /* rq->curr == p */
      
      	if (queued)
      		dequeue_task(...);
      	if (running)
      		put_prev_task(...);
      
      	/* change task properties */
      
      	if (queued)
      		enqueue_task(...);
      	if (running)
      		set_next_task(...);
      
      It becomes obvious that if we do this after put_prev_task() has
      already been called on @p, things go sideways. This is exactly what
      the commit in question allows to happen when it does:
      
      	prev->sched_class->put_prev_task(rq, prev, rf);
      	if (!rq->nr_running)
      		newidle_balance(rq, rf);
      
      The newidle_balance() call will drop rq->lock after we've called
      put_prev_task() and that allows the above 'change' pattern to
      interleave and mess up the state.
      
      Furthermore, it turns out we lost the RT-pull when we put the last DL
      task.
      
      Fix both problems by extracting the balancing from put_prev_task() and
      doing a multi-class balance() pass before put_prev_task().
      
      Fixes: 67692435
      
       ("sched: Rework pick_next_task() slow-path")
      Reported-by: default avatarQuentin Perret <qperret@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: default avatarQuentin Perret <qperret@google.com>
      Tested-by: default avatarValentin Schneider <valentin.schneider@arm.com>
      6e2df058
  18. 24 Sep, 2019 1 commit
  19. 03 Sep, 2019 1 commit
  20. 12 Aug, 2019 1 commit
    • Peter Zijlstra's avatar
      idle: Prevent late-arriving interrupts from disrupting offline · e78a7614
      Peter Zijlstra authored
      
      
      Scheduling-clock interrupts can arrive late in the CPU-offline process,
      after idle entry and the subsequent call to cpuhp_report_idle_dead().
      Once execution passes the call to rcu_report_dead(), RCU is ignoring
      the CPU, which results in lockdep complaints when the interrupt handler
      uses RCU:
      
      ------------------------------------------------------------------------
      
      =============================
      WARNING: suspicious RCU usage
      5.2.0-rc1+ #681 Not tainted
      -----------------------------
      kernel/sched/fair.c:9542 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      RCU used illegally from offline CPU!
      rcu_scheduler_active = 2, debug_locks = 1
      no locks held by swapper/5/0.
      
      stack backtrace:
      CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.2.0-rc1+ #681
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Bochs 01/01/2011
      Call Trace:
       <IRQ>
       dump_stack+0x5e/0x8b
       trigger_load_balance+0xa8/0x390
       ? tick_sched_do_timer+0x60/0x60
       update_process_times+0x3b/0x50
       tick_sched_handle+0x2f/0x40
       tick_sched_timer+0x32/0x70
       __hrtimer_run_queues+0xd3/0x3b0
       hrtimer_interrupt+0x11d/0x270
       ? sched_clock_local+0xc/0x74
       smp_apic_timer_interrupt+0x79/0x200
       apic_timer_interrupt+0xf/0x20
       </IRQ>
      RIP: 0010:delay_tsc+0x22/0x50
      Code: ff 0f 1f 80 00 00 00 00 65 44 8b 05 18 a7 11 48 0f ae e8 0f 31 48 89 d6 48 c1 e6 20 48 09 c6 eb 0e f3 90 65 8b 05 fe a6 11 48 <41> 39 c0 75 18 0f ae e8 0f 31 48 c1 e2 20 48 09 c2 48 89 d0 48 29
      RSP: 0000:ffff8f92c0157ed0 EFLAGS: 00000212 ORIG_RAX: ffffffffffffff13
      RAX: 0000000000000005 RBX: ffff8c861f356400 RCX: ffff8f92c0157e64
      RDX: 000000321214c8cc RSI: 00000032120daa7f RDI: 0000000000260f15
      RBP: 0000000000000005 R08: 0000000000000005 R09: 0000000000000000
      R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
      R13: 0000000000000000 R14: ffff8c861ee18000 R15: ffff8c861ee18000
       cpuhp_report_idle_dead+0x31/0x60
       do_idle+0x1d5/0x200
       ? _raw_spin_unlock_irqrestore+0x2d/0x40
       cpu_startup_entry+0x14/0x20
       start_secondary+0x151/0x170
       secondary_startup_64+0xa4/0xb0
      
      ------------------------------------------------------------------------
      
      This happens rarely, but can be forced by happen more often by
      placing delays in cpuhp_report_idle_dead() following the call to
      rcu_report_dead().  With this in place, the following rcutorture
      scenario reproduces the problem within a few minutes:
      
      tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --duration 5 --kconfig "CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y" --configs "TREE04"
      
      This commit uses the crude but effective expedient of moving the disabling
      of interrupts within the idle loop to precede the cpu_is_offline()
      check.  It also invokes tick_nohz_idle_stop_tick() instead of
      tick_nohz_idle_stop_tick_protected() to shut off the scheduling-clock
      interrupt.
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      [ paulmck: Revert tick_nohz_idle_stop_tick_protected() removal, new callers. ]
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.ibm.com>
      e78a7614
  21. 08 Aug, 2019 3 commits
  22. 21 May, 2019 1 commit
  23. 22 Oct, 2018 1 commit
    • Christophe Leroy's avatar
      x86/stackprotector: Remove the call to boot_init_stack_canary() from cpu_startup_entry() · 977e4be5
      Christophe Leroy authored
      The following commit:
      
        d7880812
      
       ("idle: Add the stack canary init to cpu_startup_entry()")
      
      ... added an x86 specific boot_init_stack_canary() call to the generic
      cpu_startup_entry() as a temporary hack, with the intention to remove
      the #ifdef CONFIG_X86 later.
      
      More than 5 years later let's finally realize that plan! :-)
      
      While implementing stack protector support for PowerPC, we found
      that calling boot_init_stack_canary() is also needed for PowerPC
      which uses per task (TLS) stack canary like the X86.
      
      However, calling boot_init_stack_canary() would break architectures
      using a global stack canary (ARM, SH, MIPS and XTENSA).
      
      Instead of modifying the #ifdef CONFIG_X86 to an even messier:
      
         #if defined(CONFIG_X86) || defined(CONFIG_PPC)
      
      PowerPC implemented the call to boot_init_stack_canary() in the function
      calling cpu_startup_entry().
      
      Let's try the same cleanup on the x86 side as well.
      
      On x86 we have two functions calling cpu_startup_entry():
      
       - start_secondary()
       - cpu_bringup_and_idle()
      
      start_secondary() already calls boot_init_stack_canary(), so
      it's good, and this patch adds the call to boot_init_stack_canary()
      in cpu_bringup_and_idle().
      
      I.e. now x86 catches up to the rest of the world and the ugly init
      sequence in init/main.c can be removed from cpu_startup_entry().
      
      As a final benefit we can also remove the <linux/stackprotector.h>
      dependency from <linux/sched.h>.
      
      [ mingo: Improved the changelog a bit, added language explaining x86 borkage and sched.h change. ]
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/20181020072649.5B59310483E@pc16082vm.idsi0.si.c-s.fr
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      977e4be5
  24. 20 Aug, 2018 1 commit
  25. 09 Apr, 2018 1 commit
    • Rafael J. Wysocki's avatar
      sched: idle: Select idle state before stopping the tick · 554c8aa8
      Rafael J. Wysocki authored
      In order to address the issue with short idle duration predictions
      by the idle governor after the scheduler tick has been stopped,
      reorder the code in cpuidle_idle_call() so that the governor idle
      state selection runs before tick_nohz_idle_go_idle() and use the
      "nohz" hint returned by cpuidle_select() to decide whether or not
      to stop the tick.
      
      This isn't straightforward, because menu_select() invokes
      tick_nohz_get_sleep_length() to get the time to the next timer
      event and the number returned by the latter comes from
      __tick_nohz_idle_stop_tick().  Fortunately, however, it is possible
      to compute that number without actually stopping the tick and with
      the help of the existing code.
      
      Namely, tick_nohz_get_sleep_length() can be made call
      tick_nohz_next_event(), introduced earlier, to get the time to the
      next non-highres timer event.  If that happens, tick_nohz_next_event()
      need not be called by __tick_nohz_idle_stop_tick() again.
      
      If it turns out that the scheduler tick cannot be stopped going
      forward or the next timer event is too close for the tick to be
      stopped, tick_nohz_get_sleep_length() can simply return the time to
      the next event currently programmed into the corresponding clock
      event device.
      
      In addition to knowing the return value of tick_nohz_next_event(),
      however, tick_nohz_get_sleep_length() needs to know the time to the
      next highres timer event, but with the scheduler tick timer excluded,
      which can be computed with the help of hrtimer_get_next_event().
      
      That minimum of that number and the tick_nohz_next_event() return
      value is the total time to the next timer event with the assumption
      that the tick will be stopped.  It can be returned to the idle
      governor which can use it for predicting idle duration (under the
      assumption that the tick will be stopped) and deciding whether or
      not it makes sense to stop the tick before putting the CPU into the
      selected idle state.
      
      With the above, the sleep_length field in struct tick_sched is not
      necessary any more, so drop it.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=199227
      
      Reported-by: default avatarDoug Smythies <dsmythies@telus.net>
      Reported-by: default avatarThomas Ilsche <thomas.ilsche@tu-dresden.de>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      554c8aa8
  26. 06 Apr, 2018 1 commit
    • Rafael J. Wysocki's avatar
      cpuidle: Return nohz hint from cpuidle_select() · 45f1ff59
      Rafael J. Wysocki authored
      
      
      Add a new pointer argument to cpuidle_select() and to the ->select
      cpuidle governor callback to allow a boolean value indicating
      whether or not the tick should be stopped before entering the
      selected state to be returned from there.
      
      Make the ladder governor ignore that pointer (to preserve its
      current behavior) and make the menu governor return 'false" through
      it if:
       (1) the idle exit latency is constrained at 0, or
       (2) the selected state is a polling one, or
       (3) the expected idle period duration is within the tick period
           range.
      
      In addition to that, the correction factor computations in the menu
      governor need to take the possibility that the tick may not be
      stopped into account to avoid artificially small correction factor
      values.  To that end, add a mechanism to record tick wakeups, as
      suggested by Peter Zijlstra, and use it to modify the menu_update()
      behavior when tick wakeup occurs.  Namely, if the CPU is woken up by
      the tick and the return value of tick_nohz_get_sleep_length() is not
      within the tick boundary, the predicted idle duration is likely too
      short, so make menu_update() try to compensate for that by updating
      the governor statistics as though the CPU was idle for a long time.
      
      Since the value returned through the new argument pointer of
      cpuidle_select() is not used by its caller yet, this change by
      itself is not expected to alter the functionality of the code.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      45f1ff59