Skip to content
  • Paul E. McKenney's avatar
    smpboot: Add common code for notification from dying CPU · 8038dad7
    Paul E. McKenney authored
    RCU ignores offlined CPUs, so they cannot safely run RCU read-side code.
    (They -can- use SRCU, but not RCU.)  This means that any use of RCU
    during or after the call to arch_cpu_idle_dead().  Unfortunately,
    commit 2ed53c0d
    
     added a complete() call, which will contain RCU
    read-side critical sections if there is a task waiting to be awakened.
    
    Which, as it turns out, there almost never is.  In my qemu/KVM testing,
    the to-be-awakened task is not yet asleep more than 99.5% of the time.
    In current mainline, failure is even harder to reproduce, requiring a
    virtualized environment that delays the outgoing CPU by at least three
    jiffies between the time it exits its stop_machine() task at CPU_DYING
    time and the time it calls arch_cpu_idle_dead() from the idle loop.
    However, this problem really can occur, especially in virtualized
    environments, and therefore really does need to be fixed
    
    This suggests moving back to the polling loop, but using a much shorter
    wait, with gentle exponential backoff instead of the old 100-millisecond
    wait.  Most of the time, the loop will exit without waiting at all,
    and almost all of the remaining uses will wait only five microseconds.
    If the outgoing CPU is preempted, a loop will wait one jiffy, then
    increase the wait by a factor of 11/10ths, rounding up.  As before, there
    is a five-second timeout.
    
    This commit therefore provides common-code infrastructure to do the
    dying-to-surviving CPU handoff in a safe manner.  This code also
    provides an indication at CPU-online of whether the CPU to be onlined
    previously timed out on offline.  The new cpu_check_up_prepare() function
    returns -EBUSY if this CPU previously took more than five seconds to
    go offline, or -EAGAIN if it has not yet managed to go offline.  The
    rationale for -EAGAIN is that it might still be preempted, so an additional
    wait might well find it correctly offlined.  Architecture-specific code
    can decide how to handle these conditions.  Systems in which CPUs take
    themselves completely offline might respond to an -EBUSY return as if
    it was a zero (success) return.  Systems in which the surviving CPU must
    take some action might take it at this time, or might simply mark the
    other CPU as unusable.
    
    Note that architectures that take the easy way out and simply pass the
    -EBUSY and -EAGAIN upwards will change the sysfs API.
    
    Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
    Cc: <linux-api@vger.kernel.org>
    Cc: <linux-arch@vger.kernel.org>
    [ paulmck: Fixed state machine for architectures that don't check earlier
      CPU-hotplug results as suggested by James Hogan. ]
    8038dad7