Skip to content
  • Michal Hocko's avatar
    kernel, oom: fix potential pgd_lock deadlock from __mmdrop · 7283094e
    Michal Hocko authored
    Lockdep complains that __mmdrop is not safe from the softirq context:
    
      =================================
      [ INFO: inconsistent lock state ]
      4.6.0-oomfortification2-00011-geeb3eadeab96-dirty #949 Tainted: G        W
      ---------------------------------
      inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      swapper/1/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
       (pgd_lock){+.?...}, at: pgd_free+0x19/0x6b
      {SOFTIRQ-ON-W} state was registered at:
         __lock_acquire+0xa06/0x196e
         lock_acquire+0x139/0x1e1
         _raw_spin_lock+0x32/0x41
         __change_page_attr_set_clr+0x2a5/0xacd
         change_page_attr_set_clr+0x16f/0x32c
         set_memory_nx+0x37/0x3a
         free_init_pages+0x9e/0xc7
         alternative_instructions+0xa2/0xb3
         check_bugs+0xe/0x2d
         start_kernel+0x3ce/0x3ea
         x86_64_start_reservations+0x2a/0x2c
         x86_64_start_kernel+0x17a/0x18d
      irq event stamp: 105916
      hardirqs last  enabled at (105916): free_hot_cold_page+0x37e/0x390
      hardirqs last disabled at (105915): free_hot_cold_page+0x2c1/0x390
      softirqs last  enabled at (105878): _local_bh_enable+0x42/0x44
      softirqs last disabled at (105879): irq_exit+0x6f/0xd1
    
      other info that might help us debug this:
       Possible unsafe locking scenario:
    
             CPU0
             ----
        lock(pgd_lock);
        <Interrupt>
          lock(pgd_lock);
    
       *** DEADLOCK ***
    
      1 lock held by swapper/1/0:
       #0:  (rcu_callback){......}, at: rcu_process_callbacks+0x390/0x800
    
      stack backtrace:
      CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W       4.6.0-oomfortification2-00011-geeb3eadeab96-dirty #949
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
      Call Trace:
       <IRQ>
        print_usage_bug.part.25+0x259/0x268
        mark_lock+0x381/0x567
        __lock_acquire+0x993/0x196e
        lock_acquire+0x139/0x1e1
        _raw_spin_lock+0x32/0x41
        pgd_free+0x19/0x6b
        __mmdrop+0x25/0xb9
        __put_task_struct+0x103/0x11e
        delayed_put_task_struct+0x157/0x15e
        rcu_process_callbacks+0x660/0x800
        __do_softirq+0x1ec/0x4d5
        irq_exit+0x6f/0xd1
        smp_apic_timer_interrupt+0x42/0x4d
        apic_timer_interrupt+0x8e/0xa0
       <EOI>
        arch_cpu_idle+0xf/0x11
        default_idle_call+0x32/0x34
        cpu_startup_entry+0x20c/0x399
        start_secondary+0xfe/0x101
    
    More over commit a79e53d8 ("x86/mm: Fix pgd_lock deadlock") was
    explicit about pgd_lock not to be called from the irq context.  This
    means that __mmdrop called from free_signal_struct has to be postponed
    to a user context.  We already have a similar mechanism for mmput_async
    so we can use it here as well.  This is safe because mm_count is pinned
    by mm_users.
    
    This fixes bug introduced by "oom: keep mm of the killed task available"
    
    Link: http://lkml.kernel.org/r/1472119394-11342-5-git-send-email-mhocko@kernel.org
    
    
    Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
    Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
    Cc: Oleg Nesterov <oleg@redhat.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Vladimir Davydov <vdavydov@parallels.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    7283094e