Skip to content
Snippets Groups Projects
This project is mirrored from https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git. Pull mirroring updated .
  1. Jun 28, 2011
    • KAMEZAWA Hiroyuki's avatar
      memcg: fix direct softlimit reclaim to be called in limit path · ac34a1a3
      KAMEZAWA Hiroyuki authored
      
      Commit d149e3b2 ("memcg: add the soft_limit reclaim in global direct
      reclaim") adds a softlimit hook to shrink_zones().  By this, soft limit
      is called as
      
         try_to_free_pages()
             do_try_to_free_pages()
                 shrink_zones()
                     mem_cgroup_soft_limit_reclaim()
      
      Then, direct reclaim is memcg softlimit hint aware, now.
      
      But, the memory cgroup's "limit" path can call softlimit shrinker.
      
         try_to_free_mem_cgroup_pages()
             do_try_to_free_pages()
                 shrink_zones()
                     mem_cgroup_soft_limit_reclaim()
      
      This will cause a global reclaim when a memcg hits limit.
      
      This is bug. soft_limit_reclaim() should be called when
      scanning_global_lru(sc) == true.
      
      And the commit adds a variable "total_scanned" for counting softlimit
      scanned pages....it's not "total".  This patch removes the variable and
      update sc->nr_scanned instead of it.  This will affect shrink_slab()'s
      scan condition but, global LRU is scanned by softlimit and I think this
      change makes sense.
      
      TODO: avoid too much scanning of a zone when softlimit did enough work.
      
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Ying Han <yinghan@google.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ac34a1a3
    • Jan Kara's avatar
      mm: fix assertion mapping->nrpages == 0 in end_writeback() · 08142579
      Jan Kara authored
      
      Under heavy memory and filesystem load, users observe the assertion
      mapping->nrpages == 0 in end_writeback() trigger.  This can be caused by
      page reclaim reclaiming the last page from a mapping in the following
      race:
      
      	CPU0				CPU1
        ...
        shrink_page_list()
          __remove_mapping()
            __delete_from_page_cache()
              radix_tree_delete()
      					evict_inode()
      					  truncate_inode_pages()
      					    truncate_inode_pages_range()
      					      pagevec_lookup() - finds nothing
      					  end_writeback()
      					    mapping->nrpages != 0 -> BUG
              page->mapping = NULL
              mapping->nrpages--
      
      Fix the problem by doing a reliable check of mapping->nrpages under
      mapping->tree_lock in end_writeback().
      
      Analyzed by Jay <jinshan.xiong@whamcloud.com>, lost in LKML, and dug out
      by Miklos Szeredi <mszeredi@suse.de>.
      
      Cc: Jay <jinshan.xiong@whamcloud.com>
      Cc: Miklos Szeredi <mszeredi@suse.de>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      08142579
    • Peter Zijlstra's avatar
      mm/memory-failure.c: fix spinlock vs mutex order · 9b679320
      Peter Zijlstra authored
      
      We cannot take a mutex while holding a spinlock, so flip the order and
      fix the locking documentation.
      
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9b679320
    • Hugh Dickins's avatar
      tmpfs: add shmem_read_mapping_page_gfp · d9d90e5e
      Hugh Dickins authored
      
      Although it is used (by i915) on nothing but tmpfs, read_cache_page_gfp()
      is unsuited to tmpfs, because it inserts a page into pagecache before
      calling the filesystem's ->readpage: tmpfs may have pages in swapcache
      which only it knows how to locate and switch to filecache.
      
      At present tmpfs provides a ->readpage method, and copes with this by
      copying pages; but soon we can simplify it by removing its ->readpage.
      Provide shmem_read_mapping_page_gfp() now, ready for that transition,
      
      Export shmem_read_mapping_page_gfp() and add it to list in shmem_fs.h,
      with shmem_read_mapping_page() inline for the common mapping_gfp case.
      
      (shmem_read_mapping_page_gfp or shmem_read_cache_page_gfp? Generally the
      read_mapping_page functions use the mapping's ->readpage, and the
      read_cache_page functions use the supplied filler, so I think
      read_cache_page_gfp was slightly misnamed.)
      
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d9d90e5e
    • Hugh Dickins's avatar
      tmpfs: take control of its truncate_range · 94c1e62d
      Hugh Dickins authored
      
      2.6.35's new truncate convention gave tmpfs the opportunity to control
      its file truncation, no longer enforced from outside by vmtruncate().
      We shall want to build upon that, to handle pagecache and swap together.
      
      Slightly redefine the ->truncate_range interface: let it now be called
      between the unmap_mapping_range()s, with the filesystem responsible for
      doing the truncate_inode_pages_range() from it - just as the filesystem
      is nowadays responsible for doing that from its ->setattr.
      
      Let's rename shmem_notify_change() to shmem_setattr().  Instead of
      calling the generic truncate_setsize(), bring that code in so we can
      call shmem_truncate_range() - which will later be updated to perform its
      own variant of truncate_inode_pages_range().
      
      Remove the punch_hole unmap_mapping_range() from shmem_truncate_range():
      now that the COW's unmap_mapping_range() comes after ->truncate_range,
      there is no need to call it a third time.
      
      Export shmem_truncate_range() and add it to the list in shmem_fs.h, so
      that i915_gem_object_truncate() can call it explicitly in future; get
      this patch in first, then update drm/i915 once this is available (until
      then, i915 will just be doing the truncate_inode_pages() twice).
      
      Though introduced five years ago, no other filesystem is implementing
      ->truncate_range, and its only other user is madvise(,,MADV_REMOVE): we
      expect to convert it to fallocate(,FALLOC_FL_PUNCH_HOLE,,) shortly,
      whereupon ->truncate_range can be removed from inode_operations -
      shmem_truncate_range() will help i915 across that transition too.
      
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      94c1e62d
    • Hugh Dickins's avatar
      mm: move shmem prototypes to shmem_fs.h · 072441e2
      Hugh Dickins authored
      
      Before adding any more global entry points into shmem.c, gather such
      prototypes into shmem_fs.h.  Remove mm's own declarations from swap.h,
      but for now leave the ones in mm.h: because shmem_file_setup() and
      shmem_zero_setup() are called from various places, and we should not
      force other subsystems to update immediately.
      
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      072441e2
    • Hugh Dickins's avatar
      mm: move vmtruncate_range to truncate.c · 5b8ba101
      Hugh Dickins authored
      
      You would expect to find vmtruncate_range() next to vmtruncate() in
      mm/truncate.c: move it there.
      
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarChristoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5b8ba101
  2. Jun 23, 2011
  3. Jun 18, 2011
    • Linus Torvalds's avatar
      mm: avoid anon_vma_chain allocation under anon_vma lock · dd34739c
      Linus Torvalds authored
      
      Hugh Dickins points out that lockdep (correctly) spots a potential
      deadlock on the anon_vma lock, because we now do a GFP_KERNEL allocation
      of anon_vma_chain while doing anon_vma_clone().  The problem is that
      page reclaim will want to take the anon_vma lock of any anonymous pages
      that it will try to reclaim.
      
      So re-organize the code in anon_vma_clone() slightly: first do just a
      GFP_NOWAIT allocation, which will usually work fine.  But if that fails,
      let's just drop the lock and re-do the allocation, now with GFP_KERNEL.
      
      End result: not only do we avoid the locking problem, this also ends up
      getting better concurrency in case the allocation does need to block.
      Tim Chen reports that with all these anon_vma locking tweaks, we're now
      almost back up to the spinlock performance.
      
      Reported-and-tested-by: default avatarHugh Dickins <hughd@google.com>
      Tested-by: default avatarTim Chen <tim.c.chen@linux.intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dd34739c
    • Peter Zijlstra's avatar
      mm: avoid repeated anon_vma lock/unlock sequences in unlink_anon_vmas() · eee2acba
      Peter Zijlstra authored
      
      This matches the anon_vma_clone() case, and uses the same lock helper
      functions.  Because of the need to potentially release the anon_vma's,
      it's a bit more complex, though.
      
      We traverse the 'vma->anon_vma_chain' in two phases: the first loop gets
      the anon_vma lock (with the helper function that only takes the lock
      once for the whole loop), and removes any entries that don't need any
      more processing.
      
      The second phase just traverses the remaining list entries (without
      holding the anon_vma lock), and does any actual freeing of the
      anon_vma's that is required.
      
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Tested-by: default avatarHugh Dickins <hughd@google.com>
      Tested-by: default avatarTim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      eee2acba
    • Linus Torvalds's avatar
      mm: avoid repeated anon_vma lock/unlock sequences in anon_vma_clone() · bb4aa396
      Linus Torvalds authored
      
      In anon_vma_clone() we traverse the vma->anon_vma_chain of the source
      vma, locking the anon_vma for each entry.
      
      But they are all going to have the same root entry, which means that
      we're locking and unlocking the same lock over and over again.  Which is
      expensive in locked operations, but can get _really_ expensive when that
      root entry sees any kind of lock contention.
      
      In fact, Tim Chen reports a big performance regression due to this: when
      we switched to use a mutex instead of a spinlock, the contention case
      gets much worse.
      
      So to alleviate this all, this commit creates a small helper function
      (lock_anon_vma_root()) that can be used to take the lock just once
      rather than taking and releasing it over and over again.
      
      We still have the same "take the lock and release" it behavior in the
      exit path (in unlink_anon_vmas()), but that one is a bit harder to fix
      since we're actually freeing the anon_vma entries as we go, and that
      will touch the lock too.
      
      Reported-and-tested-by: default avatarTim Chen <tim.c.chen@linux.intel.com>
      Tested-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bb4aa396
  4. Jun 16, 2011
  5. Jun 06, 2011
  6. Jun 03, 2011
    • Al Viro's avatar
      more conservative S_NOSEC handling · 9e1f1de0
      Al Viro authored
      
      Caching "we have already removed suid/caps" was overenthusiastic as merged.
      On network filesystems we might have had suid/caps set on another client,
      silently picked by this client on revalidate, all of that *without* clearing
      the S_NOSEC flag.
      
      AFAICS, the only reasonably sane way to deal with that is
      	* new superblock flag; unless set, S_NOSEC is not going to be set.
      	* local block filesystems set it in their ->mount() (more accurately,
      mount_bdev() does, so does btrfs ->mount(), users of mount_bdev() other than
      local block ones clear it)
      	* if any network filesystem (or a cluster one) wants to use S_NOSEC,
      it'll need to set MS_NOSEC in sb->s_flags *AND* take care to clear S_NOSEC when
      inode attribute changes are picked from other clients.
      
      It's not an earth-shattering hole (anybody that can set suid on another client
      will almost certainly be able to write to the file before doing that anyway),
      but it's a bug that needs fixing.
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      9e1f1de0
    • Suleiman Souhlal's avatar
      SLAB: Record actual last user of freed objects. · a947eb95
      Suleiman Souhlal authored
      
      Currently, when using CONFIG_DEBUG_SLAB, we put in kfree() or
      kmem_cache_free() as the last user of free objects, which is not
      very useful, so change it to the caller of those functions instead.
      
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Signed-off-by: default avatarSuleiman Souhlal <suleiman@google.com>
      Signed-off-by: default avatarPekka Enberg <penberg@kernel.org>
      a947eb95
    • Chris Metcalf's avatar
      slub: always align cpu_slab to honor cmpxchg_double requirement · d4d84fef
      Chris Metcalf authored
      
      On an architecture without CMPXCHG_LOCAL but with DEBUG_VM enabled,
      the VM_BUG_ON() in __pcpu_double_call_return_bool() will cause an early
      panic during boot unless we always align cpu_slab properly.
      
      In principle we could remove the alignment-testing VM_BUG_ON() for
      architectures that don't have CMPXCHG_LOCAL, but leaving it in means
      that new code will tend not to break x86 even if it is introduced
      on another platform, and it's low cost to require alignment.
      
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Signed-off-by: default avatarChris Metcalf <cmetcalf@tilera.com>
      Signed-off-by: default avatarPekka Enberg <penberg@kernel.org>
      d4d84fef
  7. Jun 01, 2011
  8. May 29, 2011
Loading