This project is mirrored from https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git. Pull mirroring updated .
  1. 02 Nov, 2009 1 commit
  2. 03 Oct, 2009 2 commits
  3. 01 Oct, 2009 1 commit
  4. 29 Sep, 2009 2 commits
  5. 28 Sep, 2009 4 commits
    • Mingming Cao's avatar
      ext4: async direct IO for holes and fallocate support · 8d5d02e6
      Mingming Cao authored
      
      
      For async direct IO that covers holes or fallocate, the end_io
      callback function now queued the convertion work on workqueue but
      don't flush the work rightaway as it might take too long to afford.
      
      But when fsync is called after all the data is completed, user expects
      the metadata also being updated before fsync returns.
      
      Thus we need to flush the conversion work when fsync() is called.
      This patch keep track of a listed of completed async direct io that
      has a work queued on workqueue.  When fsync() is called, it will go
      through the list and do the conversion.
      
      Signed-off-by: default avatarMingming Cao <cmm@us.ibm.com>
      8d5d02e6
    • Mingming Cao's avatar
      ext4: Use end_io callback to avoid direct I/O fallback to buffered I/O · 4c0425ff
      Mingming Cao authored
      
      
      Currently the DIO VFS code passes create = 0 when writing to the
      middle of file.  It does this to avoid block allocation for holes, so
      as not to expose stale data out when there is a parallel buffered read
      (which does not hold the i_mutex lock).  Direct I/O writes into holes
      falls back to buffered IO for this reason.
      
      Since preallocated extents are treated as holes when doing a
      get_block() look up (buffer is not mapped), direct IO over fallocate
      also falls back to buffered IO.  Thus ext4 actually silently falls
      back to buffered IO in above two cases, which is undesirable.
      
      To fix this, this patch creates unitialized extents when a direct I/O
      write into holes in sparse files, and registering an end_io callback which
      converts the uninitialized extent to an initialized extent after the
      I/O is completed.
      
      Singed-Off-By: default avatarMingming Cao <cmm@us.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      4c0425ff
    • Mingming Cao's avatar
      ext4: Split uninitialized extents for direct I/O · 0031462b
      Mingming Cao authored
      
      
      When writing into an unitialized extent via direct I/O, and the direct
      I/O doesn't exactly cover the unitialized extent, split the extent
      into uninitialized and initialized extents before submitting the I/O.
      This avoids needing to deal with an ENOSPC error in the end_io
      callback that gets used for direct I/O.
      
      When the IO is complete, the written extent will be marked as initialized.
      
      Singed-Off-By: default avatarMingming Cao <cmm@us.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      0031462b
    • Mingming Cao's avatar
      ext4: release reserved quota when block reservation for delalloc retry · 9f0ccfd8
      Mingming Cao authored
      
      
      ext4_da_reserve_space() can reserve quota blocks multiple times if
      ext4_claim_free_blocks() fail and we retry the allocation. We should
      release the quota reservation before restarting.
      
      Bug found by Jan Kara.
      
      Signed-off-by: default avatarMingming Cao <cmm@us.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      9f0ccfd8
  6. 29 Sep, 2009 1 commit
    • Theodore Ts'o's avatar
      ext4: Adjust ext4_da_writepages() to write out larger contiguous chunks · 55138e0b
      Theodore Ts'o authored
      
      
      Work around problems in the writeback code to force out writebacks in
      larger chunks than just 4mb, which is just too small.  This also works
      around limitations in the ext4 block allocator, which can't allocate
      more than 2048 blocks at a time.  So we need to defeat the round-robin
      characteristics of the writeback code and try to write out as many
      blocks in one inode before allowing the writeback code to move on to
      another inode.  We add a a new per-filesystem tunable,
      max_writeback_mb_bump, which caps this to a default of 128mb per
      inode.
      
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      55138e0b
  7. 26 Sep, 2009 1 commit
  8. 21 Sep, 2009 1 commit
  9. 17 Sep, 2009 1 commit
    • Theodore Ts'o's avatar
      ext4: Fix the alloc on close after a truncate hueristic · 5534fb5b
      Theodore Ts'o authored
      
      
      In an attempt to avoid doing an unneeded flush after opening a
      (previously non-existent) file with O_CREAT|O_TRUNC, the code only
      triggered the hueristic if ei->disksize was non-zero.  Turns out that
      the VFS doesn't call ->truncate() if the file doesn't exist, and
      ei->disksize is always zero even if the file previously existed.  So
      remove the test, since it isn't necessary and in fact disabled the
      hueristic.
      
      Thanks to Clemens Eisserer that he was seeing problems with files
      written using kwrite and eclipse after sudden crashes caused by a
      buggy Intel video driver.
      
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      5534fb5b
  10. 16 Sep, 2009 1 commit
  11. 17 Sep, 2009 1 commit
    • Theodore Ts'o's avatar
      ext4: store EXT4_EXT_MIGRATE in i_state instead of i_flags · 1b9c12f4
      Theodore Ts'o authored
      
      
      EXT4_EXT_MIGRATE is only intended to be used for an in-memory flag,
      and the hex value assigned to it collides with FS_DIRECTIO_FL (which
      is also stored in i_flags).  There's no reason for the
      EXT4_EXT_MIGRATE bit to be stored in i_flags, so we switch it to use
      i_state instead.
      
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      1b9c12f4
  12. 16 Sep, 2009 2 commits
    • Eric Sandeen's avatar
      ext4: limit block allocations for indirect-block files to < 2^32 · fb0a387d
      Eric Sandeen authored
      
      
      Today, the ext4 allocator will happily allocate blocks past
      2^32 for indirect-block files, which results in the block
      numbers getting truncated, and corruption ensues.
      
      This patch limits such allocations to < 2^32, and adds
      BUG_ONs if we do get blocks larger than that.
      
      This should address RH Bug 519471, ext4 bitmap allocator 
      must limit blocks to < 2^32
      
      * ext4_find_goal() is modified to choose a goal < UINT_MAX,
        so that our starting point is in an acceptable range.
      
      * ext4_xattr_block_set() is modified such that the goal block
        is < UINT_MAX, as above.
      
      * ext4_mb_regular_allocator() is modified so that the group
        search does not continue into groups which are too high
      
      * ext4_mb_use_preallocated() has a check that we don't use
        preallocated space which is too far out
      
      * ext4_alloc_blocks() and ext4_xattr_block_set() add some BUG_ONs
      
      No attempt has been made to limit inode locations to < 2^32,
      so we may wind up with blocks far from their inodes.  Doing
      this much already will lead to some odd ENOSPC issues when the
      "lower 32" gets full, and further restricting inodes could
      make that even weirder.
      
      For high inodes, choosing a goal of the original, % UINT_MAX,
      may be a bit odd, but then we're in an odd situation anyway,
      and I don't know of a better heuristic.
      
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      fb0a387d
    • Andi Kleen's avatar
      HWPOISON: Enable .remove_error_page for migration aware file systems · aa261f54
      Andi Kleen authored
      
      
      Enable removing of corrupted pages through truncation
      for a bunch of file systems: ext*, xfs, gfs2, ocfs2, ntfs
      These should cover most server needs.
      
      I chose the set of migration aware file systems for this
      for now, assuming they have been especially audited.
      But in general it should be safe for all file systems
      on the data area that support read/write and truncate.
      
      Caveat: the hardware error handler does not take i_mutex
      for now before calling the truncate function. Is that ok?
      
      Cc: tytso@mit.edu
      Cc: hch@infradead.org
      Cc: mfasheh@suse.com
      Cc: aia21@cantab.net
      Cc: hugh.dickins@tiscali.co.uk
      Cc: swhiteho@redhat.com
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      aa261f54
  13. 10 Sep, 2009 1 commit
    • Frank Mayhar's avatar
      ext4: Make non-journal fsync work properly · 91ac6f43
      Frank Mayhar authored
      
      
      Teach ext4_write_inode() and ext4_do_update_inode() about non-journal
      mode:  If we're not using a journal, ext4_write_inode() now calls
      ext4_do_update_inode() (after getting the iloc via ext4_get_inode_loc())
      with a new "do_sync" parameter.  If that parameter is nonzero _and_ we're
      not using a journal, ext4_do_update_inode() calls sync_dirty_buffer()
      instead of ext4_handle_dirty_metadata().
      
      This problem was found in power-fail testing, checking the amount of
      loss of files and blocks after a power failure when using fsync() and
      when not using fsync().  It turned out that using fsync() was actually
      worse than not doing so, possibly because it increased the likelihood
      that the inodes would remain unflushed and would therefore be lost at
      the power failure.
      
      Signed-off-by: default avatarFrank Mayhar <fmayhar@google.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      91ac6f43
  14. 08 Sep, 2009 1 commit
  15. 10 Sep, 2009 1 commit
  16. 01 Sep, 2009 1 commit
  17. 31 Aug, 2009 1 commit
    • Theodore Ts'o's avatar
      ext4: Restore wbc->range_start in ext4_da_writepages() · de89de6e
      Theodore Ts'o authored
      To solve a lock inversion problem, we implement part of the
      range_cyclic algorithm in ext4_da_writepages().  (See commit 2acf2c26
      
      
      for more details.)
      
      As part of that change wbc->range_start was modified by ext4's
      writepages function, which causes its callers to get confused since
      they aren't expecting the filesystem to modify it.  The simplest fix
      is to save and restore wbc->range_start in ext4_da_writepages.
      
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      de89de6e
  18. 18 Aug, 2009 1 commit
    • Jan Kara's avatar
      ext4: Fix possible deadlock between ext4_truncate() and ext4_get_blocks() · 487caeef
      Jan Kara authored
      
      
      During truncate we are sometimes forced to start a new transaction as
      the amount of blocks to be journaled is both quite large and hard to
      predict. So far we restarted a transaction while holding i_data_sem
      and that violates lock ordering because i_data_sem ranks below a
      transaction start (and it can lead to a real deadlock with
      ext4_get_blocks() mapping blocks in some page while having a
      transaction open).
      
      We fix the problem by dropping the i_data_sem before restarting the
      transaction and acquire it afterwards. It's slightly subtle that this
      works:
      
      1) By the time ext4_truncate() is called, all the page cache for the
      truncated part of the file is dropped so get_block() should not be
      called on it (we only have to invalidate extent cache after we
      reacquire i_data_sem because some extent from not-truncated part could
      extend also into the part we are going to truncate).
      
      2) Writes, migrate or defrag hold i_mutex so they are stopped for all
      the time of the truncate.
      
      This bug has been found and analyzed by Theodore Tso <tytso@mit.edu>.
      
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      487caeef
  19. 11 Aug, 2009 1 commit
  20. 13 Jul, 2009 1 commit
    • Curt Wohlgemuth's avatar
      ext4: Fix buffer head reference leak in no-journal mode · e6b5d301
      Curt Wohlgemuth authored
      
      
      We found a problem with buffer head reference leaks when using an ext4
      partition without a journal.  In particular, calls to ext4_forget() would
      not to a brelse() on the input buffer head, which will cause pages they
      belong to to not be reclaimable.
      
      Further investigation showed that all places where ext4_journal_forget() and
      ext4_journal_revoke() are called are subject to the same problem.  The patch
      below changes __ext4_journal_forget/__ext4_journal_revoke to do an explicit
      release of the buffer head when the journal handle isn't valid.
      
      Signed-off-by: default avatarCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      e6b5d301
  21. 17 Jul, 2009 1 commit
    • Curt Wohlgemuth's avatar
      ext4: More buffer head reference leaks · 6487a9d3
      Curt Wohlgemuth authored
      
      
      After the patch I posted last week regarding buffer head ref leaks in
      no-journal mode, I looked at all the code that uses buffer heads and
      searched for more potential leaks.
      
      The patch below fixes the issues I found; these can occur even when a
      journal is present.
      
      The change to inode.c fixes a double release if
      ext4_journal_get_create_access() fails.
      
      The changes to namei.c are more complicated.  add_dirent_to_buf() will
      release the input buffer head EXCEPT when it returns -ENOSPC.  There are
      some callers of this routine that don't always do the brelse() in the event
      that -ENOSPC is returned.  Unfortunately, to put this fix into ext4_add_entry()
      required capturing the return value of make_indexed_dir() and
      add_dirent_to_buf().
      
      Signed-off-by: default avatarCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      6487a9d3
  22. 24 Jun, 2009 1 commit
  23. 15 Jun, 2009 1 commit
    • Theodore Ts'o's avatar
      ext4: Don't update ctime for non-extent-mapped inodes · 41591750
      Theodore Ts'o authored
      
      
      The VFS handles updating ctime, so we don't need to update the inode's
      ctime in ext4_splace_branch() to update the direct or indirect blocks.
      This was harmless when we did this in ext3, but in ext4, thanks to
      delayed allocation, updating the ctime in ext4_splice_branch() can
      cause the ctime to mysteriously jump when the blocks are finally
      allocated.
      
      Thanks to Björn Steinbrink for pointing out this problem on the git
      mailing list.
      
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      41591750
  24. 14 Jun, 2009 4 commits
  25. 13 Jun, 2009 2 commits
  26. 09 Jun, 2009 1 commit
  27. 05 Jun, 2009 2 commits
  28. 09 Jun, 2009 1 commit
    • Jan Kara's avatar
      ext4: Get rid of EXTEND_DISKSIZE flag of ext4_get_blocks_handle() · 03f5d8bc
      Jan Kara authored
      
      
      Get rid of EXTEND_DISKSIZE flag of ext4_get_blocks_handle(). This
      seems to be a relict from some old days and setting disksize in this
      function does not make much sense.  Currently it was set only by
      ext4_getblk().  Since the parameter has some effect only if create ==
      1, it is easy to check by grepping through the sources that the three
      callers which end up calling ext4_getblk() with create == 1
      (ext4_append, ext4_quota_write, ext4_mkdir) do the right thing and set
      disksize themselves.
      
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      03f5d8bc
  29. 04 Jun, 2009 1 commit