This project is mirrored from Pull mirroring updated .
  1. 28 Oct, 2010 8 commits
    • Theodore Ts'o's avatar
      ext4: rename {exit,init}_ext4_*() to ext4_{exit,init}_*() · 5dabfc78
      Theodore Ts'o authored
      This is a cleanup to avoid namespace leaks out of fs/ext4
      Signed-off-by: default avatar"Theodore Ts'o" <>
    • Lukas Czerner's avatar
      ext4: Add batched discard support for ext4 · 7360d173
      Lukas Czerner authored
      Walk through allocation groups and trim all free extents. It can be
      invoked through FITRIM ioctl on the file system. The main idea is to
      provide a way to trim the whole file system if needed, since some SSD's
      may suffer from performance loss after the whole device was filled (it
      does not mean that fs is full!).
      It search for free extents in allocation groups specified by Byte range
      start -> start+len. When the free extent is within this range, blocks
      are marked as used and then trimmed. Afterwards these blocks are marked
      as free in per-group bitmap.
      Since fstrim is a long operation it is good to have an ability to
      interrupt it by a signal. This was added by Dmitry Monakhov.
      Thanks Dimitry.
      Signed-off-by: default avatarLukas Czerner <>
      Signed-off-by: default avatarDmitry Monakhov <>
      Reviewed-by: default avatarJan Kara <>
      Reviewed-by: default avatarDmitry Monakhov <>
      Signed-off-by: default avatar"Theodore Ts'o" <>
    • Theodore Ts'o's avatar
      ext4: use bio layer instead of buffer layer in mpage_da_submit_io · bd2d0210
      Theodore Ts'o authored
      Call the block I/O layer directly instad of going through the buffer
      layer.  This should give us much better performance and scalability,
      as well as lowering our CPU utilization when doing buffered writeback.
      Signed-off-by: default avatar"Theodore Ts'o" <>
    • Eric Sandeen's avatar
      ext4: remove unused ext4_sb_info members · 640e9396
      Eric Sandeen authored
      Not that these take up a lot of room, but the structure is long enough
      as it is, and there's no need to confuse people with these various
      undocumented & unused structure members...
      Signed-off-by: default avatarEric Sandeen <>
      Signed-off-by: default avatar"Theodore Ts'o" <>
    • Toshiyuki Okajima's avatar
      ext4: improve llseek error handling for overly large seek offsets · e0d10bfa
      Toshiyuki Okajima authored
      The llseek system call should return EINVAL if passed a seek offset
      which results in a write error.  What this maximum offset should be
      depends on whether or not the huge_file file system feature is set,
      and whether or not the file is extent based or not.
      If the file has no "EXT4_EXTENTS_FL" flag, the maximum size which can be 
      written (write systemcall) is different from the maximum size which can be 
      sought (lseek systemcall).
      For example, the following 2 cases demonstrates the differences
      between the maximum size which can be written, versus the seek offset
      allowed by the llseek system call:
      #1: mkfs.ext3 <dev>; mount -t ext4 <dev>
      #2: mkfs.ext3 <dev>; tune2fs -Oextent,huge_file <dev>; mount -t ext4 <dev>
      Table. the max file size which we can write or seek
             at each filesystem feature tuning and file flag setting
      | \ File flag|                               |                               |
      |      \     |     !EXT4_EXTENTS_FL          |        EXT4_EXTETNS_FL        |
      |case       \|                               |                               |
      | #1         |   write:      2194719883264   | write:       --------------   |
      |            |   seek:       2199023251456   | seek:        --------------   |
      | #2         |   write:      4402345721856   | write:       17592186044415   |
      |            |   seek:      17592186044415   | seek:        17592186044415   |
      The differences exist because ext4 has 2 maxbytes which are sb->s_maxbytes
      (= extent-mapped maxbytes) and EXT4_SB(sb)->s_bitmap_maxbytes (= block-mapped 
      maxbytes).  Although generic_file_llseek uses only extent-mapped maxbytes.
      (llseek of ext4_file_operations is generic_file_llseek which uses
      Therefore we create ext4 llseek function which uses 2 maxbytes.
      The new own function originates from generic_file_llseek().
      If the file flag, "EXT4_EXTENTS_FL" is not set, the function alters 
      inode->i_sb->s_maxbytes into EXT4_SB(inode->i_sb)->s_bitmap_maxbytes.
      Signed-off-by: default avatarToshiyuki Okajima <>
      Signed-off-by: default avatar"Theodore Ts'o" <>
      Cc: Andreas Dilger <>
    • Lukas Czerner's avatar
      ext4: add interface to advertise ext4 features in sysfs · 857ac889
      Lukas Czerner authored
      User-space should have the opportunity to check what features doest ext4
      support in each particular copy. This adds easy interface by creating new
      "features" directory in sys/fs/ext4/. In that directory files
      advertising feature names can be created.
      Add lazy_itable_init to the feature list.
      Signed-off-by: default avatarLukas Czerner <>
      Signed-off-by: default avatar"Theodore Ts'o" <>
    • Lukas Czerner's avatar
      ext4: add support for lazy inode table initialization · bfff6873
      Lukas Czerner authored
      When the lazy_itable_init extended option is passed to mke2fs, it
      considerably speeds up filesystem creation because inode tables are
      not zeroed out.  The fact that parts of the inode table are
      uninitialized is not a problem so long as the block group descriptors,
      which contain information regarding how much of the inode table has
      been initialized, has not been corrupted However, if the block group
      checksums are not valid, e2fsck must scan the entire inode table, and
      the the old, uninitialized data could potentially cause e2fsck to
      report false problems.
      Hence, it is important for the inode tables to be initialized as soon
      as possble.  This commit adds this feature so that mke2fs can safely
      use the lazy inode table initialization feature to speed up formatting
      file systems.
      This is done via a new new kernel thread called ext4lazyinit, which is
      created on demand and destroyed, when it is no longer needed.  There
      is only one thread for all ext4 filesystems in the system. When the
      first filesystem with inititable mount option is mounted, ext4lazyinit
      thread is created, then the filesystem can register its request in the
      request list.
      This thread then walks through the list of requests picking up
      scheduled requests and invoking ext4_init_inode_table(). Next schedule
      time for the request is computed by multiplying the time it took to
      zero out last inode table with wait multiplier, which can be set with
      the (init_itable=n) mount option (default is 10).  We are doing
      this so we do not take the whole I/O bandwidth. When the thread is no
      longer necessary (request list is empty) it frees the appropriate
      structures and exits (and can be created later later by another
      We do not disturb regular inode allocations in any way, it just do not
      care whether the inode table is, or is not zeroed. But when zeroing, we
      have to skip used inodes, obviously. Also we should prevent new inode
      allocations from the group, while zeroing is on the way. For that we
      take write alloc_sem lock in ext4_init_inode_table() and read alloc_sem
      in the ext4_claim_inode, so when we are unlucky and allocator hits the
      group which is currently being zeroed, it just has to wait.
      This can be suppresed using the mount option no_init_itable.
      Signed-off-by: default avatarLukas Czerner <>
      Signed-off-by: default avatar"Theodore Ts'o" <>
    • Curt Wohlgemuth's avatar
      ext4: use dedicated slab caches for group_info structures · fb1813f4
      Curt Wohlgemuth authored
      ext4_group_info structures are currently allocated with kmalloc().
      With a typical 4K block size, these are 136 bytes each -- meaning
      they'll each consume a 256-byte slab object.  On a system with many
      ext4 large partitions, that's a lot of wasted kernel slab space.
      (E.g., a single 1TB partition will have about 8000 block groups, using
      about 2MB of slab, of which nearly 1MB is wasted.)
      This patch creates an array of slab pointers created as needed --
      depending on the superblock block size -- and uses these slabs to
      allocate the group info objects.
      Google-Bug-Id: 2980809
      Signed-off-by: default avatarCurt Wohlgemuth <>
      Signed-off-by: default avatar"Theodore Ts'o" <>
  2. 09 Aug, 2010 1 commit
  3. 05 Aug, 2010 1 commit
    • Eric Sandeen's avatar
      ext4: re-inline ext4_rec_len_(to|from)_disk functions · 0cfc9255
      Eric Sandeen authored
      commit 3d0518f4
      , "ext4: New rec_len encoding for very
      large blocksizes" made several changes to this path, but from
      a perf perspective, un-inlining ext4_rec_len_from_disk() seems
      most significant.  This function is called from ext4_check_dir_entry(),
      which on a file-creation workload is called extremely often.
      I tested this with bonnie:
      # bonnie++ -u root -s 0 -f -x 200 -d /mnt/test -n 32
      (this does 200 iterations) and got this for the file creations:
      ext4 stock:   Average =  21206.8 files/s
      ext4 inlined: Average =  22346.7 files/s  (+5%)
      Signed-off-by: default avatarEric Sandeen <>
      Signed-off-by: default avatar"Theodore Ts'o" <>
  4. 02 Aug, 2010 1 commit
  5. 27 Jul, 2010 7 commits
  6. 29 Jun, 2010 3 commits
  7. 14 Jun, 2010 1 commit
  8. 12 Jun, 2010 1 commit
    • Theodore Ts'o's avatar
      ext4: Clean up s_dirt handling · a0375156
      Theodore Ts'o authored
      We don't need to set s_dirt in most of the ext4 code when journaling
      is enabled.  In ext3/4 some of the summary statistics for # of free
      inodes, blocks, and directories are calculated from the per-block
      group statistics when the file system is mounted or unmounted.  As a
      result the superblock doesn't have to be updated, either via the
      journal or by setting s_dirt.  There are a few exceptions, most
      notably when resizing the file system, where the superblock needs to
      be modified --- and in that case it should be done as a journalled
      operation if possible, and s_dirt set only in no-journal mode.
      This patch will optimize out some unneeded disk writes when using ext4
      with a journal.
      Signed-off-by: default avatar"Theodore Ts'o" <>
  9. 28 May, 2010 1 commit
  10. 17 May, 2010 6 commits
  11. 16 May, 2010 2 commits
    • Theodore Ts'o's avatar
      ext4: Add new abstraction ext4_map_blocks() underneath ext4_get_blocks() · e35fd660
      Theodore Ts'o authored
      Jack up ext4_get_blocks() and add a new function, ext4_map_blocks()
      which uses a much smaller structure, struct ext4_map_blocks which is
      20 bytes, as opposed to a struct buffer_head, which nearly 5 times
      bigger on an x86_64 machine.  By switching things to use
      ext4_map_blocks(), we can save stack space by using ext4_map_blocks()
      since we can avoid allocating a struct buffer_head on the stack.
      Signed-off-by: default avatar"Theodore Ts'o" <>
    • Curt Wohlgemuth's avatar
      ext4: check for a good block group before loading buddy pages · 8a57d9d6
      Curt Wohlgemuth authored
      This adds a new field in ext4_group_info to cache the largest available
      block range in a block group; and don't load the buddy pages until *after*
      we've done a sanity check on the block group.
      With large allocation requests (e.g., fallocate(), 8MiB) and relatively full
      partitions, it's easy to have no block groups with a block extent large
      enough to satisfy the input request length.  This currently causes the loop
      during cr == 0 in ext4_mb_regular_allocator() to load the buddy bitmap pages
      for EVERY block group.  That can be a lot of pages.  The patch below allows
      us to call ext4_mb_good_group() BEFORE we load the buddy pages (although we
      have check again after we lock the block group).
      Addresses-Google-Bug: #2578108
      Addresses-Google-Bug: #2704453
      Signed-off-by: default avatarCurt Wohlgemuth <>
      Signed-off-by: default avatar"Theodore Ts'o" <>
  12. 05 Mar, 2010 1 commit
  13. 04 Mar, 2010 1 commit
  14. 02 Mar, 2010 1 commit
  15. 04 Mar, 2010 1 commit
    • Jiaying Zhang's avatar
      ext4: use ext4_get_block_write in buffer write · 744692dc
      Jiaying Zhang authored
      Allocate uninitialized extent before ext4 buffer write and
      convert the extent to initialized after io completes.
      The purpose is to make sure an extent can only be marked
      initialized after it has been written with new data so
      we can safely drop the i_mutex lock in ext4 DIO read without
      exposing stale data. This helps to improve multi-thread DIO
      read performance on high-speed disks.
      Skip the nobh and data=journal mount cases to make things simple for now.
      Signed-off-by: default avatarJiaying Zhang <>
      Signed-off-by: default avatar"Theodore Ts'o" <>
  16. 02 Mar, 2010 1 commit
  17. 24 Feb, 2010 1 commit
  18. 17 Feb, 2010 1 commit
    • Tejun Heo's avatar
      percpu: add __percpu sparse annotations to fs · 003cb608
      Tejun Heo authored
      Add __percpu sparse annotations to fs.
      These annotations are to make sparse consider percpu variables to be
      in a different address space and warn if accessed without going
      through percpu accessors.  This patch doesn't affect normal builds.
      Signed-off-by: default avatarTejun Heo <>
      Cc: "Theodore Ts'o" <>
      Cc: Trond Myklebust <>
      Cc: Alex Elder <>
      Cc: Christoph Hellwig <>
      Cc: Alexander Viro <>
  19. 15 Feb, 2010 1 commit