Skip to content
  • Jeff Moyer's avatar
    ext4: fix race between sync and completed io work · 491caa43
    Jeff Moyer authored
    The following command line will leave the aio-stress process unkillable
    on an ext4 file system (in my case, mounted on /mnt/test):
    
    aio-stress -t 20 -s 10 -O -S -o 2 -I 1000 /mnt/test/aiostress.3561.4 /mnt/test/aiostress.3561.4.20 /mnt/test/aiostress.3561.4.19 /mnt/test/aiostress.3561.4.18 /mnt/test/aiostress.3561.4.17 /mnt/test/aiostress.3561.4.16 /mnt/test/aiostress.3561.4.15 /mnt/test/aiostress.3561.4.14 /mnt/test/aiostress.3561.4.13 /mnt/test/aiostress.3561.4.12 /mnt/test/aiostress.3561.4.11 /mnt/test/aiostress.3561.4.10 /mnt/test/aiostress.3561.4.9 /mnt/test/aiostress.3561.4.8 /mnt/test/aiostress.3561.4.7 /mnt/test/aiostress.3561.4.6 /mnt/test/aiostress.3561.4.5 /mnt/test/aiostress.3561.4.4 /mnt/test/aiostress.3561.4.3 /mnt/test/aiostress.3561.4.2
    
    This is using the aio-stress program from the xfstests test suite.
    That particular command line tells aio-stress to do random writes to
    20 files from 20 threads (one thread per file).  The files are NOT
    preallocated, so you will get writes to random offsets within the
    file, thus creating holes and extending i_size.  It also opens the
    file with O_DIRECT and O_SYNC.
    
    On to the problem.  When an I/O requires unwritten extent conversion,
    it is queued onto the completed_io_list for the ext4 inode.  Two code
    paths will pull work items from this list.  The first is the
    ext4_end_io_work routine, and the second is ext4_flush_completed_IO,
    which is called via the fsync path (and O_SYNC handling, as well).
    There are two issues I've found in these code paths.  First, if the
    fsync path beats the work routine to a particular I/O, the work
    routine will free the io_end structure!  It does not take into account
    the fact that the io_end may still be in use by the fsync path.  I've
    fixed this issue by adding yet another IO_END flag, indicating that
    the io_end is being processed by the fsync path.
    
    The second problem is that the work routine will make an assignment to
    io->flag outside of the lock.  I have witnessed this result in a hang
    at umount.  Moving the flag setting inside the lock resolved that
    problem.
    
    The problem was introduced by commit b82e384c
    
     ("ext4: optimize
    locking for end_io extent conversion"), which first appeared in 3.2.
    As such, the fix should be backported to that release (probably along
    with the unwritten extent conversion race fix).
    
    Signed-off-by: default avatarJeff Moyer <jmoyer@redhat.com>
    Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
    CC: stable@kernel.org
    491caa43