Skip to content
  • Li Jinlin's avatar
    scsi: core: Fix hang of freezing queue between blocking and running device · 02c6dcd5
    Li Jinlin authored
    We found a hang, the steps to reproduce  are as follows:
    
      1. blocking device via scsi_device_set_state()
    
      2. dd if=/dev/sda of=/mnt/t.log bs=1M count=10
    
      3. echo none > /sys/block/sda/queue/scheduler
    
      4. echo "running" >/sys/block/sda/device/state
    
    Step 3 and 4 should complete after step 4, but they hang.
    
      CPU#0               CPU#1                CPU#2
      ---------------     ----------------     ----------------
                                               Step 1: blocking device
    
                                               Step 2: dd xxxx
                                                      ^^^^^^ get request
                                                             q_usage_counter++
    
                          Step 3: switching scheculer
                          elv_iosched_store
                            elevator_switch
                              blk_mq_freeze_queue
                                blk_freeze_queue
                                  > blk_freeze_queue_start
                                    ^^^^^^ mq_freeze_depth++
    
                                  > blk_mq_run_hw_queues
                                    ^^^^^^ can't run queue when dev blocked
    
                                  > blk_mq_freeze_queue_wait
                                    ^^^^^^ Hang here!!!
                                           wait q_usage_counter==0
    
      Step 4: running device
      store_state_field
        scsi_rescan_device
          scsi_attach_vpd
            scsi_vpd_inquiry
              __scsi_execute
                blk_get_request
                  blk_mq_alloc_request
                    blk_queue_enter
                    ^^^^^^ Hang here!!!
                           wait mq_freeze_depth==0
    
        blk_mq_run_hw_queues
        ^^^^^^ dispatch IO, q_usage_counter will reduce to zero
    
                                blk_mq_unfreeze_queue
                                ^^^^^ mq_freeze_depth--
    
    To fix this, we need to run queue before rescanning device when the device
    state changes to SDEV_RUNNING.
    
    Link: https://lore.kernel.org/r/20210824025921.3277629-1-lijinlin3@huawei.com
    Fixes: f0f82e24
    
     ("scsi: core: Fix capacity set to zero after offlinining device")
    Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
    Signed-off-by: default avatarLi Jinlin <lijinlin3@huawei.com>
    Signed-off-by: default avatarQiu Laibin <qiulaibin@huawei.com>
    Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
    02c6dcd5