diff mbox series

scsi: Do not rescan devices with a suspended queue

Message ID 20231004085803.130722-1-dlemoal@kernel.org
State New
Headers show
Series scsi: Do not rescan devices with a suspended queue | expand

Commit Message

Damien Le Moal Oct. 4, 2023, 8:58 a.m. UTC
Commit ff48b37802e5 ("scsi: Do not attempt to rescan suspended devices")
modified scsi_rescan_device() to avoid attempting rescanning a suspended
device. However, the modification added a check to verify that a SCSI
device is in the running state without checking if the device request
queue (in the case of block device) is also running, thus allowing the
exectuion of internal requests. Without checking the device request
queue, commit ff48b37802e5 fix is incomplete and deadlocks on resume can
still happen. Use blk_queue_pm_only() to check if the device request
queue allows executing commands in addition to checking the SCSI device
state.

Reported-by: Petr Tesarik <petr@tesarici.cz>
Fixes: ff48b37802e5 ("scsi: Do not attempt to rescan suspended devices")
Cc: stable@vger.kernel.org
Tested-by: Petr Tesarik <petr@tesarici.cz>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
---
 drivers/scsi/scsi_scan.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

Comments

Petr Tesařík Oct. 9, 2023, 6:17 a.m. UTC | #1
Hi, (adding James)

On Wed,  4 Oct 2023 17:58:03 +0900
Damien Le Moal <dlemoal@kernel.org> wrote:

> Commit ff48b37802e5 ("scsi: Do not attempt to rescan suspended devices")
> modified scsi_rescan_device() to avoid attempting rescanning a suspended
> device. However, the modification added a check to verify that a SCSI
> device is in the running state without checking if the device request
> queue (in the case of block device) is also running, thus allowing the
> exectuion of internal requests. Without checking the device request
> queue, commit ff48b37802e5 fix is incomplete and deadlocks on resume can
> still happen. Use blk_queue_pm_only() to check if the device request
> queue allows executing commands in addition to checking the SCSI device
> state.

FTR this fix is still needed for rc5. Is there any chance it goes into
fixes before v6.6 is final?

Petr T

> Reported-by: Petr Tesarik <petr@tesarici.cz>
> Fixes: ff48b37802e5 ("scsi: Do not attempt to rescan suspended
> devices") Cc: stable@vger.kernel.org
> Tested-by: Petr Tesarik <petr@tesarici.cz>
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>  drivers/scsi/scsi_scan.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index 3db4d31a03a1..b05a55f498a2 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -1627,12 +1627,13 @@ int scsi_rescan_device(struct scsi_device
> *sdev) device_lock(dev);
>  
>  	/*
> -	 * Bail out if the device is not running. Otherwise, the
> rescan may
> -	 * block waiting for commands to be executed, with us
> holding the
> -	 * device lock. This can result in a potential deadlock in
> the power
> -	 * management core code when system resume is on-going.
> +	 * Bail out if the device or its queue are not running.
> Otherwise,
> +	 * the rescan may block waiting for commands to be executed,
> with us
> +	 * holding the device lock. This can result in a potential
> deadlock
> +	 * in the power management core code when system resume is
> on-going. */
> -	if (sdev->sdev_state != SDEV_RUNNING) {
> +	if (sdev->sdev_state != SDEV_RUNNING ||
> +	    blk_queue_pm_only(sdev->request_queue)) {
>  		ret = -EWOULDBLOCK;
>  		goto unlock;
>  	}
Damien Le Moal Oct. 9, 2023, 10:36 p.m. UTC | #2
On 10/9/23 15:17, Petr Tesařík wrote:
> Hi, (adding James)
> 
> On Wed,  4 Oct 2023 17:58:03 +0900
> Damien Le Moal <dlemoal@kernel.org> wrote:
> 
>> Commit ff48b37802e5 ("scsi: Do not attempt to rescan suspended devices")
>> modified scsi_rescan_device() to avoid attempting rescanning a suspended
>> device. However, the modification added a check to verify that a SCSI
>> device is in the running state without checking if the device request
>> queue (in the case of block device) is also running, thus allowing the
>> exectuion of internal requests. Without checking the device request
>> queue, commit ff48b37802e5 fix is incomplete and deadlocks on resume can
>> still happen. Use blk_queue_pm_only() to check if the device request
>> queue allows executing commands in addition to checking the SCSI device
>> state.
> 
> FTR this fix is still needed for rc5. Is there any chance it goes into
> fixes before v6.6 is final?

The patch is on the scsi list, not for libata. Martin will likely apply it this
week.

Martin ? Please confirm !
Martin K. Petersen Oct. 10, 2023, 1:43 a.m. UTC | #3
Damien,

>> FTR this fix is still needed for rc5. Is there any chance it goes
>> into fixes before v6.6 is final?
>
> The patch is on the scsi list, not for libata. Martin will likely
> apply it this week.

Patch looks good to me but I can't apply since the PM rework series went
through your tree. My fixes branch is based on -rc1. So I'd prefer for
you to take it.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Damien Le Moal Oct. 10, 2023, 2:23 a.m. UTC | #4
On 10/10/23 10:43, Martin K. Petersen wrote:
> 
> Damien,
> 
>>> FTR this fix is still needed for rc5. Is there any chance it goes
>>> into fixes before v6.6 is final?
>>
>> The patch is on the scsi list, not for libata. Martin will likely
>> apply it this week.
> 
> Patch looks good to me but I can't apply since the PM rework series went
> through your tree. My fixes branch is based on -rc1. So I'd prefer for
> you to take it.
> 
> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

OK. Thanks !
diff mbox series

Patch

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 3db4d31a03a1..b05a55f498a2 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1627,12 +1627,13 @@  int scsi_rescan_device(struct scsi_device *sdev)
 	device_lock(dev);
 
 	/*
-	 * Bail out if the device is not running. Otherwise, the rescan may
-	 * block waiting for commands to be executed, with us holding the
-	 * device lock. This can result in a potential deadlock in the power
-	 * management core code when system resume is on-going.
+	 * Bail out if the device or its queue are not running. Otherwise,
+	 * the rescan may block waiting for commands to be executed, with us
+	 * holding the device lock. This can result in a potential deadlock
+	 * in the power management core code when system resume is on-going.
 	 */
-	if (sdev->sdev_state != SDEV_RUNNING) {
+	if (sdev->sdev_state != SDEV_RUNNING ||
+	    blk_queue_pm_only(sdev->request_queue)) {
 		ret = -EWOULDBLOCK;
 		goto unlock;
 	}