diff mbox series

scsi: qla2xxx avoid a panic due to BUG() if a WRITE_SAME command is sent to a device that has no protection

Message ID 77f405a048b07e4451b7d7adaeba7ce4a00b7efb.camel@redhat.com
State New
Headers show
Series scsi: qla2xxx avoid a panic due to BUG() if a WRITE_SAME command is sent to a device that has no protection | expand

Commit Message

Laurence Oberman June 28, 2023, 11:34 a.m. UTC
In the current code, If a device does not have protection, qla2xx will
land up defaulting to a BUG() and will panic the system when
sg_write_same is sent.This is because SCSI_PROT_NORMAL is matched and
falls through to the BUG() call.
The write_same command to a device without protection is not handled
safely.

Fix this by making two changes:
Set the bundling variable also to 0 for SCSI_PROT_NORMAL
Modify the switch statement to match on SCSI_PROT_NORMAL and handle it
appropriately removing the call to BUG()

Supersedes prior suggested patch.
 
Suggested-by: David Jeffery <djeffery@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Laurence Oberman <loberman@redhat.com>
---
 drivers/scsi/qla2xxx/qla_iocb.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

 	/* Allocate CRC context from global pool */
@@ -1443,6 +1444,9 @@ qla24xx_build_scsi_crc_2_iocbs(srb_t *sp, struct
cmd_type_crc_2 *cmd_pkt,
 	dif_bytes = (data_bytes / blk_size) * 8;
 
 	switch (scsi_get_prot_op(GET_CMD_SP(sp))) {
+	case SCSI_PROT_NORMAL:
+		total_bytes = data_bytes;
+		break;
 	case SCSI_PROT_READ_INSERT:
 	case SCSI_PROT_WRITE_STRIP:
 		total_bytes = data_bytes;

Comments

Himanshu Madhani June 28, 2023, 4:20 p.m. UTC | #1
Hi Laurence,

> On Jun 28, 2023, at 4:34 AM, Laurence Oberman <loberman@redhat.com> wrote:
> 
> In the current code, If a device does not have protection, qla2xx will
> land up defaulting to a BUG() and will panic the system when
> sg_write_same is sent.This is because SCSI_PROT_NORMAL is matched and
> falls through to the BUG() call.
> The write_same command to a device without protection is not handled
> safely.
> 
> Fix this by making two changes:
> Set the bundling variable also to 0 for SCSI_PROT_NORMAL
> Modify the switch statement to match on SCSI_PROT_NORMAL and handle it
> appropriately removing the call to BUG()
> 

This should go to stable kernel as well. 

Cc: stable@vger.kernel.org <mailto:stable@vger.kernel.org>

> Supersedes prior suggested patch.
> 
> Suggested-by: David Jeffery <djeffery@redhat.com>
> Tested-by: Laurence Oberman <loberman@redhat.com>
> Signed-off-by: Laurence Oberman <loberman@redhat.com>
> ---
> drivers/scsi/qla2xxx/qla_iocb.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_iocb.c
> b/drivers/scsi/qla2xxx/qla_iocb.c
> index b9b3e6f80ea9..82a5d195e401 100644
> --- a/drivers/scsi/qla2xxx/qla_iocb.c
> +++ b/drivers/scsi/qla2xxx/qla_iocb.c
> @@ -1381,7 +1381,8 @@ qla24xx_build_scsi_crc_2_iocbs(srb_t *sp, struct
> cmd_type_crc_2 *cmd_pkt,
> if ((scsi_get_prot_op(cmd) == SCSI_PROT_READ_INSERT) ||
>    (scsi_get_prot_op(cmd) == SCSI_PROT_WRITE_STRIP) ||
>    (scsi_get_prot_op(cmd) == SCSI_PROT_READ_STRIP) ||
> -    (scsi_get_prot_op(cmd) == SCSI_PROT_WRITE_INSERT))
> +    (scsi_get_prot_op(cmd) == SCSI_PROT_WRITE_INSERT) ||
> +    (scsi_get_prot_op(cmd) == SCSI_PROT_NORMAL))
> bundling = 0;
> 
> /* Allocate CRC context from global pool */
> @@ -1443,6 +1444,9 @@ qla24xx_build_scsi_crc_2_iocbs(srb_t *sp, struct
> cmd_type_crc_2 *cmd_pkt,
> dif_bytes = (data_bytes / blk_size) * 8;
> 
> switch (scsi_get_prot_op(GET_CMD_SP(sp))) {
> + case SCSI_PROT_NORMAL:
> + total_bytes = data_bytes;
> + break;
> case SCSI_PROT_READ_INSERT:
> case SCSI_PROT_WRITE_STRIP:
> total_bytes = data_bytes;
> -- 
> 2.39.3

The change itself looks good. 

Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com <mailto:himanshu.madhani@oracle.com>>
Martin K. Petersen July 6, 2023, 2:35 a.m. UTC | #2
Laurence,

> In the current code, If a device does not have protection, qla2xx will
> land up defaulting to a BUG() and will panic the system when
> sg_write_same is sent.This is because SCSI_PROT_NORMAL is matched and
> falls through to the BUG() call. The write_same command to a device
> without protection is not handled safely.

I would like to understand why the driver PI code path was chosen for a
SCSI_PROT_NORMAL cmnd. That doesn't seem right.

> +	case SCSI_PROT_NORMAL:
> +		total_bytes = data_bytes;
> +		break;
>  	case SCSI_PROT_READ_INSERT:
>  	case SCSI_PROT_WRITE_STRIP:
>  		total_bytes = data_bytes;

All this transfer size wrangling in the driver should be removed and
replaced with a call to scsi_transfer_length() which takes the PI size
into account.
Laurence Oberman July 6, 2023, 1:41 p.m. UTC | #3
On Wed, 2023-07-05 at 22:35 -0400, Martin K. Petersen wrote:
> 
> Laurence,
> 
> > In the current code, If a device does not have protection, qla2xx
> > will
> > land up defaulting to a BUG() and will panic the system when
> > sg_write_same is sent.This is because SCSI_PROT_NORMAL is matched
> > and
> > falls through to the BUG() call. The write_same command to a device
> > without protection is not handled safely.
> 
> I would like to understand why the driver PI code path was chosen for
> a
> SCSI_PROT_NORMAL cmnd. That doesn't seem right.
> 
> > +       case SCSI_PROT_NORMAL:
> > +               total_bytes = data_bytes;
> > +               break;
> >         case SCSI_PROT_READ_INSERT:
> >         case SCSI_PROT_WRITE_STRIP:
> >                 total_bytes = data_bytes;
> 
> All this transfer size wrangling in the driver should be removed and
> replaced with a call to scsi_transfer_length() which takes the PI
> size
> into account.


Martin, good questions

I am waiting on Marvell to decide what to do about all this.

The patch was focused only on avoiding the nasty effect of what you
mention happens in that code path, causing customers to have system
panics.

The patch is to avoid the panic at this point with minimal impact to
prior code functionality.

Regards
Laurence



>
Laurence Oberman July 11, 2023, 7:22 p.m. UTC | #4
On Thu, 2023-07-06 at 09:41 -0400, Laurence Oberman wrote:
> On Wed, 2023-07-05 at 22:35 -0400, Martin K. Petersen wrote:
> > 
> > Laurence,
> > 
> > > In the current code, If a device does not have protection, qla2xx
> > > will
> > > land up defaulting to a BUG() and will panic the system when
> > > sg_write_same is sent.This is because SCSI_PROT_NORMAL is matched
> > > and
> > > falls through to the BUG() call. The write_same command to a
> > > device
> > > without protection is not handled safely.
> > 
> > I would like to understand why the driver PI code path was chosen
> > for
> > a
> > SCSI_PROT_NORMAL cmnd. That doesn't seem right.
> > 
> > > +       case SCSI_PROT_NORMAL:
> > > +               total_bytes = data_bytes;
> > > +               break;
> > >         case SCSI_PROT_READ_INSERT:
> > >         case SCSI_PROT_WRITE_STRIP:
> > >                 total_bytes = data_bytes;
> > 
> > All this transfer size wrangling in the driver should be removed
> > and
> > replaced with a call to scsi_transfer_length() which takes the PI
> > size
> > into account.
> 
> 
> Martin, good questions
> 
> I am waiting on Marvell to decide what to do about all this.
> 
> The patch was focused only on avoiding the nasty effect of what you
> mention happens in that code path, causing customers to have system
> panics.
> 
> The patch is to avoid the panic at this point with minimal impact to
> prior code functionality.
> 
> Regards
> Laurence
> 
> 
> 
> > 
> 

Hello Nilesh and Marvell

Any chance to get comments/eyes on this please.
Given its causing system crashes we need to decide how best to deal
with it.

regards
Laurence
Quinn Tran July 12, 2023, 12:34 a.m. UTC | #5
Hello Nilesh and Marvell

Any chance to get comments/eyes on this please.
Given its causing system crashes we need to decide how best to deal with it.

QT:  Laurence,
In understanding the severity,  Does end customer uses sg_write_same as the mechanism to move data?
Other than the sg_write_same utility, how common is end customer uses 32Byte CDB?   It seems like upper layer doesn't  have support for 32Bytes CDB at this time.

The code path you're modifying is for the T10-PI disk.  This disk is "non-T10-PI" where it may create some confusion for next reader n Martin on why we've wander down this code path.

Will queue up a patch that plug this hole.
Laurence Oberman July 12, 2023, 1:25 p.m. UTC | #6
On Wed, 2023-07-12 at 00:34 +0000, Quinn Tran wrote:
> Hello Nilesh and Marvell
> 
> Any chance to get comments/eyes on this please.
> Given its causing system crashes we need to decide how best to deal
> with it.
> 
> QT:  Laurence,
> In understanding the severity,  Does end customer uses sg_write_same
> as the mechanism to move data?
> Other than the sg_write_same utility, how common is end customer uses
> 32Byte CDB?   It seems like upper layer doesn't  have support for
> 32Bytes CDB at this time.
> 
> The code path you're modifying is for the T10-PI disk.  This disk is
> "non-T10-PI" where it may create some confusion for next reader n
> Martin on why we've wander down this code path.
> 
> Will queue up a patch that plug this hole.
> 
> 
> 

OK, Thank you
In this case the customer was specifically using sg_write_same. I am
not sure if it was part of a script or some other use case.
They were of the opinion it was severe enough of an issue to warrant
fixing so they logged a case with us.
Thanks for looking into this.

Regards
Laurence
Laurence Oberman July 26, 2023, 12:40 p.m. UTC | #7
On Wed, 2023-07-12 at 09:25 -0400, Laurence Oberman wrote:
> On Wed, 2023-07-12 at 00:34 +0000, Quinn Tran wrote:
> > Hello Nilesh and Marvell
> > 
> > Any chance to get comments/eyes on this please.
> > Given its causing system crashes we need to decide how best to deal
> > with it.
> > 
> > QT:  Laurence,
> > In understanding the severity,  Does end customer uses
> > sg_write_same
> > as the mechanism to move data?
> > Other than the sg_write_same utility, how common is end customer
> > uses
> > 32Byte CDB?   It seems like upper layer doesn't  have support for
> > 32Bytes CDB at this time.
> > 
> > The code path you're modifying is for the T10-PI disk.  This disk
> > is
> > "non-T10-PI" where it may create some confusion for next reader n
> > Martin on why we've wander down this code path.
> > 
> > Will queue up a patch that plug this hole.
> > 
> > 
> > 
> 
> OK, Thank you
> In this case the customer was specifically using sg_write_same. I am
> not sure if it was part of a script or some other use case.
> They were of the opinion it was severe enough of an issue to warrant
> fixing so they logged a case with us.
> Thanks for looking into this.
> 
> Regards
> Laurence
Hello QT

Did I miss an update to this. Was another fix sent.
We need to deal with this at Red Hat as soon as possible please.

Regards
Laurence
Laurence Oberman July 31, 2023, 1:02 p.m. UTC | #8
On Wed, 2023-07-26 at 08:40 -0400, Laurence Oberman wrote:
> On Wed, 2023-07-12 at 09:25 -0400, Laurence Oberman wrote:
> > On Wed, 2023-07-12 at 00:34 +0000, Quinn Tran wrote:
> > > Hello Nilesh and Marvell
> > > 
> > > Any chance to get comments/eyes on this please.
> > > Given its causing system crashes we need to decide how best to
> > > deal
> > > with it.
> > > 
> > > QT:  Laurence,
> > > In understanding the severity,  Does end customer uses
> > > sg_write_same
> > > as the mechanism to move data?
> > > Other than the sg_write_same utility, how common is end customer
> > > uses
> > > 32Byte CDB?   It seems like upper layer doesn't  have support for
> > > 32Bytes CDB at this time.
> > > 
> > > The code path you're modifying is for the T10-PI disk.  This disk
> > > is
> > > "non-T10-PI" where it may create some confusion for next reader n
> > > Martin on why we've wander down this code path.
> > > 
> > > Will queue up a patch that plug this hole.
> > > 
> > > 
> > > 
> > 
> > OK, Thank you
> > In this case the customer was specifically using sg_write_same. I
> > am
> > not sure if it was part of a script or some other use case.
> > They were of the opinion it was severe enough of an issue to
> > warrant
> > fixing so they logged a case with us.
> > Thanks for looking into this.
> > 
> > Regards
> > Laurence
> Hello QT
> 
> Did I miss an update to this. Was another fix sent.
> We need to deal with this at Red Hat as soon as possible please.
> 
> Regards
> Laurence

QT 
WStillw aiting on you guys and we need to get thjis fixed,
Please send the fix upstream that you were thinking of today.
Regards
Laurence
diff mbox series

Patch

diff --git a/drivers/scsi/qla2xxx/qla_iocb.c
b/drivers/scsi/qla2xxx/qla_iocb.c
index b9b3e6f80ea9..82a5d195e401 100644
--- a/drivers/scsi/qla2xxx/qla_iocb.c
+++ b/drivers/scsi/qla2xxx/qla_iocb.c
@@ -1381,7 +1381,8 @@  qla24xx_build_scsi_crc_2_iocbs(srb_t *sp, struct
cmd_type_crc_2 *cmd_pkt,
 	if ((scsi_get_prot_op(cmd) == SCSI_PROT_READ_INSERT) ||
 	    (scsi_get_prot_op(cmd) == SCSI_PROT_WRITE_STRIP) ||
 	    (scsi_get_prot_op(cmd) == SCSI_PROT_READ_STRIP) ||
-	    (scsi_get_prot_op(cmd) == SCSI_PROT_WRITE_INSERT))
+	    (scsi_get_prot_op(cmd) == SCSI_PROT_WRITE_INSERT) ||
+	    (scsi_get_prot_op(cmd) == SCSI_PROT_NORMAL))
 		bundling = 0;