scsi: scsi_debug: Fix scp is NULL errors
authorDouglas Gilbert <dgilbert@interlog.com>
Thu, 13 Aug 2020 15:57:38 +0000 (11:57 -0400)
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Thu, 3 Sep 2020 09:29:35 +0000 (11:29 +0200)
commit018360efe867a9e3282009c5f55ab21907443e92
tree93c65958af391b3801d383cd76c67b0891d02833
parent56f13789a05ad60a584c99efa0075eae30a81c77
scsi: scsi_debug: Fix scp is NULL errors

[ Upstream commit 223f91b48079227f914657f07d2d686f7b60aa26 ]

John Garry reported 'sdebug_q_cmd_complete: scp is NULL' failures that were
mainly seen on aarch64 machines (e.g. RPi 4 with four A72 CPUs). The
problem was tracked down to a missing critical section on a "short circuit"
path. Namely, the time to process the current command so far has already
exceeded the requested command duration (i.e. the number of nanoseconds in
the ndelay parameter).

The random=1 parameter setting was pivotal in finding this error.  The
failure scenario involved first taking that "short circuit" path (due to a
very short command duration) and then taking the more likely
hrtimer_start() path (due to a longer command duration). With random=1 each
command's duration is taken from the uniformly distributed [0..ndelay)
interval.  The fio utility also helped by reliably generating the error
scenario at about once per minute on a RPi 4 (64 bit OS).

Link: https://lore.kernel.org/r/20200813155738.109298-1-dgilbert@interlog.com
Reported-by: John Garry <john.garry@huawei.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
drivers/scsi/scsi_debug.c