trim: add support for multiple ranges NVMe specification allow multiple ranges for the dataset management commands. Currently the block ioctl only allows a single range for trim, however multiple ranges can be specified using nvme character device. Add an option num_range to send multiple range per trim request, which only works if the data direction is solely trim i.e. trim or randtrim. Add FIO_MULTI_RANGE_TRIM as the ioengine flag, to restrict the usage of this new option. For multi range trim request this modifies the way IO buffers are used. The buffer length will depend on number of trim ranges and the actual buffer will contains start and length of each range entry. This increases fio server version (FIO_SERVER_VER) to 103. Signed-off-by: Ankit Kumar <ankit.kumar@samsung.com> Link: https://lore.kernel.org/r/20240215151812.138370-2-ankit.kumar@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
verify: fix loops option behavior of read-verify workloads The commit 191d6634e8a6 ("verify: fix bytes_done accounting of experimental verify") introduced td->bytes_verified to separate the verified bytes from the read bytes in td->bytes_done[]. This fixed the issue of experimental verify feature. However, it caused another issue. When the verify workload does only read and does not do write, the read bytes in td->bytes_done[] is no longer updated and always zero. This zero value is returned from do_io() to thread_main() in the bytes_done array. If the read bytes is zero, thread_main() marks the job to terminate and it makes the loops option ignored. For example, the job below should do 8k read, but it does only 4k read. [global] filename=/tmp/fio.test size=4k verify=md5 [write] rw=write do_verify=0 [read] stonewall=1 rw=read loops=2 do_verify=1 To make the loops option work together with the read-verify workloads, modify io_u_update_bytes_done(). After updating td->bytes_verified, check if the workload does not write. If so, do not return from io_u_update_bytes_done() and update td->bytes_done[] for DDIR_READ in the following code. Fixes: 191d6634e8a6 ("verify: fix bytes_done accounting of experimental verify") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Link: https://lore.kernel.org/r/20240214122008.4123286-2-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
fix assert failed when timeout during call rate_ddir. Adding DDIR_TIMEOUT in enum fio_ddir, and rate_ddir returns it when fio timeouts. set_io_u_file will directly break out of the loop, and fill_io_u won't be called, which causes assert to fail in rate_ddir, because td->rwmix_ddir is DDIR_INVAL. Signed-off-by: QingSong Zhu zhuqingsong.0909@bytedance.com
zbd: print max_active_zones limit error message When zoned block devices have max_active_zones limit and when write operations exceed that limit, Linux block sub-system reports EOVERFLOW. However, the strerror() string for EOVERFLOW does not mention about max_active_zones then it confuses users. To avoid the confusion, print additional error message to indicate the max_active_zones limit. For this purpose, add a hook function zbd_log_err() and call it from __io_u_log_error(). Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com> Link: https://lore.kernel.org/r/20230719105756.553146-4-shinichiro.kawasaki@wdc.com Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
zbd: fix write zone accounting of trim workload The commit e3be810bf0fd ("zbd: Support zone reset by trim") supported trim for zonemode=zbd by introducing the function zbd_do_io_u_trim(), which calls zbd_reset_zone(). However, it did not call zbd_write_zone_put() to the trim target zone, then trim operation resulted in wrong accounting of write zones. To fix the issue, call zbd_write_zone_put() from zbd_reset_zone(). To cover the case to reset zones without a zbd_write_zone_put() call, prepare another function __zbd_reset_zone(). While at it, simplify zbd_reset_zones() by calling the modified zbd_reset_zone(). Of note is that the constifier of the argument td of do_io_u_trim() is removed since zbd_write_zone_put() requires changes in that argument. Fixes: e3be810bf0fd ("zbd: Support zone reset by trim") Suggested-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
fio: add fdp support for io_uring_cmd nvme engine Add support for NVMe TP4146 Flexible Data Placemen, allowing placement identifiers in write commands. The user can enabled this with the new "fdp=1" parameter for fio's io_uring_cmd ioengine. By default, the fio jobs will cycle through all the namespace's available placement identifiers for write commands. The user can limit which placement identifiers can be used with additional parameter, "fdp_pli=<list,>", which can be used to separate write intensive jobs from less intensive ones. Setting up your namespace for FDP is outside the scope of 'fio', so this assumes the namespace is already properly configured for the mode. Link: https://lore.kernel.org/fio/CAKi7+wfX-eaUD5pky5cJ824uCzsQ4sPYMZdp3AuCUZOA1TQrYw@mail.gmail.com/T/#m056018eb07229bed00d4e589f9760b2a2aa009fc Based-on-a-patch-by: Ankit Kumar <ankit.kumar@samsung.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> [Vincent: fold in sfree fix from Ankit] Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Fix "verify bad_hdr rand_seed" for requeued I/Os On configurations that can cause I/Os to be internally requeued from FIO_Q_BUSY such as '--iodepth_batch_complete_max', and the workload has verify enabled, the subsequent verification of the data fails with a bad verify rand_seed because the pattern for the I/O is generated twice for the same I/O, causing the seed to become out of sync when the verify is later performed. The seed is generate twice because do_io() handles the I/O twice, first when it originates the I/O and again when it later gets the same I/O back from get_io_u() after it's is pulled from the requeue list, which is where the first submission landed due to the workload reaching '--iodepth_batch_complete_max'. The fix is for do_io() to track when it has generated the verify pattern for an I/O via a new io_u flag 'IO_U_F_PATTERN_DONE', avoiding a second call to populate_verify_io_u() when that flag is detected. Link: https://github.com/axboe/fio/issues/1526 Signed-off-by: Adam Horshack (horshack@live.com)
Improve IOPs 50% by avoiding clock sampling when rate options not used Profiling revealed thread_main() is spending 50% of its time in calls to utime_since_now() from rate_ddir(). This call is only necessary if the user specified a rate option for the job. A conditional was added to avoid the call if !should_check_rate(). See this link for details and profiling data: https://github.com/axboe/fio/issues/1501#issuecomment-1418327049 Signed-off-by: Adam Horshack (horshack@live.com)
verify: fix bytes_done accounting of experimental verify The commit 55312f9f5572 ("Add ->bytes_done[] to struct thread_data") moved bytes_done[] on stack to struct thread_data. However, this unified two bytes_done[] in do_io() and do_verify() stacks into single td->bytes_done[]. This caused wrong condition check in do_verify() in experimental verify path since td->bytes_done[] holds values for do_io() not for do_verify(). This caused unexpected loop break in do_verify() and verify read skip when experimental_verify=1 option is specified. To fix this, add bytes_verified to struct thread_data for do_verify() in same manner as bytes_done[] for do_io(). Introduce a helper function io_u_update_bytes_done() to factor out same code for bytes_done[] and bytes_verified[]. Fixes: 55312f9f5572 ("Add ->bytes_done[] to struct thread_data") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
randtrimwrite: fix corner case with variable block sizes When we have variable block sizes it's possible to finish a trim + write pair and then have the next (smaller length) trim operation have a different start offset but the same end offset as the previous pair of trim and write operations. This would fool fio into believing that it had already completed a trim + write pair when it actually completed only the trim. Resolve this problem by comparing start offsets instead of end offsets. Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
randtrimwrite: fix offsets for corner case For randtrimwrite, we decide whether to issue a trim or a write based on whether the end offsets for the most recent trim and write commands match. If they don't match that means we just issued a new trim and the next operation should be a write. If they *do* match that means we just completed a trim + write pair and the next command should be a trim. This works fine for sequential workloads but for random workloads it's possible to complete a trim + write pair and then have the randomly generated offset for the next trim command match the previous offset. If that happens we need to alter the offset for the last write operation in order to ensure that we issue a write operation the next time through. It feels dirty to change the meaning of last_pos[DDIR_WRITE] in this way but hopefully the long comment in the code will be sufficient warning. Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
randtrimwrite: write at same offset as trim We need to do a little bit more to make sure that the writes land on the offsets that were trimmed. We only have a single random seed for offsets. So we need to just use the offset from trim commands when issuing writes. When we have variable block sizes we need to make sure that the trim and write commands are the same size. When randommap is enabled we have to let it adjust the block size for trim commands to make sure that the trim command does not touch any blocks that have already been touched. For sizes of write commands just use the size of the trim command. Fixes: c16dc793a3c45780f67ce65244b6e91323dee014 "Add randtrimwrite data direction" Signed-off-by: Vincent Fu <vincent.fu@samsung.com>