trim: add support for multiple ranges NVMe specification allow multiple ranges for the dataset management commands. Currently the block ioctl only allows a single range for trim, however multiple ranges can be specified using nvme character device. Add an option num_range to send multiple range per trim request, which only works if the data direction is solely trim i.e. trim or randtrim. Add FIO_MULTI_RANGE_TRIM as the ioengine flag, to restrict the usage of this new option. For multi range trim request this modifies the way IO buffers are used. The buffer length will depend on number of trim ranges and the actual buffer will contains start and length of each range entry. This increases fio server version (FIO_SERVER_VER) to 103. Signed-off-by: Ankit Kumar <ankit.kumar@samsung.com> Link: https://lore.kernel.org/r/20240215151812.138370-2-ankit.kumar@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
stat: log out both average and max over the window Add option log_window_value alias of log_max_value which reports average, max or both the values. Retain backward compatibility by allowing =0 and =1 values to specify avg and max values respectively. There is no change to existing log formats while reporting only average or max values. Signed-off-by: Ankit Kumar <ankit.kumar@samsung.com> Link: https://lore.kernel.org/r/20240125110124.55137-2-ankit.kumar@samsung.com Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
fio: Introduce new constant thinkcycles option The thinkcycles parameter allows to set a number of cycles to spin between requests to model real-world applications more realistically The thinktime parameter family can be used to model an application processing the data to be able to model real-world applications more closely. Unfortunately this is currently set per constant time and therefore is affected by CPU frequency settings or task migration to a CPU with different capacity. The new thinkcycles parameter closes that gap and allows specifying a constant number of cycles instead, such that CPU capacity is taken into account. Signed-off-by: Christian Loehle <christian.loehle@arm.com>
Make log_unix_epoch an official alias of log_alternate_epoch log_alternate_epoch was introduced along with log_alternate_epoch_clock_id, and generalized the idea of log_unix_epoch. Both options had the same effect. So we make log_unix_epoch an official alias of log_alternate_epoch, instead of maintaining both redundant options. Signed-off-by: Nick Neumann nick@pcpartpicker.com
Record job start time to fix time pain points Add a new key in the json per-job output, job_start, that records the job start time obtained via a call to clock_gettime using the clock_id specified by the new job_start_clock_id option. This allows times of fio jobs and log entries to be compared/ordered against each other and against other system events recorded against the same clock_id. Add a note to the documentation for group_reporting about how there are several per-job values for which only the first job's value is recorded in the json output format when group_reporting is enabled. Fixes #1544 Signed-off-by: Nick Neumann nick@pcpartpicker.com
Add basic error checking to parsing nr from rw=randrw:<nr>, etc Previously this was parsed by just doing atoi(). This returns 0 or has undefined behavior in error cases. Silently getting a 0 for nr is not great. In fact, 0 (or less) should likely not be allowed for nr; while the code handles it, the effective result is that the randomness is gone - all I/O becomes sequential. It makes sense to prohibit 0 as an nr value in the random case. We leverage str_to_decimal to do our parsing instead of atoi. It isn't perfect, but it is a lot more resilient than atoi, and used in other similar places. We can then return an error when parsing fails, and also return an error when the parsed numeric value is outside of the ranges that can be stored in the unsigned int used for nr, along with when nr is 0. Fixes #1622 Signed-off-by: Nick Neumann nick@pcpartpicker.com
Revert "correctly free thread_data options at the topmost parent process" This reverts commit 913028e97ceedcf2cf1ec6ec32228b3c50e7337c. This commit is causing the static analyzers to freak out, and also crashes on Windows. Revert it for now. Signed-off-by: Jens Axboe <axboe@kernel.dk>
correctly free thread_data options at the topmost parent process for non-threaded mode: since thread_data::eo is a pointer within shared memory between the topmost fio parent process and its children let the fio parent process set the pointer to NULL as just it frees its copy of 'eo' as memory previously allocated by means of 'malloc' meaning that each child and the parent process itself must free it for threaded mode we leave it as it has always been also we do not need to check td->io_ops for being able to free td->eo in fio_options_free() Signed-off-by: Denis Pronin <dannftk@yandex.ru>
cmdprio: Add support for per I/O priority hint Introduce the new option cmdprio_hint to allow specifying I/O priority hints per IO with the io_uring and libaio IO engines. A third acceptable format for the cmdprio_bssplit option is also introduced to allow specifying an I/O hint in addition to a priority class and level. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com> Link: https://lore.kernel.org/r/20230721110510.44772-6-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
options: add priohint option Introduce the new option priohint to allow users to specify an I/O priority hint applying to all IOs issued by a job. This increases fio server version (FIO_SERVER_VER) to 101. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com> Link: https://lore.kernel.org/r/20230721110510.44772-5-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
os-linux: add initial support for IO priority hints Add initial support for Linux to allow specifying a hint for any priority value. With this change, a priority value becomes the combination of a priority class, a priority level and a hint. The generic os.h ioprio manipulation macros, as well as the os-dragonfly.h ioprio manipulation macros are modified to ignore this hint. For all other OSes that do not support priority classes, priotity hints are ignored and always equal to 0. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com> Link: https://lore.kernel.org/r/20230721110510.44772-4-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
zbd: rename 'open zones' to 'write zones' Current fio code for zonemode=zbd uses the word 'open zone' to mean the zones that fio jobs write to. Before fio starts writing to a zone, it calls zbd_open_zone(). When fio completes writing to a zone, it calls zbd_close_zone(). This wording is good for zoned block devices with max_open_zones limit, such as ZBC and ZAC devices. The devices use same word 'open' to express the zone condition that the devices assign resources for data write to zones. However, the word 'open' gets confusing to support zoned block devices which has max_active_zones limit, such as ZNS devices. These devices have both 'open' and 'active' keywords to mean two different kinds of resources on the device. This 'active' status does not fit with the 'open zone' wording in the fio code. Also, the word 'open' zone in fio code does not always match with the 'open' condition of zones on the device (e.g. when --ignore_zone_limits option is specified). To avoid the confusion, stop using the word 'open zone' in the fio code. Instead, use the word 'write zone' to mean that the zone is the write target. When fio starts a write to a zone, it adds the zone to write_zones array. When fio completes writing to a zone, it removes the zone from the write_zones array. For this purpose, rename struct fields, functions and a macro: ZBD_MAX_OPEN_ZONES -> ZBD_MAX_WRITE_ZONES struct fio_zone_info open -> write struct thread_data num_open_zones -> num_write_zones struct zoned_block_device_info: max_open_zones -> max_write_zones num_open_zones -> num_write_zones open_zones[] -> write_zones[] zbd_open_zone() -> zbd_write_zone_get() zbd_close_zone() -> zbd_write_zone_put() zbd_convert_to_open_zone() -> zbd_convert_to_write_zone() To match up these changes, rename local variables and goto labels. Also rephrase code comments. Of note is that this rename is only for the fio code. The fio options max_open_zones and job_max_open_zones are not renamed to not confuse users. Suggested-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
init: clean up random seed options - make allrandrepeat a synonym of randrepeat. allrandrepeat is superfluous because the seeds set by randrepeat already encompass random number generators beyond the one used for random offsets. - allow randseed to override [all]randrepeat: this is what the documentation implies but was not previously the case This is a breaking change for users relying on the values of fio's default random seeds. Link: https://github.com/axboe/fio/pull/1546 Fixes: https://github.com/axboe/fio/issues/1502 Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
fio: add support for POSIX_FADV_NOREUSE As of Linux kernel commit 17e810229cb3 ("mm: support POSIX_FADV_NOREUSE"), POSIX_FADV_NOREUSE hints at the LRU algorithm to ignore accesses to mapped files with this flag. Previously, it was a no-op. Add it in fio as an fadvise_hint option to test the new behavior. Signed-off-by: Yuanchu Xie <yuanchu@google.com> Link: https://lore.kernel.org/r/20230331183703.3145788-1-yuanchu@google.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
fio: steadystate: allow for custom check interval Allow for a different steady state check interval than 1s with a new --ss_interval parameter. Steady state is reached when the steady state condition (like slope) is true when comparing the last windows (set with --ss_dur). The actual values for this comparison is currently calculated for a 1s interval during the window. This is especially problematic for slow random devices, where the values do not converge for such a fine granularity. Letting the user set this solves this problem, although requires them figuring out an appropriate value themselves. --ss=iops:5% --ss_dur=120s should reproduce this for many (slower) devices. Then adding like --ss_interval=20s may let it converge. Signed-off-by: Christian Loehle <cloehle@posteo.de>
fio: add fdp support for io_uring_cmd nvme engine Add support for NVMe TP4146 Flexible Data Placemen, allowing placement identifiers in write commands. The user can enabled this with the new "fdp=1" parameter for fio's io_uring_cmd ioengine. By default, the fio jobs will cycle through all the namespace's available placement identifiers for write commands. The user can limit which placement identifiers can be used with additional parameter, "fdp_pli=<list,>", which can be used to separate write intensive jobs from less intensive ones. Setting up your namespace for FDP is outside the scope of 'fio', so this assumes the namespace is already properly configured for the mode. Link: https://lore.kernel.org/fio/CAKi7+wfX-eaUD5pky5cJ824uCzsQ4sPYMZdp3AuCUZOA1TQrYw@mail.gmail.com/T/#m056018eb07229bed00d4e589f9760b2a2aa009fc Based-on-a-patch-by: Ankit Kumar <ankit.kumar@samsung.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> [Vincent: fold in sfree fix from Ankit] Signed-off-by: Vincent Fu <vincent.fu@samsung.com>