path: root/io_u.c
AgeCommit message (Collapse)Author
2018-10-24target: fixeslatency-probeJens Axboe
1) Use parent for should_account(), if we have a parent 2) Only sum step stats if src has them, to prevent overwriting destination stats in the parent. 3) Pretty up the normal output a bit Signed-off-by: Jens Axboe <>
2018-10-24Add support for latency probing over an interval of loadJens Axboe
Provide a way to easily run a latency probe on the device. You define a job with peak parameters, and then probe settings for generating iops/latency numbers based on that workload. The latter looks something like this: iodepth_mode=stepped:10-130/10,5/10 which has the format of: low_percentage-high_percentage/step,ramp_time/run_time The above would probe from 10% of peak performance to 130%, in steps of 10%. For each step, it would run a 5 second ramp, then do 10 seconds of testing. For percentages <= 100%, fio will limit the IOPS. For percentages above, it'll ramp up the queue depth. For each section run, it'll look the avg completion latency associated with that queue depth / iops setting. Has normal output (which sucks), and json output. Still experimenting, not final form yet. Signed-off-by: Jens Axboe <>
2018-10-24io_u: move trim error notification out-of-lineJens Axboe
Also kill dead code, we never touch BLOCK_STATE_WRITTEN as it was guarded by a check for ddir == DDIR_TRIM. The latter should probably be double checked... Signed-off-by: Jens Axboe <>
2018-08-25Make td_io_u_lock/unlock() explicitJens Axboe
Hopefully this will make coverity a little happier, it currently thinks they are unbalanced. Signed-off-by: Jens Axboe <>
2018-08-24Add support for zoned block devicesBart Van Assche
This patch adds support for zoned block devices as follows: - After the file size has been determined, check whether the file name refers to a block device. If so, check whether that block device is a host-managed block device. If that is the case, read the zone information using the BLKREPORTZONE ioctl. That ioctl is supported by the Linux kernel since kernel version v4.10. - After all command-line options have been processed and all job files have been read, verify whether these refer to a zoned block device and also whether the specified options are compatible with a zoned block device. Complain if that is not the case. - After each get_next_block() call, verify whether the block is appropriate for a zoned block device. When writing data to a sequential zone, adjust the write offset to the zone write pointer. When reading from a sequential zone, avoid reading past the write pointer. - After I/O submission, update the variable that represents the write pointer. - When writing data and with data verification enabled, reset a zone before writing any data into a zone. Otherwise reset a zone before issuing a write if that zone is full. - Translate trim into zone resets. Zoned block devices namely do not have to support any of the SCSI commands that are used by the kernel to implement the discard ioctl (UNMAP / WRITE SAME). This work started from a patch from Masato Suzuki <>. Some of the ideas in this patch come from Phillip Chen <>. Signed-off-by: Bart Van Assche <> Signed-off-by: Jens Axboe <>
2018-08-24Add two assert statements in mark_random_map()Bart Van Assche
Add two assert statements that verify whether mark_random_map() is used correctly. Signed-off-by: Bart Van Assche <> Signed-off-by: Jens Axboe <>
2018-08-24Pass offset and buffer length explicitly to mark_random_map()Bart Van Assche
This patch does not change any functionality. The changes introduced by this patch will be used by the zoned block device code. Signed-off-by: Bart Van Assche <> Signed-off-by: Jens Axboe <>
2018-08-24Introduce the io_u.post_submit callback function pointerBart Van Assche
This patch does not change any functionality. The code introduced by this patch will be used by the zoned block device code. Signed-off-by: Bart Van Assche <> Signed-off-by: Jens Axboe <>
2018-08-24Add the zonemode job optionBart Van Assche
Fio's zone support makes fio perform I/O inside a zone before it skips to the next zone. That behavior is the opposite of the behavior needed for zoned block devices, namely to consider all zones when performing random I/O. Hence introduce a new job option that allows users to choose between fio's traditional zone mode and the behavior needed for zoned block devices. This patch makes fio behave identically with --zonemode=none and --zonemode=zbd. A later patch will implement new behavior for --zonemode=zbd. Signed-off-by: Bart Van Assche <> Signed-off-by: Jens Axboe <>
2018-07-23Add support for >= 4G block sizesJeff Furlong
For trims, it's useful to be able to support larger than (or equal to) 4GB. This extends the block sizes support for that. Change from Jeff, various little fixups from me. Signed-off-by: Jens Axboe <>
2018-07-10io_u: ensure we generate the full length of block sizesJens Axboe
Since we round down, we can miss the last entry. This ensures that if we do: bsrange=4k-16k we actually get an even split of 4,8,12,16k ios. Signed-off-by: Jens Axboe <>
2018-07-10io_u: fix negative offset due to wrapJens Axboe
If we do wrap, the math is off and we end up wrapping a 64-bit value. Instead reset to the initial offset. Reported-by: Bart Van Assche <> Fixes: 4c8be5b1569f ("Fix bug with zone and zone skipping and io_limit") Fixes: 224b3093cc21 ("Fix zoning issue with seq-io and randommap issue") Signed-off-by: Jens Axboe <>
2018-06-12rand: cleanup rand_between() and helpersJens Axboe
Make the 32/64-bit helper just return a random number up to a certain value, and let the generic helper handle the range part. Signed-off-by: Jens Axboe <>
2018-05-31io_u: ensure to invalidate cache on time_based random readsJens Axboe
We need to do this with the file reset and retrieval of a new offset, not of it fails. Fixes: 0bcf41cdc22df ("io_u: re-invalidate cache when looping around without file open/close") Signed-off-by: Jens Axboe <>
2018-04-18Change return type of td_io_commit() into voidBart Van Assche
Since td_io_commit() always returns 0, change its return type from int into void. This patch does not change any functionality. Signed-off-by: Bart Van Assche <>
2018-04-17Deprecate verifysort and verifysort_nrJens Axboe
It was an optimization to read back verifies in a sorted order, for rotational storage. But I don't think the option makes much sense, and I've never heard of anyone using it. Mark it as deprecated, and always verify in the same order that IO was written. Signed-off-by: Jens Axboe <>
2018-04-04Only populate the write buffer if necessaryBart Van Assche
This patch moves the populate_verify_io_u() call from inside get_io_u() into all its callers except do_dry_run() and thereby skips write buffer population for dry runs. This patch does not change the behavior of fio but is necessary because the ZBC patch will insert code in fill_io_u() after the get_io_u() call that may modify the write offset. Signed-off-by: Bart Van Assche <>
2018-04-04Rename TD_F_VER_NONE into TD_F_DO_VERIFYBart Van Assche
Rename TD_F_VER_NONE into TD_F_DO_VERIFY to make it clear that this flag means that data verification has to be performed. See also commit d72be5454c8c ("Cache layout improvements"). Signed-off-by: Bart Van Assche <>
2018-03-21Refactor #includes and headersSitsofe Wheeler
- Try and remove unneeded #include lines - Try and add #include lines that would allow the files to be built in a more standalone manner Signed-off-by: Sitsofe Wheeler <>
2018-03-16Signal td->free_cond with the associated mutex heldBart Van Assche
Calling pthread_cond_signal() or pthread_cond_broadcast() without holding the associated mutex can lead to missed wakeups. Hence ensure that td->io_u_lock is held around pthread_cond_signal(&td->free_cond) calls. A quote from the POSIX spec ( "The pthread_cond_broadcast() or pthread_cond_signal() functions may be called by a thread whether or not it currently owns the mutex that threads calling pthread_cond_wait() or pthread_cond_timedwait() have associated with the condition variable during their waits; however, if predictable scheduling behavior is required, then that mutex shall be locked by the thread calling pthread_cond_broadcast() or pthread_cond_signal()." Signed-off-by: Bart Van Assche <>
2018-03-16Make sure that assert() expressions do not have side effectsBart Van Assche
Assert statements are compiled out if NDEBUG is defined. Hence make sure that the expressions passed to assert do not have side effects. Signed-off-by: Bart Van Assche <>
2018-03-13io_u: only rewind file position if it's non-zeroJens Axboe
This: bs=8k rw=read:-4k is supposed to read 0..8k, then add 8k and subtract 4k, ending up with a next read at 4k..12k and so forth. But we rewind too quickly, and the first IO fails as being out-of-bounds. Fixes: c22825bb537af ("Fix backwards reads with --size smaller than the file size") Signed-off-by: Jens Axboe <>
2018-03-09io_u: 'is_random' can be a booleanJens Axboe
Signed-off-by: Jens Axboe <>
2018-03-09io_u: kill get_next_{offset,buflen} wrappersJens Axboe
After the previous commit, they are just wrappers. Kill them. Signed-off-by: Jens Axboe <>
2018-03-09Remove prof_io_ops.fill_io_u_off(), .fill_io_u_size() and .get_next_file()Bart Van Assche
Remove the struct prof_io_ops methods for which no implementations have been defined. This patch does not change any functionality. Signed-off-by: Bart Van Assche <> Signed-off-by: Jens Axboe <>
2018-03-01Fix overflow of counters incremented on each I/O operationAlexander Larin
- In the thread_stat struct: uint32_t counters are updated to uint64_t. - In the io_u_plat_entry struct: unsigned int counters are updated to uint64_t. It fixes overflow of these counters. Signed-off-by: Alexander Larin <>
2018-02-12io_u: convert zoned bug warning to fio_did_warn()Jens Axboe
Signed-off-by: Jens Axboe <>
2018-01-24Switch last_was_sync and terminate to bool and pack betterJens Axboe
Signed-off-by: Jens Axboe <>
2018-01-24Add suppor for logging fsync (and friends) latenciesJens Axboe
Signed-off-by: Jens Axboe <>
2018-01-12Merge branch 'fio-issue-450' of Axboe
2017-12-29debug: make debug=io readable with multiple threadsRobert Elliott
When multiple threads are active, debug=io prints are unreadable as the output is interleaved. Multiple log_info_buf() calls are used to construct each line, but there is no mutual exclusion covering multiple calls. Change the dprint call tree to construct the entire line before passing it to log_info_buf(), rather than make several calls. Other nits: * print the thread ID rather than the process ID * change offset and length from decimal to hex * separate offset, length, ddir, and file with , rather than / since the filename on the same line likely has / of its own * change "fill_io_u" to "fill" to match the others * change "io complete" to "complete" to match the others * change "->prep()=%d" to "prep: io_u %p: ret=%d" to resemble the others * change offset/buflen in an error print to better resemble the normal prints * add "file=" prefix for the filename * check the calloc() return values inside the valist_to_buf functions Old: fill_io_u: io_u 0x7feeac010b80: off=720896/len=65536/ddir=1 io 50692io 50692//dev/dax0.0io 50692fill_io_u: io_u 0x7fee98010b80: off=196608/len=65536/ddir=1io 50692io 50692 io 50692->prep(0x7fef10010b80)=0 //dev/dax0.0//dev/dax1.0io 50692io 50692io 50692io 50692->prep(0x7feeec010b80)=0 io 50692prep: io_u 0x7feec4010b80: off=1966080/len=65536/ddir=1io 50692io complete: io_u 0x7feedc010b80: off=393216/len=65536/ddir=1io 50692io 50692prep: io_u 0x7feef4010b80: off=720896/len=65536/ddir=1io 50692io 50692 io 50692io 50692prep: io_u 0x7feef0010b80: off=851968/len=65536/ddir=1//dev/dax0.0io 50692//dev/dax0.0 //dev/dax0.0 io 50692io 50692//dev/dax0.0//dev/dax0.0io 50692 New: io 71400 queue: io_u 0x7fd0f0010b80: off=0x2f0000,len=0x10000,ddir=1,file=/dev/dax0.0 io 71395 fill: io_u 0x7fd0fc010b80: off=0x80000,len=0x10000,ddir=1,file=/dev/dax0.0 io 71395 prep: io_u 0x7fd0fc010b80: off=0x80000,len=0x10000,ddir=1,file=/dev/dax0.0 io 71395 prep: io_u 0x7fd0fc010b80: ret=0 io 71395 queue: io_u 0x7fd0fc010b80: off=0x80000,len=0x10000,ddir=1,file=/dev/dax0.0 io 71430 complete: io_u 0x7fd05c010b80: off=0x180000,len=0x10000,ddir=1,file=/dev/dax1.0 io 71400 complete: io_u 0x7fd0f0010b80: off=0x2f0000,len=0x10000,ddir=1,file=/dev/dax0.0 io 71430 fill: io_u 0x7fd05c010b80: off=0x190000,len=0x10000,ddir=1,file=/dev/dax1.0 io 71400 fill: io_u 0x7fd0f0010b80: off=0x300000,len=0x10000,ddir=1,file=/dev/dax0.0 io 71400 prep: io_u 0x7fd0f0010b80: off=0x300000,len=0x10000,ddir=1,file=/dev/dax0.0 io 71400 prep: io_u 0x7fd0f0010b80: ret=0 io 71400 queue: io_u 0x7fd0f0010b80: off=0x300000,len=0x10000,ddir=1,file=/dev/dax0.0 io 71430 prep: io_u 0x7fd05c010b80: off=0x190000,len=0x10000,ddir=1,file=/dev/dax1.0 io 71395 complete: io_u 0x7fd0fc010b80: off=0x80000,len=0x10000,ddir=1,file=/dev/dax0.0 io 71430 prep: io_u 0x7fd05c010b80: ret=0 io 71430 queue: io_u 0x7fd05c010b80: off=0x190000,len=0x10000,ddir=1,file=/dev/dax1.0 io 71421 complete: io_u 0x7fd090010b80: off=0x320000,len=0x10000,ddir=1,file=/dev/dax0.0 io 71419 complete: io_u 0x7fd098010b80: off=0x320000,len=0x10000,ddir=1,file=/dev/dax0.0 io 71404 complete: io_u 0x7fd0d0010b80: off=0x0,len=0x10000,ddir=1,file=/dev/dax0.0 Signed-off-by: Jens Axboe <>
2017-12-06io_u: rate cleanup and spelling errorJens Axboe
Fixes: 50a8ce86 ("Implement new Rate Control") Signed-off-by: Jens Axboe <>
2017-12-06Add option to ignore thinktime for rated IOJens Axboe
By default, fio will ignore thinktime when calculating the next time to issue and IO, if rated IO is specified. This leads to fio entering a catch-up type of mode after doing the specified sleep. For some workloads, that may not be useful. If someone asks for a specific amount of IOPS and sets a thinktime, they may want to exclude the sleep time. Fixes: Signed-off-by: Jens Axboe <>
2017-11-30io_u: don't account io issue blocks for verify backlogJens Axboe
We don't account the bytes, we should not account the blocks either. Fixes: ae2fafc8 ("verify: verify bytes should not add to this_io_bytes") Signed-off-by: Jens Axboe <>
2017-11-30io_u: speed up small_content_scramble()Jens Axboe
This is a hot path for write workloads, since we don't want to send the same buffers to the device again and again. The idea is to defeat basic dedupe/compression, but slightly modifying the buffer for each write. small_content_scramble() does this by filling in the io_u offset into a random spot in each 512b chunk of an io buffer, and filling in the start time (sec,nsec) at the end of each 512b chunk. With this change, we still do those two things, but we generate a random cacheline within each 512b chunk, and fill the offset at the beginning of the cacheline, and the time at the end of it. This means that instead of potentially dirtying 2 cachelines for each 512b chunk in an IO buffer, we dirty just 1. The results should still be random enough that small_content_scramble() fullfils the promise to defeat basic dedupe and compression, but it is lighter to run. Signed-off-by: Jens Axboe <>
2017-11-30io_u: cleanup check_get_trim()Jens Axboe
Signed-off-by: Jens Axboe <>
2017-11-30io_u: tweak small content buffer scrambleJens Axboe
We currently generate a 'random' offset in a 512b chunk to fill in the offset. Since we don't want the later time scramble to overwrite it, we check and adjust for that. Instead just ensure that we generate a random offset in the first half of the 512b chunk, then we know we never overlap. Signed-off-by: Jens Axboe <>
2017-11-30io_u: use nsec value for buffer scrambleJens Axboe
Just use the nanosecond value directly, it's pointless to shift it down and lose 10 bits with of scrambling data. Fixes: d5d3795c ("io_u: don't do expensive int divide for buffer scramble") Signed-off-by: Jens Axboe <>
2017-11-29Change latency targets to be in nsec values internallyJens Axboe
Since all of our timekeeping is in nsec now, it's easier to convert these at init time and not have to do it at runtime. Signed-off-by: Jens Axboe <>
2017-11-29io_u: do nsec -> usec converison in one spot in account_io_completion()Jens Axboe
Should not matter for runtime, but it's cleaner. What we should really do is convert the internal values to nsec, so we don't have to do this conversion. Signed-off-by: Jens Axboe <>
2017-11-29io_u: don't do expensive int divide for buffer scrambleJens Axboe
We don't need the conversion from nsec to usec to be exact, so just shift by 10 instead. Fixes: 8b6a404cd ("nanosecond: initial commit changing timeval to timespec") Signed-off-by: Jens Axboe <>
2017-11-29io_u: cleanup and simplify __get_next_rand_offset_zoned_abs()Jens Axboe
We can drop various variables, it's easier to read this way too. Signed-off-by: Jens Axboe <>
2017-11-29Add support for absolute random zonesJens Axboe
We currently support random_distribution=zoned, which allows the user to specify a percentage of access to a zoned define as a percentage of the file/device size. This commit adds support for zoned_abs, which works exactly like zoned, except you give the zone size in an absolute value. Signed-off-by: Jens Axboe <>
2017-11-01io_u: reset file to initial offsetJens Axboe
Don't assume that initial offset is 0, we should use the set f->file_offset when resetting. Fixes: 17373ce2f38a ("io_u: wrap to beginning when end-of-file is reached for time_based") Signed-off-by: Jens Axboe <>
2017-11-01io_u: wrap to beginning when end-of-file is reached for time_basedJens Axboe
The logic around using io_size isn't correct. Signed-off-by: Jens Axboe <>
2017-10-26io_u: re-invalidate cache when looping around without file open/closeJens Axboe
If we're doing buffered IO and we end up wrapping around for a time based run, then ensure that we re-invalidate the kernel cache for the file. Reported-by: Paolo Valente <> Signed-off-by: Jens Axboe <>
2017-10-09engines/filecreate: set FIO_NOSTATS flagJens Axboe
Before this change, we bundle the fake IO latencies with the file open latencies. That's not intended. Add a flag for IO engines to tell the core to ignore any IO latencies. Signed-off-by: Jens Axboe <>
2017-10-03io_u: Converting usec from long to uint64_tErwan Velu
'rate_next_io_time' structure and usec_sleep() are manipulating uint64_t. It's more consistent having usec being expressed in the same stdint unit. Signed-off-by: Jens Axboe <>
2017-09-14Fix zoning issue with seq-io and randommap issuegvkovai
The case of zonerange < zonesize scenario was not handled correctly earlier. When zonesize > zonerange, IO must continue in the same zonerange of the size zonesize for seq-io. For random io, zonesize > zonerange leads to sequential io after first zonerange size of io is done when 'norandommap' is not set. In this case, map needs to be reset for every zonerange size of IO on a zone. <seqzoneread.fio> ===== [global] ioengine=libaio direct=1 time_based disk_util=0 continue_on_error=all rate_process=poisson write_iolog=offsetlog [db-dss1] bs=8K filesize=524288M zonesize=9M zonerange=3M zoneskip=1M filename=/dev/sdb rw=read iodepth=1 rate_iops=100 ====== sudo ./fio --runtime 120 --debug=file,io,blktrace --write_iops_log=/tmp/IOPS --write_lat_log=/tmp/LAT --status-interval=10 --output=/tmp/fio.out --output-format=json seqzoneread.fio see issue for more details and plots which describes the issue and fix. fixes #450 .
2017-09-12io_u: fix trimming of mixed block size randommapJens Axboe
If you run something ala: fio --name=test --filename=/dev/sda --rw=randread --size=1M --bs=128k,64k --ioengine=libaio --io_submit_mode=offload where the you have different block sizes for reads and writes, we can get into a situation where we have to trim IO size for a read (128k bs), but we use the general minimum block size, which is 64k That results in a case where we incorrectly reset the IO size to 0, resulting in an invalid IO. Fixes: Signed-off-by: Jens Axboe <>