Age | Commit message (Collapse) | Author |
|
When using per-priority statistics for workloads using multiple
different priority values, the statistics output displays the priority
class and value (level) for each set of statistics. However, this is
done using linux priority values coding, that is, assuming that the
priority level is at most 7 (lower 3-bits). However, this is not always
the case for all OSes. E.g. dragonfly allows IO priorities up to a
value of 10.
Introduce the OS dependent ioprio_class() and ioprio() macros to extract
the fields from an ioprio value according to the OS capabilities. A
generic definition (always returning 0) for these macros in os/os.h is
added and used for all OSes that do not define these macros.
The functions show_ddir_status() and add_ddir_status_json() are modified
to use these new macros to fix per priority statistics output. The
modification includes changes to the loops over the clat_prio array to
reduce indentation levels, making the code a little cleaner.
Fixes: 692dec0cfb4b ("stat: report clat stats on a per priority granularity")
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
|
|
We do this in 4 different spots, put it in a helper.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
No point in duplicating this code, use the helper.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
ZBC and ZAC require that writes to sequential write required zones shall
be aligned to physical block size. However, the t/zbd/test-zbd-support
script uses logical block size as the minimum write size. When SMR
drives have the physical block size larger than the logical block size,
writes with the logical block size causes unaligned write command error.
To fix it, use correct value as the minimum write size. As for zoned
block devices, introduce a helper function min_seq_write_size(), which
checks sysfs attributes and returns the correct size. Refer the
attribute zone_write_granularity when it is available, which provides the
minimum write size regardless of the device type. If the attribute is
not available, refer the attribute physical_block_size for SMR devices,
and the logical_block_size attribute for other devices. As for SG node
device, refer physical block size that zbc_info command reports.
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
|
|
The test script t/zbd/test-zbd-support assumes that the logical block
size is the minimum size unit to write to sequential write required
zones, then it uses a variable named 'logical_block_size' to keep the
minimum size. The assumption is true for ZNS devices but not for ZBC/ZAC
devices. Rename the variable from 'logical_block_size' to
'min_seq_write_size' to not imply the wrong assumption.
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
|
|
https://github.com/horshack-dpreview/fio
* 'For_Each_Td_Private_Scope' of https://github.com/horshack-dpreview/fio:
Refactor for_each_td() to catch inappropriate td ptr reuse
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
https://github.com/horshack-dpreview/fio
* 'Fix_calc_thread_status_ramp_time_check' of https://github.com/horshack-dpreview/fio:
Fix --bandwidth-log segmentation fault when numjobs even multiple of 8
|
|
I recently introduced a bug caused by reusing a struct thread_data *td
after the end of a for_each_td() loop construct.
Link: https://github.com/axboe/fio/pull/1521#issuecomment-1448591102
To prevent others from making this same mistake, this commit refactors
for_each_td() so that both the struct thread_data * and the loop index
variable are placed inside their own scope for the loop. This will cause
any reference to those variables outside the for_each_td() to produce an
undeclared identifier error, provided the outer scope doesn't already
reuse those same variable names for other code within the routine (which
is fine because the scopes are separate).
Because C/C++ doesn't let you declare two different variable types
within the scope of a for() loop initializer, creating a scope for both
struct thread_data * and the loop index required explicitly declaring a
scope with a curly brace. This means for_each_td() includes an opening
curly brace to create the scope, which means all uses of for_each_td()
must now end with an invocation of a new macro named end_for_each()
to emit an ending curly brace to match the scope brace created by
for_each_td():
for_each_td(td) {
while (td->runstate < TD_EXITED)
sleep(1);
} end_for_each();
The alternative is to end every for_each_td() construct with an inline
curly brace, which is off-putting since the implementation of an extra
opening curly brace is abstracted in for_each_td():
for_each_td(td) {
while (td->runstate < TD_EXITED)
sleep(1);
}}
Most fio logic only declares "struct thread_data *td" and "int i" for use in
for_each_td(), which means those declarations will now cause -Wunused-variable
warnings since they're not used outside the scope of the refactored
for_each_td(). Those declarations have been removed.
Implementing this change caught a latent bug in eta.c::calc_thread_status()
that accesses the ending value of struct thread_data *td after the end
of for_each_td(), now manifesting as a compile error, so working as
designed :)
Signed-off-by: Adam Horshack (horshack@live.com)
|
|
Segmentation fault occurs when aggregate bandwidth logging is enabled
(--bandwidth-log) and numjobs is an even multiple of 8. Fault occurs
because logic is using the terminating value of struct thread_data *td
from the most recent for_each_td(). This bug was caught by the
refactoring of for_each_td().
Link: https://github.com/axboe/fio/issues/1534
Signed-off-by: Adam Horshack (horshack@live.com)
|
|
* 'fiologparser-fix' of https://github.com/patrakov/fio:
fix fiologparser.py to work with new logging format
|
|
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
|
|
On Windows fio.h includes some definitions needed by file.h.
fio.h actually includes file.h already but we can retain the file.h
include in fdp.c since it refers to some declarations that were added
there.
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
|
|
https://github.com/horshack-dpreview/fio
* 'doc-Clarify_Runtime_Param' of https://github.com/horshack-dpreview/fio:
Clarify documentation for runtime parameter
|
|
I realize this is highly subjective but I think the description of the
runtime parameter could be made a bit more precise. I misinterpreted its
meaning after reading the doc and only learned of my mistake by trial and
error using fio. Either I'm just slow or the description could use
just a little more precision :)
Signed-off-by: Adam Horshack (horshack@live.com)
|
|
This reverts commit d5a47449ce79001ba233fe6d0499627d0438cb69.
The change to rate-submit.c is clearly bogus, as it's referencing
'td' outside of the actual 'td' loop. It's not valid at that
point.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
I don't believe we can have a NULL ->io_ops here, but let's just
add an error check and make the static checkers happy as they don't
like the non-NULL check and then a later deref in the other branch.
Add missing braces while at it.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add support for NVMe TP4146 Flexible Data Placemen, allowing placement
identifiers in write commands. The user can enabled this with the new
"fdp=1" parameter for fio's io_uring_cmd ioengine. By default, the fio
jobs will cycle through all the namespace's available placement
identifiers for write commands. The user can limit which placement
identifiers can be used with additional parameter, "fdp_pli=<list,>",
which can be used to separate write intensive jobs from less intensive
ones.
Setting up your namespace for FDP is outside the scope of 'fio', so this
assumes the namespace is already properly configured for the mode.
Link: https://lore.kernel.org/fio/CAKi7+wfX-eaUD5pky5cJ824uCzsQ4sPYMZdp3AuCUZOA1TQrYw@mail.gmail.com/T/#m056018eb07229bed00d4e589f9760b2a2aa009fc
Based-on-a-patch-by: Ankit Kumar <ankit.kumar@samsung.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
[Vincent: fold in sfree fix from Ankit]
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
|
|
https://github.com/horshack-dpreview/fio
* 'Fix_Assert_TdIoQueue_Serialize_Overlap_Offload' of https://github.com/horshack-dpreview/fio:
ioengines.c:346: td_io_queue: Assertion `res == 0' failed
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
https://github.com/horshack-dpreview/fio
* 'Fix_Bad_Hdr_Rand_Seed_For_Requeued_IO' of https://github.com/horshack-dpreview/fio:
Fix "verify bad_hdr rand_seed" for requeued I/Os
|
|
* 'master' of https://github.com/Cuelive/fio:
blktrace: fix compilation error on the uos system
|
|
When compiling on uos, it fails with an undefined reference to 'major'. Fix
this by including the correct header for it.
liuyafei <liuyafei@uniontech.com>
|
|
On configurations that can cause I/Os to be internally requeued from
FIO_Q_BUSY such as '--iodepth_batch_complete_max', and the workload has
verify enabled, the subsequent verification of the data fails with a bad
verify rand_seed because the pattern for the I/O is generated twice for
the same I/O, causing the seed to become out of sync when the verify is
later performed. The seed is generate twice because do_io() handles the
I/O twice, first when it originates the I/O and again when it later gets
the same I/O back from get_io_u() after it's is pulled from the requeue
list, which is where the first submission landed due to the workload
reaching '--iodepth_batch_complete_max'.
The fix is for do_io() to track when it has generated the verify pattern
for an I/O via a new io_u flag 'IO_U_F_PATTERN_DONE', avoiding a second
call to populate_verify_io_u() when that flag is detected.
Link: https://github.com/axboe/fio/issues/1526
Signed-off-by: Adam Horshack (horshack@live.com)
|
|
* 'master' of https://github.com/bvanassche/fio:
zbd: Make an error message more detailed
zbd: Report the zone capacity
io_u: Add a debug message in fill_io_u()
|
|
If zone data is invalid, report in detail why it is invalid.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
|
|
The zone capacity is important information. Hence report the zone
capacity if ZBD debugging is enabled.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
|
|
A debug message is logged before each 'return io_u_eof' statement in
fill_io_u() except one. Hence add a debug message in front of that one
return statement.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
|
|
The logging format updates documented in 1a953d97 were never propagated
to fiologparser.py, which since then has been failing with a ValueError
exception.
This commit explicitly limits fiologparser.py to only reading the first
2 columns in the log file, because they are the only columns used.
This is similar to issue #928.
Signed-off-by: Alexander Patrakov <patrakov@gmail.com>
|
|
'Verify_Bad_Hdr_Rand_Seed_Mult_Workload_Iterations_Non_Repeating_Seed' of https://github.com/horshack-dpreview/fio
* 'Verify_Bad_Hdr_Rand_Seed_Mult_Workload_Iterations_Non_Repeating_Seed' of https://github.com/horshack-dpreview/fio:
Bad header rand_seed with time_based or loops with randrepeat=0 verify
|
|
The commit removing pmemblk inadvertently removed dev-dax and
libpmem ioengines as well.
Tested-by: Yi Zhang yi.zhang@redhat.com
Fixes: 04c1cdc4c108c6537681ab7c50daaed6d2fb4c93 ("pmemblk: remove
pmemblk engine")
Fixes: https://github.com/axboe/fio/issues/1523
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
|
|
Verify fails with "bad header rand_seed" when multiple iterations of
do_io() execute (time_based=1 or loops>0), with verify enabled
and randrepeat=0
The root cause is do_verify() resetting the verify seed back to the
job-init value, which works for verification of the first iteration of
do_io() but fails for subsequent iterations because the seed is left in
its post-do_io() state after the first do_verify(), which means
different rand values for the second iteration of do_io() yet the second
iteration of do_verify() will revert back again to the job-init seed
value.
The fix is to revert the verify seed for randrepeat=0 back to ts state
when do_io() last ran rather than to its job-init value. That will allow
do_verify() to use the correct seed for each iteration while still
retaining a per-iteration unique verify seed.
Link: https://github.com/axboe/fio/issues/1517#issuecomment-1430282533
Signed-off-by: Adam Horshack (horshack@live.com)
|
|
Assertion in ioengines.c::td_io_queue() fails for pthread_mutex_unlock()
on overlap_check mutex when serialize_overlap=1, io_submit_mode=offload,
and verify=<any> are used together.
backend.c::fio_io_sync() invokes td_io_queue(), which expects the caller
to have ownership of the overlap_check mutex when serialize_overlap
and offloading are configured, as part of the overlap-check interlock
with IO_U_F_FLIGHT. The mutex is not acquired for this path because it's
not an I/O requiring an overlap check.
The fix is to refine the conditional that triggers td_io_queue() to
release the overlap_check mutex. Rather than using broad config options,
the conditional now uses a new io_u flag named IO_U_F_OVERLAP_LOCK, which
is only set for the offload worker thread path that acquires the mutex.
Link: https://github.com/axboe/fio/issues/1520
Signed-off-by: Adam Horshack (horshack@live.com)
|
|
Runtime for fio jobs when used in conjuction with thinktime,
thinktime_iotime and thinktime_spin were sometimes more than
what is specified. Add a fix so that fio doesn't spin or sleep
for any duration beyond runtime.
For the first cycle fio starts by doing I/O for
thinktime + thinktime_iotime, which should just be for
thinktime_iotime. Add a fix for that.
Signed-off-by: Ankit Kumar <ankit.kumar@samsung.com>
Link: https://lore.kernel.org/r/20230217070322.14163-2-ankit.kumar@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
* 'remove_pmemblk_engine' of github.com:osalyk/fio:
pmemblk: remove pmemblk engine
|
|
Fio has not been setting O_DIRECT, O_SYNC, O_DSYNC, and O_CREAT for
workloads that include trim commands. Stop doing this and actually set
these flags when requested for workloads that include trim commands.
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
|
|
This feature never went upstream on the Linux kernel side, let's just
get rid of it.
The option is left for now, but we can deprecate that or even probably
remove it as it will never had had any effect.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add code to process trim commands when we are reading an iolog.
Fixes: https://github.com/axboe/fio/issues/769
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
|
|
No further support or maintenance of the libpmemblk library is planned.
https://pmem.io/blog/2022/11/update-on-pmdk-and-our-long-term-support-strategy/
https://github.com/pmem/pmdk/pull/5538
Signed-off-by: osalyk <oksana.salyk@intel.com>
|
|
github.com:horshack-dpreview/fio
* 'Read_Stats_Not_Reported_For_Timed_Backlog_Verifies' of github.com:horshack-dpreview/fio:
Read stats for backlog verifies not reported for time-expired workloads
|
|
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
|
|
Improve the documentation, describing how to use nbdkit with a local
file. Move the suggested test file to /var/tmp since /tmp might be a
tmpfs. Use indenting to make it easier to read.
Use ${uri} instead of ${unixsocket} since nbdkit 1.14 was released
nearly 4 years ago.
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
|
|
The zone_reset_threshold option works for multiple jobs only when the
jobs have same write range. Add three test cases to confirm that the
option works for multiple jobs as expected. The first test case checks
that different write ranges are reported as an error. The second test
case checks that multiple write jobs work when they have same write
range. The third test case checks that a read job and a write job work
when they have different IO ranges.
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
|
|
The valid data bytes accounting field is initialized at file reset,
after each job started. Each job locks zones to check write pointer
positions of its write target zones. This can cause zone lock contention
with write by other jobs.
To avoid the zone lock contention, move the initialization from file
reset to file setup before job start. It allows to access the write
pointers and the accounting field without locks. Remove the lock and
unlock codes which are no longer required. Ensure the locks are not
required by checking run-state in the struct thread_data. Also rename
the function zbd_set_vdb() to zbd_verify_and_set_vdb() to be consistent
with other functions called in zbd_setup_files().
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
|
|
The valid data bytes accounting is used for zone_reset_threshold option.
This accounting usage has two issues. The first issue is unexpected
zone reset due to different IO ranges. The valid data bytes accounting
is done for all IO ranges per device, and shared by all jobs. On the
other hand, the zone_reset_threshold option is defined as the ratio to
each job's IO range. When a job refers to the accounting value, it
includes writes to IO ranges out of the job's IO range. Then zone reset
is triggered earlier than expected.
The second issue is accounting value initialization. The initialization
of the accounting field is repeated for each job, then the value
initialized by the first job is overwritten by other jobs. This works as
expected for single job or multiple jobs with same write range. However,
when multiple jobs have different write ranges, the overwritten value is
wrong except for the last job.
To ensure that the accounting works as expected for the option, check
that write ranges of all jobs are same. If jobs have different write
ranges, report it as an error. Initialize the accounting field only once
for the first job. All jobs have same write range, then one time
initialization is enough. Update man page to clarify this limitation of
the option.
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
|
|
The valid data bytes accounting is used only for zone_reset_threshold
option. Avoid the accounting when the option is not specified.
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
|
|
The zone_reset_threshold option uses the 'sectors with data' accounting
then it was described to have 'logical block' as its unit. However, the
accounting was implemented with 'byte' unit. Fix the description of the
option.
Also, the zone_reset_threshold option works together with the
zone_reset_frequency option. Describe this relation also.
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
|
|
The 'sectors with data' accounting was designed to have 'sector' as its
unit. Then related variables have the word 'sector' in their names. Also
related code comments use the words 'sector' or 'logical blocks'.
However, actually it was implemented to have 'byte' as unit. Rename
related variables and comments to indicate the byte unit. Also replace
the abbreviation swd to vdb.
Fixes: a7c2b6fc2959 ("Add support for resetting zones periodically")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
|
|
The 'sectors with data' accounting had been used for CHECK_SWD debug
feature. It compared expected written data size and actually written
data size for zonemode=zbd. However, this feature has been disabled for
a while and not actively used. Also, the sector with data accounting has
two issues. The first issue is wrong accounting for multiple jobs with
different write ranges. The second issue is job start up failure due to
zone lock contention.
Avoid using the accounting by removing the CHECK_SWD feature and related
code. Also rename the function zbd_process_swd() to zbd_set_swd() to
clarify that it no longer works for CHECK_SWD.
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
|
|
To decide the first IO direction of randrw workload, the function
zbd_adjust_ddir() refers to the zbd_info->sectors_with_data value which
indicates the number of bytes written to the zoned block devices being
accessed. However, this accounting has two issues. The first issue is
wrong accounting for multiple jobs with different write ranges. The
second issue is job start up failure due to zone lock contention.
Avoid using zbd_info->sectors_with_data and simply refer to file->
last_start[DDIR_WRITE] instead. It is initialized with -1ULL for each
job. After any write operation is done by the job, it keeps valid
offset. If it has valid offset, written data is expected and the first
IO direction can be read.
Also remove zbd_info->sectors_with_data, which is no longer used. Keep
the field zbd_info->wp_sectors_with_data since it is still used for
zones with write pointers.
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
|