fio.git
2 years agostat: make add lat percentile functions inline
Niklas Cassel [Thu, 25 Nov 2021 13:20:32 +0000 (13:20 +0000)]
stat: make add lat percentile functions inline

Now that add_lat_percentile_prio_sample() has been simplified,
make both add lat percentile functions inline, just like add_stat_sample().

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211125132020.109955-7-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agostat: simplify add_lat_percentile_prio_sample()
Niklas Cassel [Thu, 25 Nov 2021 13:20:32 +0000 (13:20 +0000)]
stat: simplify add_lat_percentile_prio_sample()

add_lat_percentile_prio_sample() currently adds both a per priority sample
and a regular sample.

Since these two samples are completely unrelated, it is very confusing that
the add_lat_percentile_prio_sample() also adds a regular sample.

Remove the add_lat_percentile_sample() function call from
add_lat_percentile_prio_sample(), and let functions calling
add_lat_percentile_prio_sample() call add_lat_percentile_sample()
explicitly. This makes the flow in e.g. add_clat_sample() much easier to
follow.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211125132020.109955-6-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agostat: rename add_lat_percentile_sample_noprio()
Niklas Cassel [Thu, 25 Nov 2021 13:20:31 +0000 (13:20 +0000)]
stat: rename add_lat_percentile_sample_noprio()

add_lat_percentile_sample_noprio() is the regular function to add a latency
percentile sample. It adds a regular sample (it doesn't add any per
priority sample). Therefore, it makes sense that this function has no
suffix, neither _noprio nor _prio.

Drop the _noprio suffix from add_lat_percentile_sample_noprio(), to make it
more obvious that this function should be used if you want to add a regular
percentile sample.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211125132020.109955-5-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agostat: rename add_lat_percentile_sample()
Niklas Cassel [Thu, 25 Nov 2021 13:20:31 +0000 (13:20 +0000)]
stat: rename add_lat_percentile_sample()

The name for add_lat_percentile_sample() is confusing, since the function
actually adds a per priority percentile sample (it also adds a regular
sample), yet it doesn't have prio as part of the function name.

Rename the function so that it is more obvious that this function should
be used if you want to add a prio percentile sample.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211125132020.109955-4-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agostat: add comments describing the quirky behavior of clat prio samples
Niklas Cassel [Thu, 25 Nov 2021 13:20:30 +0000 (13:20 +0000)]
stat: add comments describing the quirky behavior of clat prio samples

Commit 56440e63ac17 ("fio: report percentiles for slat, clat, lat")
together with commit 38ec5c514104 ("stat: make priority summary statistics
consistent with percentiles") changed so that per prio stats track either
completion latency (clat) or total latency (lat), depending on the option
lat_percentiles.

It is not obvious why add_clat_sample() shouldn't add a high/low clat prio
sample when option lat_percentiles is set, especially considering that
option lat_percentiles is usually used for controlling if total latency
percentiles should be displayed or not.

Add comments to describe why add_clat_sample() has to care about option
lat_percentiles.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211125132020.109955-3-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agodocs: document quirky implementation of per priority stats reporting
Niklas Cassel [Thu, 25 Nov 2021 13:20:30 +0000 (13:20 +0000)]
docs: document quirky implementation of per priority stats reporting

Commit 56440e63ac17 ("fio: report percentiles for slat, clat, lat") changed
many things. One of the changes, from the commit message:
"- for the new cmdprio_percentage latencies, if lat_percentiles=1,
*total* latency percentiles will be tracked. Otherwise, *completion*
latency percentiles will be tracked."

In other words, the commit changed the per prio stats from always tracking
(and reporting) clat latency, to instead either track (and report) clat or
lat latency.

Considering that a certain latency type reports two things:
1) min/max/avg latency for the the specific latency type
2) latency percentiles for the specific latency type

If disable_clat/disable_lat is used, neither 1) nor 2) will be reported.
If clat_percentiles/lat_percentiles is false, 2) will not be reported.

Therefore it is unintuitive that setting lat_percentiles=1, an option
usually used to enable/disable percentile reporting, also affects which
type of latency that will be tracked (and reported) for per prio stats.

The fact that the variables are named e.g. clat_prio_stat, regardless of
the type of latency being tracked does not help.

Anyway, let's document the way that the current implementation works,
so that a user can know how per priority stats are handled, without having
to read the source, since the commit that introduced this behavior forgot
to update the documentation.

Fixes: 56440e63ac17 ("fio: report percentiles for slat, clat, lat")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211125132020.109955-2-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoMerge branch 'fix-parse-sync-file-range' of https://github.com/oleglatin/fio
Jens Axboe [Wed, 24 Nov 2021 17:27:20 +0000 (10:27 -0700)]
Merge branch 'fix-parse-sync-file-range' of https://github.com/oleglatin/fio

* 'fix-parse-sync-file-range' of https://github.com/oleglatin/fio:
  parse: handle comma-separated options

2 years agoparse: handle comma-separated options
Oleg Latin [Wed, 24 Nov 2021 17:17:04 +0000 (20:17 +0300)]
parse: handle comma-separated options

Option parser does not properly handle 'sync_file_range' option with
multiple flags.  It was due to opt_len() only use ':' as delimiter, so
only last flag in comma-separated list have effect.

This patch adds ',' as a delimiter.  All flags are correctly ORed now.

Fixes: https://github.com/axboe/fio/issues/1234
Signed-off-by: Oleg Latin <oleglatin@yandex-team.ru>
2 years agot/dedupe: style fixups
Jens Axboe [Sun, 21 Nov 2021 13:51:11 +0000 (06:51 -0700)]
t/dedupe: style fixups

Some introduced by a recent patch, some old. Be more consistent with the
fio coding style.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agot/io_uring: fix 32-bit compile warnings
Jens Axboe [Sun, 21 Nov 2021 13:50:22 +0000 (06:50 -0700)]
t/io_uring: fix 32-bit compile warnings

We need to use a 64-bit cast for the shift the the user_data, and
fix the init of minv in the clat percentile calculation.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoMerge branch 'dedupe_and_compression' of https://github.com/bardavid/fio
Jens Axboe [Sun, 21 Nov 2021 13:43:05 +0000 (06:43 -0700)]
Merge branch 'dedupe_and_compression' of https://github.com/bardavid/fio

* 'dedupe_and_compression' of https://github.com/bardavid/fio:
  fio-dedup: adjusted the binary to support compression
  Mixed dedup and compression

2 years agofio-dedup: adjusted the binary to support compression
Bar David [Wed, 10 Nov 2021 06:56:47 +0000 (08:56 +0200)]
fio-dedup: adjusted the binary to support compression

When given -C 1 option, fio-dedup will first look for
dedupable data and then calculate compressible (unique)
data opportunity. The rationale is to measure the total
data reduction potential.

Signed-off-by: Bar David <bardavvid@gmail.com>
2 years agoMixed dedup and compression
Bar David [Sun, 24 Oct 2021 10:59:50 +0000 (13:59 +0300)]
Mixed dedup and compression

Introducing support for dedupe and compression
on the same job. When used together, compression is
calculated from unique capacity. E.g. when using
dedupe_percentage=50 and buffer_compress_percentage=50,
then total reduction should be 75% - 50% would be deduped
while 50% of the remaining buffers would be compressed

Signed-off-by: Bar David <bardavvid@gmail.com>
2 years agoSync io_uring header with the kernel
Jens Axboe [Sat, 20 Nov 2021 14:31:20 +0000 (07:31 -0700)]
Sync io_uring header with the kernel

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: clamp CQ size to SQ size
Jens Axboe [Sat, 20 Nov 2021 14:27:57 +0000 (07:27 -0700)]
io_uring: clamp CQ size to SQ size

By default, io_uring uses twice as big a CQ ring as the SQ ring. That's
to help with cases where completions can come in unexpectedly. This is not
the case for storage IO, so just clamp the CQ size to save a bit of memory
on the CQEs and CQ ring.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agot/io_uring: add -R option for random/sequential IO
Jens Axboe [Fri, 19 Nov 2021 17:44:15 +0000 (10:44 -0700)]
t/io_uring: add -R option for random/sequential IO

If -R1 is used, which is the default, then a random IO pattern is used.
If -R0 is used, then the IO will be sequential.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agot/io_uring: use internal random generator
Jens Axboe [Fri, 19 Nov 2021 17:40:20 +0000 (10:40 -0700)]
t/io_uring: use internal random generator

Instead of using lrand48_r, use the internal fio random number generator.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agofio: Introduce the log_entries option
Damien Le Moal [Thu, 18 Nov 2021 05:27:29 +0000 (14:27 +0900)]
fio: Introduce the log_entries option

When iops, latency, or bw logging options are used, fio will by default
log information for any I/O that completes. The initial number of I/O
log entries is 1024, as defined by (DEF_LOG_ENTRIES). When all log
entries are used, new log entries are dynamically allocated by
get_new_log(). This dynamic log entry allocation can negatively impact
time-related statistics such as the I/O tail latencies (e.g. 99.9
percentile completion latency) as growing the logs causes a temporary
I/O stall (IO quiesce), which disturbs the workload steady state. The
effect of this is especially noticeable with workloads using IO
priorities: the tail latencies of high priority I/Os increase if the
IO log needs to be grown.

For example, running the following fio command on a SATA disk
supporting NCQ priority:

fio --name=prio-randread --filename=/dev/sdg \
    --random_generator=tausworthe64 --ioscheduler=none \
    --write_lat_log=randread.log --log_prio=1 --rw=randread --bs=128k \
    --ioengine=libaio --iodepth=32 --direct=1 --cmdprio_class=1 \
    --cmdprio_percentage=30 --runtime=900

(128KB random read workload at QD=32 and 30% of commands issued with a
high priority), with an inital number of log entries equal to the
default of 1024, depending on the machine memory state, the completion
latency statistics may show imprecise percentiles such as shown below.

high prio (30.75%) clat percentiles (msec):
 |  1.00th=[   14],  5.00th=[   17], 10.00th=[   20], 20.00th=[   23],
 | 30.00th=[   27], 40.00th=[   32], 50.00th=[   40], 60.00th=[   53],
 | 70.00th=[   71], 80.00th=[  104], 90.00th=[  169], 95.00th=[  243],
 | 99.00th=[  514], 99.50th=[  676], 99.90th=[ 1485], 99.95th=[ 1502],
 | 99.99th=[ 1552]
low prio (69.25%) clat percentiles (msec):
 |  1.00th=[   16],  5.00th=[   24], 10.00th=[   37], 20.00th=[   68],
 | 30.00th=[  105], 40.00th=[  146], 50.00th=[  199], 60.00th=[  255],
 | 70.00th=[  330], 80.00th=[  439], 90.00th=[  592], 95.00th=[  718],
 | 99.00th=[  885], 99.50th=[  986], 99.90th=[ 1469], 99.95th=[ 1536],
 | 99.99th=[ 1586]

All completion latency percentiles above the 99.90th percentile are
similar for the high and low priority commands, which is not consistent
with the drive expected execution of prioritized read commands.

To solve this issue and get more precise latency statistics, this patch
introduces the new "log_entries" option to allow specifying a larger
initial number of IO log entries to avoid run-time allocation.
This option value defaults to DEF_LOG_ENTRIES and its maximum value is
MAX_LOG_ENTRIES to be consistent with get_new_log() allocation. Also
simplify get_new_log() by using calloc() instead of malloc, thus
removing the need for the local variable new_size.

Adding the "--log_entries=65536" option to the previous command line
example, the completion latency results obtained are more stable:

high prio (30.72%) clat percentiles (msec):
 |  1.00th=[   15],  5.00th=[   17], 10.00th=[   19], 20.00th=[   22],
 | 30.00th=[   24], 40.00th=[   27], 50.00th=[   32], 60.00th=[   36],
 | 70.00th=[   46], 80.00th=[   57], 90.00th=[   81], 95.00th=[  105],
 | 99.00th=[  161], 99.50th=[  188], 99.90th=[  271], 99.95th=[  275],
 | 99.99th=[  363]
low prio (69.28%) clat percentiles (msec):
 |  1.00th=[   16],  5.00th=[   27], 10.00th=[   43], 20.00th=[   80],
 | 30.00th=[  123], 40.00th=[  176], 50.00th=[  236], 60.00th=[  313],
 | 70.00th=[  401], 80.00th=[  506], 90.00th=[  634], 95.00th=[  718],
 | 99.00th=[  844], 99.50th=[  885], 99.90th=[  953], 99.95th=[  995],
 | 99.99th=[ 1053]

All completion percentiles clearly now show shorter latencies for high
priority commands, as expected. The 99.99th percentile for low priority
commands is also improved compared to the previous case as the
measurements are not impacted by the log dynamic allocation.

Suggested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211118052729.132423-1-damien.lemoal@opensource.wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoMakefile: Fix android compilation
Gwendal Grignou [Wed, 17 Nov 2021 22:19:18 +0000 (14:19 -0800)]
Makefile: Fix android compilation

Inclue cmdprio for Android as well.
Without the patch, make for Android fails at link time:

engines/io_uring.c:824: error: undefined reference to 'fio_cmdprio_init'
engines/io_uring.c:456: error: undefined reference to 'fio_cmdprio_set_ioprio'
...

Fixes e27b9ff0e ("cmdprio: move cmdprio function definitions to a new cmdprio.c file")

Signed-off-by: Gwendal Grignou <gwendal@chromium.org>
Link: https://lore.kernel.org/r/20211117221918.3050439-1-gwendal@chromium.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoMerge branch 'jf_readme_typo' of https://github.com/jfpanisset/fio
Jens Axboe [Fri, 12 Nov 2021 16:22:21 +0000 (09:22 -0700)]
Merge branch 'jf_readme_typo' of https://github.com/jfpanisset/fio

* 'jf_readme_typo' of https://github.com/jfpanisset/fio:
  Small typo fix

2 years agolibaio,io_uring: make it possible to cleanup cmdprio malloced data
Niklas Cassel [Fri, 12 Nov 2021 09:54:44 +0000 (09:54 +0000)]
libaio,io_uring: make it possible to cleanup cmdprio malloced data

The way that fio currently handles engine options:
options_free() will call free() only for options that have the type
FIO_OPT_STR_STORE. This means that any option that has a pointer in
either td->o or td->eo, which is not of type FIO_OPT_STR_STORE will
leak memory. This is true even for numjobs == 1.

When running with numjobs > 1, fio_options_mem_dupe() will memcpy
td->eo into the new td. Since off1 of the pointers in the first td
has already been set, the pointers in the new td will point to the
same data. (Regardless, options_free() will never try to free the
memory, for neither td.) Neither can we manually free the memory in
cleanup(), since the other td will still point to the same memory,
so this would lead to a double free.

These memory leaks are reported by e.g. valgrind.

The most obvious way to solve this is to put dynamically allocated
memory in {ioring,libaio}_data instead of {ioring,libaio}_options.

This solves the problem since {ioring,libaio}_data is dynamically
allocated by each td during the ioengine init callback, and is freed
when the ioengine cleanup callback for that td is called.

The downside of this is that the parsing has to be done in
fio_cmdprio_init() instead of in the option .cb callback, since the
.cb callback is called before {ioring,libaio}_data is available.

This patch keeps the static cmdprio options in
{ioring,libaio}_options, but moves the dynamically allocated memory
needed by cmdprio to {ioring,libaio}_data.

No cmdprio related memory leaks are reported after this patch.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211112095428.158300-9-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agocmdprio: add mode to make the logic easier to reason about
Niklas Cassel [Fri, 12 Nov 2021 09:54:43 +0000 (09:54 +0000)]
cmdprio: add mode to make the logic easier to reason about

Add a new field "mode", in order to know if we are determining IO
priorities according to cmdprio_percentage or to cmdprio_bssplit.

This makes the logic easier to reason about, and allows us to
remove the "use_cmdprio" variable from the ioengines themselves.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211112095428.158300-8-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agolibaio,io_uring: move common cmdprio_prep() code to cmdprio
Niklas Cassel [Fri, 12 Nov 2021 09:54:42 +0000 (09:54 +0000)]
libaio,io_uring: move common cmdprio_prep() code to cmdprio

Move common cmdprio_prep() code to cmdprio.c to avoid code duplication.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211112095428.158300-7-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agolibaio,io_uring: rename prio_prep() to include cmdprio in the name
Niklas Cassel [Fri, 12 Nov 2021 09:54:41 +0000 (09:54 +0000)]
libaio,io_uring: rename prio_prep() to include cmdprio in the name

The default priority (which is either 0 or the value set by "prio" and
"prioclass" options, will now be used regardless if prio_prep() is
called or not. This is true for both libaio and io_uring.

The way to think about it is that prio_prep() is only called if
cmdprio_percentage/cmdprio_bssplit is used.

prio_prep() might then override the default priority, if the random
value happens to say that this I/O should use the cmdprio_value,
rather than the default priority.

Rename the prio_prep() functions to highlight that these functions
are now only called if cmdprio is used. (If only option
"prio"/"prioclass" is used, that is handled elsewhere.)

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211112095428.158300-6-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: set async IO priority to td->ioprio in fio_ioring_prep()
Niklas Cassel [Fri, 12 Nov 2021 09:54:41 +0000 (09:54 +0000)]
io_uring: set async IO priority to td->ioprio in fio_ioring_prep()

The default priority (which is either 0 or the value set by "prio" and
"prioclass" options) is now saved in td->ioprio.

The simplest thing is therefore to unconditionally set the async IO
priority to td->ioprio in fio_ioring_prep(), and let fio_ioring_prio_prep()
only handle the case where cmdprio_percentage/cmdprio_bssplit is enabled.

Therefore, fio_ioring_prio_prep() doesn't need to care if prio/prioclass
was enabled or not, we can simply think that fio_ioring_prio_prep()
might "override" the default priority, whatever the default priority may
be.

Doing it this way also has the advantage that the prio_prep() function
in io_uring will now look identical to the prio_prep() function in
libaio.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211112095428.158300-5-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agocmdprio: do not allocate memory for unused data direction
Niklas Cassel [Fri, 12 Nov 2021 09:54:40 +0000 (09:54 +0000)]
cmdprio: do not allocate memory for unused data direction

All cmdprio options only support data directions read and write.
However, each cmdprio option allocates memory for ddir trim as well,
even though nothing is ever written to this memory.

Change this so that we don't allocate memory for something which is
never used.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211112095428.158300-4-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agocmdprio: move cmdprio function definitions to a new cmdprio.c file
Niklas Cassel [Fri, 12 Nov 2021 09:54:40 +0000 (09:54 +0000)]
cmdprio: move cmdprio function definitions to a new cmdprio.c file

Move cmdprio function definitions from the cmdprio.h header file to a new
cmdprio.c file, such that we can add new static functions to cmdprio.c.

A follow up patch will add new cmdprio functions which do not need to be
directly accessible by ioengines.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211112095428.158300-3-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agodocs: update cmdprio_percentage documentation
Niklas Cassel [Fri, 12 Nov 2021 09:54:39 +0000 (09:54 +0000)]
docs: update cmdprio_percentage documentation

Commit 1437d6357429 ("libaio,io_uring: relax cmdprio_percentage
constraints") relaxed the cmdprio_percentage constraints such that
cmdprio_percentage and prioclass/prio could be used together.

However, it forgot to remove the mention of this constraint from
the docs. Update the docs to reflect the new behavior.

Fixes: 1437d6357429 ("libaio,io_uring: relax cmdprio_percentage constraints")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20211112095428.158300-2-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoSmall typo fix
Jean-Francois Panisset [Fri, 12 Nov 2021 03:56:30 +0000 (19:56 -0800)]
Small typo fix

Signed-off-by: Jean-Francois Panisset <panisset@gmail.com>
2 years agostat: create a init_thread_stat_min_vals() helper
Niklas Cassel [Mon, 8 Nov 2021 13:12:09 +0000 (13:12 +0000)]
stat: create a init_thread_stat_min_vals() helper

Create a init_thread_stat_min_vals() helper so that we can remove
duplicated code.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211108131143.80158-1-Niklas.Cassel@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoMerge branch 'evelu-peak' of https://github.com/ErwanAliasr1/fio
Jens Axboe [Mon, 25 Oct 2021 18:38:35 +0000 (12:38 -0600)]
Merge branch 'evelu-peak' of https://github.com/ErwanAliasr1/fio

* 'evelu-peak' of https://github.com/ErwanAliasr1/fio:
  t/one-core-peak: Don't report errors if missing NVME features
  t/io_uring: Fixing typo in help message
  t/one-core-peak: Reporting SElinux status

2 years agot/one-core-peak: Don't report errors if missing NVME features
Erwan Velu [Sun, 17 Oct 2021 20:00:02 +0000 (22:00 +0200)]
t/one-core-peak: Don't report errors if missing NVME features

Some NVMEs doesn't support some features, an error message is reported
like in the following example :
NVMe status: INVALID_FIELD: A reserved coded value or an unsupported value in a defined field(0x4002)
nvme2n1: Temp:26 C, Autonomous Power State Transition:, PowerState:0, Completion Queues:135, Submission Queues:135

This commit will only report features if available :
nvme2n1: Completion Queues:135, Submission Queues:135, PowerState:0, Temp:27 C

Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
2 years agot/io_uring: Fixing typo in help message
Erwan Velu [Sun, 17 Oct 2021 19:44:53 +0000 (21:44 +0200)]
t/io_uring: Fixing typo in help message

Commit a71ad043a3f4a introduce the DMA pre mapping support but made a
typo in the help message.

This option is enabled via -D, not -R.

Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
2 years agot/one-core-peak: Reporting SElinux status
Erwan Velu [Sun, 17 Oct 2021 19:18:40 +0000 (21:18 +0200)]
t/one-core-peak: Reporting SElinux status

SElinux can influence the overall performance.
Let's report its state

Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
2 years agoMerge branch 'master' of https://github.com/bvanassche/fio
Jens Axboe [Fri, 22 Oct 2021 16:19:04 +0000 (10:19 -0600)]
Merge branch 'master' of https://github.com/bvanassche/fio

* 'master' of https://github.com/bvanassche/fio:
  Android: Add io_uring support

2 years agoAndroid: Add io_uring support
Bart Van Assche [Thu, 21 Oct 2021 21:41:40 +0000 (14:41 -0700)]
Android: Add io_uring support

This patch has been tested on a recent Android phone. Compilation of this
patch has been verified as follows:

    NDK=/usr/lib/android-ndk
    export LIBS="-landroid"
    export UNAME=Android
    for ((i=23;i<=30;i++)); do
        echo "==== i = $i ===="
        export CC=$NDK/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android${i}-clang
        [ -e "$CC" ] || continue
        ./configure && make -j$(nproc) fio || break
    done

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
2 years agoMerge branch 'patch-1' of https://github.com/sweettea/fio
Jens Axboe [Tue, 19 Oct 2021 22:09:21 +0000 (16:09 -0600)]
Merge branch 'patch-1' of https://github.com/sweettea/fio

* 'patch-1' of https://github.com/sweettea/fio:
  t/fuzz: Clean up generated dependency makefiles

2 years agot/fuzz: Clean up generated dependency makefiles
Sweet Tea Dorminy [Tue, 19 Oct 2021 20:31:27 +0000 (16:31 -0400)]
t/fuzz: Clean up generated dependency makefiles

Currently, the 'clean' target cleans up the t/ directory, but not its
subdirectories. As t/fuzz contains c files, though, dependency makefiles
are created there and should be cleaned up.

Signed-off-by: Sweet Tea Dorminy <sweettea@dorminy.me>
2 years agoMerge branch 'fixes_1290' of https://github.com/rthardin/fio
Jens Axboe [Tue, 19 Oct 2021 01:29:46 +0000 (19:29 -0600)]
Merge branch 'fixes_1290' of https://github.com/rthardin/fio

* 'fixes_1290' of https://github.com/rthardin/fio:
  Use min_bs in rate_process=poisson

2 years agoUse min_bs in rate_process=poisson
Ryan Hardin [Mon, 18 Oct 2021 20:43:22 +0000 (16:43 -0400)]
Use min_bs in rate_process=poisson

This fixes an issue where IOPS targets were not met
when the `bs` parameter was not given explicitly, such
as when using `bssplit`.

Fixes #1290

Signed-off-by: Ryan Hardin <ryan.hardin@nutanix.com>
2 years agorun-fio-tests: make test runs more resilient
Vincent Fu [Tue, 21 Sep 2021 21:27:11 +0000 (21:27 +0000)]
run-fio-tests: make test runs more resilient

Catch exceptions that occur during test setup/running/evaluation. This
makes it more likely that the entire test suite can run to completion
even if some tests fail in an unexpected fashion.

In particular I have seen failures in FioJobTest_t0014() when the test
is run on a bare metal machine. Without this patch these failures make
the entire script grind to a halt.

Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Link: https://lore.kernel.org/r/20210921212639.61319-1-vincent.fu@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agot/zbd: Add -w option to ensure no open zone before write tests
Shin'ichiro Kawasaki [Wed, 13 Oct 2021 06:09:03 +0000 (15:09 +0900)]
t/zbd: Add -w option to ensure no open zone before write tests

The commit b34eb155e4a6 ("t/zbd: Reset all zones before test when max
open zones is specified") introduced -o max_open_zones option to the
script t/zbd/test-zbd-support. It passes max_open_zones value to fio and
resets all zones of the test target device before each test case run
with write operation. This zone reset by the script ensures that no zone
out of the IO range is in open status and the write operation do not
exceed the max_open_zones limit.

On the other hand, since commit d2f442bc0bd5 ("ioengines: add
get_max_open_zones zoned block device operation"), fio automatically
fetches the max_open_zones value. So it is no longer required to pass
the max_open_zones value from the script to fio. To simplify the script
usage, introduce -w option which does not require max_open_zones value.
This option just resets zones before test cases with write operation.

Of note is that fio itself resets the zones exceeding max_open_zones
limit since the commit 954217b90191 ("zbd: Initialize open zones list
referring zone status at fio start"), but it just resets zones within
the fio IO range. Still zone reset by the test script is required for
zones out of IO range. Zone reset out of IO range by fio is not
implemented since it may cause unexpected data erasure.

Suggested-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211013060903.166543-6-shinichiro.kawasaki@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agot/zbd: Align block size to zone capacity
Shin'ichiro Kawasaki [Wed, 13 Oct 2021 06:09:02 +0000 (15:09 +0900)]
t/zbd: Align block size to zone capacity

The test cases #5, #6, #15 and #37 writes data and read it back (or
write with verify option for read back). When test target zones have
zone capacity unaligned to the block size, read request can not be made
to all of the written data, and the test cases fail.

To avoid the failures, check zone capacity of zones and get block size
which can align to the zone capacities. Then use the block size for the
test cases.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211013060903.166543-5-shinichiro.kawasaki@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agot/zbd: Do not use too large block size in test case #4
Shin'ichiro Kawasaki [Wed, 13 Oct 2021 06:09:01 +0000 (15:09 +0900)]
t/zbd: Do not use too large block size in test case #4

The test case #4 specifies zone size as block size to read a zone. For
some devices, zone size is very large in GB order, then single pread64
system call can not complete the request. This makes the test case fail.

To avoid the failure, keep the block size adequate. If zone size is too
large, use logical_block_size * 256 as the block size.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211013060903.166543-4-shinichiro.kawasaki@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agozbd: Fix type of local variable min_bs
Shin'ichiro Kawasaki [Wed, 13 Oct 2021 06:09:00 +0000 (15:09 +0900)]
zbd: Fix type of local variable min_bs

In zbd.c, thread option min_bs[] is referred and stored in the local
variable min_bs. Elements of min_bs[] have type unsigned long long, but
the local variable min_bs has type uint32_t. When an element of min_bs[]
has value larger than UINT32_MAX, it overflows on assignment to min_bs.

To avoid the overflow, fix type of the local variable min_bs from
uint32_t to uint64_t. Use uint64_t rather than unsigned long long to be
more specific about data size and consistency in zbd.c. The variable is
passed to the helper function zbd_find_zone(), then fix the type of the
argument of the function also.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211013060903.166543-3-shinichiro.kawasaki@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agozbd: Remove cast to unsigned long long for printf
Shin'ichiro Kawasaki [Wed, 13 Oct 2021 06:08:59 +0000 (15:08 +0900)]
zbd: Remove cast to unsigned long long for printf

Many of the variables in zbd.c have type uint64_t. They are casted to
unsigned long long and printed with printf %llu format to handle
uint64_t types difference among architectures. This requires many
lengthy casts to unsigned long long.

To simplify the code, remove the casts to unsigned long long. Some of
the casts are simply unnecessary. To remove other casts, replace the
printf format %llu with PRIu64 so that uint64_t type difference among
architectures is handled accordingly.

Fio build pass of this change was confirmed with 32bit ARM cross
compiler and 64bit x86 compiler.

Suggested-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20211013060903.166543-2-shinichiro.kawasaki@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoengines/http.c: add fallthrough annotation to _curl_trace
Rebecca Cran [Sat, 16 Oct 2021 06:17:38 +0000 (00:17 -0600)]
engines/http.c: add fallthrough annotation to _curl_trace

To avoid the warning from clang "warning: unannotated fall-through
between switch labels [-Wimplicit-fallthrough]" swap the "fall through"
comment with the "fallthrough;" annotation from compiler.h.

Since the second "fall through" comment isn't really a new fall-through,
remove it.

Signed-off-by: Rebecca Cran <rebecca@bsdio.com>
Link: https://lore.kernel.org/r/20211016061738.76654-1-rebecca@bsdio.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agot/io_uring: Fix the parameters calculation for multiple threads scenario
Pankaj Raghav [Fri, 15 Oct 2021 12:09:56 +0000 (14:09 +0200)]
t/io_uring: Fix the parameters calculation for multiple threads scenario

The this_done, this_call and this_reap parameter should be a summation of
the corresponding field from all the submitters.

Currently, we are adding the done, calls and reaps param of the last used
submitter nthread times.

Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoMerge branch 'evelu-typo' of https://github.com/ErwanAliasr1/fio
Jens Axboe [Thu, 14 Oct 2021 21:01:30 +0000 (15:01 -0600)]
Merge branch 'evelu-typo' of https://github.com/ErwanAliasr1/fio

* 'evelu-typo' of https://github.com/ErwanAliasr1/fio:
  t/io_uring: Fixing typo

2 years agot/io_uring: Fixing typo
Erwan Velu [Thu, 14 Oct 2021 20:38:36 +0000 (22:38 +0200)]
t/io_uring: Fixing typo

s/Maxiumum/Maximum/g

Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
2 years agot/io_uring: include a maximum IOPS seen when exiting
Jens Axboe [Thu, 14 Oct 2021 13:56:56 +0000 (07:56 -0600)]
t/io_uring: include a maximum IOPS seen when exiting

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agot/io_uring: don't append 'K' to IOPS if we don't divide by 1000
Jens Axboe [Wed, 13 Oct 2021 12:17:44 +0000 (06:17 -0600)]
t/io_uring: don't append 'K' to IOPS if we don't divide by 1000

Impressive two errors in that silly change.

Reported-by: Erwan Velu <erwanaliasr1@gmail.com>
Fixes: dc10c23ab9a7 ("t/io_uring: show IOPS in increments of 1000 IOPS if necessary")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agot/io_uring: update for new DMA map buffers API
Jens Axboe [Wed, 13 Oct 2021 00:41:14 +0000 (18:41 -0600)]
t/io_uring: update for new DMA map buffers API

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agot/io_uring: add test support for pre mapping DMA buffers
Jens Axboe [Tue, 12 Oct 2021 20:25:23 +0000 (14:25 -0600)]
t/io_uring: add test support for pre mapping DMA buffers

This is in no shape or form the final evolution or API of this, but
easier to stuff it in here for testing.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agot/io_uring: fix silly identical branch error
Jens Axboe [Tue, 12 Oct 2021 20:09:33 +0000 (14:09 -0600)]
t/io_uring: fix silly identical branch error

The previous change inadvertently added the / 1000 to both branches, it
should of course only be done on the first one.

Fixes: dc10c23ab9a7 ("t/io_uring: show IOPS in increments of 1000 IOPS if necessary")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoMerge branch 'evelu-onecore' of https://github.com/ErwanAliasr1/fio
Jens Axboe [Tue, 12 Oct 2021 19:50:54 +0000 (13:50 -0600)]
Merge branch 'evelu-onecore' of https://github.com/ErwanAliasr1/fio

* 'evelu-onecore' of https://github.com/ErwanAliasr1/fio:
  t/one-core-peak: Improving check_sysblock_value error handling

2 years agot/io_uring: show IOPS in increments of 1000 IOPS if necessary
Jens Axboe [Tue, 12 Oct 2021 19:48:45 +0000 (13:48 -0600)]
t/io_uring: show IOPS in increments of 1000 IOPS if necessary

It's a bit hard to read the millions of IOPS, so if we're above 100K
IOPS, scale by 1000 and add a K instead. This is easier to read:

IOPS=7235K, BW=3532MiB/s, IOS/call=31/31, inflight=(78 114)
IOPS=7218K, BW=3524MiB/s, IOS/call=32/32, inflight=(79 105)

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agot/one-core-peak: Improving check_sysblock_value error handling
Erwan Velu [Tue, 12 Oct 2021 19:39:58 +0000 (21:39 +0200)]
t/one-core-peak: Improving check_sysblock_value error handling

The current code was reporting the following output:
cat: /sys/block/nvme0n1/queue/wbt_lat_usec: Argument invalide
nvme0n1: /sys/block/nvme0n1/queue/wbt_lat_usec set to 0.
Warning: nvme0n1: Cannot set 0 on /sys/block/nvme0n1/queue/wbt_lat_usec

This is problematic for several reasons:
- cat reports an error at reading wbt_lat_usec
- a message says it set wbt_lat_usec to 0
- a warning reports it cannot set wbt_lat_usec to 0

This commit:
- prevents the first error to be printed
- only report wbt_lat_usec is set to 0 if succeed unles it print the Warning message.

Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
2 years agoMerge branch 'windows-res' of https://github.com/bjpaupor/fio
Jens Axboe [Tue, 12 Oct 2021 19:20:56 +0000 (13:20 -0600)]
Merge branch 'windows-res' of https://github.com/bjpaupor/fio

* 'windows-res' of https://github.com/bjpaupor/fio:
  Query Windows clock frequency and use reported max

2 years agoQuery Windows clock frequency and use reported max
Brandon Paupore [Tue, 12 Oct 2021 19:00:41 +0000 (14:00 -0500)]
Query Windows clock frequency and use reported max

Previously FIO used the Windows lower-bound clock frequency of 64 Hz for
its helper-thread. This caused IOPS/BW logs to have large drift between
timestamps when not using per-unit logging for those measurements.

Now query the current resolution and set to use the maximum for more
accurate timestamps. Note that the resolution is automatically restored
after FIO terminates.

Signed-off-by: Brandon Paupore <brandon.paupore@wdc.com>
2 years agoio_u: don't attempt to requeue for full residual
Jens Axboe [Mon, 11 Oct 2021 15:49:21 +0000 (09:49 -0600)]
io_u: don't attempt to requeue for full residual

If we get zero bytes transferred, then don't attempt to re-set the
io_u and requeue the IO. That's a fatal condition for this IO.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agot/io_uring: fix latency stats for depth == 1
Jens Axboe [Sat, 9 Oct 2021 18:56:11 +0000 (12:56 -0600)]
t/io_uring: fix latency stats for depth == 1

Two issues here:

- Stat increment accounting was off-by-one, causing no stats added
  for depth == 1
- The stat batch count should be a minimum of 2, since it's really
  a mask.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoMerge branch 'evelu-ocp' of https://github.com/ErwanAliasr1/fio
Jens Axboe [Thu, 7 Oct 2021 12:18:21 +0000 (06:18 -0600)]
Merge branch 'evelu-ocp' of https://github.com/ErwanAliasr1/fio

* 'evelu-ocp' of https://github.com/ErwanAliasr1/fio:
  t/io_uring: Add -r option to control the runtime
  t/one-core-peak: Reporting RETPOLINE & PAGE_TABLE_ISOLATION
  t/one-core-peak: Reporting kernel cmdline
  t/one-core-peak: Reporting BLK_WBT_MQ
  t/one-core-peak: Reporting BLK_CGROUP

2 years agot/io_uring: Add -r option to control the runtime
Erwan Velu [Wed, 6 Oct 2021 21:40:27 +0000 (23:40 +0200)]
t/io_uring: Add -r option to control the runtime

By default the test is running until someone press Ctrl-C.
This commit add an option to define the expected runtime.

Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
2 years agot/one-core-peak: Reporting RETPOLINE & PAGE_TABLE_ISOLATION
Erwan Velu [Wed, 6 Oct 2021 21:42:29 +0000 (23:42 +0200)]
t/one-core-peak: Reporting RETPOLINE & PAGE_TABLE_ISOLATION

These settings can influence the max perf if enabled.
Let's report them.

Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
2 years agot/one-core-peak: Reporting kernel cmdline
Erwan Velu [Wed, 6 Oct 2021 21:25:49 +0000 (23:25 +0200)]
t/one-core-peak: Reporting kernel cmdline

The cmdline can contain many interesting options that were set and could
influence the final result/one-core-peak: Reporting kernel cmdline

The cmdline can contain many interesting options that were set and could
influence the final result

Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
2 years agot/one-core-peak: Reporting BLK_WBT_MQ
Erwan Velu [Wed, 6 Oct 2021 21:02:15 +0000 (23:02 +0200)]
t/one-core-peak: Reporting BLK_WBT_MQ

If BLK_WBT_MQ is set, some ktime_get() call can be seen in the io path.
Let's report the value of this setting and disable it if present.

Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
2 years agot/one-core-peak: Reporting BLK_CGROUP
Erwan Velu [Wed, 6 Oct 2021 20:19:30 +0000 (22:19 +0200)]
t/one-core-peak: Reporting BLK_CGROUP

When BLK_CGROUP is enabled, it induces some rdtsc calls which reduce the
overall performance.

Let's report if this option is enabled.

The tool was reporting BLK_CGROUP_IOCOST which wasn't the right one.

Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
2 years agot/io_uring: get rid of old debug printfs
Jens Axboe [Tue, 5 Oct 2021 12:58:07 +0000 (06:58 -0600)]
t/io_uring: get rid of old debug printfs

We don't really care about the sq/cq ring pointers, that was something
I originally added as this test tool was the first one that I wrote to
bring up io_uring and help debug ring issues.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agot/io_uring: print submitter id with tid on startup
Jens Axboe [Tue, 5 Oct 2021 12:38:41 +0000 (06:38 -0600)]
t/io_uring: print submitter id with tid on startup

Makes it easier to match up multiple threads with the stats.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agot/io_uring: clean up aio wait loop
Jens Axboe [Mon, 4 Oct 2021 23:04:04 +0000 (17:04 -0600)]
t/io_uring: clean up aio wait loop

No functional changes, just makes it easier to read and gets rid of
an indentation.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agot/io_uring: check for valid clock_index and finish state for stats
Jens Axboe [Mon, 4 Oct 2021 22:35:15 +0000 (16:35 -0600)]
t/io_uring: check for valid clock_index and finish state for stats

If the clock_index isn't non-zero, it's not valid and we should disregard
the sample. Ditto if an exit signal has been sent, we're done at that
point and aren't interested in the last samples.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agot/io_uring: don't track IO latencies the first second of runtime
Jens Axboe [Mon, 4 Oct 2021 22:18:39 +0000 (16:18 -0600)]
t/io_uring: don't track IO latencies the first second of runtime

The most variation is usually seen at startup, so don't start tracking
latencies until we've done the first reporting run. Things should be
nice and stable at that point.

To make this cheaper on the fast path, clock_index is only valid if
it's non-zero. This makes checking for stats cheap in the reap path.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agot/io_uring: don't print partial IOPS etc output if exit signal was received
Jens Axboe [Mon, 4 Oct 2021 22:16:01 +0000 (16:16 -0600)]
t/io_uring: don't print partial IOPS etc output if exit signal was received

The run always terminates with what looks like a much slower cycle than
the previous seconds. That's not really the case, it's just that the
sleep() got interrupted by the signal and we slept less than we thought
we did, yet we still account it as a full second.

Just make it cleaner and break if finish is set.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agot/io_uring: add support for legacy AIO
Jens Axboe [Mon, 4 Oct 2021 18:42:01 +0000 (12:42 -0600)]
t/io_uring: add support for legacy AIO

Just as a comparison point, not really interesting otherwise. It doesn't
support any of the advanced features, just basic IO.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agot/io_uring: remove extra add_stat() call
Jens Axboe [Mon, 4 Oct 2021 18:33:40 +0000 (12:33 -0600)]
t/io_uring: remove extra add_stat() call

If we're batching the stat updates, it's incorrect to add the individual
stat. Would have skewed the percentiles, and make -t1 run slower than it
otherwise would have.

Fixes: ab85494f8bf0 ("t/io_uring: batch stat updates")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoMerge branch 'evelu-fixes2' of https://github.com/ErwanAliasr1/fio
Jens Axboe [Fri, 1 Oct 2021 19:55:52 +0000 (13:55 -0600)]
Merge branch 'evelu-fixes2' of https://github.com/ErwanAliasr1/fio

* 'evelu-fixes2' of https://github.com/ErwanAliasr1/fio:
  t/one-core-peak: nvme-cli as optional tooling
  t/one-core-peak: Report numa as off if missing

2 years agot/one-core-peak: nvme-cli as optional tooling
Erwan Velu [Fri, 1 Oct 2021 19:43:07 +0000 (21:43 +0200)]
t/one-core-peak: nvme-cli as optional tooling

Not all systems has nvme-cli installed.

If present then let's print additional low-level info,
If not, let's ignore and continue.

Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
2 years agot/one-core-peak: Report numa as off if missing
Erwan Velu [Fri, 1 Oct 2021 19:37:29 +0000 (21:37 +0200)]
t/one-core-peak: Report numa as off if missing

Some systems doesn't have numa enabled,
if so don't report an error but report numa as off.

Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
2 years agoRefer td->loops instead of td->o.loops to fix loop count issue
Shin'ichiro Kawasaki [Fri, 1 Oct 2021 10:32:57 +0000 (19:32 +0900)]
Refer td->loops instead of td->o.loops to fix loop count issue

In the github issues #1093 and #1278, it was reported that the loops
option does not work as expected when do_verify=0 option is specified.
Per analysis by Sowmya Ravi, the cause was as follows:

1) keep_running() decrements td->o.loops at job repetition, then
   td->o.loops has zero value when the last loop is executed.
2) clear_io_state() is called at the beginning of the thread_main loop
   for each repetition for loops option.
3) clear_io_state() calls reset_io_counters() which resets
   td->nr_done_files to zero when td->o.loops is non-zero.
4) For the last loop of loops option, clear_io_state() call does not
   clear td->nr_done_files since td->l.loops is zero. This results in a
   setup error in do_io().

To fix the issue, modify reset_io_counters() to refer td->loops instead
of td->o.loops. td->o.loops is not a good reference since it is updated
in keep_running(). td->loops is not updated during fio run, and safe to
refer.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20211001103257.4130231-3-shinichiro.kawasaki@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoRevert "Fix for loop count issue when do_verify=0 (#1093)"
Shin'ichiro Kawasaki [Fri, 1 Oct 2021 10:32:56 +0000 (19:32 +0900)]
Revert "Fix for loop count issue when do_verify=0 (#1093)"

This reverts commit 499cded5f435a0a7c379b606eb3e903d7f43c360.

The commit enabled clear_io_state() call in the loop of thread_main()
after completion of IOs, regardless of verify option. This sets zero to
td->nr_done_files even when the IOs are sequential workload with holes.
Such IOs depend on td->nr_done_files to judge job completion in
__get_next_file(). With zero value in td->nr_done_files, the sequential
IOs do not complete as expected, and results in failure of a test case

Revert the commit to avoid the failure. Regarding the loop count issue
with do_verify=0 option, another fix patch follows.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20211001103257.4130231-2-shinichiro.kawasaki@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agot/io_uring: correct percentile ranking
Jens Axboe [Fri, 1 Oct 2021 17:11:53 +0000 (11:11 -0600)]
t/io_uring: correct percentile ranking

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agozbd: Fix unexpected job termination by open zone search failure
Shin'ichiro Kawasaki [Thu, 30 Sep 2021 00:02:36 +0000 (09:02 +0900)]
zbd: Fix unexpected job termination by open zone search failure

Test case #46 in t/zbd/test-zbd-support fails when it is repeated
hundreds of times on null_blk zoned devices. The test case uses libaio
IO engine to run 8 random write jobs on 4 sequential write required
zones. When all of the 4 zones get almost full but still open for
in-flight writes, the helper function zbd_convert_to_open_zone() fails
to get an opened zone for next write. This results in unexpected job
termination.

To avoid the unexpected job termination, retry the steps in
zbd_convert_to_open_zone(). Before retry, call io_u_quiesce() to ensure
that the in-flight writes get completed.

To prevent infinite loop by the retry, retry only when any IOs are
in-flight or in-flight IOs get completed. To check in-flight IO count of
all jobs, add a new helper function any_io_in_flight().

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Link: https://lore.kernel.org/r/20210930000236.4116945-1-shinichiro.kawasaki@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agot/io_uring: store TSC rate in local file
Jens Axboe [Thu, 30 Sep 2021 02:15:45 +0000 (20:15 -0600)]
t/io_uring: store TSC rate in local file

Doesn't change on a single machine, so let's just cache the value instead
of requiring it to be specified every time. If we specify the rate, the
local data is updated. If we don't specify it, we check the file, and use
the rate in there if it exists.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoMerge branch 'patch-1' of https://github.com/ravisowmya/fio
Jens Axboe [Wed, 29 Sep 2021 17:38:58 +0000 (11:38 -0600)]
Merge branch 'patch-1' of https://github.com/ravisowmya/fio

* 'patch-1' of https://github.com/ravisowmya/fio:
  Fix for loop count issue when do_verify=0 (#1093)

2 years agoFix for loop count issue when do_verify=0 (#1093)
ravisowmya [Tue, 28 Sep 2021 19:09:38 +0000 (12:09 -0700)]
Fix for loop count issue when do_verify=0 (#1093)

'clear_io_state' is called twice and resets the nr_done_files.
'clear_io_state' resets the nr_done_files if loop>=1.
This API is called twice with in thread_main and the second call is
skipped if do_verify=0. We rely on the first call for setup management.
So, for the very last loop, we would have skipped reseting
'nr_done_files' because loops=0 resulting in an IO error
in do_io and we exit without performing any IOs. Fix will invoke
the second call to clear_io_state

Signed-off-by: Sowmya Ravi sowmyaravi.92@gmail.com
2 years agoMerge branch 'sigbreak' of https://github.com/bjpaupor/fio
Jens Axboe [Tue, 28 Sep 2021 19:28:18 +0000 (13:28 -0600)]
Merge branch 'sigbreak' of https://github.com/bjpaupor/fio

* 'sigbreak' of https://github.com/bjpaupor/fio:
  add signal handlers for Windows SIGBREAK

2 years agoadd signal handlers for Windows SIGBREAK
Brandon Paupore [Tue, 28 Sep 2021 17:12:15 +0000 (12:12 -0500)]
add signal handlers for Windows SIGBREAK

Signed-off-by: Brandon Paupore <brandon.paupore@wdc.com>
2 years agoMerge branch 'onecore' of https://github.com/ByteHamster/fio
Jens Axboe [Sun, 26 Sep 2021 22:32:32 +0000 (16:32 -0600)]
Merge branch 'onecore' of https://github.com/ByteHamster/fio

* 'onecore' of https://github.com/ByteHamster/fio:
  Pick core for running t/one-core-peak.sh

2 years agoMerge branch 'evelu-fio' of https://github.com/ErwanAliasr1/fio
Jens Axboe [Sun, 26 Sep 2021 22:32:05 +0000 (16:32 -0600)]
Merge branch 'evelu-fio' of https://github.com/ErwanAliasr1/fio

* 'evelu-fio' of https://github.com/ErwanAliasr1/fio:
  one-core-peak: Reporting NVME features
  t/one-core-peak: Reporting kernel config
  one-core-peak.sh: Fixing bash

2 years agoone-core-peak: Reporting NVME features
Erwan Velu [Sun, 26 Sep 2021 20:26:27 +0000 (22:26 +0200)]
one-core-peak: Reporting NVME features

This commit get some low-level features of NVME drives and report them.
It includes, temperature, apste, power state and submission & completion queues

A typical output looks like :
  nvme0n1: MODEL=Samsung SSD 970 EVO Plus 2TB FW=2B2QEXM7 serial=S59CNM0R417706B PCI=0000:01:00.0@8.0 GT/s PCIe IRQ=62 NUMA=0 CPUS=0-31
  nvme0n1: Temp:34 C, Autonomous Power State Transition: Enabled, PowerState:4, Completion Queues:32, Submission Queues:32

Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
2 years agot/one-core-peak: Reporting kernel config
Erwan Velu [Sun, 26 Sep 2021 19:43:39 +0000 (21:43 +0200)]
t/one-core-peak: Reporting kernel config

This patch add a reporting of some items of the kernel config.

A typical output looks like :
system: KERNEL: 5.15.0-rc2+
system: KERNEL: CONFIG_BLK_CGROUP_IOCOST=y
system: KERNEL: CONFIG_HZ=1000

Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
2 years agoPick core for running t/one-core-peak.sh
ByteHamster [Wed, 22 Sep 2021 14:30:35 +0000 (16:30 +0200)]
Pick core for running t/one-core-peak.sh

2 years agoone-core-peak.sh: Fixing bash
Erwan Velu [Sun, 26 Sep 2021 19:03:36 +0000 (21:03 +0200)]
one-core-peak.sh: Fixing bash

This commit fixes some warning around the bash syntax

Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
2 years agoMerge branch 'tsc' of https://github.com/ErwanAliasr1/fio
Jens Axboe [Sun, 26 Sep 2021 15:58:05 +0000 (09:58 -0600)]
Merge branch 'tsc' of https://github.com/ErwanAliasr1/fio

* 'tsc' of https://github.com/ErwanAliasr1/fio:
  one-core-peak: Adding option to reporting latencies
  one-core-peak: Avoid reporting Unknown memory speed

2 years agoone-core-peak: Adding option to reporting latencies
Erwan Velu [Sat, 25 Sep 2021 21:51:24 +0000 (23:51 +0200)]
one-core-peak: Adding option to reporting latencies

Since commit 932131c944b10f2a03f4028318c454c98eca489f,
it is now possible to report the io_uring benchmark latencies.

This patch detects the current TSC value and enable the latency feature if requested.

Signed-off-by: Erwan Velu <e.velu@criteo.com>
2 years agoone-core-peak: Avoid reporting Unknown memory speed
Erwan Velu [Sat, 25 Sep 2021 21:49:12 +0000 (23:49 +0200)]
one-core-peak: Avoid reporting Unknown memory speed

Some BIOSes, reports the configured mem speed to unknown making the report useless.
Adding a match on a real speed to avoid this.

Before: system: MEMORY: Unknown
After:  system: MEMORY: 3466 MT/s

Signed-off-by: Erwan Velu <e.velu@criteo.com>
2 years agoMerge branch 'evelu-uring' of https://github.com/ErwanAliasr1/fio
Jens Axboe [Sat, 25 Sep 2021 20:56:14 +0000 (14:56 -0600)]
Merge branch 'evelu-uring' of https://github.com/ErwanAliasr1/fio

* 'evelu-uring' of https://github.com/ErwanAliasr1/fio:
  t/io_uring.c: Adding \n on help

2 years agot/io_uring.c: Adding \n on help
Erwan Velu [Sat, 25 Sep 2021 20:45:51 +0000 (22:45 +0200)]
t/io_uring.c: Adding \n on help

Without these \n, the new options were baddly printed

Signed-off-by: Erwan Velu <e.velu@criteo.com>
2 years agot/io_uring: batch stat updates
Jens Axboe [Sat, 25 Sep 2021 20:38:10 +0000 (14:38 -0600)]
t/io_uring: batch stat updates

Track the last clock_index, and batch increments if at all possible.

Signed-off-by: Jens Axboe <axboe@kernel.dk>