logging: expand runstates eligible for logging Currently recording log entries for IOPS and BW logging with log_avg_msec enabled only happens when a thread has completed its ramp time and is in the TD_RUNNING or TD_VERIFYING run states. It may happen that a final bandwidth or IOPS log entry is missed when a job transitions from TD_RUNNING/TD_VERIFYING to TD_FINISHING before the helper thread has a chance to calculate a final log entry. This patch expands the run states where logging is permitted, allowing log entries to be recorded for jobs in the TD_FINISHING or TD_EXITED states. Each job cleans itself up and typically transitions quickly from TD_FINISHING to TD_EXITED. The main fio backend thread carries out the transition from TD_EXITED to TD_REAPED. The window during which a job is in the TD_FINISHING and TD_EXITED states is short, so measurements should still be reasonablly accurate. I tested these patches with the following job: fio --name=test --ioengine=null --time_based --runtime=3s --filesize=1T \ --write_iops_log=test --write_bw_log=test --log_avg_msec=1000 \ && cat test_iops.1.log && cat test_bw.1.log Before this patch series 10/10 trials had missing log entries. With only the helper_thread change in the preceding patch 3/10 trials had missing log entries. With this entire patch series, only 1/10 trials had missing log entries. Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
logging: record timestamp for each thread Instead of recording a timestamp once before iterating through all the threads to check if we should add a new bw or iops log measurement, record a new timestamp before checking each thread. We were already querying the time anyway for the mtime_since_now() call. We might as well have it available locally for more accurate logging. Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
stat: log out both average and max over the window Add option log_window_value alias of log_max_value which reports average, max or both the values. Retain backward compatibility by allowing =0 and =1 values to specify avg and max values respectively. There is no change to existing log formats while reporting only average or max values. Signed-off-by: Ankit Kumar <ankit.kumar@samsung.com> Link: https://lore.kernel.org/r/20240125110124.55137-2-ankit.kumar@samsung.com Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Make log_unix_epoch an official alias of log_alternate_epoch log_alternate_epoch was introduced along with log_alternate_epoch_clock_id, and generalized the idea of log_unix_epoch. Both options had the same effect. So we make log_unix_epoch an official alias of log_alternate_epoch, instead of maintaining both redundant options. Signed-off-by: Nick Neumann nick@pcpartpicker.com
Record job start time to fix time pain points Add a new key in the json per-job output, job_start, that records the job start time obtained via a call to clock_gettime using the clock_id specified by the new job_start_clock_id option. This allows times of fio jobs and log entries to be compared/ordered against each other and against other system events recorded against the same clock_id. Add a note to the documentation for group_reporting about how there are several per-job values for which only the first job's value is recorded in the json output format when group_reporting is enabled. Fixes #1544 Signed-off-by: Nick Neumann nick@pcpartpicker.com
stats: Add hint information to per priority level stats Modify the json and standard per-priority output stats to display the hint value together with the priority class and level. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com> Link: https://lore.kernel.org/r/20230721110510.44772-7-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
stat: add diskutil aggregated sectors to normal output Since we are now collecting sectors in the disk utilization data we should include them in the aggregated data as well. I tested this with an LVM mirror. I also tested this on an mdadm mirror but all the aggregated and slave data was zero. Fixes: 75cbc26d500fc5f7e36f6203c9b8e08b9c6f007c ("diskutil: Report how many sectors have been read and written") Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
stat: add new diskutil sectors to json output A recent commit added sectors read/written to the disk utilization data. Allow these counts to also appear in the JSON output. Fixes: 75cbc26d500fc5f7e36f6203c9b8e08b9c6f007c ("diskutil: Report how many sectors have been read and written") Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
fio: steadystate: allow for custom check interval Allow for a different steady state check interval than 1s with a new --ss_interval parameter. Steady state is reached when the steady state condition (like slope) is true when comparing the last windows (set with --ss_dur). The actual values for this comparison is currently calculated for a 1s interval during the window. This is especially problematic for slow random devices, where the values do not converge for such a fine granularity. Letting the user set this solves this problem, although requires them figuring out an appropriate value themselves. --ss=iops:5% --ss_dur=120s should reproduce this for many (slower) devices. Then adding like --ss_interval=20s may let it converge. Signed-off-by: Christian Loehle <cloehle@posteo.de>
stat: Fix ioprio print When using per-priority statistics for workloads using multiple different priority values, the statistics output displays the priority class and value (level) for each set of statistics. However, this is done using linux priority values coding, that is, assuming that the priority level is at most 7 (lower 3-bits). However, this is not always the case for all OSes. E.g. dragonfly allows IO priorities up to a value of 10. Introduce the OS dependent ioprio_class() and ioprio() macros to extract the fields from an ioprio value according to the OS capabilities. A generic definition (always returning 0) for these macros in os/os.h is added and used for all OSes that do not define these macros. The functions show_ddir_status() and add_ddir_status_json() are modified to use these new macros to fix per priority statistics output. The modification includes changes to the loops over the clat_prio array to reduce indentation levels, making the code a little cleaner. Fixes: 692dec0cfb4b ("stat: report clat stats on a per priority granularity") Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Refactor for_each_td() to catch inappropriate td ptr reuse I recently introduced a bug caused by reusing a struct thread_data *td after the end of a for_each_td() loop construct. Link: https://github.com/axboe/fio/pull/1521#issuecomment-1448591102 To prevent others from making this same mistake, this commit refactors for_each_td() so that both the struct thread_data * and the loop index variable are placed inside their own scope for the loop. This will cause any reference to those variables outside the for_each_td() to produce an undeclared identifier error, provided the outer scope doesn't already reuse those same variable names for other code within the routine (which is fine because the scopes are separate). Because C/C++ doesn't let you declare two different variable types within the scope of a for() loop initializer, creating a scope for both struct thread_data * and the loop index required explicitly declaring a scope with a curly brace. This means for_each_td() includes an opening curly brace to create the scope, which means all uses of for_each_td() must now end with an invocation of a new macro named end_for_each() to emit an ending curly brace to match the scope brace created by for_each_td(): for_each_td(td) { while (td->runstate < TD_EXITED) sleep(1); } end_for_each(); The alternative is to end every for_each_td() construct with an inline curly brace, which is off-putting since the implementation of an extra opening curly brace is abstracted in for_each_td(): for_each_td(td) { while (td->runstate < TD_EXITED) sleep(1); }} Most fio logic only declares "struct thread_data *td" and "int i" for use in for_each_td(), which means those declarations will now cause -Wunused-variable warnings since they're not used outside the scope of the refactored for_each_td(). Those declarations have been removed. Implementing this change caught a latent bug in eta.c::calc_thread_status() that accesses the ending value of struct thread_data *td after the end of for_each_td(), now manifesting as a compile error, so working as designed :) Signed-off-by: Adam Horshack (horshack@live.com)
stat: fix segfault with fio option --bandwidth-log The log_params for aggregate read, write and trim only specify log type. As a result the io_log doesn't have the relevant thread_data structure. With fio option --bandwidth-log this results in segmentation fault. Add a check and use DEF_LOG_ENTRIES for such case. Fixes: 0a852a50 ("fio: Introduce the log_entries option") Signed-off-by: Ankit Kumar <ankit.kumar@samsung.com> [vkf: added Fixes tag, lightly edited commit message] Signed-off-by: Vincent Fu <vincent.fu@samsung.com>
Handle finished jobs when using status-interval stat: When printing job stats with status-interval, don't keep adding values to the total runtime if the jobs are already finished. This should fix the printing of the intermediate runtime/average BW etc. Signed-off-by: Kozlowski Mateusz <mateusz.kozlowski@intel.com>
stat: make free_clat_prio_stats() safe against NULL The sfree() in free_clat_prio_stats() itself handles NULL, so the function already handles a struct thread_stat without any per priority stats. (Per priority stats are disabled on threads/thread_stats that we know will never be able to contain more than a single priority.) However, if malloc() in e.g. gen_mixed_ddir_stats_from_ts() or __show_run_stats() failed to allocate memory, free_clat_prio_stats() will be supplied a NULL pointer. Fix free_clat_prio_stats() to handle a NULL pointer gracefully. Fixes: 4ad856497c0b ("stat: add a new function to allocate a clat_prio_stat array") Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Link: https://lore.kernel.org/r/20220204001741.34419-1-Niklas.Cassel@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
stat: convert json output to a new per priority granularity format The JSON output will no longer contain high_prio/low_prio entries, but will instead include a new list "prios", which will include an object per prioclass/priolevel combination. Each of these objects will either have a "clat_ns" object or a "lat_ns" object, depending on which latency type was being tracked. This JSON structure should make it easy if the per priority stats were ever extended to be able to track multiple latency types at the same time, as each prioclass/priolevel object will then simply contain (e.g.) both a "clat_ns" and a "lat_ns" object. Convert the JSON output to this new per priority granularity format, and convert the tests to work with the new JSON output. Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20220203192814.18552-16-Niklas.Cassel@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>