X-Git-Url: https://git.kernel.dk/?p=fio.git;a=blobdiff_plain;f=fio.1;h=ed492682fde0a1a3a9f3498a80a46959879454b8;hp=883a31bd2646c9795da4ee567c8f9c31a4fd0a4d;hb=c5daece64fd56763f264a59965a547433d4da799;hpb=875e8d6fa4d443068eb1c48a29f5367e454d2a37 diff --git a/fio.1 b/fio.1 index 883a31bd..ed492682 100644 --- a/fio.1 +++ b/fio.1 @@ -20,6 +20,9 @@ file and memory debugging). `help' will list all available tracing options. .BI \-\-parse\-only Parse options only, don't start any I/O. .TP +.BI \-\-merge\-blktrace\-only +Merge blktraces only, don't start any I/O. +.TP .BI \-\-output \fR=\fPfilename Write output to \fIfilename\fR. .TP @@ -93,7 +96,10 @@ the value is interpreted in seconds. Force a full status dump of cumulative (from job start) values at \fItime\fR intervals. This option does *not* provide per-period measurements. So values such as bandwidth are running averages. When the time unit is omitted, -\fItime\fR is interpreted in seconds. +\fItime\fR is interpreted in seconds. Note that using this option with +`\-\-output-format=json' will yield output that technically isn't valid json, +since the output will be collated sets of valid json. It will need to be split +into valid sets of json after the run. .TP .BI \-\-section \fR=\fPname Only run specified section \fIname\fR in job file. Multiple sections can be specified. @@ -724,21 +730,79 @@ false. .BI unlink_each_loop \fR=\fPbool Unlink job files after each iteration or loop. Default: false. .TP -Fio supports strided data access. After having read \fBzonesize\fR bytes from an area that is \fBzonerange\fR bytes big, \fBzoneskip\fR bytes are skipped. +.BI zonemode \fR=\fPstr +Accepted values are: +.RS +.RS +.TP +.B none +The \fBzonerange\fR, \fBzonesize\fR and \fBzoneskip\fR parameters are ignored. +.TP +.B strided +I/O happens in a single zone until \fBzonesize\fR bytes have been transferred. +After that number of bytes has been transferred processing of the next zone +starts. +.TP +.B zbd +Zoned block device mode. I/O happens sequentially in each zone, even if random +I/O has been selected. Random I/O happens across all zones instead of being +restricted to a single zone. +.RE +.RE .TP .BI zonerange \fR=\fPint -Size of a single zone in which I/O occurs. +Size of a single zone. See also \fBzonesize\fR and \fBzoneskip\fR. .TP .BI zonesize \fR=\fPint -Number of bytes to transfer before skipping \fBzoneskip\fR bytes. If this -parameter is smaller than \fBzonerange\fR then only a fraction of each zone -with \fBzonerange\fR bytes will be accessed. If this parameter is larger than -\fBzonerange\fR then each zone will be accessed multiple times before skipping -to the next zone. +For \fBzonemode\fR=strided, this is the number of bytes to transfer before +skipping \fBzoneskip\fR bytes. If this parameter is smaller than +\fBzonerange\fR then only a fraction of each zone with \fBzonerange\fR bytes +will be accessed. If this parameter is larger than \fBzonerange\fR then each +zone will be accessed multiple times before skipping to the next zone. + +For \fBzonemode\fR=zbd, this is the size of a single zone. The \fBzonerange\fR +parameter is ignored in this mode. .TP .BI zoneskip \fR=\fPint -Skip the specified number of bytes after \fBzonesize\fR bytes of data have been -transferred. +For \fBzonemode\fR=strided, the number of bytes to skip after \fBzonesize\fR +bytes of data have been transferred. This parameter must be zero for +\fBzonemode\fR=zbd. + +.TP +.BI read_beyond_wp \fR=\fPbool +This parameter applies to \fBzonemode=zbd\fR only. + +Zoned block devices are block devices that consist of multiple zones. Each +zone has a type, e.g. conventional or sequential. A conventional zone can be +written at any offset that is a multiple of the block size. Sequential zones +must be written sequentially. The position at which a write must occur is +called the write pointer. A zoned block device can be either drive +managed, host managed or host aware. For host managed devices the host must +ensure that writes happen sequentially. Fio recognizes host managed devices +and serializes writes to sequential zones for these devices. + +If a read occurs in a sequential zone beyond the write pointer then the zoned +block device will complete the read without reading any data from the storage +medium. Since such reads lead to unrealistically high bandwidth and IOPS +numbers fio only reads beyond the write pointer if explicitly told to do +so. Default: false. +.TP +.BI max_open_zones \fR=\fPint +When running a random write test across an entire drive many more zones will be +open than in a typical application workload. Hence this command line option +that allows to limit the number of open zones. The number of open zones is +defined as the number of zones to which write commands are issued. +.TP +.BI zone_reset_threshold \fR=\fPfloat +A number between zero and one that indicates the ratio of logical blocks with +data to the total number of logical blocks in the test above which zones +should be reset periodically. +.TP +.BI zone_reset_frequency \fR=\fPfloat +A number between zero and one that indicates how often a zone reset should be +issued if the zone reset threshold has been exceeded. A zone reset is +submitted after each (1 / zone_reset_frequency) write requests. This and the +previous parameter can be used to simulate garbage collection activity. .SS "I/O type" .TP @@ -1673,6 +1737,20 @@ done other than creating the file. Read and write using mmap I/O to a file on a filesystem mounted with DAX on a persistent memory device through the PMDK libpmem library. +.TP +.B ime_psync +Synchronous read and write using DDN's Infinite Memory Engine (IME). This +engine is very basic and issues calls to IME whenever an IO is queued. +.TP +.B ime_psyncv +Synchronous read and write using DDN's Infinite Memory Engine (IME). This +engine uses iovecs and will try to stack as much IOs as possible (if the IOs +are "contiguous" and the IO depth is not exceeded) before issuing a call to IME. +.TP +.B ime_aio +Asynchronous read and write using DDN's Infinite Memory Engine (IME). This +engine will try to stack as much IOs as possible by creating requests for IME. +FIO will then decide when to commit these requests. .SS "I/O engine specific parameters" In addition, there are some parameters which are only valid when a specific \fBioengine\fR is in use. These are used identically to normal parameters, @@ -1829,12 +1907,14 @@ Username for HTTP authentication. .BI (http)http_pass \fR=\fPstr Password for HTTP authentication. .TP -.BI (http)https \fR=\fPbool -Whether to use HTTPS instead of plain HTTP. Default is \fB0\fR. +.BI (http)https \fR=\fPstr +Whether to use HTTPS instead of plain HTTP. \fRon\fP enables HTTPS; +\fRinsecure\fP will enable HTTPS, but disable SSL peer verification (use +with caution!). Default is \fBoff\fR. .TP -.BI (http)http_s3 \fR=\fPbool -Include S3 specific HTTP headers such as authenticating requests with -AWS Signature Version 4. Default is \fB0\fR. +.BI (http)http_mode \fR=\fPstr +Which HTTP access mode to use: webdav, swift, or s3. Default is +\fBwebdav\fR. .TP .BI (http)http_s3_region \fR=\fPstr The S3 region/zone to include in the request. Default is \fBus-east-1\fR. @@ -1845,6 +1925,10 @@ The S3 secret key. .BI (http)http_s3_keyid \fR=\fPstr The S3 key/access id. .TP +.BI (http)http_swift_auth_token \fR=\fPstr +The Swift auth token. See the example configuration file on how to +retrieve this. +.TP .BI (http)http_verbose \fR=\fPint Enable verbose requests from libcurl. Useful for debugging. 1 turns on verbose logging from libcurl, 2 additionally enables HTTP IO tracing. @@ -1986,8 +2070,15 @@ changing data and the overlapping region has a non-zero size. Setting \fBserialize_overlap\fR tells fio to avoid provoking this behavior by explicitly serializing in-flight I/Os that have a non-zero overlap. Note that setting this option can reduce both performance and the \fBiodepth\fR achieved. -Additionally this option does not work when \fBio_submit_mode\fR is set to -offload. Default: false. +.RS +.P +This option only applies to I/Os issued for a single job except when it is +enabled along with \fBio_submit_mode\fR=offload. In offload mode, fio +will check for overlap among all I/Os submitted by offload jobs with \fBserialize_overlap\fR +enabled. +.P +Default: false. +.RE .TP .BI io_submit_mode \fR=\fPstr This option controls how fio submits the I/O to the I/O engine. The default @@ -2107,12 +2198,40 @@ to replay a workload captured by blktrace. See \fBblktrace\fR\|(8) for how to capture such logging data. For blktrace replay, the file needs to be turned into a blkparse binary data file first (`blkparse \-o /dev/null \-d file_for_fio.bin'). +You can specify a number of files by separating the names with a ':' character. +See the \fBfilename\fR option for information on how to escape ':' and '\' +characters within the file names. These files will be sequentially assigned to +job clones created by \fBnumjobs\fR. .TP .BI read_iolog_chunked \fR=\fPbool Determines how iolog is read. If false (default) entire \fBread_iolog\fR will be read at once. If selected true, input from iolog will be read gradually. Useful when iolog is very large, or it is generated. .TP +.BI merge_blktrace_file \fR=\fPstr +When specified, rather than replaying the logs passed to \fBread_iolog\fR, +the logs go through a merge phase which aggregates them into a single blktrace. +The resulting file is then passed on as the \fBread_iolog\fR parameter. The +intention here is to make the order of events consistent. This limits the +influence of the scheduler compared to replaying multiple blktraces via +concurrent jobs. +.TP +.BI merge_blktrace_scalars \fR=\fPfloat_list +This is a percentage based option that is index paired with the list of files +passed to \fBread_iolog\fR. When merging is performed, scale the time of each +event by the corresponding amount. For example, +`\-\-merge_blktrace_scalars="50:100"' runs the first trace in halftime and the +second trace in realtime. This knob is separately tunable from +\fBreplay_time_scale\fR which scales the trace during runtime and will not +change the output of the merge unlike this option. +.TP +.BI merge_blktrace_iters \fR=\fPfloat_list +This is a whole number option that is index paired with the list of files +passed to \fBread_iolog\fR. When merging is performed, run each trace for +the specified number of iterations. For example, +`\-\-merge_blktrace_iters="2:1"' runs the first trace for two iterations +and the second trace for one iteration. +.TP .BI replay_no_stall \fR=\fPbool When replaying I/O with \fBread_iolog\fR the default behavior is to attempt to respect the timestamps within the log and replay them with the @@ -2145,11 +2264,12 @@ Unfortunately this also breaks the strict time ordering between multiple device accesses. .TP .BI replay_align \fR=\fPint -Force alignment of I/O offsets and lengths in a trace to this power of 2 -value. +Force alignment of the byte offsets in a trace to this value. The value +must be a power of 2. .TP .BI replay_scale \fR=\fPint -Scale sector offsets down by this factor when replaying traces. +Scale bye offsets down by this factor when replaying traces. Should most +likely use \fBreplay_align\fR as well. .SS "Threads, processes and job synchronization" .TP .BI replay_skip \fR=\fPstr @@ -2559,6 +2679,12 @@ steady state assessment criteria. All assessments are carried out using only data from the rolling collection window. Threshold limits can be expressed as a fixed value or as a percentage of the mean in the collection window. .RS +.P +When using this feature, most jobs should include the \fBtime_based\fR +and \fBruntime\fR options or the \fBloops\fR option so that fio does not +stop running after it has covered the full size of the specified file(s) +or device(s). +.RS .RS .TP .B iops @@ -3446,6 +3572,45 @@ Write `length' bytes beginning from `offset'. Trim the given file from the given `offset' for `length' bytes. .RE .RE +.SH I/O REPLAY \- MERGING TRACES +Colocation is a common practice used to get the most out of a machine. +Knowing which workloads play nicely with each other and which ones don't is +a much harder task. While fio can replay workloads concurrently via multiple +jobs, it leaves some variability up to the scheduler making results harder to +reproduce. Merging is a way to make the order of events consistent. +.P +Merging is integrated into I/O replay and done when a \fBmerge_blktrace_file\fR +is specified. The list of files passed to \fBread_iolog\fR go through the merge +process and output a single file stored to the specified file. The output file is +passed on as if it were the only file passed to \fBread_iolog\fR. An example would +look like: +.RS +.P +$ fio \-\-read_iolog=":" \-\-merge_blktrace_file="" +.RE +.P +Creating only the merged file can be done by passing the command line argument +\fBmerge-blktrace-only\fR. +.P +Scaling traces can be done to see the relative impact of any particular trace +being slowed down or sped up. \fBmerge_blktrace_scalars\fR takes in a colon +separated list of percentage scalars. It is index paired with the files passed +to \fBread_iolog\fR. +.P +With scaling, it may be desirable to match the running time of all traces. +This can be done with \fBmerge_blktrace_iters\fR. It is index paired with +\fBread_iolog\fR just like \fBmerge_blktrace_scalars\fR. +.P +In an example, given two traces, A and B, each 60s long. If we want to see +the impact of trace A issuing IOs twice as fast and repeat trace A over the +runtime of trace B, the following can be done: +.RS +.P +$ fio \-\-read_iolog=":"" \-\-merge_blktrace_file"" \-\-merge_blktrace_scalars="50:100" \-\-merge_blktrace_iters="2:1" +.RE +.P +This runs trace A at 2x the speed twice for approximately the same runtime as +a single run of trace B. .SH CPU IDLENESS PROFILING In some cases, we want to understand CPU overhead in a test. For example, we test patches for the specific goodness of whether they reduce CPU usage.