X-Git-Url: https://git.kernel.dk/?a=blobdiff_plain;f=fio.1;h=45ec8d43dcbf8172318f91d12c33005037386bd9;hb=4ef1562a013513fd0a0048cca4048f28d308a90f;hp=05896e61418e564461b98628c8a14462fe7c1a02;hpb=b2a432bfbb6d10e93f2c8f8092d6db672d45af0d;p=fio.git diff --git a/fio.1 b/fio.1 index 05896e61..45ec8d43 100644 --- a/fio.1 +++ b/fio.1 @@ -738,12 +738,13 @@ Accepted values are: .RS .TP .B none -The \fBzonerange\fR, \fBzonesize\fR and \fBzoneskip\fR parameters are ignored. +The \fBzonerange\fR, \fBzonesize\fR \fBzonecapacity\fR and \fBzoneskip\fR +parameters are ignored. .TP .B strided I/O happens in a single zone until \fBzonesize\fR bytes have been transferred. After that number of bytes has been transferred processing of the next zone -starts. +starts. The \fBzonecapacity\fR parameter is ignored. .TP .B zbd Zoned block device mode. I/O happens sequentially in each zone, even if random @@ -771,6 +772,14 @@ zoned block device, the specified \fBzonesize\fR must be 0 or equal to the device zone size. For a regular block device or file, the specified \fBzonesize\fR must be at least 512B. .TP +.BI zonecapacity \fR=\fPint +For \fBzonemode\fR=zbd, this defines the capacity of a single zone, which is +the accessible area starting from the zone start address. This parameter only +applies when using \fBzonemode\fR=zbd in combination with regular block devices. +If not specified it defaults to the zone size. If the target device is a zoned +block device, the zone capacity is obtained from the device information and this +option is ignored. +.TP .BI zoneskip \fR=\fPint For \fBzonemode\fR=strided, the number of bytes to skip after \fBzonesize\fR bytes of data have been transferred. @@ -804,7 +813,11 @@ so. Default: false. When running a random write test across an entire drive many more zones will be open than in a typical application workload. Hence this command line option that allows to limit the number of open zones. The number of open zones is -defined as the number of zones to which write commands are issued. +defined as the number of zones to which write commands are issued by all +threads/processes. +.TP +.BI job_max_open_zones \fR=\fPint +Limit on the number of simultaneously opened zones per single thread/process. .TP .BI zone_reset_threshold \fR=\fPfloat A number between zero and one that indicates the ratio of logical blocks with @@ -1449,9 +1462,31 @@ starting I/O if the platform and file type support it. Defaults to true. This will be ignored if \fBpre_read\fR is also specified for the same job. .TP -.BI sync \fR=\fPbool -Use synchronous I/O for buffered writes. For the majority of I/O engines, -this means using O_SYNC. Default: false. +.BI sync \fR=\fPstr +Whether, and what type, of synchronous I/O to use for writes. The allowed +values are: +.RS +.RS +.TP +.B none +Do not use synchronous IO, the default. +.TP +.B 0 +Same as \fBnone\fR. +.TP +.B sync +Use synchronous file IO. For the majority of I/O engines, +this means using O_SYNC. +.TP +.B 1 +Same as \fBsync\fR. +.TP +.B dsync +Use synchronous data IO. For the majority of I/O engines, +this means using O_DSYNC. +.PD +.RE +.RE .TP .BI iomem \fR=\fPstr "\fR,\fP mem" \fR=\fPstr Fio can use various types of memory as the I/O unit buffer. The allowed @@ -1548,7 +1583,8 @@ if \fBsize\fR is set to 20GiB and \fBio_size\fR is set to 5GiB, fio will perform I/O within the first 20GiB but exit when 5GiB have been done. The opposite is also possible \-\- if \fBsize\fR is set to 20GiB, and \fBio_size\fR is set to 40GiB, then fio will do 40GiB of I/O within -the 0..20GiB region. +the 0..20GiB region. Value can be set as percentage: \fBio_size\fR=N%. +In this case \fBio_size\fR multiplies \fBsize\fR= value. .TP .BI filesize \fR=\fPirange(int) Individual file sizes. May be a range, in which case fio will select sizes @@ -1629,6 +1665,12 @@ I/O. Requires \fBfilename\fR option to specify either block or character devices. This engine supports trim operations. The sg engine includes engine specific options. .TP +.B libzbc +Synchronous I/O engine for SMR hard-disks using the \fBlibzbc\fR +library. The target can be either an sg character device or +a block device file. This engine supports the zonemode=zbd zone +operations. +.TP .B null Doesn't transfer any data, just pretends to. This is mainly used to exercise fio itself and for debugging/testing purposes. @@ -1655,11 +1697,6 @@ to get desired CPU usage, as the cpuload only loads a single CPU at the desired rate. A job never finishes unless there is at least one non-cpuio job. .TP -.B guasi -The GUASI I/O engine is the Generic Userspace Asynchronous Syscall -Interface approach to async I/O. See \fIhttp://www.xmailserver.org/guasi-lib.html\fR -for more info on GUASI. -.TP .B rdma The RDMA I/O engine supports both RDMA memory semantics (RDMA_WRITE/RDMA_READ) and channel semantics (Send/Recv) for the @@ -1789,6 +1826,13 @@ Read and write iscsi lun with libiscsi. .TP .B nbd Synchronous read and write a Network Block Device (NBD). +.TP +.B libcufile +I/O engine supporting libcufile synchronous access to nvidia-fs and a +GPUDirect Storage-supported filesystem. This engine performs +I/O without transferring buffers between user-space and the kernel, +unless \fBverify\fR is set or \fBcuda_io\fR is \fBposix\fR. \fBiomem\fR must +not be \fBcudamalloc\fR. This ioengine defines engine specific options. .SS "I/O engine specific parameters" In addition, there are some parameters which are only valid when a specific \fBioengine\fR is in use. These are used identically to normal parameters, @@ -1800,7 +1844,8 @@ Set the percentage of I/O that will be issued with higher priority by setting the priority bit. Non-read I/O is likely unaffected by ``cmdprio_percentage``. This option cannot be used with the `prio` or `prioclass` options. For this option to set the priority bit properly, NCQ priority must be supported and -enabled and `direct=1' option must be used. +enabled and `direct=1' option must be used. fio must also be run as the root +user. .TP .BI (io_uring)fixedbufs If fio is asked to do direct IO, then Linux will map pages for each IO call, and @@ -1846,6 +1891,22 @@ than normal. When hipri is set this determines the probability of a pvsync2 I/O being high priority. The default is 100%. .TP +.BI (pvsync2,libaio,io_uring)nowait +By default if a request cannot be executed immediately (e.g. resource starvation, +waiting on locks) it is queued and the initiating process will be blocked until +the required resource becomes free. +This option sets the RWF_NOWAIT flag (supported from the 4.14 Linux kernel) and +the call will return instantly with EAGAIN or a partial result rather than waiting. + +It is useful to also use \fBignore_error\fR=EAGAIN when using this option. +Note: glibc 2.27, 2.28 have a bug in syscall wrappers preadv2, pwritev2. +They return EOPNOTSUP instead of EAGAIN. + +For cached I/O, using this option usually means a request operates only with +cached data. Currently the RWF_NOWAIT flag does not supported for cached write. +For direct I/O, requests will only succeed if cache invalidation isn't required, +file blocks are fully allocated and the disk request could be issued immediately. +.TP .BI (cpuio)cpuload \fR=\fPint Attempt to use the specified percentage of CPU cycles. This is a mandatory option when using cpuio I/O engine. @@ -2032,6 +2093,10 @@ on the client site it will be used in the rdma_resolve_add() function. This can be useful when multiple paths exist between the client and the server or in certain loopback configurations. .TP +.BI (filestat)stat_type \fR=\fPstr +Specify stat system call type to measure lookup/getattr performance. +Default is \fBstat\fR for \fBstat\fR\|(2). +.TP .BI (sg)readfua \fR=\fPbool With readfua option set to 1, read operations include the force unit access (fua) flag. Default: 0. @@ -2081,7 +2146,36 @@ Example URIs: \fInbd+unix:///?socket=/tmp/socket\fR .TP \fInbds://tlshost/exportname\fR - +.RE +.RE +.TP +.BI (libcufile)gpu_dev_ids\fR=\fPstr +Specify the GPU IDs to use with CUDA. This is a colon-separated list of int. +GPUs are assigned to workers roundrobin. Default is 0. +.TP +.BI (libcufile)cuda_io\fR=\fPstr +Specify the type of I/O to use with CUDA. This option +takes the following values: +.RS +.RS +.TP +.B cufile (default) +Use libcufile and nvidia-fs. This option performs I/O directly +between a GPUDirect Storage filesystem and GPU buffers, +avoiding use of a bounce buffer. If \fBverify\fR is set, +cudaMemcpy is used to copy verification data between RAM and GPU(s). +Verification data is copied from RAM to GPU before a write +and from GPU to RAM after a read. +\fBdirect\fR must be 1. +.TP +.BI posix +Use POSIX to perform I/O with a RAM buffer, and use +cudaMemcpy to transfer data between RAM and the GPU(s). +Data is copied from GPU to RAM before a write and copied +from RAM to GPU after a read. \fBverify\fR does not affect +the use of cudaMemcpy. +.RE +.RE .SS "I/O depth" .TP .BI iodepth \fR=\fPint @@ -2179,7 +2273,7 @@ has a bit of extra overhead, especially for lower queue depth I/O where it can increase latencies. The benefit is that fio can manage submission rates independently of the device completion rates. This avoids skewed latency reporting if I/O gets backed up on the device side (the coordinated omission -problem). +problem). Note that this option cannot reliably be used with async IO engines. .SS "I/O rate" .TP .BI thinktime \fR=\fPtime @@ -2265,6 +2359,11 @@ The percentage of I/Os that must fall within the criteria specified by defaults to 100.0, meaning that all I/Os must be equal or below to the value set by \fBlatency_target\fR. .TP +.BI latency_run \fR=\fPbool +Used with \fBlatency_target\fR. If false (default), fio will find the highest +queue depth that meets \fBlatency_target\fR and exit. If true, fio will continue +running and try to meet \fBlatency_target\fR by adjusting queue depth. +.TP .BI max_latency \fR=\fPtime If set, fio will exit the job with an ETIMEDOUT error if it exceeds this maximum latency. When the unit is omitted, the value is interpreted in @@ -2291,7 +2390,9 @@ replay, the file needs to be turned into a blkparse binary data file first You can specify a number of files by separating the names with a ':' character. See the \fBfilename\fR option for information on how to escape ':' characters within the file names. These files will be sequentially assigned to -job clones created by \fBnumjobs\fR. +job clones created by \fBnumjobs\fR. '-' is a reserved name, meaning read from +stdin, notably if \fBfilename\fR is set to '-' which means stdin as well, +then this flag can't be set to '-'. .TP .BI read_iolog_chunked \fR=\fPbool Determines how iolog is read. If false (default) entire \fBread_iolog\fR will @@ -2502,27 +2603,25 @@ The ID of the flow. If not specified, it defaults to being a global flow. See \fBflow\fR. .TP .BI flow \fR=\fPint -Weight in token-based flow control. If this value is used, then there is -a 'flow counter' which is used to regulate the proportion of activity between -two or more jobs. Fio attempts to keep this flow counter near zero. The -\fBflow\fR parameter stands for how much should be added or subtracted to the -flow counter on each iteration of the main I/O loop. That is, if one job has -`flow=8' and another job has `flow=\-1', then there will be a roughly 1:8 -ratio in how much one runs vs the other. -.TP -.BI flow_watermark \fR=\fPint -The maximum value that the absolute value of the flow counter is allowed to -reach before the job must wait for a lower value of the counter. +Weight in token-based flow control. If this value is used, +then fio regulates the activity between two or more jobs +sharing the same flow_id. +Fio attempts to keep each job activity proportional to other jobs' activities +in the same flow_id group, with respect to requested weight per job. +That is, if one job has `flow=3', another job has `flow=2' +and another with `flow=1`, then there will be a roughly 3:2:1 ratio +in how much one runs vs the others. .TP .BI flow_sleep \fR=\fPint -The period of time, in microseconds, to wait after the flow watermark has -been exceeded before retrying operations. +The period of time, in microseconds, to wait after the flow counter +has exceeded its proportion before retrying operations. .TP .BI stonewall "\fR,\fB wait_for_previous" Wait for preceding jobs in the job file to exit, before starting this one. Can be used to insert serialization points in the job file. A stone wall also implies starting a new reporting group, see -\fBgroup_reporting\fR. +\fBgroup_reporting\fR. Optionally you can use `stonewall=0` to disable or +`stonewall=1` to enable it. .TP .BI exitall By default, fio will continue running all other jobs when one job finishes. @@ -2530,15 +2629,27 @@ Sometimes this is not the desired action. Setting \fBexitall\fR will instead make fio terminate all jobs in the same group, as soon as one job of that group finishes. .TP -.BI exit_what +.BI exit_what \fR=\fPstr By default, fio will continue running all other jobs when one job finishes. -Sometimes this is not the desired action. Setting \fBexit_all\fR will instead +Sometimes this is not the desired action. Setting \fBexitall\fR will instead make fio terminate all jobs in the same group. The option \fBexit_what\fR -allows to control which jobs get terminated when \fBexitall\fR is enabled. The -default is \fBgroup\fR and does not change the behaviour of \fBexitall\fR. The -setting \fBall\fR terminates all jobs. The setting \fBstonewall\fR terminates -all currently running jobs across all groups and continues execution with the -next stonewalled group. +allows you to control which jobs get terminated when \fBexitall\fR is enabled. +The default value is \fBgroup\fR. +The allowed values are: +.RS +.RS +.TP +.B all +terminates all jobs. +.TP +.B group +is the default and does not change the behaviour of \fBexitall\fR. +.TP +.B stonewall +terminates all currently running jobs across all groups and continues +execution with the next stonewalled group. +.RE +.RE .TP .BI exec_prerun \fR=\fPstr Before running this job, issue the command specified through @@ -2997,23 +3108,24 @@ Disable measurements of submission latency numbers. See Disable measurements of throughput/bandwidth numbers. See \fBdisable_lat\fR. .TP +.BI slat_percentiles \fR=\fPbool +Report submission latency percentiles. Submission latency is not recorded +for synchronous ioengines. +.TP .BI clat_percentiles \fR=\fPbool -Enable the reporting of percentiles of completion latencies. This option is -mutually exclusive with \fBlat_percentiles\fR. +Report completion latency percentiles. .TP .BI lat_percentiles \fR=\fPbool -Enable the reporting of percentiles of I/O latencies. This is similar to -\fBclat_percentiles\fR, except that this includes the submission latency. -This option is mutually exclusive with \fBclat_percentiles\fR. +Report total latency percentiles. Total latency is the sum of submission +latency and completion latency. .TP .BI percentile_list \fR=\fPfloat_list -Overwrite the default list of percentiles for completion latencies and the -block error histogram. Each number is a floating number in the range +Overwrite the default list of percentiles for latencies and the +block error histogram. Each number is a floating point number in the range (0,100], and the maximum length of the list is 20. Use ':' to separate the -numbers, and list the numbers in ascending order. For example, -`\-\-percentile_list=99.5:99.9' will cause fio to report the values of -completion latency below which 99.5% and 99.9% of the observed latencies -fell, respectively. +numbers. For example, `\-\-percentile_list=99.5:99.9' will cause fio to +report the latency durations below which 99.5% and 99.9% of the observed +latencies fell, respectively. .TP .BI significant_figures \fR=\fPint If using \fB\-\-output\-format\fR of `normal', set the significant figures @@ -3826,7 +3938,8 @@ Fio supports a variety of log file formats, for logging latencies, bandwidth, and IOPS. The logs share a common format, which looks like this: .RS .P -time (msec), value, data direction, block size (bytes), offset (bytes) +time (msec), value, data direction, block size (bytes), offset (bytes), +command priority .RE .P `Time' for the log entry is always in milliseconds. The `value' logged depends @@ -3860,6 +3973,9 @@ The entry's `block size' is always in bytes. The `offset' is the position in byt from the start of the file for that particular I/O. The logging of the offset can be toggled with \fBlog_offset\fR. .P +`Command priority` is 0 for normal priority and 1 for high priority. This is controlled +by the ioengine specific \fBcmdprio_percentage\fR. +.P Fio defaults to logging every individual I/O but when windowed logging is set through \fBlog_avg_msec\fR, either the average (by default) or the maximum (\fBlog_max_value\fR is set) `value' seen over the specified period of time