X-Git-Url: https://git.kernel.dk/?p=fio.git;a=blobdiff_plain;f=HOWTO;h=41b959d176dd2f3a52b6db81d76ac67adebe9f6e;hp=4c117c2473d6693158f1b3f346387d342b8d55bd;hb=f2d6de5d997b039cebac9c34912871baa5e12d49;hpb=fee14ab846ef542d9bb9ebf68f11f0ecb8636f5e diff --git a/HOWTO b/HOWTO index 4c117c24..41b959d1 100644 --- a/HOWTO +++ b/HOWTO @@ -100,6 +100,10 @@ Command line options Parse options only, don't start any I/O. +.. option:: --merge-blktrace-only + + Merge blktraces only, don't start any I/O. + .. option:: --output=filename Write output to file `filename`. @@ -194,7 +198,10 @@ Command line options Force a full status dump of cumulative (from job start) values at `time` intervals. This option does *not* provide per-period measurements. So values such as bandwidth are running averages. When the time unit is omitted, - `time` is interpreted in seconds. + `time` is interpreted in seconds. Note that using this option with + ``--output-format=json`` will yield output that technically isn't valid + json, since the output will be collated sets of valid json. It will need + to be split into valid sets of json after the run. .. option:: --section=name @@ -952,18 +959,92 @@ Target file/device Unlink job files after each iteration or loop. Default: false. -.. option:: zonesize=int +.. option:: zonemode=str - Divide a file into zones of the specified size. See :option:`zoneskip`. + Accepted values are: + + **none** + The :option:`zonerange`, :option:`zonesize` and + :option:`zoneskip` parameters are ignored. + **strided** + I/O happens in a single zone until + :option:`zonesize` bytes have been transferred. + After that number of bytes has been + transferred processing of the next zone + starts. + **zbd** + Zoned block device mode. I/O happens + sequentially in each zone, even if random I/O + has been selected. Random I/O happens across + all zones instead of being restricted to a + single zone. The :option:`zoneskip` parameter + is ignored. :option:`zonerange` and + :option:`zonesize` must be identical. .. option:: zonerange=int - Give size of an I/O zone. See :option:`zoneskip`. + Size of a single zone. See also :option:`zonesize` and + :option:`zoneskip`. + +.. option:: zonesize=int + + For :option:`zonemode` =strided, this is the number of bytes to + transfer before skipping :option:`zoneskip` bytes. If this parameter + is smaller than :option:`zonerange` then only a fraction of each zone + with :option:`zonerange` bytes will be accessed. If this parameter is + larger than :option:`zonerange` then each zone will be accessed + multiple times before skipping to the next zone. + + For :option:`zonemode` =zbd, this is the size of a single zone. The + :option:`zonerange` parameter is ignored in this mode. .. option:: zoneskip=int - Skip the specified number of bytes when :option:`zonesize` data has been - read. The two zone options can be used to only do I/O on zones of a file. + For :option:`zonemode` =strided, the number of bytes to skip after + :option:`zonesize` bytes of data have been transferred. This parameter + must be zero for :option:`zonemode` =zbd. + +.. option:: read_beyond_wp=bool + + This parameter applies to :option:`zonemode` =zbd only. + + Zoned block devices are block devices that consist of multiple zones. + Each zone has a type, e.g. conventional or sequential. A conventional + zone can be written at any offset that is a multiple of the block + size. Sequential zones must be written sequentially. The position at + which a write must occur is called the write pointer. A zoned block + device can be either drive managed, host managed or host aware. For + host managed devices the host must ensure that writes happen + sequentially. Fio recognizes host managed devices and serializes + writes to sequential zones for these devices. + + If a read occurs in a sequential zone beyond the write pointer then + the zoned block device will complete the read without reading any data + from the storage medium. Since such reads lead to unrealistically high + bandwidth and IOPS numbers fio only reads beyond the write pointer if + explicitly told to do so. Default: false. + +.. option:: max_open_zones=int + + When running a random write test across an entire drive many more + zones will be open than in a typical application workload. Hence this + command line option that allows to limit the number of open zones. The + number of open zones is defined as the number of zones to which write + commands are issued. + +.. option:: zone_reset_threshold=float + + A number between zero and one that indicates the ratio of logical + blocks with data to the total number of logical blocks in the test + above which zones should be reset periodically. + +.. option:: zone_reset_frequency=float + + A number between zero and one that indicates how often a zone reset + should be issued if the zone reset threshold has been exceeded. A zone + reset is submitted after each (1 / zone_reset_frequency) write + requests. This and the previous parameter can be used to simulate + garbage collection activity. I/O type @@ -1724,6 +1805,11 @@ I/O engine **pvsync2** Basic :manpage:`preadv2(2)` or :manpage:`pwritev2(2)` I/O. + **io_uring** + Fast Linux native asynchronous I/O. Supports async IO + for both direct and buffered IO. + This engine defines engine specific options. + **libaio** Linux native asynchronous I/O. Note that Linux may only support queued behavior with non-buffered I/O (set ``direct=1`` or @@ -1829,6 +1915,15 @@ I/O engine (RBD) via librbd without the need to use the kernel rbd driver. This ioengine defines engine specific options. + **http** + I/O engine supporting GET/PUT requests over HTTP(S) with libcurl to + a WebDAV or S3 endpoint. This ioengine defines engine specific options. + + This engine only supports direct IO of iodepth=1; you need to scale this + via numjobs. blocksize defines the size of the objects to be created. + + TRIM is translated to object deletion. + **gfapi** Using GlusterFS libgfapi sync interface to direct access to GlusterFS volumes without having to go through FUSE. This ioengine @@ -1886,6 +1981,26 @@ I/O engine mounted with DAX on a persistent memory device through the PMDK libpmem library. + **ime_psync** + Synchronous read and write using DDN's Infinite Memory Engine (IME). + This engine is very basic and issues calls to IME whenever an IO is + queued. + + **ime_psyncv** + Synchronous read and write using DDN's Infinite Memory Engine (IME). + This engine uses iovecs and will try to stack as much IOs as possible + (if the IOs are "contiguous" and the IO depth is not exceeded) + before issuing a call to IME. + + **ime_aio** + Asynchronous read and write using DDN's Infinite Memory Engine (IME). + This engine will try to stack as much IOs as possible by creating + requests for IME. FIO will then decide when to commit these requests. + **libiscsi** + Read and write iscsi lun with libiscsi. + **nbd** + Read and write a Network Block Device (NBD). + I/O engine specific parameters ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -1894,6 +2009,35 @@ In addition, there are some parameters which are only valid when a specific with the caveat that when used on the command line, they must come after the :option:`ioengine` that defines them is selected. +.. option:: hipri : [io_uring] + + If this option is set, fio will attempt to use polled IO completions. + Normal IO completions generate interrupts to signal the completion of + IO, polled completions do not. Hence they are require active reaping + by the application. The benefits are more efficient IO for high IOPS + scenarios, and lower latencies for low queue depth IO. + +.. option:: fixedbufs : [io_uring] + + If fio is asked to do direct IO, then Linux will map pages for each + IO call, and release them when IO is done. If this option is set, the + pages are pre-mapped before IO is started. This eliminates the need to + map and release for each IO. This is more efficient, and reduces the + IO latency as well. + +.. option:: sqthread_poll : [io_uring] + + Normally fio will submit IO by issuing a system call to notify the + kernel of available items in the SQ ring. If this option is set, the + act of submitting IO will be done by a polling thread in the kernel. + This frees up cycles for fio, at the cost of using more CPU in the + system. + +.. option:: sqthread_poll_cpu : [io_uring] + + When :option:`sqthread_poll` is set, this option provides a way to + define which CPU should be used for the polling thread. + .. option:: userspace_reap : [libaio] Normally, with the libaio engine in use, fio will use the @@ -2109,6 +2253,63 @@ with the caveat that when used on the command line, they must come after the transferred to the device. The writefua option is ignored with this selection. +.. option:: http_host=str : [http] + + Hostname to connect to. For S3, this could be the bucket hostname. + Default is **localhost** + +.. option:: http_user=str : [http] + + Username for HTTP authentication. + +.. option:: http_pass=str : [http] + + Password for HTTP authentication. + +.. option:: https=str : [http] + + Enable HTTPS instead of http. *on* enables HTTPS; *insecure* + will enable HTTPS, but disable SSL peer verification (use with + caution!). Default is **off** + +.. option:: http_mode=str : [http] + + Which HTTP access mode to use: *webdav*, *swift*, or *s3*. + Default is **webdav** + +.. option:: http_s3_region=str : [http] + + The S3 region/zone string. + Default is **us-east-1** + +.. option:: http_s3_key=str : [http] + + The S3 secret key. + +.. option:: http_s3_keyid=str : [http] + + The S3 key/access id. + +.. option:: http_swift_auth_token=str : [http] + + The Swift auth token. See the example configuration file on how + to retrieve this. + +.. option:: http_verbose=int : [http] + + Enable verbose requests from libcurl. Useful for debugging. 1 + turns on verbose logging from libcurl, 2 additionally enables + HTTP IO tracing. Default is **0** + +.. option:: uri=str : [nbd] + + Specify the NBD URI of the server to test. The string + is a standard NBD URI + (see https://github.com/NetworkBlockDevice/nbd/tree/master/doc). + Example URIs: nbd://localhost:10809 + nbd+unix:///?socket=/tmp/socket + nbds://tlshost/exportname + I/O depth ~~~~~~~~~ @@ -2185,8 +2386,13 @@ I/O depth ``serialize_overlap`` tells fio to avoid provoking this behavior by explicitly serializing in-flight I/Os that have a non-zero overlap. Note that setting this option can reduce both performance and the :option:`iodepth` achieved. - Additionally this option does not work when :option:`io_submit_mode` is set to - offload. Default: false. + + This option only applies to I/Os issued for a single job except when it is + enabled along with :option:`io_submit_mode`=offload. In offload mode, fio + will check for overlap among all I/Os submitted by offload jobs with :option:`serialize_overlap` + enabled. + + Default: false. .. option:: io_submit_mode=str @@ -2330,6 +2536,10 @@ I/O replay :manpage:`blktrace(8)` for how to capture such logging data. For blktrace replay, the file needs to be turned into a blkparse binary data file first (``blkparse -o /dev/null -d file_for_fio.bin``). + You can specify a number of files by separating the names with a ':' + character. See the :option:`filename` option for information on how to + escape ':' and '\' characters within the file names. These files will + be sequentially assigned to job clones created by :option:`numjobs`. .. option:: read_iolog_chunked=bool @@ -2337,6 +2547,33 @@ I/O replay will be read at once. If selected true, input from iolog will be read gradually. Useful when iolog is very large, or it is generated. +.. option:: merge_blktrace_file=str + + When specified, rather than replaying the logs passed to :option:`read_iolog`, + the logs go through a merge phase which aggregates them into a single + blktrace. The resulting file is then passed on as the :option:`read_iolog` + parameter. The intention here is to make the order of events consistent. + This limits the influence of the scheduler compared to replaying multiple + blktraces via concurrent jobs. + +.. option:: merge_blktrace_scalars=float_list + + This is a percentage based option that is index paired with the list of + files passed to :option:`read_iolog`. When merging is performed, scale + the time of each event by the corresponding amount. For example, + ``--merge_blktrace_scalars="50:100"`` runs the first trace in halftime + and the second trace in realtime. This knob is separately tunable from + :option:`replay_time_scale` which scales the trace during runtime and + does not change the output of the merge unlike this option. + +.. option:: merge_blktrace_iters=float_list + + This is a whole number option that is index paired with the list of files + passed to :option:`read_iolog`. When merging is performed, run each trace + for the specified number of iterations. For example, + ``--merge_blktrace_iters="2:1"`` runs the first trace for two iterations + and the second trace for one iteration. + .. option:: replay_no_stall=bool When replaying I/O with :option:`read_iolog` the default behavior is to @@ -2374,12 +2611,13 @@ I/O replay .. option:: replay_align=int - Force alignment of I/O offsets and lengths in a trace to this power of 2 - value. + Force alignment of the byte offsets in a trace to this value. The value + must be a power of 2. .. option:: replay_scale=int - Scale sector offsets down by this factor when replaying traces. + Scale byte offsets down by this factor when replaying traces. Should most + likely use :option:`replay_align` as well. .. option:: replay_skip=str @@ -2815,6 +3053,10 @@ Steady state data from the rolling collection window. Threshold limits can be expressed as a fixed value or as a percentage of the mean in the collection window. + When using this feature, most jobs should include the :option:`time_based` + and :option:`runtime` options or the :option:`loops` option so that fio does not + stop running after it has covered the full size of the specified file(s) or device(s). + **iops** Collect IOPS data. Stop the job if all individual IOPS measurements are within the specified limit of the mean IOPS (e.g., ``iops:2`` @@ -3496,7 +3738,8 @@ is one long line of values, such as:: 2;card0;0;0;7139336;121836;60004;1;10109;27.932460;116.933948;220;126861;3495.446807;1085.368601;226;126864;3523.635629;1089.012448;24063;99944;50.275485%;59818.274627;5540.657370;7155060;122104;60004;1;8338;29.086342;117.839068;388;128077;5032.488518;1234.785715;391;128085;5061.839412;1236.909129;23436;100928;50.287926%;59964.832030;5644.844189;14.595833%;19.394167%;123706;0;7313;0.1%;0.1%;0.1%;0.1%;0.1%;0.1%;100.0%;0.00%;0.00%;0.00%;0.00%;0.00%;0.00%;0.01%;0.02%;0.05%;0.16%;6.04%;40.40%;52.68%;0.64%;0.01%;0.00%;0.01%;0.00%;0.00%;0.00%;0.00%;0.00% A description of this job goes here. -The job description (if provided) follows on a second line. +The job description (if provided) follows on a second line for terse v2. +It appears on the same line for other terse versions. To enable terse output, use the :option:`--minimal` or :option:`--output-format`\=terse command line options. The @@ -3581,6 +3824,11 @@ minimal output v3, separated by semicolons:: terse_version_3;fio_version;jobname;groupid;error;read_kb;read_bandwidth;read_iops;read_runtime_ms;read_slat_min;read_slat_max;read_slat_mean;read_slat_dev;read_clat_min;read_clat_max;read_clat_mean;read_clat_dev;read_clat_pct01;read_clat_pct02;read_clat_pct03;read_clat_pct04;read_clat_pct05;read_clat_pct06;read_clat_pct07;read_clat_pct08;read_clat_pct09;read_clat_pct10;read_clat_pct11;read_clat_pct12;read_clat_pct13;read_clat_pct14;read_clat_pct15;read_clat_pct16;read_clat_pct17;read_clat_pct18;read_clat_pct19;read_clat_pct20;read_tlat_min;read_lat_max;read_lat_mean;read_lat_dev;read_bw_min;read_bw_max;read_bw_agg_pct;read_bw_mean;read_bw_dev;write_kb;write_bandwidth;write_iops;write_runtime_ms;write_slat_min;write_slat_max;write_slat_mean;write_slat_dev;write_clat_min;write_clat_max;write_clat_mean;write_clat_dev;write_clat_pct01;write_clat_pct02;write_clat_pct03;write_clat_pct04;write_clat_pct05;write_clat_pct06;write_clat_pct07;write_clat_pct08;write_clat_pct09;write_clat_pct10;write_clat_pct11;write_clat_pct12;write_clat_pct13;write_clat_pct14;write_clat_pct15;write_clat_pct16;write_clat_pct17;write_clat_pct18;write_clat_pct19;write_clat_pct20;write_tlat_min;write_lat_max;write_lat_mean;write_lat_dev;write_bw_min;write_bw_max;write_bw_agg_pct;write_bw_mean;write_bw_dev;cpu_user;cpu_sys;cpu_csw;cpu_mjf;cpu_minf;iodepth_1;iodepth_2;iodepth_4;iodepth_8;iodepth_16;iodepth_32;iodepth_64;lat_2us;lat_4us;lat_10us;lat_20us;lat_50us;lat_100us;lat_250us;lat_500us;lat_750us;lat_1000us;lat_2ms;lat_4ms;lat_10ms;lat_20ms;lat_50ms;lat_100ms;lat_250ms;lat_500ms;lat_750ms;lat_1000ms;lat_2000ms;lat_over_2000ms;disk_name;disk_read_iops;disk_write_iops;disk_read_merges;disk_write_merges;disk_read_ticks;write_ticks;disk_queue_time;disk_util +In client/server mode terse output differs from what appears when jobs are run +locally. Disk utilization data is omitted from the standard terse output and +for v3 and later appears on its own separate line at the end of each terse +reporting cycle. + JSON output ------------ @@ -3685,6 +3933,46 @@ given in bytes. The `action` can be one of these: **trim** Trim the given file from the given `offset` for `length` bytes. + +I/O Replay - Merging Traces +--------------------------- + +Colocation is a common practice used to get the most out of a machine. +Knowing which workloads play nicely with each other and which ones don't is +a much harder task. While fio can replay workloads concurrently via multiple +jobs, it leaves some variability up to the scheduler making results harder to +reproduce. Merging is a way to make the order of events consistent. + +Merging is integrated into I/O replay and done when a +:option:`merge_blktrace_file` is specified. The list of files passed to +:option:`read_iolog` go through the merge process and output a single file +stored to the specified file. The output file is passed on as if it were the +only file passed to :option:`read_iolog`. An example would look like:: + + $ fio --read_iolog=":" --merge_blktrace_file="" + +Creating only the merged file can be done by passing the command line argument +:option:`merge-blktrace-only`. + +Scaling traces can be done to see the relative impact of any particular trace +being slowed down or sped up. :option:`merge_blktrace_scalars` takes in a colon +separated list of percentage scalars. It is index paired with the files passed +to :option:`read_iolog`. + +With scaling, it may be desirable to match the running time of all traces. +This can be done with :option:`merge_blktrace_iters`. It is index paired with +:option:`read_iolog` just like :option:`merge_blktrace_scalars`. + +In an example, given two traces, A and B, each 60s long. If we want to see +the impact of trace A issuing IOs twice as fast and repeat trace A over the +runtime of trace B, the following can be done:: + + $ fio --read_iolog=":"" --merge_blktrace_file"" --merge_blktrace_scalars="50:100" --merge_blktrace_iters="2:1" + +This runs trace A at 2x the speed twice for approximately the same runtime as +a single run of trace B. + + CPU idleness profiling ---------------------- @@ -3819,6 +4107,7 @@ is recorded. Each *data direction* seen within the window period will aggregate its values in a separate row. Further, when using windowed logging the *block size* and *offset* entries will always contain 0. + Client/Server ------------- @@ -3906,3 +4195,6 @@ containing two hostnames ``h1`` and ``h2`` with IP addresses 192.168.10.120 and /mnt/nfs/fio/192.168.10.120.fileio.tmp /mnt/nfs/fio/192.168.10.121.fileio.tmp + +Terse output in client/server mode will differ slightly from what is produced +when fio is run in stand-alone mode. See the terse output section for details.