X-Git-Url: https://git.kernel.dk/?p=fio.git;a=blobdiff_plain;f=HOWTO;h=0c5b7109df30c55c35bbc2d8441be0ee97d3ecb8;hp=acb9e97fc35bf6ca790d45bb93b7be0260509a8f;hb=f88817479014b87bfed3a20f9fe7dae0efb33dee;hpb=804c08390492023293f7febb71172e5fe0aefbf5 diff --git a/HOWTO b/HOWTO index acb9e97f..0c5b7109 100644 --- a/HOWTO +++ b/HOWTO @@ -163,12 +163,12 @@ Command line options .. option:: --readonly - Turn on safety read-only checks, preventing writes. The ``--readonly`` - option is an extra safety guard to prevent users from accidentally starting - a write workload when that is not desired. Fio will only write if - `rw=write/randwrite/rw/randrw` is given. This extra safety net can be used - as an extra precaution as ``--readonly`` will also enable a write check in - the I/O engine core to prevent writes due to unknown user space bug(s). + Turn on safety read-only checks, preventing writes and trims. The + ``--readonly`` option is an extra safety guard to prevent users from + accidentally starting a write or trim workload when that is not desired. + Fio will only modify the device under test if + `rw=write/randwrite/rw/randrw/trim/randtrim/trimwrite` is given. This + safety net can be used as an extra precaution. .. option:: --eta=when @@ -194,7 +194,10 @@ Command line options Force a full status dump of cumulative (from job start) values at `time` intervals. This option does *not* provide per-period measurements. So values such as bandwidth are running averages. When the time unit is omitted, - `time` is interpreted in seconds. + `time` is interpreted in seconds. Note that using this option with + ``--output-format=json`` will yield output that technically isn't valid + json, since the output will be collated sets of valid json. It will need + to be split into valid sets of json after the run. .. option:: --section=name @@ -283,7 +286,8 @@ Command line options .. option:: --aux-path=path - Use this `path` for fio state generated files. + Use the directory specified by `path` for generated state files instead + of the current working directory. Any parameters following the options will be assumed to be job files, unless they match a job file parameter. Multiple job files can be listed and each job @@ -748,12 +752,15 @@ Target file/device assigned equally distributed to job clones created by :option:`numjobs` as long as they are using generated filenames. If specific `filename(s)` are set fio will use the first listed directory, and thereby matching the - `filename` semantic which generates a file each clone if not specified, but - let all clones use the same if set. + `filename` semantic (which generates a file for each clone if not + specified, but lets all clones use the same file if set). See the :option:`filename` option for information on how to escape "``:``" and "``\``" characters within the directory path itself. + Note: To control the directory fio will use for internal state files + use :option:`--aux-path`. + .. option:: filename=str Fio normally makes up a `filename` based on the job name, thread number, and @@ -948,18 +955,92 @@ Target file/device Unlink job files after each iteration or loop. Default: false. -.. option:: zonesize=int +.. option:: zonemode=str + + Accepted values are: - Divide a file into zones of the specified size. See :option:`zoneskip`. + **none** + The :option:`zonerange`, :option:`zonesize` and + :option:`zoneskip` parameters are ignored. + **strided** + I/O happens in a single zone until + :option:`zonesize` bytes have been transferred. + After that number of bytes has been + transferred processing of the next zone + starts. + **zbd** + Zoned block device mode. I/O happens + sequentially in each zone, even if random I/O + has been selected. Random I/O happens across + all zones instead of being restricted to a + single zone. The :option:`zoneskip` parameter + is ignored. :option:`zonerange` and + :option:`zonesize` must be identical. .. option:: zonerange=int - Give size of an I/O zone. See :option:`zoneskip`. + Size of a single zone. See also :option:`zonesize` and + :option:`zoneskip`. + +.. option:: zonesize=int + + For :option:`zonemode` =strided, this is the number of bytes to + transfer before skipping :option:`zoneskip` bytes. If this parameter + is smaller than :option:`zonerange` then only a fraction of each zone + with :option:`zonerange` bytes will be accessed. If this parameter is + larger than :option:`zonerange` then each zone will be accessed + multiple times before skipping to the next zone. + + For :option:`zonemode` =zbd, this is the size of a single zone. The + :option:`zonerange` parameter is ignored in this mode. .. option:: zoneskip=int - Skip the specified number of bytes when :option:`zonesize` data has been - read. The two zone options can be used to only do I/O on zones of a file. + For :option:`zonemode` =strided, the number of bytes to skip after + :option:`zonesize` bytes of data have been transferred. This parameter + must be zero for :option:`zonemode` =zbd. + +.. option:: read_beyond_wp=bool + + This parameter applies to :option:`zonemode` =zbd only. + + Zoned block devices are block devices that consist of multiple zones. + Each zone has a type, e.g. conventional or sequential. A conventional + zone can be written at any offset that is a multiple of the block + size. Sequential zones must be written sequentially. The position at + which a write must occur is called the write pointer. A zoned block + device can be either drive managed, host managed or host aware. For + host managed devices the host must ensure that writes happen + sequentially. Fio recognizes host managed devices and serializes + writes to sequential zones for these devices. + + If a read occurs in a sequential zone beyond the write pointer then + the zoned block device will complete the read without reading any data + from the storage medium. Since such reads lead to unrealistically high + bandwidth and IOPS numbers fio only reads beyond the write pointer if + explicitly told to do so. Default: false. + +.. option:: max_open_zones=int + + When running a random write test across an entire drive many more + zones will be open than in a typical application workload. Hence this + command line option that allows to limit the number of open zones. The + number of open zones is defined as the number of zones to which write + commands are issued. + +.. option:: zone_reset_threshold=float + + A number between zero and one that indicates the ratio of logical + blocks with data to the total number of logical blocks in the test + above which zones should be reset periodically. + +.. option:: zone_reset_frequency=float + + A number between zero and one that indicates how often a zone reset + should be issued if the zone reset threshold has been exceeded. A zone + reset is submitted after each (1 / zone_reset_frequency) write + requests. This and the previous parameter can be used to simulate + garbage collection activity. I/O type @@ -991,13 +1072,15 @@ I/O type **write** Sequential writes. **trim** - Sequential trims (Linux block devices only). + Sequential trims (Linux block devices and SCSI + character devices only). **randread** Random reads. **randwrite** Random writes. **randtrim** - Random trims (Linux block devices only). + Random trims (Linux block devices and SCSI + character devices only). **rw,readwrite** Sequential mixed reads and writes. **randrw** @@ -1329,7 +1412,9 @@ I/O type and that some blocks may be read/written more than once. If this option is used with :option:`verify` and multiple blocksizes (via :option:`bsrange`), only intact blocks are verified, i.e., partially-overwritten blocks are - ignored. + ignored. With an async I/O engine and an I/O depth > 1, it is possible for + the same block to be overwritten, which can cause verification errors. Either + do not use norandommap in this case, or also use the lfsr random generator. .. option:: softrandommap=bool @@ -1429,7 +1514,7 @@ Block size If you want a workload that has 50% 2k reads and 50% 4k reads, while having 90% 4k writes and 10% 8k writes, you would specify:: - bssplit=2k/50:4k/50,4k/90,8k/10 + bssplit=2k/50:4k/50,4k/90:8k/10 Fio supports defining up to 64 different weights for each data direction. @@ -1746,7 +1831,8 @@ I/O engine ioctl, or if the target is an sg character device we use :manpage:`read(2)` and :manpage:`write(2)` for asynchronous I/O. Requires :option:`filename` option to specify either block or - character devices. + character devices. This engine supports trim operations. + The sg engine includes engine specific options. **null** Doesn't transfer any data, just pretends to. This is mainly used to @@ -1820,6 +1906,15 @@ I/O engine (RBD) via librbd without the need to use the kernel rbd driver. This ioengine defines engine specific options. + **http** + I/O engine supporting GET/PUT requests over HTTP(S) with libcurl to + a WebDAV or S3 endpoint. This ioengine defines engine specific options. + + This engine only supports direct IO of iodepth=1; you need to scale this + via numjobs. blocksize defines the size of the objects to be created. + + TRIM is translated to object deletion. + **gfapi** Using GlusterFS libgfapi sync interface to direct access to GlusterFS volumes without having to go through FUSE. This ioengine @@ -1853,12 +1948,12 @@ I/O engine **pmemblk** Read and write using filesystem DAX to a file on a filesystem - mounted with DAX on a persistent memory device through the NVML + mounted with DAX on a persistent memory device through the PMDK libpmemblk library. **dev-dax** Read and write using device DAX to a persistent memory device (e.g., - /dev/dax0.0) through the NVML libpmem library. + /dev/dax0.0) through the PMDK libpmem library. **external** Prefix to specify loading an external I/O engine object file. Append @@ -1874,9 +1969,25 @@ I/O engine **libpmem** Read and write using mmap I/O to a file on a filesystem - mounted with DAX on a persistent memory device through the NVML + mounted with DAX on a persistent memory device through the PMDK libpmem library. + **ime_psync** + Synchronous read and write using DDN's Infinite Memory Engine (IME). + This engine is very basic and issues calls to IME whenever an IO is + queued. + + **ime_psyncv** + Synchronous read and write using DDN's Infinite Memory Engine (IME). + This engine uses iovecs and will try to stack as much IOs as possible + (if the IOs are "contiguous" and the IO depth is not exceeded) + before issuing a call to IME. + + **ime_aio** + Asynchronous read and write using DDN's Infinite Memory Engine (IME). + This engine will try to stack as much IOs as possible by creating + requests for IME. FIO will then decide when to commit these requests. + I/O engine specific parameters ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -2068,6 +2179,86 @@ with the caveat that when used on the command line, they must come after the multiple paths exist between the client and the server or in certain loopback configurations. +.. option:: readfua=bool : [sg] + + With readfua option set to 1, read operations include + the force unit access (fua) flag. Default is 0. + +.. option:: writefua=bool : [sg] + + With writefua option set to 1, write operations include + the force unit access (fua) flag. Default is 0. + +.. option:: sg_write_mode=str : [sg] + + Specify the type of write commands to issue. This option can take three values: + + **write** + This is the default where write opcodes are issued as usual. + **verify** + Issue WRITE AND VERIFY commands. The BYTCHK bit is set to 0. This + directs the device to carry out a medium verification with no data + comparison. The writefua option is ignored with this selection. + **same** + Issue WRITE SAME commands. This transfers a single block to the device + and writes this same block of data to a contiguous sequence of LBAs + beginning at the specified offset. fio's block size parameter specifies + the amount of data written with each command. However, the amount of data + actually transferred to the device is equal to the device's block + (sector) size. For a device with 512 byte sectors, blocksize=8k will + write 16 sectors with each command. fio will still generate 8k of data + for each command but only the first 512 bytes will be used and + transferred to the device. The writefua option is ignored with this + selection. + +.. option:: http_host=str : [http] + + Hostname to connect to. For S3, this could be the bucket hostname. + Default is **localhost** + +.. option:: http_user=str : [http] + + Username for HTTP authentication. + +.. option:: http_pass=str : [http] + + Password for HTTP authentication. + +.. option:: https=str : [http] + + Enable HTTPS instead of http. *on* enables HTTPS; *insecure* + will enable HTTPS, but disable SSL peer verification (use with + caution!). Default is **off** + +.. option:: http_mode=str : [http] + + Which HTTP access mode to use: *webdav*, *swift*, or *s3*. + Default is **webdav** + +.. option:: http_s3_region=str : [http] + + The S3 region/zone string. + Default is **us-east-1** + +.. option:: http_s3_key=str : [http] + + The S3 secret key. + +.. option:: http_s3_keyid=str : [http] + + The S3 key/access id. + +.. option:: http_swift_auth_token=str : [http] + + The Swift auth token. See the example configuration file on how + to retrieve this. + +.. option:: http_verbose=int : [http] + + Enable verbose requests from libcurl. Useful for debugging. 1 + turns on verbose logging from libcurl, 2 additionally enables + HTTP IO tracing. Default is **0** + I/O depth ~~~~~~~~~ @@ -2289,6 +2480,16 @@ I/O replay :manpage:`blktrace(8)` for how to capture such logging data. For blktrace replay, the file needs to be turned into a blkparse binary data file first (``blkparse -o /dev/null -d file_for_fio.bin``). + You can specify a number of files by separating the names with a ':' + character. See the :option:`filename` option for information on how to + escape ':' and '\' characters within the file names. These files will + be sequentially assigned to job clones created by :option:`numjobs`. + +.. option:: read_iolog_chunked=bool + + Determines how iolog is read. If false(default) entire :option:`read_iolog` + will be read at once. If selected true, input from iolog will be read + gradually. Useful when iolog is very large, or it is generated. .. option:: replay_no_stall=bool @@ -2299,6 +2500,14 @@ I/O replay still respecting ordering. The result is the same I/O pattern to a given device, but different timings. +.. option:: replay_time_scale=int + + When replaying I/O with :option:`read_iolog`, fio will honor the + original timing in the trace. With this option, it's possible to scale + the time. It's a percentage option, if set to 50 it means run at 50% + the original IO rate in the trace. If set to 200, run at twice the + original IO rate. Defaults to 100. + .. option:: replay_redirect=str While replaying I/O patterns using :option:`read_iolog` the default behavior @@ -2326,6 +2535,14 @@ I/O replay Scale sector offsets down by this factor when replaying traces. +.. option:: replay_skip=str + + Sometimes it's useful to skip certain IO types in a replay trace. + This could be, for instance, eliminating the writes in the trace. + Or not replaying the trims/discards, if you are redirecting to + a device that doesn't support them. This option takes a comma + separated list of read, write, trim, sync. + Threads, processes and job synchronization ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -2365,24 +2582,27 @@ Threads, processes and job synchronization Set the I/O priority class. See man :manpage:`ionice(1)`. -.. option:: cpumask=int - - Set the CPU affinity of this job. The parameter given is a bit mask of - allowed CPUs the job may run on. So if you want the allowed CPUs to be 1 - and 5, you would pass the decimal value of (1 << 1 | 1 << 5), or 34. See man - :manpage:`sched_setaffinity(2)`. This may not work on all supported - operating systems or kernel versions. This option doesn't work well for a - higher CPU count than what you can store in an integer mask, so it can only - control cpus 1-32. For boxes with larger CPU counts, use - :option:`cpus_allowed`. - .. option:: cpus_allowed=str Controls the same options as :option:`cpumask`, but accepts a textual - specification of the permitted CPUs instead. So to use CPUs 1 and 5 you - would specify ``cpus_allowed=1,5``. This option also allows a range of CPUs - to be specified -- say you wanted a binding to CPUs 1, 5, and 8 to 15, you - would set ``cpus_allowed=1,5,8-15``. + specification of the permitted CPUs instead and CPUs are indexed from 0. So + to use CPUs 0 and 5 you would specify ``cpus_allowed=0,5``. This option also + allows a range of CPUs to be specified -- say you wanted a binding to CPUs + 0, 5, and 8 to 15, you would set ``cpus_allowed=0,5,8-15``. + + On Windows, when ``cpus_allowed`` is unset only CPUs from fio's current + processor group will be used and affinity settings are inherited from the + system. An fio build configured to target Windows 7 makes options that set + CPUs processor group aware and values will set both the processor group + and a CPU from within that group. For example, on a system where processor + group 0 has 40 CPUs and processor group 1 has 32 CPUs, ``cpus_allowed`` + values between 0 and 39 will bind CPUs from processor group 0 and + ``cpus_allowed`` values between 40 and 71 will bind CPUs from processor + group 1. When using ``cpus_allowed_policy=shared`` all CPUs specified by a + single ``cpus_allowed`` option must be from the same processor group. For + Windows fio builds not built for Windows 7, CPUs will only be selected from + (and be relative to) whatever processor group fio happens to be running in + and CPUs from other processor groups cannot be used. .. option:: cpus_allowed_policy=str @@ -2399,6 +2619,17 @@ Threads, processes and job synchronization enough CPUs are given for the jobs listed, then fio will roundrobin the CPUs in the set. +.. option:: cpumask=int + + Set the CPU affinity of this job. The parameter given is a bit mask of + allowed CPUs the job may run on. So if you want the allowed CPUs to be 1 + and 5, you would pass the decimal value of (1 << 1 | 1 << 5), or 34. See man + :manpage:`sched_setaffinity(2)`. This may not work on all supported + operating systems or kernel versions. This option doesn't work well for a + higher CPU count than what you can store in an integer mask, so it can only + control cpus 1-32. For boxes with larger CPU counts, use + :option:`cpus_allowed`. + .. option:: numa_cpu_nodes=str Set this job running on specified NUMA nodes' CPUs. The arguments allow @@ -2601,17 +2832,10 @@ Verification previously written file. If the data direction includes any form of write, the verify will be of the newly written data. -.. option:: verifysort=bool - - If true, fio will sort written verify blocks when it deems it faster to read - them back in a sorted manner. This is often the case when overwriting an - existing file, since the blocks are already laid out in the file system. You - can ignore this option unless doing huge amounts of really fast I/O where - the red-black tree sorting CPU time becomes significant. Default: true. - -.. option:: verifysort_nr=int - - Pre-load and sort verify blocks for a read workload. + To avoid false verification errors, do not use the norandommap option when + verifying data with async I/O engines and I/O depths > 1. Or use the + norandommap and the lfsr random generator together to avoid writing to the + same offset with muliple outstanding I/Os. .. option:: verify_offset=int @@ -2849,9 +3073,11 @@ Measurements and reporting .. option:: write_iops_log=str Same as :option:`write_bw_log`, but writes an IOPS file (e.g. - :file:`name_iops.x.log`) instead. See :option:`write_bw_log` for - details about the filename format and `Log File Formats`_ for how data - is structured within the file. + :file:`name_iops.x.log`) instead. Because fio defaults to individual + I/O logging, the value entry in the IOPS log will be 1 unless windowed + logging (see :option:`log_avg_msec`) has been enabled. See + :option:`write_bw_log` for details about the filename format and `Log + File Formats`_ for how data is structured within the file. .. option:: log_avg_msec=int @@ -2909,7 +3135,8 @@ Measurements and reporting Define the set of CPUs that are allowed to handle online log compression for the I/O jobs. This can provide better isolation between performance - sensitive jobs, and background compression work. + sensitive jobs, and background compression work. See + :option:`cpus_allowed` for the format used. .. option:: log_store_compressed=bool @@ -3735,17 +3962,16 @@ on the type of log, it will be one of the following: **2** I/O is a TRIM -The entry's *block size* is always in bytes. The *offset* is the offset, in bytes, -from the start of the file, for that particular I/O. The logging of the offset can be +The entry's *block size* is always in bytes. The *offset* is the position in bytes +from the start of the file for that particular I/O. The logging of the offset can be toggled with :option:`log_offset`. -Fio defaults to logging every individual I/O. When IOPS are logged for individual -I/Os the *value* entry will always be 1. If windowed logging is enabled through -:option:`log_avg_msec`, fio logs the average values over the specified period of time. -If windowed logging is enabled and :option:`log_max_value` is set, then fio logs -maximum values in that window instead of averages. Since *data direction*, *block -size* and *offset* are per-I/O values, if windowed logging is enabled they -aren't applicable and will be 0. +Fio defaults to logging every individual I/O but when windowed logging is set +through :option:`log_avg_msec`, either the average (by default) or the maximum +(:option:`log_max_value` is set) *value* seen over the specified period of time +is recorded. Each *data direction* seen within the window period will aggregate +its values in a separate row. Further, when using windowed logging the *block +size* and *offset* entries will always contain 0. Client/Server -------------