Revert "smalloc: smalloc() already clears memory, scalloc() need not do it again"

[fio.git] / HOWTO.rst
diff --git a/HOWTO.rst b/HOWTO.rst

index 847c035637226aaf32c2f6b37cfef37dc6c62b73..9eeb203e58277087f0699b917a0916f1be83595c 100644 (file)
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -756,8 +756,9 @@ Time related parameters
         CPU mask of other jobs.
  
  .. option:: job_start_clock_id=int
-   The clock_id passed to the call to `clock_gettime` used to record job_start
-   in the `json` output format. Default is 0, or CLOCK_REALTIME.
+
+        The clock_id passed to the call to `clock_gettime` used to record
+        job_start in the `json` output format. Default is 0, or CLOCK_REALTIME.
  
  
  Target file/device
@@ -970,13 +971,13 @@ Target file/device
  
  .. option:: unlink=bool
  
-       Unlink the job files when done. Not the default, as repeated runs of that
+       Unlink (delete) the job files when done. Not the default, as repeated runs of that
         job would then waste time recreating the file set again and again. Default:
         false.
  
  .. option:: unlink_each_loop=bool
  
-       Unlink job files after each iteration or loop.  Default: false.
+       Unlink (delete) job files after each iteration or loop.  Default: false.
  
  .. option:: zonemode=str
  
@@ -984,14 +985,14 @@ Target file/device
  
                 **none**
                                 The :option:`zonerange`, :option:`zonesize`,
-                               :option `zonecapacity` and option:`zoneskip`
+                               :option:`zonecapacity` and :option:`zoneskip`
                                 parameters are ignored.
                 **strided**
                                 I/O happens in a single zone until
                                 :option:`zonesize` bytes have been transferred.
                                 After that number of bytes has been
                                 transferred processing of the next zone
-                               starts. :option `zonecapacity` is ignored.
+                               starts. :option:`zonecapacity` is ignored.
                 **zbd**
                                 Zoned block device mode. I/O happens
                                 sequentially in each zone, even if random I/O
@@ -1630,7 +1631,7 @@ Block size
         Comma-separated ranges may be specified for reads, writes, and trims as
         described in :option:`blocksize`.
  
-       Example: ``bsrange=1k-4k,2k-8k``.
+       Example: ``bsrange=1k-4k,2k-8k`` also the ':' delimiter ``bsrange=1k:4k,2k:8k``.
  
  .. option:: bssplit=str[,str][,str]
  
@@ -1991,7 +1992,9 @@ I/O engine
  
  .. option:: ioengine=str
  
-       Defines how the job issues I/O to the file. The following types are defined:
+       fio supports 2 kinds of performance measurement: I/O and file/directory operation.
+
+       I/O engines define how the job issues I/O to the file. The following types are defined:
  
                 **sync**
                         Basic :manpage:`read(2)` or :manpage:`write(2)`
@@ -2176,21 +2179,6 @@ I/O engine
                         absolute or relative. See :file:`engines/skeleton_external.c` for
                         details of writing an external I/O engine.
  
-               **filecreate**
-                       Simply create the files and do no I/O to them.  You still need to
-                       set  `filesize` so that all the accounting still occurs, but no
-                       actual I/O will be done other than creating the file.
-
-               **filestat**
-                       Simply do stat() and do no I/O to the file. You need to set 'filesize'
-                       and 'nrfiles', so that files will be created.
-                       This engine is to measure file lookup and meta data access.
-
-               **filedelete**
-                       Simply delete the files by unlink() and do no I/O to them. You need to set 'filesize'
-                       and 'nrfiles', so that the files will be created.
-                       This engine is to measure file delete.
-
                 **libpmem**
                         Read and write using mmap I/O to a file on a filesystem
                         mounted with DAX on a persistent memory device through the PMDK
@@ -2260,6 +2248,50 @@ I/O engine
                         several instances to access the same device or file
                         simultaneously, but allow it for threads.
  
+       File/directory operation engines define how the job operates file or directory. The
+       following types are defined:
+
+               **filecreate**
+                       Simply create the files and do no I/O to them.  You still need to
+                       set  `filesize` so that all the accounting still occurs, but no
+                       actual I/O will be done other than creating the file.
+                       Example job file: filecreate-ioengine.fio.
+
+               **filestat**
+                       Simply do stat() and do no I/O to the file. You need to set 'filesize'
+                       and 'nrfiles', so that files will be created.
+                       This engine is to measure file lookup and meta data access.
+                       Example job file: filestat-ioengine.fio.
+
+               **filedelete**
+                       Simply delete the files by unlink() and do no I/O to them. You need to set 'filesize'
+                       and 'nrfiles', so that the files will be created.
+                       This engine is to measure file delete.
+                       Example job file: filedelete-ioengine.fio.
+
+               **dircreate**
+                       Simply create the directories and do no I/O to them.  You still need to
+                       set  `filesize` so that all the accounting still occurs, but no
+                       actual I/O will be done other than creating the directories.
+                       Example job file: dircreate-ioengine.fio.
+
+               **dirstat**
+                       Simply do stat() and do no I/O to the directories. You need to set 'filesize'
+                       and 'nrfiles', so that directories will be created.
+                       This engine is to measure directory lookup and meta data access.
+                       Example job file: dirstat-ioengine.fio.
+
+               **dirdelete**
+                       Simply delete the directories by rmdir() and do no I/O to them. You need to set 'filesize'
+                       and 'nrfiles', so that the directories will be created.
+                       This engine is to measure directory delete.
+                       Example job file: dirdelete-ioengine.fio.
+
+               For file and directory operation engines, there is no I/O throughput, then the
+               statistics data in report have different meanings. The meaningful output indexes are: 'iops' and 'clat'.
+               'bw' is meaningless. Refer to section: "Interpreting the output" for more details.
+
+
  I/O engine specific parameters
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  
@@ -2468,7 +2500,24 @@ with the caveat that when used on the command line, they must come after the
  
         Enable Flexible Data Placement mode for write commands.
  
-.. option:: fdp_pli_select=str : [io_uring_cmd] [xnvme]
+.. option:: dataplacement=str : [io_uring_cmd] [xnvme]
+
+        Specifies the data placement directive type to use for write commands.
+        The following types are supported:
+
+                **none**
+                        Do not use a data placement directive. This is the
+                        default.
+
+                **fdp**
+                        Use Flexible Data Placement directives for write
+                        commands. This is equivalent to specifying
+                        :option:`fdp` =1.
+
+               **streams**
+                        Use Streams directives for write commands.
+
+.. option:: plid_select=str, fdp_pli_select=str : [io_uring_cmd] [xnvme]
  
         Defines how fio decides which placement ID to use next. The following
         types are defined:
@@ -2480,22 +2529,52 @@ with the caveat that when used on the command line, they must come after the
                         Round robin over available placement IDs. This is the
                         default.
  
-       The available placement ID index/indices is defined by the option
-       :option:`fdp_pli`.
+               **scheme**
+                       Choose a placement ID (index) based on the scheme file defined by
+                       the option :option:`dp_scheme`.
  
-.. option:: fdp_pli=str : [io_uring_cmd] [xnvme]
+       The available placement ID (indices) are defined by the option :option:`fdp_pli`
+       or :option:`plids` except for the case of **scheme**.
  
-       Select which Placement ID Index/Indicies this job is allowed to use for
-       writes. By default, the job will cycle through all available Placement
-        IDs, so use this to isolate these identifiers to specific jobs. If you
-        want fio to use placement identifier only at indices 0, 2 and 5 specify
-        ``fdp_pli=0,2,5``.
+.. option:: plids=str, fdp_pli=str : [io_uring_cmd] [xnvme]
  
-.. option:: md_per_io_size=int : [io_uring_cmd]
+        Select which Placement ID Indices (FDP) or Placement IDs (streams) this
+        job is allowed to use for writes. This option accepts a comma-separated
+        list of values or ranges (e.g., 1,2-4,5,6-8).
+
+        For FDP by default, the job will cycle through all available Placement
+        IDs, so use this option to be selective. The values specified here are
+        array indices for the list of placement IDs returned by the nvme-cli
+        command ``nvme fdp status``. If you want fio to use FDP placement
+        identifiers only at indices 0, 2 and 5, set ``plids=0,2,5``.
+
+        For streams this should be a list of Stream IDs.
+
+.. option:: dp_scheme=str : [io_uring_cmd] [xnvme]
+
+       Defines which placement ID (index) to be selected based on offset(LBA) range.
+       The file should contains one or more scheme entries in the following format:
+
+               0, 10737418240, 0
+               10737418240, 21474836480, 1
+               21474836480, 32212254720, 2
+               ...
+
+       Each line, a scheme entry, contains start offset, end offset, and placement ID
+       (index) separated by comma(,). If the write offset is within the range of a certain
+       scheme entry(start offset ≤ offset < end offset), the corresponding placement ID
+       (index) will be selected. If the write offset belongs to multiple scheme entries,
+       the first matched scheme entry will be applied. If the offset is not within any range
+       of scheme entry, dspec field will be set to 0, default RUH. (Caution: In case of
+       multiple devices in a job, all devices of the job will be affected by the scheme. If
+       this option is specified, the option :option:`plids` or :option:`fdp_pli` will be
+       ignored.)
+
+.. option:: md_per_io_size=int : [io_uring_cmd] [xnvme]
  
         Size in bytes for separate metadata buffer per IO. Default: 0.
  
-.. option:: pi_act=int : [io_uring_cmd]
+.. option:: pi_act=int : [io_uring_cmd] [xnvme]
  
         Action to take when nvme namespace is formatted with protection
         information. If this is set to 1 and namespace is formatted with
@@ -2511,7 +2590,7 @@ with the caveat that when used on the command line, they must come after the
         it will use the default slower generator.
         (see: https://github.com/intel/isa-l)
  
-.. option:: pi_chk=str[,str][,str] : [io_uring_cmd]
+.. option:: pi_chk=str[,str][,str] : [io_uring_cmd] [xnvme]
  
         Controls the protection information check. This can take one or more
         of these values. Default: none.
@@ -2524,16 +2603,25 @@ with the caveat that when used on the command line, they must come after the
         **APPTAG**
                 Enables protection information checking of application tag field.
  
-.. option:: apptag=int : [io_uring_cmd]
+.. option:: apptag=int : [io_uring_cmd] [xnvme]
  
         Specifies logical block application tag value, if namespace is
         formatted to use end to end protection information. Default: 0x1234.
  
-.. option:: apptag_mask=int : [io_uring_cmd]
+.. option:: apptag_mask=int : [io_uring_cmd] [xnvme]
  
         Specifies logical block application tag mask value, if namespace is
         formatted to use end to end protection information. Default: 0xffff.
  
+.. option:: num_range=int : [io_uring_cmd]
+
+       For trim command this will be the number of ranges to trim per I/O
+       request. The number of logical blocks per range is determined by the
+       :option:`bs` option which should be a multiple of logical block size.
+       This cannot be used with read or write. Note that setting this
+       option > 1, :option:`log_offset` will not be able to log all the
+       offsets. Default: 1.
+
  .. option:: cpuload=int : [cpuio]
  
         Attempt to use the specified percentage of CPU cycles. This is a mandatory
@@ -2626,10 +2714,13 @@ with the caveat that when used on the command line, they must come after the
                 User datagram protocol V6.
         **unix**
                 UNIX domain socket.
+       **vsock**
+               VSOCK protocol.
  
-       When the protocol is TCP or UDP, the port must also be given, as well as the
-       hostname if the job is a TCP listener or UDP reader. For unix sockets, the
+       When the protocol is TCP, UDP or VSOCK, the port must also be given, as well as the
+       hostname if the job is a TCP or VSOCK listener or UDP reader. For unix sockets, the
         normal :option:`filename` option should be used and the port is invalid.
+       When the protocol is VSOCK, the :option:`hostname` is the CID of the remote VM.
  
  .. option:: listen : [netsplice] [net]
  
@@ -2761,19 +2852,35 @@ with the caveat that when used on the command line, they must come after the
         Specify stat system call type to measure lookup/getattr performance.
         Default is **stat** for :manpage:`stat(2)`.
  
-.. option:: readfua=bool : [sg]
+.. option:: readfua=bool : [sg] [io_uring_cmd]
  
         With readfua option set to 1, read operations include
         the force unit access (fua) flag. Default is 0.
  
-.. option:: writefua=bool : [sg]
+.. option:: writefua=bool : [sg] [io_uring_cmd]
  
         With writefua option set to 1, write operations include
         the force unit access (fua) flag. Default is 0.
  
+.. option:: write_mode=str : [io_uring_cmd]
+
+        Specifies the type of write operation.  Defaults to 'write'.
+
+                **write**
+                        Use Write commands for write operations
+
+                **uncor**
+                        Use Write Uncorrectable commands for write operations
+
+                **zeroes**
+                        Use Write Zeroes commands for write operations
+
+                **verify**
+                        Use Verify commands for write operations
+
  .. option:: sg_write_mode=str : [sg]
  
-       Specify the type of write commands to issue. This option can take three values:
+       Specify the type of write commands to issue. This option can take ten values:
  
         **write**
                 This is the default where write opcodes are issued as usual.
@@ -3310,8 +3417,8 @@ I/O rate
  
  .. option:: rate_cycle=int
  
-       Average bandwidth for :option:`rate` and :option:`rate_min` over this number
-       of milliseconds. Defaults to 1000.
+        Average bandwidth for :option:`rate_min` and :option:`rate_iops_min`
+        over this number of milliseconds. Defaults to 1000.
  
  
  I/O latency
@@ -3984,12 +4091,12 @@ Measurements and reporting
         same reporting group, unless if separated by a :option:`stonewall`, or by
         using :option:`new_group`.
  
-    NOTE: When :option: `group_reporting` is used along with `json` output,
-    there are certain per-job properties which can be different between jobs
-    but do not have a natural group-level equivalent. Examples include
-    `kb_base`, `unit_base`, `sig_figs`, `thread_number`, `pid`, and
-    `job_start`. For these properties, the values for the first job are
-    recorded for the group.
+       NOTE: When :option:`group_reporting` is used along with `json` output,
+       there are certain per-job properties which can be different between jobs
+       but do not have a natural group-level equivalent. Examples include
+       `kb_base`, `unit_base`, `sig_figs`, `thread_number`, `pid`, and
+       `job_start`. For these properties, the values for the first job are
+       recorded for the group.
  
  .. option:: new_group
  
@@ -4063,12 +4170,15 @@ Measurements and reporting
  
  .. option:: log_avg_msec=int
  
-       By default, fio will log an entry in the iops, latency, or bw log for every
-       I/O that completes. When writing to the disk log, that can quickly grow to a
-       very large size. Setting this option makes fio average the each log entry
-       over the specified period of time, reducing the resolution of the log.  See
-       :option:`log_max_value` as well. Defaults to 0, logging all entries.
-       Also see `Log File Formats`_.
+        By default, fio will log an entry in the iops, latency, or bw log for
+        every I/O that completes. When writing to the disk log, that can
+        quickly grow to a very large size. Setting this option directs fio to
+        instead record an average over the specified duration for each log
+        entry, reducing the resolution of the log. When the job completes, fio
+        will flush any accumulated latency log data, so the final log interval
+        may not match the value specified by this option and there may even be
+        duplicate timestamps. See :option:`log_window_value` as well. Defaults
+        to 0, logging entries for each I/O. Also see `Log File Formats`_.
  
  .. option:: log_hist_msec=int
  
@@ -4088,11 +4198,28 @@ Measurements and reporting
         histogram logs contain 1216 latency bins. See :option:`write_hist_log`
         and `Log File Formats`_.
  
-.. option:: log_max_value=bool
+.. option:: log_window_value=str, log_max_value=str
+
+       If :option:`log_avg_msec` is set, fio by default logs the average over that
+       window. This option determines whether fio logs the average, maximum or
+       both the values over the window. This only affects the latency logging,
+       as both average and maximum values for iops or bw log will be same.
+       Accepted values are:
  
-       If :option:`log_avg_msec` is set, fio logs the average over that window. If
-       you instead want to log the maximum value, set this option to 1. Defaults to
-       0, meaning that averaged values are logged.
+               **avg**
+                       Log average value over the window. The default.
+
+               **max**
+                       Log maximum value in the window.
+
+               **both**
+                       Log both average and maximum value over the window.
+
+               **0**
+                       Backward-compatible alias for **avg**.
+
+               **1**
+                       Backward-compatible alias for **max**.
  
  .. option:: log_offset=bool
  
@@ -4535,6 +4662,21 @@ writes in the example above).  In the order listed, they denote:
                  commit if available) functions were completed to when the I/O's
                  completion was reaped by fio.
  
+               For file and directory operation engines, **clat** denotes the time
+               to complete one file or directory operation.
+
+                 **filecreate engine**:the time cost to create a new file
+
+                 **filestat engine**:  the time cost to look up an existing file
+
+                 **filedelete engine**:the time cost to delete a file
+
+                 **dircreate engine**: the time cost to create a new directory
+
+                 **dirstat engine**:   the time cost to look up an existing directory
+
+                 **dirdelete engine**: the time cost to delete a directory
+
  **lat**
                 Total latency. Same names as slat and clat, this denotes the time from
                 when fio created the I/O unit to completion of the I/O operation.
@@ -4553,12 +4695,30 @@ writes in the example above).  In the order listed, they denote:
                 are on the same disk, since they are then competing for disk
                 access.
  
+               For file and directory operation engines, **bw** is meaningless.
+
  **iops**
                 IOPS statistics based on measurements from discrete intervals.
                 For details see the description for bw above. See
                 :option:`iopsavgtime` to control the duration of the intervals.
                 Same values reported here as for bw except for percentage.
  
+               For file and directory operation engines, **iops** is the most
+               fundamental index to denote the performance.
+               It means how many files or directories can be operated per second.
+
+                 **filecreate engine**:number of files can be created per second
+
+                 **filestat engine**:  number of files can be looked up per second
+
+                 **filedelete engine**:number of files can be deleted per second
+
+                 **dircreate engine**: number of directories can be created per second
+
+                 **dirstat engine**:   number of directories can be looked up per second
+
+                 **dirdelete engine**: number of directories can be deleted per second
+
  **lat (nsec/usec/msec)**
                 The distribution of I/O completion latencies. This is the time from when
                 I/O leaves fio and when it gets completed. Unlike the separate
@@ -5061,11 +5221,19 @@ toggled with :option:`log_offset`.
  by the ioengine specific :option:`cmdprio_percentage`.
  
  Fio defaults to logging every individual I/O but when windowed logging is set
-through :option:`log_avg_msec`, either the average (by default) or the maximum
-(:option:`log_max_value` is set) *value* seen over the specified period of time
-is recorded. Each *data direction* seen within the window period will aggregate
-its values in a separate row. Further, when using windowed logging the *block
-size* and *offset* entries will always contain 0.
+through :option:`log_avg_msec`, either the average (by default), the maximum
+(:option:`log_window_value` is set to max) *value* seen over the specified period
+of time, or both the average *value* and maximum *value1* (:option:`log_window_value`
+is set to both) is recorded. The log file format when both the values are reported
+takes this form:
+
+    *time* (`msec`), *value*, *value1*, *data direction*, *block size* (`bytes`),
+    *offset* (`bytes`), *command priority*
+
+
+Each *data direction* seen within the window period will aggregate its values in a
+separate row. Further, when using windowed logging the *block size* and *offset*
+entries will always contain 0.
  
  
  Client/Server