Merge branch 'atomic-writes'

[fio.git] / HOWTO.rst
diff --git a/HOWTO.rst b/HOWTO.rst

index 56951f13075d45498186588e18942d2b2e3b719d..4f071484bdc3fa7b02cc4dd51f87ec3ef97a601d 100644 (file)
--- a/HOWTO.rst
+++ b/HOWTO.rst
@@ -805,6 +805,11 @@ Target file/device
         Note: Windows and FreeBSD (refer to geom(4)) prevent write access to areas
         of the disk containing in-use data (e.g. filesystems).
  
+       For HTTP and S3 access, specify a valid URL path or S3 key, respectively.
+       A filename for path-style S3 includes a bucket name (:file:`/bucket/k/e.y`)
+       while a virtual-hosted-style S3 filename :file:`/k/e.y` does not because 
+       its bucket name is specified in :option:`http_host`.
+
         The filename "`-`" is a reserved name, meaning *stdin* or *stdout*.  Which
         of the two depends on the read/write direction set.
  
@@ -2496,6 +2501,20 @@ with the caveat that when used on the command line, they must come after the
         For direct I/O, requests will only succeed if cache invalidation isn't required,
         file blocks are fully allocated and the disk request could be issued immediately.
  
+.. option:: atomic=bool : [pvsync2] [libaio] [io_uring]
+
+       This option means that writes are issued with torn-write protection, meaning
+       that for a power fail or kernel crash, all or none of the data from the write
+       will be stored, but never a mix of old and new data. Torn-write protection is
+       also known as atomic writes.
+
+       This option sets the RWF_ATOMIC flag (supported from the 6.11 Linux kernel) on
+       a per-IO basis.
+
+       Writes with RWF_ATOMIC set will be rejected by the kernel when the file does
+       not support torn-write protection. To learn a file's torn-write limits, issue
+       statx with STATX_WRITE_ATOMIC.
+
  .. option:: fdp=bool : [io_uring_cmd] [xnvme]
  
         Enable Flexible Data Placement mode for write commands.
@@ -2529,17 +2548,46 @@ with the caveat that when used on the command line, they must come after the
                         Round robin over available placement IDs. This is the
                         default.
  
-       The available placement ID (indices) are defined by the option
-       :option:`plids`.
+               **scheme**
+                       Choose a placement ID (index) based on the scheme file defined by
+                       the option :option:`dp_scheme`.
+
+       The available placement ID (indices) are defined by the option :option:`fdp_pli`
+       or :option:`plids` except for the case of **scheme**.
  
  .. option:: plids=str, fdp_pli=str : [io_uring_cmd] [xnvme]
  
-        Select which Placement IDs (streams) or Placement ID Indices (FDP) this
-        job is allowed to use for writes. For FDP by default, the job will
-        cycle through all available Placement IDs, so use this to isolate these
-        identifiers to specific jobs. If you want fio to use FDP placement
-        identifiers only at indices 0, 2 and 5 specify ``plids=0,2,5``. For
-        streams this should be a comma-separated list of Stream IDs.
+        Select which Placement ID Indices (FDP) or Placement IDs (streams) this
+        job is allowed to use for writes. This option accepts a comma-separated
+        list of values or ranges (e.g., 1,2-4,5,6-8).
+
+        For FDP by default, the job will cycle through all available Placement
+        IDs, so use this option to be selective. The values specified here are
+        array indices for the list of placement IDs returned by the nvme-cli
+        command ``nvme fdp status``. If you want fio to use FDP placement
+        identifiers only at indices 0, 2 and 5, set ``plids=0,2,5``.
+
+        For streams this should be a list of Stream IDs.
+
+.. option:: dp_scheme=str : [io_uring_cmd] [xnvme]
+
+       Defines which placement ID (index) to be selected based on offset(LBA) range.
+       The file should contains one or more scheme entries in the following format:
+
+               0, 10737418240, 0
+               10737418240, 21474836480, 1
+               21474836480, 32212254720, 2
+               ...
+
+       Each line, a scheme entry, contains start offset, end offset, and placement ID
+       (index) separated by comma(,). If the write offset is within the range of a certain
+       scheme entry(start offset ≤ offset < end offset), the corresponding placement ID
+       (index) will be selected. If the write offset belongs to multiple scheme entries,
+       the first matched scheme entry will be applied. If the offset is not within any range
+       of scheme entry, dspec field will be set to 0, default RUH. (Caution: In case of
+       multiple devices in a job, all devices of the job will be affected by the scheme. If
+       this option is specified, the option :option:`plids` or :option:`fdp_pli` will be
+       ignored.)
  
  .. option:: md_per_io_size=int : [io_uring_cmd] [xnvme]
  
@@ -2633,7 +2681,7 @@ with the caveat that when used on the command line, they must come after the
                 this will be the starting port number since fio will use a range of
                 ports.
  
-   [rdma], [librpma_*]
+   [rdma]
  
                 The port to use for RDMA-CM communication. This should be the same value
                 on the client and the server side.
@@ -2644,20 +2692,6 @@ with the caveat that when used on the command line, they must come after the
         is a TCP listener or UDP reader, the hostname is not used and must be omitted
         unless it is a valid UDP multicast address.
  
-.. option:: serverip=str : [librpma_*]
-
-       The IP address to be used for RDMA-CM based I/O.
-
-.. option:: direct_write_to_pmem=bool : [librpma_*]
-
-       Set to 1 only when Direct Write to PMem from the remote host is possible.
-       Otherwise, set to 0.
-
-.. option:: busy_wait_polling=bool : [librpma_*_server]
-
-       Set to 0 to wait for completion instead of busy-wait polling completion.
-       Default: 1.
-
  .. option:: interface=str : [netsplice] [net]
  
         The IP address of the network interface used to send or receive UDP
@@ -2833,6 +2867,22 @@ with the caveat that when used on the command line, they must come after the
         With writefua option set to 1, write operations include
         the force unit access (fua) flag. Default is 0.
  
+.. option:: write_mode=str : [io_uring_cmd]
+
+        Specifies the type of write operation.  Defaults to 'write'.
+
+                **write**
+                        Use Write commands for write operations
+
+                **uncor**
+                        Use Write Uncorrectable commands for write operations
+
+                **zeroes**
+                        Use Write Zeroes commands for write operations
+
+                **verify**
+                        Use Verify commands for write operations
+
  .. option:: sg_write_mode=str : [sg]
  
         Specify the type of write commands to issue. This option can take ten values:
@@ -2889,8 +2939,13 @@ with the caveat that when used on the command line, they must come after the
  
  .. option:: http_host=str : [http]
  
-       Hostname to connect to. For S3, this could be the bucket hostname.
-       Default is **localhost**
+       Hostname to connect to. HTTP port 80 is used automatically when the value of 
+       the https parameter is *off*, and HTTPS port 443 if it is *on*. A 
+       virtual-hosted-style S3 hostname starts with a bucket name, while a 
+       path-style S3 hostname does not. See 
+       https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html for 
+       detailed examples.
+       Default is **localhost** (path-style S3 hostname)
  
  .. option:: http_user=str : [http]
  
@@ -3947,6 +4002,17 @@ Verification
          instead resets the file after the write phase and then replays I/Os for
          the verification phase.
  
+.. option:: verify_write_sequence=bool
+
+        Verify the header write sequence number. In a scenario with multiple jobs,
+        verification of the write sequence number may fail. Disabling this option
+        will mean that write sequence number checking is skipped. Doing that can be
+        useful for testing atomic writes, as it means that checksum verification can
+        still be attempted. For when :option:`atomic` is enabled, checksum
+        verification is expected to succeed (while write sequence checking can still
+        fail).
+        Defaults to true.
+
  .. option:: trim_percentage=int
  
         Number of verify blocks to discard/trim.
@@ -4182,6 +4248,21 @@ Measurements and reporting
         entry as well as the other data values. Defaults to 0 meaning that
         offsets are not present in logs. Also see `Log File Formats`_.
  
+.. option:: log_prio=bool
+
+       If this is set, the *Command priority* field in `Log File Formats`_
+       shows the priority value and the IO priority class of the command.
+       Otherwise, the field shows if the command has the highest RT
+       priority class or not. Also see `Log File Formats`_.
+
+.. option:: log_issue_time=bool
+
+       If this is set, the iolog options will include the command issue time
+       for the I/O entry as well as the other data values. Defaults to 0
+       meaning that command issue times are not present in logs. Also see
+       `Log File Formats`_. This option shall be set together with
+       :option:`write_lat_log` and :option:`log_offset`.
+
  .. option:: log_compression=int
  
         If this is set, fio will compress the I/O logs as it goes, to keep the
@@ -5147,7 +5228,7 @@ Fio supports a variety of log file formats, for logging latencies, bandwidth,
  and IOPS. The logs share a common format, which looks like this:
  
      *time* (`msec`), *value*, *data direction*, *block size* (`bytes`),
-    *offset* (`bytes`), *command priority*
+    *offset* (`bytes`), *command priority*, *issue time* (`nsec`)
  
  *Time* for the log entry is always in milliseconds. The *value* logged depends
  on the type of log, it will be one of the following:
@@ -5172,8 +5253,21 @@ The entry's *block size* is always in bytes. The *offset* is the position in byt
  from the start of the file for that particular I/O. The logging of the offset can be
  toggled with :option:`log_offset`.
  
-*Command priority* is 0 for normal priority and 1 for high priority. This is controlled
-by the ioengine specific :option:`cmdprio_percentage`.
+If :option:`log_prio` is not set, the entry's *Command priority* is 1 for an IO
+executed with the highest RT priority class (:option:`prioclass` =1 or
+:option:`cmdprio_class` =1) and 0 otherwise. This is controlled by the
+:option:`prioclass` option and the ioengine specific
+:option:`cmdprio_percentage`  :option:`cmdprio_class` options. If
+:option:`log_prio` is set, the entry's *Command priority* is the priority set
+for the IO, as a 16-bits hexadecimal number with the lowest 13 bits indicating
+the priority value (:option:`prio` and :option:`cmdprio` options) and the
+highest 3 bits indicating the IO priority class (:option:`prioclass` and
+:option:`cmdprio_class` options).
+
+The entry's *issue time* is the command issue time in nanoseconds. The logging
+of the issue time can be toggled with :option:`log_issue_time`. This field has
+valid values in completion latency log file (clat), or submit latency log file
+(slat). The field has value 0 in other logs files.
  
  Fio defaults to logging every individual I/O but when windowed logging is set
  through :option:`log_avg_msec`, either the average (by default), the maximum
@@ -5183,12 +5277,12 @@ is set to both) is recorded. The log file format when both the values are report
  takes this form:
  
      *time* (`msec`), *value*, *value1*, *data direction*, *block size* (`bytes`),
-    *offset* (`bytes`), *command priority*
+    *offset* (`bytes`), *command priority*, *issue time* (`nsec`)
  
  
  Each *data direction* seen within the window period will aggregate its values in a
-separate row. Further, when using windowed logging the *block size* and *offset*
-entries will always contain 0.
+separate row. Further, when using windowed logging the *block size*, *offset*
+and *issue time* entries will always contain 0.
  
  
  Client/Server