zbd: Improve random zone index generation logic

[fio.git] / fio.1
diff --git a/fio.1 b/fio.1

index 27cf2f15ed19f0e7bb0ab0d7462256a6dc26f693..9c12ad1318b7e130aacf403fc5e8375eb5d87f9b 100644 (file)
--- a/fio.1
+++ b/fio.1
@@ -288,6 +288,15 @@ Pi means pebi (Pi) or 1024**5
  .PD
  .RE
  .P
+For Zone Block Device Mode:
+.RS
+.P
+.PD 0
+z means Zone 
+.P
+.PD
+.RE
+.P
  With `kb_base=1024' (the default), the unit prefixes are opposite
  from those specified in the SI and IEC 80000-13 standards to provide
  compatibility with old scripts. For example, 4k means 4096.
@@ -690,7 +699,8 @@ of how that would work.
  .TP
  .BI ioscheduler \fR=\fPstr
  Attempt to switch the device hosting the file to the specified I/O scheduler
-before running.
+before running. If the file is a pipe, a character device file or if device
+hosting the file could not be determined, this option is ignored.
  .TP
  .BI create_serialize \fR=\fPbool
  If true, serialize the file creation for the jobs. This may be handy to
@@ -825,6 +835,11 @@ threads/processes.
  .BI job_max_open_zones \fR=\fPint
  Limit on the number of simultaneously opened zones per single thread/process.
  .TP
+.BI ignore_zone_limits \fR=\fPbool
+If this isn't set, fio will query the max open zones limit from the zoned block
+device, and exit if the specified \fBmax_open_zones\fR value is larger than the
+limit reported by the device. Default: false.
+.TP
  .BI zone_reset_threshold \fR=\fPfloat
  A number between zero and one that indicates the ratio of logical blocks with
  data to the total number of logical blocks in the test above which zones
@@ -924,10 +939,32 @@ behaves in a similar fashion, except it sends the same offset 8 number of
  times before generating a new offset.
  .RE
  .TP
-.BI unified_rw_reporting \fR=\fPbool
+.BI unified_rw_reporting \fR=\fPstr
  Fio normally reports statistics on a per data direction basis, meaning that
-reads, writes, and trims are accounted and reported separately. If this
-option is set fio sums the results and report them as "mixed" instead.
+reads, writes, and trims are accounted and reported separately. This option
+determines whether fio reports the results normally, summed together, or as
+both options.
+Accepted values are:
+.RS
+.TP
+.B none
+Normal statistics reporting.
+.TP
+.B mixed
+Statistics are summed per data direction and reported together.
+.TP
+.B both
+Statistics are reported normally, followed by the mixed statistics.
+.TP
+.B 0
+Backward-compatible alias for \fBnone\fR.
+.TP
+.B 1
+Backward-compatible alias for \fBmixed\fR.
+.TP
+.B 2
+Alias for \fBboth\fR.
+.RE
  .TP
  .BI randrepeat \fR=\fPbool
  Seed the random number generator used for random I/O patterns in a
@@ -1038,13 +1075,14 @@ should be associated with them.
  .TP
  .BI offset \fR=\fPint[%|z]
  Start I/O at the provided offset in the file, given as either a fixed size in
-bytes or a percentage. If a percentage is given, the generated offset will be
+bytes, zones or a percentage. If a percentage is given, the generated offset will be
  aligned to the minimum \fBblocksize\fR or to the value of \fBoffset_align\fR if
  provided. Data before the given offset will not be touched. This
  effectively caps the file size at `real_size \- offset'. Can be combined with
  \fBsize\fR to constrain the start and end range of the I/O workload.
  A percentage can be specified by a number between 1 and 100 followed by '%',
-for example, `offset=20%' to specify 20%.
+for example, `offset=20%' to specify 20%. In ZBD mode, value can be set as 
+number of zones using 'z'.
  .TP
  .BI offset_align \fR=\fPint
  If set to non-zero value, the byte offset generated by a percentage \fBoffset\fR
@@ -1059,7 +1097,8 @@ specified). This option is useful if there are several jobs which are
  intended to operate on a file in parallel disjoint segments, with even
  spacing between the starting points. Percentages can be used for this option.
  If a percentage is given, the generated offset will be aligned to the minimum
-\fBblocksize\fR or to the value of \fBoffset_align\fR if provided.
+\fBblocksize\fR or to the value of \fBoffset_align\fR if provided.In ZBD mode, value 
+can be set as number of zones using 'z'.
  .TP
  .BI number_ios \fR=\fPint
  Fio will normally perform I/Os until it has exhausted the size of the region
@@ -1470,6 +1509,48 @@ all \-\- this option only controls the distribution of unique buffers. Setting
  this option will also enable \fBrefill_buffers\fR to prevent every buffer
  being identical.
  .TP
+.BI dedupe_mode \fR=\fPstr
+If \fBdedupe_percentage\fR is given, then this option controls how fio
+generates the dedupe buffers.
+.RS
+.RS
+.TP
+.B repeat
+.P
+.RS
+Generate dedupe buffers by repeating previous writes
+.RE
+.TP
+.B working_set
+.P
+.RS
+Generate dedupe buffers from working set
+.RE
+.RE
+.P
+\fBrepeat\fR is the default option for fio. Dedupe buffers are generated
+by repeating previous unique write.
+
+\fBworking_set\fR is a more realistic workload.
+With \fBworking_set\fR, \fBdedupe_working_set_percentage\fR should be provided.
+Given that, fio will use the initial unique write buffers as its working set.
+Upon deciding to dedupe, fio will randomly choose a buffer from the working set.
+Note that by using \fBworking_set\fR the dedupe percentage will converge
+to the desired over time while \fBrepeat\fR maintains the desired percentage
+throughout the job.
+.RE
+.RE
+.TP
+.BI dedupe_working_set_percentage \fR=\fPint
+If \fBdedupe_mode\fR is set to \fBworking_set\fR, then this controls
+the percentage of size of the file or device used as the buffers
+fio will choose to generate the dedupe buffers from
+.P
+.RS
+Note that \fBsize\fR needs to be explicitly provided and only 1 file
+per job is supported
+.RE
+.TP
  .BI invalidate \fR=\fPbool
  Invalidate the buffer/page cache parts of the files to be used prior to
  starting I/O if the platform and file type support it. Defaults to true.
@@ -1584,9 +1665,9 @@ set to the physical size of the given files or devices if they exist.
  If this option is not specified, fio will use the full size of the given
  files or devices. If the files do not exist, size must be given. It is also
  possible to give size as a percentage between 1 and 100. If `size=20%' is
-given, fio will use 20% of the full size of the given files or devices.
-Can be combined with \fBoffset\fR to constrain the start and end range
-that I/O will be done within.
+given, fio will use 20% of the full size of the given files or devices. In ZBD mode,
+size can be given in units of number of zones using 'z'. Can be combined with \fBoffset\fR to 
+constrain the start and end range that I/O will be done within.
  .TP
  .BI io_size \fR=\fPint[%|z] "\fR,\fB io_limit" \fR=\fPint[%|z]
  Normally fio operates within the region set by \fBsize\fR, which means
@@ -1598,7 +1679,8 @@ will perform I/O within the first 20GiB but exit when 5GiB have been
  done. The opposite is also possible \-\- if \fBsize\fR is set to 20GiB,
  and \fBio_size\fR is set to 40GiB, then fio will do 40GiB of I/O within
  the 0..20GiB region. Value can be set as percentage: \fBio_size\fR=N%.
-In this case \fBio_size\fR multiplies \fBsize\fR= value.
+In this case \fBio_size\fR multiplies \fBsize\fR= value. In ZBD mode, value can
+also be set as number of zones using 'z'.
  .TP
  .BI filesize \fR=\fPirange(int)
  Individual file sizes. May be a range, in which case fio will select sizes
@@ -1615,11 +1697,10 @@ of a file. This option is ignored on non-regular files.
  .TP
  .BI fill_device \fR=\fPbool "\fR,\fB fill_fs" \fR=\fPbool
  Sets size to something really large and waits for ENOSPC (no space left on
-device) as the terminating condition. Only makes sense with sequential
+device) or EDQUOT (disk quota exceeded)
+as the terminating condition. Only makes sense with sequential
  write. For a read workload, the mount point will be filled first then I/O
-started on the result. This option doesn't make sense if operating on a raw
-device node, since the size of that is already known by the file system.
-Additionally, writing beyond end-of-device will not return ENOSPC there.
+started on the result.
  .SS "I/O engine"
  .TP
  .BI ioengine \fR=\fPstr
@@ -1825,6 +1906,11 @@ Simply do stat() and do no I/O to the file. You need to set 'filesize'
  and 'nrfiles', so that files will be created.
  This engine is to measure file lookup and meta data access.
  .TP
+.B filedelete
+Simply delete files by unlink() and do no I/O to the file. You need to set 'filesize'
+and 'nrfiles', so that files will be created.
+This engine is to measure file delete.
+.TP
  .B libpmem
  Read and write using mmap I/O to a file on a filesystem
  mounted with DAX on a persistent memory device through the PMDK
@@ -1860,6 +1946,15 @@ not be \fBcudamalloc\fR. This ioengine defines engine specific options.
  .B dfs
  I/O engine supporting asynchronous read and write operations to the DAOS File
  System (DFS) via libdfs.
+.TP
+.B nfs
+I/O engine supporting asynchronous read and write operations to
+NFS filesystems from userspace via libnfs. This is useful for
+achieving higher concurrency and thus throughput than is possible
+via kernel NFS.
+.TP
+.B exec
+Execute 3rd party tools. Could be used to perform monitoring during jobs runtime.
  .SS "I/O engine specific parameters"
  In addition, there are some parameters which are only valid when a specific
  \fBioengine\fR is in use. These are used identically to normal parameters,
@@ -1956,7 +2051,7 @@ The TCP or UDP port to bind to or connect to. If this is used with
  this will be the starting port number since fio will use a range of
  ports.
  .TP
-.BI (rdma)port
+.BI (rdma, librpma_*)port
  The port to use for RDMA-CM communication. This should be the same
  value on the client and the server side.
  .TP
@@ -1965,6 +2060,16 @@ The hostname or IP address to use for TCP, UDP or RDMA-CM based I/O.
  If the job is a TCP listener or UDP reader, the hostname is not used
  and must be omitted unless it is a valid UDP multicast address.
  .TP
+.BI (librpma_*)serverip \fR=\fPstr
+The IP address to be used for RDMA-CM based I/O.
+.TP
+.BI (librpma_*_server)direct_write_to_pmem \fR=\fPbool
+Set to 1 only when Direct Write to PMem from the remote host is possible. Otherwise, set to 0.
+.TP
+.BI (librpma_*_server)busy_wait_polling \fR=\fPbool
+Set to 0 to wait for completion instead of busy-wait polling completion.
+Default: 1.
+.TP
  .BI (netsplice,net)interface \fR=\fPstr
  The IP address of the network interface used to send or receive UDP
  multicast.
@@ -2059,6 +2164,11 @@ by default.
  Poll store instead of waiting for completion. Usually this provides better
  throughput at cost of higher(up to 100%) CPU utilization.
  .TP
+.BI (rados)touch_objects \fR=\fPbool
+During initialization, touch (create if do not exist) all objects (files).
+Touching all objects affects ceph caches and likely impacts test results.
+Enabled by default.
+.TP
  .BI (http)http_host \fR=\fPstr
  Hostname to connect to. For S3, this could be the bucket name. Default
  is \fBlocalhost\fR
@@ -2227,6 +2337,35 @@ Use DAOS container's chunk size by default.
  .BI (dfs)object_class
  Specificy a different object class for the dfs file.
  Use DAOS container's object class by default.
+.TP
+.BI (nfs)nfs_url
+URL in libnfs format, eg nfs://<server|ipv4|ipv6>/path[?arg=val[&arg=val]*]
+Refer to the libnfs README for more details.
+.TP
+.BI (exec)program\fR=\fPstr
+Specify the program to execute.
+Note the program will receive a SIGTERM when the job is reaching the time limit.
+A SIGKILL is sent once the job is over. The delay between the two signals is defined by \fBgrace_time\fR option.
+.TP
+.BI (exec)arguments\fR=\fPstr
+Specify arguments to pass to program.
+Some special variables can be expanded to pass fio's job details to the program :
+.RS
+.RS
+.TP
+.B %r
+replaced by the duration of the job in seconds
+.TP
+.BI %n
+replaced by the name of the job
+.RE
+.RE
+.TP
+.BI (exec)grace_time\fR=\fPint
+Defines the time between the SIGTERM and SIGKILL signals. Default is 1 second.
+.TP
+.BI (exec)std_redirect\fR=\fbool
+If set, stdout and stderr streams are redirected to files named from the job name. Default is true.
  .SS "I/O depth"
  .TP
  .BI iodepth \fR=\fPint