Merge branch 'multiple-read_iolog' of https://github.com/aclamk/fio

author Jens Axboe <axboe@kernel.dk>

Mon, 20 Aug 2018 14:32:29 +0000 (08:32 -0600)

committer Jens Axboe <axboe@kernel.dk>

Mon, 20 Aug 2018 14:32:29 +0000 (08:32 -0600)
author Jens Axboe <axboe@kernel.dk>
Mon, 20 Aug 2018 14:32:29 +0000 (08:32 -0600)
committer Jens Axboe <axboe@kernel.dk>
Mon, 20 Aug 2018 14:32:29 +0000 (08:32 -0600)
diff --git a/.travis.yml b/.travis.yml

index 94f69fb59763d5ceb9ac4824f9771e4d6e36cb14..4a87fe6c45f80baff08e5ac5e037d51c0477578b 100644 (file)
--- a/.travis.yml
+++ b/.travis.yml
@@ -25,6 +25,10 @@ matrix:
        compiler: clang
        osx_image: xcode8.3
        env: BUILD_ARCH="x86_64"
+    - os: osx
+      compiler: clang
+      osx_image: xcode9.4
+      env: BUILD_ARCH="x86_64"
    exclude:
      - os: osx
        compiler: gcc
diff --git a/HOWTO b/HOWTO

index 16c5ae3163074fcb08a6caca741cb78a0b56c29b..ff7aa096495a969aea9eddb0c8159f8054b6b0d1 100644 (file)
--- a/HOWTO
+++ b/HOWTO
@@ -283,7 +283,8 @@ Command line options
  
  .. option:: --aux-path=path
  
-       Use this `path` for fio state generated files.
+       Use the directory specified by `path` for generated state files instead
+       of the current working directory.
  
  Any parameters following the options will be assumed to be job files, unless
  they match a job file parameter. Multiple job files can be listed and each job
@@ -748,12 +749,15 @@ Target file/device
         assigned equally distributed to job clones created by :option:`numjobs` as
         long as they are using generated filenames. If specific `filename(s)` are
         set fio will use the first listed directory, and thereby matching the
-       `filename` semantic which generates a file each clone if not specified, but
-       let all clones use the same if set.
+       `filename` semantic (which generates a file for each clone if not
+       specified, but lets all clones use the same file if set).
  
         See the :option:`filename` option for information on how to escape "``:``" and
         "``\``" characters within the directory path itself.
  
+       Note: To control the directory fio will use for internal state files
+       use :option:`--aux-path`.
+
  .. option:: filename=str
  
         Fio normally makes up a `filename` based on the job name, thread number, and
@@ -948,18 +952,24 @@ Target file/device
  
         Unlink job files after each iteration or loop.  Default: false.
  
-.. option:: zonesize=int
+.. option:: zonerange=int
  
-       Divide a file into zones of the specified size. See :option:`zoneskip`.
+       Size of a single zone in which I/O occurs. See also :option:`zonesize`
+       and :option:`zoneskip`.
  
-.. option:: zonerange=int
+.. option:: zonesize=int
  
-       Give size of an I/O zone.  See :option:`zoneskip`.
+       Number of bytes to transfer before skipping :option:`zoneskip`
+       bytes. If this parameter is smaller than :option:`zonerange` then only
+       a fraction of each zone with :option:`zonerange` bytes will be
+       accessed.  If this parameter is larger than :option:`zonerange` then
+       each zone will be accessed multiple times before skipping
  
  .. option:: zoneskip=int
  
-       Skip the specified number of bytes when :option:`zonesize` data has been
-       read. The two zone options can be used to only do I/O on zones of a file.
+       Skip the specified number of bytes when :option:`zonesize` data have
+       been transferred. The three zone options can be used to do strided I/O
+       on a file.
  
  
  I/O type
@@ -1825,6 +1835,15 @@ I/O engine
                         (RBD) via librbd without the need to use the kernel rbd driver. This
                         ioengine defines engine specific options.
  
+               **http**
+                       I/O engine supporting GET/PUT requests over HTTP(S) with libcurl to
+                       a WebDAV or S3 endpoint.  This ioengine defines engine specific options.
+
+                       This engine only supports direct IO of iodepth=1; you need to scale this
+                       via numjobs. blocksize defines the size of the objects to be created.
+
+                       TRIM is translated to object deletion.
+
                 **gfapi**
                         Using GlusterFS libgfapi sync interface to direct access to
                         GlusterFS volumes without having to go through FUSE.  This ioengine
@@ -1882,6 +1901,22 @@ I/O engine
                         mounted with DAX on a persistent memory device through the PMDK
                         libpmem library.
  
+               **ime_psync**
+                       Synchronous read and write using DDN's Infinite Memory Engine (IME).
+                       This engine is very basic and issues calls to IME whenever an IO is
+                       queued.
+
+               **ime_psyncv**
+                       Synchronous read and write using DDN's Infinite Memory Engine (IME).
+                       This engine uses iovecs and will try to stack as much IOs as possible
+                       (if the IOs are "contiguous" and the IO depth is not exceeded)
+                       before issuing a call to IME.
+
+               **ime_aio**
+                       Asynchronous read and write using DDN's Infinite Memory Engine (IME).
+                       This engine will try to stack as much IOs as possible by creating
+                       requests for IME. FIO will then decide when to commit these requests.
+
  I/O engine specific parameters
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  
@@ -2105,6 +2140,54 @@ with the caveat that when used on the command line, they must come after the
                 transferred to the device. The writefua option is ignored with this
                 selection.
  
+.. option:: http_host=str : [http]
+
+       Hostname to connect to. For S3, this could be the bucket hostname.
+       Default is **localhost**
+
+.. option:: http_user=str : [http]
+
+       Username for HTTP authentication.
+
+.. option:: http_pass=str : [http]
+
+       Password for HTTP authentication.
+
+.. option:: https=str : [http]
+
+       Enable HTTPS instead of http. *on* enables HTTPS; *insecure*
+       will enable HTTPS, but disable SSL peer verification (use with
+       caution!). Default is **off**
+
+.. option:: http_mode=str : [http]
+
+       Which HTTP access mode to use: *webdav*, *swift*, or *s3*.
+       Default is **webdav**
+
+.. option:: http_s3_region=str : [http]
+
+       The S3 region/zone string.
+       Default is **us-east-1**
+
+.. option:: http_s3_key=str : [http]
+
+       The S3 secret key.
+
+.. option:: http_s3_keyid=str : [http]
+
+       The S3 key/access id.
+
+.. option:: http_swift_auth_token=str : [http]
+
+       The Swift auth token. See the example configuration file on how
+       to retrieve this.
+
+.. option:: http_verbose=int : [http]
+
+       Enable verbose requests from libcurl. Useful for debugging. 1
+       turns on verbose logging from libcurl, 2 additionally enables
+       HTTP IO tracing. Default is **0**
+
  I/O depth
  ~~~~~~~~~
  
@@ -2915,9 +2998,11 @@ Measurements and reporting
  .. option:: write_iops_log=str
  
         Same as :option:`write_bw_log`, but writes an IOPS file (e.g.
-       :file:`name_iops.x.log`) instead. See :option:`write_bw_log` for
-       details about the filename format and `Log File Formats`_ for how data
-       is structured within the file.
+       :file:`name_iops.x.log`) instead. Because fio defaults to individual
+       I/O logging, the value entry in the IOPS log will be 1 unless windowed
+       logging (see :option:`log_avg_msec`) has been enabled. See
+       :option:`write_bw_log` for details about the filename format and `Log
+       File Formats`_ for how data is structured within the file.
  
  .. option:: log_avg_msec=int
  
@@ -3802,17 +3887,16 @@ on the type of log, it will be one of the following:
         **2**
                 I/O is a TRIM
  
-The entry's *block size* is always in bytes. The *offset* is the offset, in bytes,
-from the start of the file, for that particular I/O. The logging of the offset can be
+The entry's *block size* is always in bytes. The *offset* is the position in bytes
+from the start of the file for that particular I/O. The logging of the offset can be
  toggled with :option:`log_offset`.
  
-Fio defaults to logging every individual I/O.  When IOPS are logged for individual
-I/Os the *value* entry will always be 1. If windowed logging is enabled through
-:option:`log_avg_msec`, fio logs the average values over the specified period of time.
-If windowed logging is enabled and :option:`log_max_value` is set, then fio logs
-maximum values in that window instead of averages. Since *data direction*, *block
-size* and *offset* are per-I/O values, if windowed logging is enabled they
-aren't applicable and will be 0.
+Fio defaults to logging every individual I/O but when windowed logging is set
+through :option:`log_avg_msec`, either the average (by default) or the maximum
+(:option:`log_max_value` is set) *value* seen over the specified period of time
+is recorded. Each *data direction* seen within the window period will aggregate
+its values in a separate row. Further, when using windowed logging the *block
+size* and *offset* entries will always contain 0.
  
  Client/Server
  -------------
diff --git a/Makefile b/Makefile

index 20d3ec124a5ad1ed55694b4718b9d9e20a7a206b..e8e15fe863ae1a4bbf5511f222000dfd5c6b800f 100644 (file)
--- a/Makefile
+++ b/Makefile
@@ -50,7 +50,7 @@ SOURCE :=     $(sort $(patsubst $(SRCDIR)/%,%,$(wildcard $(SRCDIR)/crc/*.c)) \
                 gettime-thread.c helpers.c json.c idletime.c td_error.c \
                 profiles/tiobench.c profiles/act.c io_u_queue.c filelock.c \
                 workqueue.c rate-submit.c optgroup.c helper_thread.c \
-               steadystate.c
+               steadystate.c zone-dist.c
  
  ifdef CONFIG_LIBHDFS
    HDFSFLAGS= -I $(JAVA_HOME)/include -I $(JAVA_HOME)/include/linux -I $(FIO_LIBHDFS_INCLUDE)
@@ -101,6 +101,9 @@ endif
  ifdef CONFIG_RBD
    SOURCE += engines/rbd.c
  endif
+ifdef CONFIG_HTTP
+  SOURCE += engines/http.c
+endif
  SOURCE += oslib/asprintf.c
  ifndef CONFIG_STRSEP
    SOURCE += oslib/strsep.c
@@ -142,6 +145,9 @@ endif
  ifdef CONFIG_LIBPMEM
    SOURCE += engines/libpmem.c
  endif
+ifdef CONFIG_IME
+  SOURCE += engines/ime.c
+endif
  
  ifeq ($(CONFIG_TARGET_OS), Linux)
    SOURCE += diskutil.c fifo.c blktrace.c cgroup.c trim.c engines/sg.c \
diff --git a/arch/arch-arm.h b/arch/arch-arm.h

index dd286d04464f20bbb3aabbb3997f8b410af325f1..fc1c4844e5e4c18b0b066243218ae75d0c7ac592 100644 (file)
--- a/arch/arch-arm.h
+++ b/arch/arch-arm.h
@@ -6,7 +6,8 @@
  #if defined (__ARM_ARCH_4__) || defined (__ARM_ARCH_4T__) \
         || defined (__ARM_ARCH_5__) || defined (__ARM_ARCH_5T__) || defined (__ARM_ARCH_5E__)\
         || defined (__ARM_ARCH_5TE__) || defined (__ARM_ARCH_5TEJ__) \
-       || defined(__ARM_ARCH_6__)  || defined(__ARM_ARCH_6J__) || defined(__ARM_ARCH_6Z__) || defined(__ARM_ARCH_6ZK__)
+       || defined(__ARM_ARCH_6__)  || defined(__ARM_ARCH_6J__) || defined(__ARM_ARCH_6Z__) || defined(__ARM_ARCH_6ZK__) \
+       || defined(__ARM_ARCH_6KZ__)
  #define nop             __asm__ __volatile__("mov\tr0,r0\t@ nop\n\t")
  #define read_barrier() __asm__ __volatile__ ("" : : : "memory")
  #define write_barrier()        __asm__ __volatile__ ("" : : : "memory")
diff --git a/backend.c b/backend.c

index f6cfbdd8288230be53136b8ad002439c862118df..36bde6a587535635ee9d1dcef6c0bc0c5160f772 100644 (file)
--- a/backend.c
+++ b/backend.c
@@ -47,6 +47,7 @@
  #include "rate-submit.h"
  #include "helper_thread.h"
  #include "pshared.h"
+#include "zone-dist.h"
  
  static struct fio_sem *startup_sem;
  static struct flist_head *cgroup_list;
@@ -1592,6 +1593,8 @@ static void *thread_main(void *data)
                 goto err;
         }
  
+       td_zone_gen_index(td);
+
         /*
          * Do this early, we don't want the compress threads to be limited
          * to the same CPUs as the IO workers. So do this before we set
@@ -1907,15 +1910,7 @@ err:
         close_ioengine(td);
         cgroup_shutdown(td, cgroup_mnt);
         verify_free_state(td);
-
-       if (td->zone_state_index) {
-               int i;
-
-               for (i = 0; i < DDIR_RWDIR_CNT; i++)
-                       free(td->zone_state_index[i]);
-               free(td->zone_state_index);
-               td->zone_state_index = NULL;
-       }
+       td_zone_free_index(td);
  
         if (fio_option_is_set(o, cpumask)) {
                 ret = fio_cpuset_exit(&o->cpumask);
diff --git a/configure b/configure

index 9bdc7a156e73ab46bd6102615f4dfbfe4619b14f..fb8b2433a7a743255f64e934643aea70552aac29 100755 (executable)
--- a/configure
+++ b/configure
@@ -14,12 +14,13 @@ else
  fi
  
  TMPC="${TMPDIR1}/fio-conf-${RANDOM}-$$-${RANDOM}.c"
+TMPC2="${TMPDIR1}/fio-conf-${RANDOM}-$$-${RANDOM}-2.c"
  TMPO="${TMPDIR1}/fio-conf-${RANDOM}-$$-${RANDOM}.o"
  TMPE="${TMPDIR1}/fio-conf-${RANDOM}-$$-${RANDOM}.exe"
  
  # NB: do not call "exit" in the trap handler; this is buggy with some shells;
  # see <1285349658-3122-1-git-send-email-loic.minier@linaro.org>
-trap "rm -f $TMPC $TMPO $TMPE" EXIT INT QUIT TERM
+trap "rm -f $TMPC $TMPC2 $TMPO $TMPE" EXIT INT QUIT TERM
  
  rm -rf config.log
  
@@ -181,6 +182,8 @@ for opt do
    ;;
    --disable-rbd) disable_rbd="yes"
    ;;
+  --disable-http) disable_http="yes"
+  ;;
    --disable-gfapi) disable_gfapi="yes"
    ;;
    --enable-libhdfs) libhdfs="yes"
@@ -199,6 +202,8 @@ for opt do
    ;;
    --disable-native) disable_native="yes"
    ;;
+  --with-ime=*) ime_path="$optarg"
+  ;;
    --help)
      show_help="yes"
      ;;
@@ -230,6 +235,7 @@ if test "$show_help" = "yes" ; then
    echo "--disable-optimizations Don't enable compiler optimizations"
    echo "--enable-cuda           Enable GPUDirect RDMA support"
    echo "--disable-native        Don't build for native host"
+  echo "--with-ime=             Install path for DDN's Infinite Memory Engine"
    exit $exit_val
  fi
  
@@ -1566,6 +1572,61 @@ if compile_prog "" "" "ipv6"; then
  fi
  print_config "IPv6 helpers" "$ipv6"
  
+##########################################
+# check for http
+if test "$http" != "yes" ; then
+  http="no"
+fi
+# check for openssl >= 1.1.0, which uses an opaque HMAC_CTX pointer
+cat > $TMPC << EOF
+#include <curl/curl.h>
+#include <openssl/hmac.h>
+
+int main(int argc, char **argv)
+{
+  CURL *curl;
+  HMAC_CTX *ctx;
+
+  curl = curl_easy_init();
+  curl_easy_cleanup(curl);
+
+  ctx = HMAC_CTX_new();
+  HMAC_CTX_reset(ctx);
+  HMAC_CTX_free(ctx);
+  return 0;
+}
+EOF
+# openssl < 1.1.0 uses the HMAC_CTX type directly
+cat > $TMPC2 << EOF
+#include <curl/curl.h>
+#include <openssl/hmac.h>
+
+int main(int argc, char **argv)
+{
+  CURL *curl;
+  HMAC_CTX ctx;
+
+  curl = curl_easy_init();
+  curl_easy_cleanup(curl);
+
+  HMAC_CTX_init(&ctx);
+  HMAC_CTX_cleanup(&ctx);
+  return 0;
+}
+EOF
+if test "$disable_http" != "yes"; then
+  HTTP_LIBS="-lcurl -lssl -lcrypto"
+  if compile_prog "" "$HTTP_LIBS" "curl-new-ssl"; then
+    output_sym "CONFIG_HAVE_OPAQUE_HMAC_CTX"
+    http="yes"
+    LIBS="$HTTP_LIBS $LIBS"
+  elif mv $TMPC2 $TMPC && compile_prog "" "$HTTP_LIBS" "curl-old-ssl"; then
+    http="yes"
+    LIBS="$HTTP_LIBS $LIBS"
+  fi
+fi
+print_config "http engine" "$http"
+
  ##########################################
  # check for rados
  if test "$rados" != "yes" ; then
@@ -1904,6 +1965,29 @@ print_config "PMDK dev-dax engine" "$devdax"
  # Report whether libpmem engine is enabled
  print_config "PMDK libpmem engine" "$pmem"
  
+##########################################
+# Check whether we support DDN's IME
+if test "$libime" != "yes" ; then
+  libime="no"
+fi
+cat > $TMPC << EOF
+#include <ime_native.h>
+int main(int argc, char **argv)
+{
+  int rc;
+  ime_native_init();
+  rc = ime_native_finalize();
+  return 0;
+}
+EOF
+if compile_prog "-I${ime_path}/include" "-L${ime_path}/lib -lim_client" "libime"; then
+  libime="yes"
+  CFLAGS="-I${ime_path}/include $CFLAGS"
+  LDFLAGS="-Wl,-rpath ${ime_path}/lib -L${ime_path}/lib $LDFLAGS"
+  LIBS="-lim_client $LIBS"
+fi
+print_config "DDN's Infinite Memory Engine" "$libime"
+
  ##########################################
  # Check if we have lex/yacc available
  yacc="no"
@@ -2346,6 +2430,9 @@ fi
  if test "$ipv6" = "yes" ; then
    output_sym "CONFIG_IPV6"
  fi
+if test "$http" = "yes" ; then
+  output_sym "CONFIG_HTTP"
+fi
  if test "$rados" = "yes" ; then
    output_sym "CONFIG_RADOS"
  fi
@@ -2394,6 +2481,9 @@ fi
  if test "$pmem" = "yes" ; then
    output_sym "CONFIG_LIBPMEM"
  fi
+if test "$libime" = "yes" ; then
+  output_sym "CONFIG_IME"
+fi
  if test "$arith" = "yes" ; then
    output_sym "CONFIG_ARITHMETIC"
    if test "$yacc_is_bison" = "yes" ; then
diff --git a/engines/http.c b/engines/http.c

new file mode 100644 (file)

index 0000000..93fcd0d
--- /dev/null
+++ b/engines/http.c
@@ -0,0 +1,661 @@
+/*
+ * HTTP GET/PUT IO engine
+ *
+ * IO engine to perform HTTP(S) GET/PUT requests via libcurl-easy.
+ *
+ * Copyright (C) 2018 SUSE LLC
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License,
+ * version 2 as published by the Free Software Foundation..
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the Free
+ * Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
+ * Boston, MA 02110-1301, USA.
+ */
+
+#include <pthread.h>
+#include <time.h>
+#include <curl/curl.h>
+#include <openssl/hmac.h>
+#include <openssl/sha.h>
+#include <openssl/md5.h>
+#include "fio.h"
+#include "../optgroup.h"
+
+
+enum {
+       FIO_HTTP_WEBDAV     = 0,
+       FIO_HTTP_S3         = 1,
+       FIO_HTTP_SWIFT      = 2,
+
+       FIO_HTTPS_OFF       = 0,
+       FIO_HTTPS_ON        = 1,
+       FIO_HTTPS_INSECURE  = 2,
+};
+
+struct http_data {
+       CURL *curl;
+};
+
+struct http_options {
+       void *pad;
+       unsigned int https;
+       char *host;
+       char *user;
+       char *pass;
+       char *s3_key;
+       char *s3_keyid;
+       char *s3_region;
+       char *swift_auth_token;
+       int verbose;
+       unsigned int mode;
+};
+
+struct http_curl_stream {
+       char *buf;
+       size_t pos;
+       size_t max;
+};
+
+static struct fio_option options[] = {
+       {
+               .name     = "https",
+               .lname    = "https",
+               .type     = FIO_OPT_STR,
+               .help     = "Enable https",
+               .off1     = offsetof(struct http_options, https),
+               .def      = "off",
+               .posval = {
+                         { .ival = "off",
+                           .oval = FIO_HTTPS_OFF,
+                           .help = "No HTTPS",
+                         },
+                         { .ival = "on",
+                           .oval = FIO_HTTPS_ON,
+                           .help = "Enable HTTPS",
+                         },
+                         { .ival = "insecure",
+                           .oval = FIO_HTTPS_INSECURE,
+                           .help = "Enable HTTPS, disable peer verification",
+                         },
+               },
+               .category = FIO_OPT_C_ENGINE,
+               .group    = FIO_OPT_G_HTTP,
+       },
+       {
+               .name     = "http_host",
+               .lname    = "http_host",
+               .type     = FIO_OPT_STR_STORE,
+               .help     = "Hostname (S3 bucket)",
+               .off1     = offsetof(struct http_options, host),
+               .def      = "localhost",
+               .category = FIO_OPT_C_ENGINE,
+               .group    = FIO_OPT_G_HTTP,
+       },
+       {
+               .name     = "http_user",
+               .lname    = "http_user",
+               .type     = FIO_OPT_STR_STORE,
+               .help     = "HTTP user name",
+               .off1     = offsetof(struct http_options, user),
+               .category = FIO_OPT_C_ENGINE,
+               .group    = FIO_OPT_G_HTTP,
+       },
+       {
+               .name     = "http_pass",
+               .lname    = "http_pass",
+               .type     = FIO_OPT_STR_STORE,
+               .help     = "HTTP password",
+               .off1     = offsetof(struct http_options, pass),
+               .category = FIO_OPT_C_ENGINE,
+               .group    = FIO_OPT_G_HTTP,
+       },
+       {
+               .name     = "http_s3_key",
+               .lname    = "S3 secret key",
+               .type     = FIO_OPT_STR_STORE,
+               .help     = "S3 secret key",
+               .off1     = offsetof(struct http_options, s3_key),
+               .def      = "",
+               .category = FIO_OPT_C_ENGINE,
+               .group    = FIO_OPT_G_HTTP,
+       },
+       {
+               .name     = "http_s3_keyid",
+               .lname    = "S3 key id",
+               .type     = FIO_OPT_STR_STORE,
+               .help     = "S3 key id",
+               .off1     = offsetof(struct http_options, s3_keyid),
+               .def      = "",
+               .category = FIO_OPT_C_ENGINE,
+               .group    = FIO_OPT_G_HTTP,
+       },
+       {
+               .name     = "http_swift_auth_token",
+               .lname    = "Swift auth token",
+               .type     = FIO_OPT_STR_STORE,
+               .help     = "OpenStack Swift auth token",
+               .off1     = offsetof(struct http_options, swift_auth_token),
+               .def      = "",
+               .category = FIO_OPT_C_ENGINE,
+               .group    = FIO_OPT_G_HTTP,
+       },
+       {
+               .name     = "http_s3_region",
+               .lname    = "S3 region",
+               .type     = FIO_OPT_STR_STORE,
+               .help     = "S3 region",
+               .off1     = offsetof(struct http_options, s3_region),
+               .def      = "us-east-1",
+               .category = FIO_OPT_C_ENGINE,
+               .group    = FIO_OPT_G_HTTP,
+       },
+       {
+               .name     = "http_mode",
+               .lname    = "Request mode to use",
+               .type     = FIO_OPT_STR,
+               .help     = "Whether to use WebDAV, Swift, or S3",
+               .off1     = offsetof(struct http_options, mode),
+               .def      = "webdav",
+               .posval = {
+                         { .ival = "webdav",
+                           .oval = FIO_HTTP_WEBDAV,
+                           .help = "WebDAV server",
+                         },
+                         { .ival = "s3",
+                           .oval = FIO_HTTP_S3,
+                           .help = "S3 storage backend",
+                         },
+                         { .ival = "swift",
+                           .oval = FIO_HTTP_SWIFT,
+                           .help = "OpenStack Swift storage",
+                         },
+               },
+               .category = FIO_OPT_C_ENGINE,
+               .group    = FIO_OPT_G_HTTP,
+       },
+       {
+               .name     = "http_verbose",
+               .lname    = "HTTP verbosity level",
+               .type     = FIO_OPT_INT,
+               .help     = "increase http engine verbosity",
+               .off1     = offsetof(struct http_options, verbose),
+               .def      = "0",
+               .category = FIO_OPT_C_ENGINE,
+               .group    = FIO_OPT_G_HTTP,
+       },
+       {
+               .name     = NULL,
+       },
+};
+
+static char *_aws_uriencode(const char *uri)
+{
+       size_t bufsize = 1024;
+       char *r = malloc(bufsize);
+       char c;
+       int i, n;
+       const char *hex = "0123456789ABCDEF";
+
+       if (!r) {
+               log_err("malloc failed\n");
+               return NULL;
+       }
+
+       n = 0;
+       for (i = 0; (c = uri[i]); i++) {
+               if (n > bufsize-5) {
+                       log_err("encoding the URL failed\n");
+                       return NULL;
+               }
+
+               if ( (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z')
+               || (c >= '0' && c <= '9') || c == '_' || c == '-'
+               || c == '~' || c == '.' || c == '/')
+                       r[n++] = c;
+               else {
+                       r[n++] = '%';
+                       r[n++] = hex[(c >> 4 ) & 0xF];
+                       r[n++] = hex[c & 0xF];
+               }
+       }
+       r[n++] = 0;
+       return r;
+}
+
+static char *_conv_hex(const unsigned char *p, size_t len)
+{
+       char *r;
+       int i,n;
+       const char *hex = "0123456789abcdef";
+       r = malloc(len * 2 + 1);
+       n = 0;
+       for (i = 0; i < len; i++) {
+               r[n++] = hex[(p[i] >> 4 ) & 0xF];
+               r[n++] = hex[p[i] & 0xF];
+       }
+       r[n] = 0;
+
+       return r;
+}
+
+static char *_gen_hex_sha256(const char *p, size_t len)
+{
+       unsigned char hash[SHA256_DIGEST_LENGTH];
+
+       SHA256((unsigned char*)p, len, hash);
+       return _conv_hex(hash, SHA256_DIGEST_LENGTH);
+}
+
+static char *_gen_hex_md5(const char *p, size_t len)
+{
+       unsigned char hash[MD5_DIGEST_LENGTH];
+
+       MD5((unsigned char*)p, len, hash);
+       return _conv_hex(hash, MD5_DIGEST_LENGTH);
+}
+
+static void _hmac(unsigned char *md, void *key, int key_len, char *data) {
+#ifndef CONFIG_HAVE_OPAQUE_HMAC_CTX
+       HMAC_CTX _ctx;
+#endif
+       HMAC_CTX *ctx;
+       unsigned int hmac_len;
+
+#ifdef CONFIG_HAVE_OPAQUE_HMAC_CTX
+       ctx = HMAC_CTX_new();
+#else
+       ctx = &_ctx;
+#endif
+       HMAC_Init_ex(ctx, key, key_len, EVP_sha256(), NULL);
+       HMAC_Update(ctx, (unsigned char*)data, strlen(data));
+       HMAC_Final(ctx, md, &hmac_len);
+#ifdef CONFIG_HAVE_OPAQUE_HMAC_CTX
+       HMAC_CTX_free(ctx);
+#else
+       HMAC_CTX_cleanup(ctx);
+#endif
+}
+
+static int _curl_trace(CURL *handle, curl_infotype type,
+            char *data, size_t size,
+            void *userp)
+{
+       const char *text;
+       (void)handle; /* prevent compiler warning */
+       (void)userp;
+
+       switch (type) {
+       case CURLINFO_TEXT:
+       fprintf(stderr, "== Info: %s", data);
+       default:
+       case CURLINFO_SSL_DATA_OUT:
+       case CURLINFO_SSL_DATA_IN:
+               return 0;
+
+       case CURLINFO_HEADER_OUT:
+               text = "=> Send header";
+               break;
+       case CURLINFO_DATA_OUT:
+               text = "=> Send data";
+               break;
+       case CURLINFO_HEADER_IN:
+               text = "<= Recv header";
+               break;
+       case CURLINFO_DATA_IN:
+               text = "<= Recv data";
+               break;
+       }
+
+       log_info("%s: %s", text, data);
+       return 0;
+}
+
+/* https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-header-based-auth.html
+ * https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-authenticating-requests.html#signing-request-intro
+ */
+static void _add_aws_auth_header(CURL *curl, struct curl_slist *slist, struct http_options *o,
+               int op, const char *uri, char *buf, size_t len)
+{
+       char date_short[16];
+       char date_iso[32];
+       char method[8];
+       char dkey[128];
+       char creq[512];
+       char sts[256];
+       char s[512];
+       char *uri_encoded = NULL;
+       char *dsha = NULL;
+       char *csha = NULL;
+       char *signature = NULL;
+       const char *service = "s3";
+       const char *aws = "aws4_request";
+       unsigned char md[SHA256_DIGEST_LENGTH];
+
+       time_t t = time(NULL);
+       struct tm *gtm = gmtime(&t);
+
+       strftime (date_short, sizeof(date_short), "%Y%m%d", gtm);
+       strftime (date_iso, sizeof(date_iso), "%Y%m%dT%H%M%SZ", gtm);
+       uri_encoded = _aws_uriencode(uri);
+
+       if (op == DDIR_WRITE) {
+               dsha = _gen_hex_sha256(buf, len);
+               sprintf(method, "PUT");
+       } else {
+               /* DDIR_READ && DDIR_TRIM supply an empty body */
+               if (op == DDIR_READ)
+                       sprintf(method, "GET");
+               else
+                       sprintf(method, "DELETE");
+               dsha = _gen_hex_sha256("", 0);
+       }
+
+       /* Create the canonical request first */
+       snprintf(creq, sizeof(creq),
+       "%s\n"
+       "%s\n"
+       "\n"
+       "host:%s\n"
+       "x-amz-content-sha256:%s\n"
+       "x-amz-date:%s\n"
+       "\n"
+       "host;x-amz-content-sha256;x-amz-date\n"
+       "%s"
+       , method
+       , uri_encoded, o->host, dsha, date_iso, dsha);
+
+       csha = _gen_hex_sha256(creq, strlen(creq));
+       snprintf(sts, sizeof(sts), "AWS4-HMAC-SHA256\n%s\n%s/%s/%s/%s\n%s",
+               date_iso, date_short, o->s3_region, service, aws, csha);
+
+       snprintf((char *)dkey, sizeof(dkey), "AWS4%s", o->s3_key);
+       _hmac(md, dkey, strlen(dkey), date_short);
+       _hmac(md, md, SHA256_DIGEST_LENGTH, o->s3_region);
+       _hmac(md, md, SHA256_DIGEST_LENGTH, (char*) service);
+       _hmac(md, md, SHA256_DIGEST_LENGTH, (char*) aws);
+       _hmac(md, md, SHA256_DIGEST_LENGTH, sts);
+
+       signature = _conv_hex(md, SHA256_DIGEST_LENGTH);
+
+       /* Surpress automatic Accept: header */
+       slist = curl_slist_append(slist, "Accept:");
+
+       snprintf(s, sizeof(s), "x-amz-content-sha256: %s", dsha);
+       slist = curl_slist_append(slist, s);
+
+       snprintf(s, sizeof(s), "x-amz-date: %s", date_iso);
+       slist = curl_slist_append(slist, s);
+
+       snprintf(s, sizeof(s), "Authorization: AWS4-HMAC-SHA256 Credential=%s/%s/%s/s3/aws4_request,"
+       "SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=%s",
+       o->s3_keyid, date_short, o->s3_region, signature);
+       slist = curl_slist_append(slist, s);
+
+       curl_easy_setopt(curl, CURLOPT_HTTPHEADER, slist);
+
+       free(uri_encoded);
+       free(csha);
+       free(dsha);
+       free(signature);
+}
+
+static void _add_swift_header(CURL *curl, struct curl_slist *slist, struct http_options *o,
+               int op, const char *uri, char *buf, size_t len)
+{
+       char *dsha = NULL;
+       char s[512];
+
+       if (op == DDIR_WRITE) {
+               dsha = _gen_hex_md5(buf, len);
+       }
+       /* Surpress automatic Accept: header */
+       slist = curl_slist_append(slist, "Accept:");
+
+       snprintf(s, sizeof(s), "etag: %s", dsha);
+       slist = curl_slist_append(slist, s);
+
+       snprintf(s, sizeof(s), "x-auth-token: %s", o->swift_auth_token);
+       slist = curl_slist_append(slist, s);
+
+       curl_easy_setopt(curl, CURLOPT_HTTPHEADER, slist);
+
+       free(dsha);
+}
+
+static void fio_http_cleanup(struct thread_data *td)
+{
+       struct http_data *http = td->io_ops_data;
+
+       if (http) {
+               curl_easy_cleanup(http->curl);
+               free(http);
+       }
+}
+
+static size_t _http_read(void *ptr, size_t size, size_t nmemb, void *stream)
+{
+       struct http_curl_stream *state = stream;
+       size_t len = size * nmemb;
+       /* We're retrieving; nothing is supposed to be read locally */
+       if (!stream)
+               return 0;
+       if (len+state->pos > state->max)
+               len = state->max - state->pos;
+       memcpy(ptr, &state->buf[state->pos], len);
+       state->pos += len;
+       return len;
+}
+
+static size_t _http_write(void *ptr, size_t size, size_t nmemb, void *stream)
+{
+       struct http_curl_stream *state = stream;
+       /* We're just discarding the returned body after a PUT */
+       if (!stream)
+               return nmemb;
+       if (size != 1)
+               return CURLE_WRITE_ERROR;
+       if (nmemb + state->pos > state->max)
+               return CURLE_WRITE_ERROR;
+       memcpy(&state->buf[state->pos], ptr, nmemb);
+       state->pos += nmemb;
+       return nmemb;
+}
+
+static int _http_seek(void *stream, curl_off_t offset, int origin)
+{
+       struct http_curl_stream *state = stream;
+       if (offset < state->max && origin == SEEK_SET) {
+               state->pos = offset;
+               return CURL_SEEKFUNC_OK;
+       } else
+               return CURL_SEEKFUNC_FAIL;
+}
+
+static enum fio_q_status fio_http_queue(struct thread_data *td,
+                                        struct io_u *io_u)
+{
+       struct http_data *http = td->io_ops_data;
+       struct http_options *o = td->eo;
+       struct http_curl_stream _curl_stream;
+       struct curl_slist *slist = NULL;
+       char object[512];
+       char url[1024];
+       long status;
+       CURLcode res;
+       int r = -1;
+
+       fio_ro_check(td, io_u);
+       memset(&_curl_stream, 0, sizeof(_curl_stream));
+       snprintf(object, sizeof(object), "%s_%llu_%llu", td->files[0]->file_name,
+               io_u->offset, io_u->xfer_buflen);
+       if (o->https == FIO_HTTPS_OFF)
+               snprintf(url, sizeof(url), "http://%s%s", o->host, object);
+       else
+               snprintf(url, sizeof(url), "https://%s%s", o->host, object);
+       curl_easy_setopt(http->curl, CURLOPT_URL, url);
+       _curl_stream.buf = io_u->xfer_buf;
+       _curl_stream.max = io_u->xfer_buflen;
+       curl_easy_setopt(http->curl, CURLOPT_SEEKDATA, &_curl_stream);
+       curl_easy_setopt(http->curl, CURLOPT_INFILESIZE_LARGE, (curl_off_t)io_u->xfer_buflen);
+
+       if (o->mode == FIO_HTTP_S3)
+               _add_aws_auth_header(http->curl, slist, o, io_u->ddir, object,
+                       io_u->xfer_buf, io_u->xfer_buflen);
+       else if (o->mode == FIO_HTTP_SWIFT)
+               _add_swift_header(http->curl, slist, o, io_u->ddir, object,
+                       io_u->xfer_buf, io_u->xfer_buflen);
+
+       if (io_u->ddir == DDIR_WRITE) {
+               curl_easy_setopt(http->curl, CURLOPT_READDATA, &_curl_stream);
+               curl_easy_setopt(http->curl, CURLOPT_WRITEDATA, NULL);
+               curl_easy_setopt(http->curl, CURLOPT_UPLOAD, 1L);
+               res = curl_easy_perform(http->curl);
+               if (res == CURLE_OK) {
+                       curl_easy_getinfo(http->curl, CURLINFO_RESPONSE_CODE, &status);
+                       if (status == 100 || (status >= 200 && status <= 204))
+                               goto out;
+                       log_err("DDIR_WRITE failed with HTTP status code %ld\n", status);
+                       goto err;
+               }
+       } else if (io_u->ddir == DDIR_READ) {
+               curl_easy_setopt(http->curl, CURLOPT_READDATA, NULL);
+               curl_easy_setopt(http->curl, CURLOPT_WRITEDATA, &_curl_stream);
+               curl_easy_setopt(http->curl, CURLOPT_HTTPGET, 1L);
+               res = curl_easy_perform(http->curl);
+               if (res == CURLE_OK) {
+                       curl_easy_getinfo(http->curl, CURLINFO_RESPONSE_CODE, &status);
+                       if (status == 200)
+                               goto out;
+                       else if (status == 404) {
+                               /* Object doesn't exist. Pretend we read
+                                * zeroes */
+                               memset(io_u->xfer_buf, 0, io_u->xfer_buflen);
+                               goto out;
+                       }
+                       log_err("DDIR_READ failed with HTTP status code %ld\n", status);
+               }
+               goto err;
+       } else if (io_u->ddir == DDIR_TRIM) {
+               curl_easy_setopt(http->curl, CURLOPT_HTTPGET, 1L);
+               curl_easy_setopt(http->curl, CURLOPT_CUSTOMREQUEST, "DELETE");
+               curl_easy_setopt(http->curl, CURLOPT_INFILESIZE_LARGE, (curl_off_t)0);
+               curl_easy_setopt(http->curl, CURLOPT_READDATA, NULL);
+               curl_easy_setopt(http->curl, CURLOPT_WRITEDATA, NULL);
+               res = curl_easy_perform(http->curl);
+               if (res == CURLE_OK) {
+                       curl_easy_getinfo(http->curl, CURLINFO_RESPONSE_CODE, &status);
+                       if (status == 200 || status == 202 || status == 204 || status == 404)
+                               goto out;
+                       log_err("DDIR_TRIM failed with HTTP status code %ld\n", status);
+               }
+               goto err;
+       }
+
+       log_err("WARNING: Only DDIR_READ/DDIR_WRITE/DDIR_TRIM are supported!\n");
+
+err:
+       io_u->error = r;
+       td_verror(td, io_u->error, "transfer");
+out:
+       curl_slist_free_all(slist);
+       return FIO_Q_COMPLETED;
+}
+
+static struct io_u *fio_http_event(struct thread_data *td, int event)
+{
+       /* sync IO engine - never any outstanding events */
+       return NULL;
+}
+
+int fio_http_getevents(struct thread_data *td, unsigned int min,
+       unsigned int max, const struct timespec *t)
+{
+       /* sync IO engine - never any outstanding events */
+       return 0;
+}
+
+static int fio_http_setup(struct thread_data *td)
+{
+       struct http_data *http = NULL;
+       struct http_options *o = td->eo;
+
+       /* allocate engine specific structure to deal with libhttp. */
+       http = calloc(1, sizeof(*http));
+       if (!http) {
+               log_err("calloc failed.\n");
+               goto cleanup;
+       }
+
+       http->curl = curl_easy_init();
+       if (o->verbose)
+               curl_easy_setopt(http->curl, CURLOPT_VERBOSE, 1L);
+       if (o->verbose > 1)
+               curl_easy_setopt(http->curl, CURLOPT_DEBUGFUNCTION, &_curl_trace);
+       curl_easy_setopt(http->curl, CURLOPT_NOPROGRESS, 1L);
+       curl_easy_setopt(http->curl, CURLOPT_FOLLOWLOCATION, 1L);
+       curl_easy_setopt(http->curl, CURLOPT_PROTOCOLS, CURLPROTO_HTTP|CURLPROTO_HTTPS);
+       if (o->https == FIO_HTTPS_INSECURE) {
+               curl_easy_setopt(http->curl, CURLOPT_SSL_VERIFYPEER, 0L);
+               curl_easy_setopt(http->curl, CURLOPT_SSL_VERIFYHOST, 0L);
+       }
+       curl_easy_setopt(http->curl, CURLOPT_READFUNCTION, _http_read);
+       curl_easy_setopt(http->curl, CURLOPT_WRITEFUNCTION, _http_write);
+       curl_easy_setopt(http->curl, CURLOPT_SEEKFUNCTION, &_http_seek);
+       if (o->user && o->pass) {
+               curl_easy_setopt(http->curl, CURLOPT_USERNAME, o->user);
+               curl_easy_setopt(http->curl, CURLOPT_PASSWORD, o->pass);
+               curl_easy_setopt(http->curl, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
+       }
+
+       td->io_ops_data = http;
+
+       /* Force single process mode. */
+       td->o.use_thread = 1;
+
+       return 0;
+cleanup:
+       fio_http_cleanup(td);
+       return 1;
+}
+
+static int fio_http_open(struct thread_data *td, struct fio_file *f)
+{
+       return 0;
+}
+static int fio_http_invalidate(struct thread_data *td, struct fio_file *f)
+{
+       return 0;
+}
+
+static struct ioengine_ops ioengine = {
+       .name = "http",
+       .version                = FIO_IOOPS_VERSION,
+       .flags                  = FIO_DISKLESSIO,
+       .setup                  = fio_http_setup,
+       .queue                  = fio_http_queue,
+       .getevents              = fio_http_getevents,
+       .event                  = fio_http_event,
+       .cleanup                = fio_http_cleanup,
+       .open_file              = fio_http_open,
+       .invalidate             = fio_http_invalidate,
+       .options                = options,
+       .option_struct_size     = sizeof(struct http_options),
+};
+
+static void fio_init fio_http_register(void)
+{
+       register_ioengine(&ioengine);
+}
+
+static void fio_exit fio_http_unregister(void)
+{
+       unregister_ioengine(&ioengine);
+}
diff --git a/engines/ime.c b/engines/ime.c

new file mode 100644 (file)

index 0000000..4298402
--- /dev/null
+++ b/engines/ime.c
@@ -0,0 +1,899 @@
+/*
+ * FIO engines for DDN's Infinite Memory Engine.
+ * This file defines 3 engines: ime_psync, ime_psyncv, and ime_aio
+ *
+ * Copyright (C) 2018      DataDirect Networks. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License,
+ * version 2 as published by the Free Software Foundation..
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+/*
+ * Some details about the new engines are given below:
+ *
+ *
+ * ime_psync:
+ * Most basic engine that issues calls to ime_native whenever an IO is queued.
+ *
+ * ime_psyncv:
+ * This engine tries to queue the IOs (by creating iovecs) if asked by FIO (via
+ * iodepth_batch). It refuses to queue when the iovecs can't be appended, and
+ * waits for FIO to issue a commit. After a call to commit and get_events, new
+ * IOs can be queued.
+ *
+ * ime_aio:
+ * This engine tries to queue the IOs (by creating iovecs) if asked by FIO (via
+ * iodepth_batch). When the iovecs can't be appended to the current request, a
+ * new request for IME is created. These requests will be issued to IME when
+ * commit is called. Contrary to ime_psyncv, there can be several requests at
+ * once. We don't need to wait for a request to terminate before creating a new
+ * one.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <linux/limits.h>
+#include <ime_native.h>
+
+#include "../fio.h"
+
+
+/**************************************************************
+ *              Types and constants definitions
+ *
+ **************************************************************/
+
+/* define constants for async IOs */
+#define FIO_IME_IN_PROGRESS -1
+#define FIO_IME_REQ_ERROR   -2
+
+/* This flag is used when some jobs were created using threads. In that
+   case, IME can't be finalized in the engine-specific cleanup function,
+   because other threads might still use IME. Instead, IME is finalized
+   in the destructor (see fio_ime_unregister), only when the flag
+   fio_ime_is_initialized is true (which means at least one thread has
+   initialized IME). */
+static bool fio_ime_is_initialized = false;
+
+struct imesio_req {
+       int                     fd;             /* File descriptor */
+       enum fio_ddir   ddir;   /* Type of IO (read or write) */
+       off_t                   offset; /* File offset */
+};
+struct imeaio_req {
+       struct ime_aiocb        iocb;                   /* IME aio request */
+       ssize_t                 status;                 /* Status of the IME request */
+       enum fio_ddir           ddir;                   /* Type of IO (read or write) */
+       pthread_cond_t          cond_endio;             /* Condition var to notify FIO */
+       pthread_mutex_t         status_mutex;   /* Mutex for cond_endio */
+};
+
+/* This structure will be used for 2 engines: ime_psyncv and ime_aio */
+struct ime_data {
+       union {
+               struct imeaio_req       *aioreqs;       /* array of aio requests */
+               struct imesio_req       *sioreq;        /* pointer to the only syncio request */
+       };
+       struct iovec    *iovecs;                /* array of queued iovecs */
+       struct io_u     **io_us;                /* array of queued io_u pointers */
+       struct io_u     **event_io_us;  /* array of the events retieved afer get_events*/
+       unsigned int    queued;                 /* iovecs/io_us in the queue */
+       unsigned int    events;                 /* number of committed iovecs/io_us */
+
+       /* variables used to implement a "ring" queue */
+       unsigned int depth;                     /* max entries in the queue */
+       unsigned int head;                      /* index used to append */
+       unsigned int tail;                      /* index used to pop */
+       unsigned int cur_commit;        /* index of the first uncommitted req */
+
+       /* offset used by the last iovec (used to check if the iovecs can be appended)*/
+       unsigned long long      last_offset;
+
+       /* The variables below are used for aio only */
+       struct imeaio_req       *last_req; /* last request awaiting committing */
+};
+
+
+/**************************************************************
+ *         Private functions for queueing/unqueueing
+ *
+ **************************************************************/
+
+static void fio_ime_queue_incr (struct ime_data *ime_d)
+{
+       ime_d->head = (ime_d->head + 1) % ime_d->depth;
+       ime_d->queued++;
+}
+
+static void fio_ime_queue_red (struct ime_data *ime_d)
+{
+       ime_d->tail = (ime_d->tail + 1) % ime_d->depth;
+       ime_d->queued--;
+       ime_d->events--;
+}
+
+static void fio_ime_queue_commit (struct ime_data *ime_d, int iovcnt)
+{
+       ime_d->cur_commit = (ime_d->cur_commit + iovcnt) % ime_d->depth;
+       ime_d->events += iovcnt;
+}
+
+static void fio_ime_queue_reset (struct ime_data *ime_d)
+{
+       ime_d->head = 0;
+       ime_d->tail = 0;
+       ime_d->cur_commit = 0;
+       ime_d->queued = 0;
+       ime_d->events = 0;
+}
+
+/**************************************************************
+ *                   General IME functions
+ *             (needed for both sync and async IOs)
+ **************************************************************/
+
+static char *fio_set_ime_filename(char* filename)
+{
+       static __thread char ime_filename[PATH_MAX];
+       int ret;
+
+       ret = snprintf(ime_filename, PATH_MAX, "%s%s", DEFAULT_IME_FILE_PREFIX, filename);
+       if (ret < PATH_MAX)
+               return ime_filename;
+
+       return NULL;
+}
+
+static int fio_ime_get_file_size(struct thread_data *td, struct fio_file *f)
+{
+       struct stat buf;
+       int ret;
+       char *ime_filename;
+
+       dprint(FD_FILE, "get file size %s\n", f->file_name);
+
+       ime_filename = fio_set_ime_filename(f->file_name);
+       if (ime_filename == NULL)
+               return 1;
+       ret = ime_native_stat(ime_filename, &buf);
+       if (ret == -1) {
+               td_verror(td, errno, "fstat");
+               return 1;
+       }
+
+       f->real_file_size = buf.st_size;
+       return 0;
+}
+
+/* This functions mimics the generic_file_open function, but issues
+   IME native calls instead of POSIX calls. */
+static int fio_ime_open_file(struct thread_data *td, struct fio_file *f)
+{
+       int flags = 0;
+       int ret;
+       uint64_t desired_fs;
+       char *ime_filename;
+
+       dprint(FD_FILE, "fd open %s\n", f->file_name);
+
+       if (td_trim(td)) {
+               td_verror(td, EINVAL, "IME does not support TRIM operation");
+               return 1;
+       }
+
+       if (td->o.oatomic) {
+               td_verror(td, EINVAL, "IME does not support atomic IO");
+               return 1;
+       }
+       if (td->o.odirect)
+               flags |= O_DIRECT;
+       if (td->o.sync_io)
+               flags |= O_SYNC;
+       if (td->o.create_on_open && td->o.allow_create)
+               flags |= O_CREAT;
+
+       if (td_write(td)) {
+               if (!read_only)
+                       flags |= O_RDWR;
+
+               if (td->o.allow_create)
+                       flags |= O_CREAT;
+       } else if (td_read(td)) {
+               flags |= O_RDONLY;
+       } else {
+               /* We should never go here. */
+               td_verror(td, EINVAL, "Unsopported open mode");
+               return 1;
+       }
+
+       ime_filename = fio_set_ime_filename(f->file_name);
+       if (ime_filename == NULL)
+               return 1;
+       f->fd = ime_native_open(ime_filename, flags, 0600);
+       if (f->fd == -1) {
+               char buf[FIO_VERROR_SIZE];
+               int __e = errno;
+
+               snprintf(buf, sizeof(buf), "open(%s)", f->file_name);
+               td_verror(td, __e, buf);
+               return 1;
+       }
+
+       /* Now we need to make sure the real file size is sufficient for FIO
+          to do its things. This is normally done before the file open function
+          is called, but because FIO would use POSIX calls, we need to do it
+          ourselves */
+       ret = fio_ime_get_file_size(td, f);
+       if (ret < 0) {
+               ime_native_close(f->fd);
+               td_verror(td, errno, "ime_get_file_size");
+               return 1;
+       }
+
+       desired_fs = f->io_size + f->file_offset;
+       if (td_write(td)) {
+               dprint(FD_FILE, "Laying out file %s%s\n",
+                       DEFAULT_IME_FILE_PREFIX, f->file_name);
+               if (!td->o.create_on_open &&
+                               f->real_file_size < desired_fs &&
+                               ime_native_ftruncate(f->fd, desired_fs) < 0) {
+                       ime_native_close(f->fd);
+                       td_verror(td, errno, "ime_native_ftruncate");
+                       return 1;
+               }
+               if (f->real_file_size < desired_fs)
+                       f->real_file_size = desired_fs;
+       } else if (td_read(td) && f->real_file_size < desired_fs) {
+               ime_native_close(f->fd);
+               log_err("error: can't read %lu bytes from file with "
+                                               "%lu bytes\n", desired_fs, f->real_file_size);
+               return 1;
+       }
+
+       return 0;
+}
+
+static int fio_ime_close_file(struct thread_data fio_unused *td, struct fio_file *f)
+{
+       int ret = 0;
+
+       dprint(FD_FILE, "fd close %s\n", f->file_name);
+
+       if (ime_native_close(f->fd) < 0)
+               ret = errno;
+
+       f->fd = -1;
+       return ret;
+}
+
+static int fio_ime_unlink_file(struct thread_data *td, struct fio_file *f)
+{
+       char *ime_filename = fio_set_ime_filename(f->file_name);
+       int ret;
+
+       if (ime_filename == NULL)
+               return 1;
+
+       ret = unlink(ime_filename);
+       return ret < 0 ? errno : 0;
+}
+
+static struct io_u *fio_ime_event(struct thread_data *td, int event)
+{
+       struct ime_data *ime_d = td->io_ops_data;
+
+       return ime_d->event_io_us[event];
+}
+
+/* Setup file used to replace get_file_sizes when settin up the file.
+   Instead we will set real_file_sie to 0 for each file. This way we
+   can avoid calling ime_native_init before the forks are created. */
+static int fio_ime_setup(struct thread_data *td)
+{
+       struct fio_file *f;
+       unsigned int i;
+
+       for_each_file(td, f, i) {
+               dprint(FD_FILE, "setup: set file size to 0 for %p/%d/%s\n",
+                       f, i, f->file_name);
+               f->real_file_size = 0;
+       }
+
+       return 0;
+}
+
+static int fio_ime_engine_init(struct thread_data *td)
+{
+       struct fio_file *f;
+       unsigned int i;
+
+       dprint(FD_IO, "ime engine init\n");
+       if (fio_ime_is_initialized && !td->o.use_thread) {
+               log_err("Warning: something might go wrong. Not all threads/forks were"
+                               " created before the FIO jobs were initialized.\n");
+       }
+
+       ime_native_init();
+       fio_ime_is_initialized = true;
+
+       /* We have to temporarily set real_file_size so that
+          FIO can initialize properly. It will be corrected
+          on file open. */
+       for_each_file(td, f, i)
+               f->real_file_size = f->io_size + f->file_offset;
+
+       return 0;
+}
+
+static void fio_ime_engine_finalize(struct thread_data *td)
+{
+       /* Only finalize IME when using forks */
+       if (!td->o.use_thread) {
+               if (ime_native_finalize() < 0)
+                       log_err("error in ime_native_finalize\n");
+               fio_ime_is_initialized = false;
+       }
+}
+
+
+/**************************************************************
+ *             Private functions for blocking IOs
+ *                     (without iovecs)
+ **************************************************************/
+
+/* Notice: this function comes from the sync engine */
+/* It is used by the commit function to return a proper code and fill
+   some attributes in the io_u used for the IO. */
+static int fio_ime_psync_end(struct thread_data *td, struct io_u *io_u, ssize_t ret)
+{
+       if (ret != (ssize_t) io_u->xfer_buflen) {
+               if (ret >= 0) {
+                       io_u->resid = io_u->xfer_buflen - ret;
+                       io_u->error = 0;
+                       return FIO_Q_COMPLETED;
+               } else
+                       io_u->error = errno;
+       }
+
+       if (io_u->error) {
+               io_u_log_error(td, io_u);
+               td_verror(td, io_u->error, "xfer");
+       }
+
+       return FIO_Q_COMPLETED;
+}
+
+static enum fio_q_status fio_ime_psync_queue(struct thread_data *td,
+                                          struct io_u *io_u)
+{
+       struct fio_file *f = io_u->file;
+       ssize_t ret;
+
+       fio_ro_check(td, io_u);
+
+       if (io_u->ddir == DDIR_READ)
+               ret = ime_native_pread(f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
+       else if (io_u->ddir == DDIR_WRITE)
+               ret = ime_native_pwrite(f->fd, io_u->xfer_buf, io_u->xfer_buflen, io_u->offset);
+       else if (io_u->ddir == DDIR_SYNC)
+               ret = ime_native_fsync(f->fd);
+       else {
+               ret = io_u->xfer_buflen;
+               io_u->error = EINVAL;
+       }
+
+       return fio_ime_psync_end(td, io_u, ret);
+}
+
+
+/**************************************************************
+ *             Private functions for blocking IOs
+ *                       (with iovecs)
+ **************************************************************/
+
+static bool fio_ime_psyncv_can_queue(struct ime_data *ime_d, struct io_u *io_u)
+{
+       /* We can only queue if:
+         - There are no queued iovecs
+         - Or if there is at least one:
+                - There must be no event waiting for retrieval
+                - The offsets must be contiguous
+                - The ddir and fd must be the same */
+       return (ime_d->queued == 0 || (
+                       ime_d->events == 0 &&
+                       ime_d->last_offset == io_u->offset &&
+                       ime_d->sioreq->ddir == io_u->ddir &&
+                       ime_d->sioreq->fd == io_u->file->fd));
+}
+
+/* Before using this function, we should have already
+   ensured that the queue is not full */
+static void fio_ime_psyncv_enqueue(struct ime_data *ime_d, struct io_u *io_u)
+{
+       struct imesio_req *ioreq = ime_d->sioreq;
+       struct iovec *iov = &ime_d->iovecs[ime_d->head];
+
+       iov->iov_base = io_u->xfer_buf;
+       iov->iov_len = io_u->xfer_buflen;
+
+       if (ime_d->queued == 0) {
+               ioreq->offset = io_u->offset;
+               ioreq->ddir = io_u->ddir;
+               ioreq->fd = io_u->file->fd;
+       }
+
+       ime_d->io_us[ime_d->head] = io_u;
+       ime_d->last_offset = io_u->offset + io_u->xfer_buflen;
+       fio_ime_queue_incr(ime_d);
+}
+
+/* Tries to queue an IO. It will fail if the IO can't be appended to the
+   current request or if the current request has been committed but not
+   yet retrieved by get_events. */
+static enum fio_q_status fio_ime_psyncv_queue(struct thread_data *td,
+       struct io_u *io_u)
+{
+       struct ime_data *ime_d = td->io_ops_data;
+
+       fio_ro_check(td, io_u);
+
+       if (ime_d->queued == ime_d->depth)
+               return FIO_Q_BUSY;
+
+       if (io_u->ddir == DDIR_READ || io_u->ddir == DDIR_WRITE) {
+               if (!fio_ime_psyncv_can_queue(ime_d, io_u))
+                       return FIO_Q_BUSY;
+
+               dprint(FD_IO, "queue: ddir=%d at %u commit=%u queued=%u events=%u\n",
+                       io_u->ddir, ime_d->head, ime_d->cur_commit,
+                       ime_d->queued, ime_d->events);
+               fio_ime_psyncv_enqueue(ime_d, io_u);
+               return FIO_Q_QUEUED;
+       } else if (io_u->ddir == DDIR_SYNC) {
+               if (ime_native_fsync(io_u->file->fd) < 0) {
+                       io_u->error = errno;
+                       td_verror(td, io_u->error, "fsync");
+               }
+               return FIO_Q_COMPLETED;
+       } else {
+               io_u->error = EINVAL;
+               td_verror(td, io_u->error, "wrong ddir");
+               return FIO_Q_COMPLETED;
+       }
+}
+
+/* Notice: this function comes from the sync engine */
+/* It is used by the commit function to return a proper code and fill
+   some attributes in the io_us appended to the current request. */
+static int fio_ime_psyncv_end(struct thread_data *td, ssize_t bytes)
+{
+       struct ime_data *ime_d = td->io_ops_data;
+       struct io_u *io_u;
+       unsigned int i;
+       int err = errno;
+
+       for (i = 0; i < ime_d->queued; i++) {
+               io_u = ime_d->io_us[i];
+
+               if (bytes == -1)
+                       io_u->error = err;
+               else {
+                       unsigned int this_io;
+
+                       this_io = bytes;
+                       if (this_io > io_u->xfer_buflen)
+                               this_io = io_u->xfer_buflen;
+
+                       io_u->resid = io_u->xfer_buflen - this_io;
+                       io_u->error = 0;
+                       bytes -= this_io;
+               }
+       }
+
+       if (bytes == -1) {
+               td_verror(td, err, "xfer psyncv");
+               return -err;
+       }
+
+       return 0;
+}
+
+/* Commits the current request by calling ime_native (with one or several
+   iovecs). After this commit, the corresponding events (one per iovec)
+   can be retrieved by get_events. */
+static int fio_ime_psyncv_commit(struct thread_data *td)
+{
+       struct ime_data *ime_d = td->io_ops_data;
+       struct imesio_req *ioreq;
+       int ret = 0;
+
+       /* Exit if there are no (new) events to commit
+          or if the previous committed event haven't been retrieved */
+       if (!ime_d->queued || ime_d->events)
+               return 0;
+
+       ioreq = ime_d->sioreq;
+       ime_d->events = ime_d->queued;
+       if (ioreq->ddir == DDIR_READ)
+               ret = ime_native_preadv(ioreq->fd, ime_d->iovecs, ime_d->queued, ioreq->offset);
+       else
+               ret = ime_native_pwritev(ioreq->fd, ime_d->iovecs, ime_d->queued, ioreq->offset);
+
+       dprint(FD_IO, "committed %d iovecs\n", ime_d->queued);
+
+       return fio_ime_psyncv_end(td, ret);
+}
+
+static int fio_ime_psyncv_getevents(struct thread_data *td, unsigned int min,
+                               unsigned int max, const struct timespec *t)
+{
+       struct ime_data *ime_d = td->io_ops_data;
+       struct io_u *io_u;
+       int events = 0;
+       unsigned int count;
+
+       if (ime_d->events) {
+               for (count = 0; count < ime_d->events; count++) {
+                       io_u = ime_d->io_us[count];
+                       ime_d->event_io_us[events] = io_u;
+                       events++;
+               }
+               fio_ime_queue_reset(ime_d);
+       }
+
+       dprint(FD_IO, "getevents(%u,%u) ret=%d queued=%u events=%u\n",
+               min, max, events, ime_d->queued, ime_d->events);
+       return events;
+}
+
+static int fio_ime_psyncv_init(struct thread_data *td)
+{
+       struct ime_data *ime_d;
+
+       if (fio_ime_engine_init(td) < 0)
+               return 1;
+
+       ime_d = calloc(1, sizeof(*ime_d));
+
+       ime_d->sioreq = malloc(sizeof(struct imesio_req));
+       ime_d->iovecs = malloc(td->o.iodepth * sizeof(struct iovec));
+       ime_d->io_us = malloc(2 * td->o.iodepth * sizeof(struct io_u *));
+       ime_d->event_io_us = ime_d->io_us + td->o.iodepth;
+
+       ime_d->depth = td->o.iodepth;
+
+       td->io_ops_data = ime_d;
+       return 0;
+}
+
+static void fio_ime_psyncv_clean(struct thread_data *td)
+{
+       struct ime_data *ime_d = td->io_ops_data;
+
+       if (ime_d) {
+               free(ime_d->sioreq);
+               free(ime_d->iovecs);
+               free(ime_d->io_us);
+               free(ime_d);
+               td->io_ops_data = NULL;
+       }
+
+       fio_ime_engine_finalize(td);
+}
+
+
+/**************************************************************
+ *           Private functions for non-blocking IOs
+ *
+ **************************************************************/
+
+void fio_ime_aio_complete_cb  (struct ime_aiocb *aiocb, int err,
+                                                          ssize_t bytes)
+{
+       struct imeaio_req *ioreq = (struct imeaio_req *) aiocb->user_context;
+
+       pthread_mutex_lock(&ioreq->status_mutex);
+       ioreq->status = err == 0 ? bytes : FIO_IME_REQ_ERROR;
+       pthread_mutex_unlock(&ioreq->status_mutex);
+
+       pthread_cond_signal(&ioreq->cond_endio);
+}
+
+static bool fio_ime_aio_can_queue (struct ime_data *ime_d, struct io_u *io_u)
+{
+       /* So far we can queue in any case. */
+       return true;
+}
+static bool fio_ime_aio_can_append (struct ime_data *ime_d, struct io_u *io_u)
+{
+       /* We can only append if:
+               - The iovecs will be contiguous in the array
+               - There is already a queued iovec
+               - The offsets are contiguous
+               - The ddir and fs are the same */
+       return (ime_d->head != 0 &&
+                       ime_d->queued - ime_d->events > 0 &&
+                       ime_d->last_offset == io_u->offset &&
+                       ime_d->last_req->ddir == io_u->ddir &&
+                       ime_d->last_req->iocb.fd == io_u->file->fd);
+}
+
+/* Before using this function, we should have already
+   ensured that the queue is not full */
+static void fio_ime_aio_enqueue(struct ime_data *ime_d, struct io_u *io_u)
+{
+       struct imeaio_req *ioreq = &ime_d->aioreqs[ime_d->head];
+       struct ime_aiocb *iocb = &ioreq->iocb;
+       struct iovec *iov = &ime_d->iovecs[ime_d->head];
+
+       iov->iov_base = io_u->xfer_buf;
+       iov->iov_len = io_u->xfer_buflen;
+
+       if (fio_ime_aio_can_append(ime_d, io_u))
+               ime_d->last_req->iocb.iovcnt++;
+       else {
+               ioreq->status = FIO_IME_IN_PROGRESS;
+               ioreq->ddir = io_u->ddir;
+               ime_d->last_req = ioreq;
+
+               iocb->complete_cb = &fio_ime_aio_complete_cb;
+               iocb->fd = io_u->file->fd;
+               iocb->file_offset = io_u->offset;
+               iocb->iov = iov;
+               iocb->iovcnt = 1;
+               iocb->flags = 0;
+               iocb->user_context = (intptr_t) ioreq;
+       }
+
+       ime_d->io_us[ime_d->head] = io_u;
+       ime_d->last_offset = io_u->offset + io_u->xfer_buflen;
+       fio_ime_queue_incr(ime_d);
+}
+
+/* Tries to queue an IO. It will create a new request if the IO can't be
+   appended to the current request. It will fail if the queue can't contain
+   any more io_u/iovec. In this case, commit and then get_events need to be
+   called. */
+static enum fio_q_status fio_ime_aio_queue(struct thread_data *td,
+               struct io_u *io_u)
+{
+       struct ime_data *ime_d = td->io_ops_data;
+
+       fio_ro_check(td, io_u);
+
+       dprint(FD_IO, "queue: ddir=%d at %u commit=%u queued=%u events=%u\n",
+               io_u->ddir, ime_d->head, ime_d->cur_commit,
+               ime_d->queued, ime_d->events);
+
+       if (ime_d->queued == ime_d->depth)
+               return FIO_Q_BUSY;
+
+       if (io_u->ddir == DDIR_READ || io_u->ddir == DDIR_WRITE) {
+               if (!fio_ime_aio_can_queue(ime_d, io_u))
+                       return FIO_Q_BUSY;
+
+               fio_ime_aio_enqueue(ime_d, io_u);
+               return FIO_Q_QUEUED;
+       } else if (io_u->ddir == DDIR_SYNC) {
+               if (ime_native_fsync(io_u->file->fd) < 0) {
+                       io_u->error = errno;
+                       td_verror(td, io_u->error, "fsync");
+               }
+               return FIO_Q_COMPLETED;
+       } else {
+               io_u->error = EINVAL;
+               td_verror(td, io_u->error, "wrong ddir");
+               return FIO_Q_COMPLETED;
+       }
+}
+
+static int fio_ime_aio_commit(struct thread_data *td)
+{
+       struct ime_data *ime_d = td->io_ops_data;
+       struct imeaio_req *ioreq;
+       int ret = 0;
+
+       /* Loop while there are events to commit */
+       while (ime_d->queued - ime_d->events) {
+               ioreq = &ime_d->aioreqs[ime_d->cur_commit];
+               if (ioreq->ddir == DDIR_READ)
+                       ret = ime_native_aio_read(&ioreq->iocb);
+               else
+                       ret = ime_native_aio_write(&ioreq->iocb);
+
+               fio_ime_queue_commit(ime_d, ioreq->iocb.iovcnt);
+
+               /* fio needs a negative error code */
+               if (ret < 0) {
+                       ioreq->status = FIO_IME_REQ_ERROR;
+                       return -errno;
+               }
+
+               io_u_mark_submit(td, ioreq->iocb.iovcnt);
+               dprint(FD_IO, "committed %d iovecs commit=%u queued=%u events=%u\n",
+                       ioreq->iocb.iovcnt, ime_d->cur_commit,
+                       ime_d->queued, ime_d->events);
+       }
+
+       return 0;
+}
+
+static int fio_ime_aio_getevents(struct thread_data *td, unsigned int min,
+                               unsigned int max, const struct timespec *t)
+{
+       struct ime_data *ime_d = td->io_ops_data;
+       struct imeaio_req *ioreq;
+       struct io_u *io_u;
+       int events = 0;
+       unsigned int count;
+       ssize_t bytes;
+
+       while (ime_d->events) {
+               ioreq = &ime_d->aioreqs[ime_d->tail];
+
+               /* Break if we already got events, and if we will
+                  exceed max if we append the next events */
+               if (events && events + ioreq->iocb.iovcnt > max)
+                       break;
+
+               if (ioreq->status != FIO_IME_IN_PROGRESS) {
+
+                       bytes = ioreq->status;
+                       for (count = 0; count < ioreq->iocb.iovcnt; count++) {
+                               io_u = ime_d->io_us[ime_d->tail];
+                               ime_d->event_io_us[events] = io_u;
+                               events++;
+                               fio_ime_queue_red(ime_d);
+
+                               if (ioreq->status == FIO_IME_REQ_ERROR)
+                                       io_u->error = EIO;
+                               else {
+                                       io_u->resid = bytes > io_u->xfer_buflen ?
+                                                                       0 : io_u->xfer_buflen - bytes;
+                                       io_u->error = 0;
+                                       bytes -= io_u->xfer_buflen - io_u->resid;
+                               }
+                       }
+               } else {
+                       pthread_mutex_lock(&ioreq->status_mutex);
+                       while (ioreq->status == FIO_IME_IN_PROGRESS)
+                               pthread_cond_wait(&ioreq->cond_endio, &ioreq->status_mutex);
+                       pthread_mutex_unlock(&ioreq->status_mutex);
+               }
+
+       }
+
+       dprint(FD_IO, "getevents(%u,%u) ret=%d queued=%u events=%u\n", min, max,
+               events, ime_d->queued, ime_d->events);
+       return events;
+}
+
+static int fio_ime_aio_init(struct thread_data *td)
+{
+       struct ime_data *ime_d;
+       struct imeaio_req *ioreq;
+       unsigned int i;
+
+       if (fio_ime_engine_init(td) < 0)
+               return 1;
+
+       ime_d = calloc(1, sizeof(*ime_d));
+
+       ime_d->aioreqs = malloc(td->o.iodepth * sizeof(struct imeaio_req));
+       ime_d->iovecs = malloc(td->o.iodepth * sizeof(struct iovec));
+       ime_d->io_us = malloc(2 * td->o.iodepth * sizeof(struct io_u *));
+       ime_d->event_io_us = ime_d->io_us + td->o.iodepth;
+
+       ime_d->depth = td->o.iodepth;
+       for (i = 0; i < ime_d->depth; i++) {
+               ioreq = &ime_d->aioreqs[i];
+               pthread_cond_init(&ioreq->cond_endio, NULL);
+               pthread_mutex_init(&ioreq->status_mutex, NULL);
+       }
+
+       td->io_ops_data = ime_d;
+       return 0;
+}
+
+static void fio_ime_aio_clean(struct thread_data *td)
+{
+       struct ime_data *ime_d = td->io_ops_data;
+       struct imeaio_req *ioreq;
+       unsigned int i;
+
+       if (ime_d) {
+               for (i = 0; i < ime_d->depth; i++) {
+                       ioreq = &ime_d->aioreqs[i];
+                       pthread_cond_destroy(&ioreq->cond_endio);
+                       pthread_mutex_destroy(&ioreq->status_mutex);
+               }
+               free(ime_d->aioreqs);
+               free(ime_d->iovecs);
+               free(ime_d->io_us);
+               free(ime_d);
+               td->io_ops_data = NULL;
+       }
+
+       fio_ime_engine_finalize(td);
+}
+
+
+/**************************************************************
+ *                   IO engines definitions
+ *
+ **************************************************************/
+
+/* The FIO_DISKLESSIO flag used for these engines is necessary to prevent
+   FIO from using POSIX calls. See fio_ime_open_file for more details. */
+
+static struct ioengine_ops ioengine_prw = {
+       .name           = "ime_psync",
+       .version        = FIO_IOOPS_VERSION,
+       .setup          = fio_ime_setup,
+       .init           = fio_ime_engine_init,
+       .cleanup        = fio_ime_engine_finalize,
+       .queue          = fio_ime_psync_queue,
+       .open_file      = fio_ime_open_file,
+       .close_file     = fio_ime_close_file,
+       .get_file_size  = fio_ime_get_file_size,
+       .unlink_file    = fio_ime_unlink_file,
+       .flags          = FIO_SYNCIO | FIO_DISKLESSIO,
+};
+
+static struct ioengine_ops ioengine_pvrw = {
+       .name           = "ime_psyncv",
+       .version        = FIO_IOOPS_VERSION,
+       .setup          = fio_ime_setup,
+       .init           = fio_ime_psyncv_init,
+       .cleanup        = fio_ime_psyncv_clean,
+       .queue          = fio_ime_psyncv_queue,
+       .commit         = fio_ime_psyncv_commit,
+       .getevents      = fio_ime_psyncv_getevents,
+       .event          = fio_ime_event,
+       .open_file      = fio_ime_open_file,
+       .close_file     = fio_ime_close_file,
+       .get_file_size  = fio_ime_get_file_size,
+       .unlink_file    = fio_ime_unlink_file,
+       .flags          = FIO_SYNCIO | FIO_DISKLESSIO,
+};
+
+static struct ioengine_ops ioengine_aio = {
+       .name           = "ime_aio",
+       .version        = FIO_IOOPS_VERSION,
+       .setup          = fio_ime_setup,
+       .init           = fio_ime_aio_init,
+       .cleanup        = fio_ime_aio_clean,
+       .queue          = fio_ime_aio_queue,
+       .commit         = fio_ime_aio_commit,
+       .getevents      = fio_ime_aio_getevents,
+       .event          = fio_ime_event,
+       .open_file      = fio_ime_open_file,
+       .close_file     = fio_ime_close_file,
+       .get_file_size  = fio_ime_get_file_size,
+       .unlink_file    = fio_ime_unlink_file,
+       .flags          = FIO_DISKLESSIO,
+};
+
+static void fio_init fio_ime_register(void)
+{
+       register_ioengine(&ioengine_prw);
+       register_ioengine(&ioengine_pvrw);
+       register_ioengine(&ioengine_aio);
+}
+
+static void fio_exit fio_ime_unregister(void)
+{
+       unregister_ioengine(&ioengine_prw);
+       unregister_ioengine(&ioengine_pvrw);
+       unregister_ioengine(&ioengine_aio);
+
+       if (fio_ime_is_initialized && ime_native_finalize() < 0)
+               log_err("Warning: IME did not finalize properly\n");
+}
diff --git a/eta.c b/eta.c

index 9111f5ec1e670bfc4177c12e24601795be291efb..970a67dfd0ac8d6672758b99d528f32526f26e4e 100644 (file)
--- a/eta.c
+++ b/eta.c
@@ -518,6 +518,63 @@ bool calc_thread_status(struct jobs_eta *je, int force)
         return true;
  }
  
+static int gen_eta_str(struct jobs_eta *je, char *p, size_t left,
+                      char **rate_str, char **iops_str)
+{
+       bool has_r = je->rate[DDIR_READ] || je->iops[DDIR_READ];
+       bool has_w = je->rate[DDIR_WRITE] || je->iops[DDIR_WRITE];
+       bool has_t = je->rate[DDIR_TRIM] || je->iops[DDIR_TRIM];
+       int l = 0;
+
+       if (!has_r && !has_w && !has_t)
+               return 0;
+
+       if (has_r) {
+               l += snprintf(p + l, left - l, "[r=%s", rate_str[DDIR_READ]);
+               if (!has_w)
+                       l += snprintf(p + l, left - l, "]");
+       }
+       if (has_w) {
+               if (has_r)
+                       l += snprintf(p + l, left - l, ",");
+               else
+                       l += snprintf(p + l, left - l, "[");
+               l += snprintf(p + l, left - l, "w=%s", rate_str[DDIR_WRITE]);
+               if (!has_t)
+                       l += snprintf(p + l, left - l, "]");
+       }
+       if (has_t) {
+               if (has_r || has_w)
+                       l += snprintf(p + l, left - l, ",");
+               else if (!has_r && !has_w)
+                       l += snprintf(p + l, left - l, "[");
+               l += snprintf(p + l, left - l, "t=%s]", rate_str[DDIR_TRIM]);
+       }
+       if (has_r) {
+               l += snprintf(p + l, left - l, "[r=%s", iops_str[DDIR_READ]);
+               if (!has_w)
+                       l += snprintf(p + l, left - l, " IOPS]");
+       }
+       if (has_w) {
+               if (has_r)
+                       l += snprintf(p + l, left - l, ",");
+               else
+                       l += snprintf(p + l, left - l, "[");
+               l += snprintf(p + l, left - l, "w=%s", iops_str[DDIR_WRITE]);
+               if (!has_t)
+                       l += snprintf(p + l, left - l, " IOPS]");
+       }
+       if (has_t) {
+               if (has_r || has_w)
+                       l += snprintf(p + l, left - l, ",");
+               else if (!has_r && !has_w)
+                       l += snprintf(p + l, left - l, "[");
+               l += snprintf(p + l, left - l, "t=%s IOPS]", iops_str[DDIR_TRIM]);
+       }
+
+       return l;
+}
+
  void display_thread_status(struct jobs_eta *je)
  {
         static struct timespec disp_eta_new_line;
@@ -592,21 +649,10 @@ void display_thread_status(struct jobs_eta *je)
                 }
  
                 left = sizeof(output) - (p - output) - 1;
+               l = snprintf(p, left, ": [%s][%s]", je->run_str, perc_str);
+               l += gen_eta_str(je, p + l, left - l, rate_str, iops_str);
+               l += snprintf(p + l, left - l, "[eta %s]", eta_str);
  
-               if (je->rate[DDIR_TRIM] || je->iops[DDIR_TRIM])
-                       l = snprintf(p, left,
-                               ": [%s][%s][r=%s,w=%s,t=%s][r=%s,w=%s,t=%s IOPS][eta %s]",
-                               je->run_str, perc_str, rate_str[DDIR_READ],
-                               rate_str[DDIR_WRITE], rate_str[DDIR_TRIM],
-                               iops_str[DDIR_READ], iops_str[DDIR_WRITE],
-                               iops_str[DDIR_TRIM], eta_str);
-               else
-                       l = snprintf(p, left,
-                               ": [%s][%s][r=%s,w=%s][r=%s,w=%s IOPS][eta %s]",
-                               je->run_str, perc_str,
-                               rate_str[DDIR_READ], rate_str[DDIR_WRITE],
-                               iops_str[DDIR_READ], iops_str[DDIR_WRITE],
-                               eta_str);
                 /* If truncation occurred adjust l so p is on the null */
                 if (l >= left)
                         l = left - 1;
diff --git a/examples/http-s3.fio b/examples/http-s3.fio

new file mode 100644 (file)

index 0000000..2dcae36
--- /dev/null
+++ b/examples/http-s3.fio
@@ -0,0 +1,34 @@
+# Example test for the HTTP engine's S3 support against Amazon AWS.
+# Obviously, you have to adjust the S3 credentials; for this example,
+# they're passed in via the environment.
+#
+
+[global]
+ioengine=http
+name=test
+direct=1
+filename=/larsmb-fio-test/object
+http_verbose=0
+https=on
+http_mode=s3
+http_s3_key=${S3_KEY}
+http_s3_keyid=${S3_ID}
+http_host=s3.eu-central-1.amazonaws.com
+http_s3_region=eu-central-1
+group_reporting
+
+# With verify, this both writes and reads the object
+[create]
+rw=write
+bs=4k
+size=64k
+io_size=4k
+verify=sha256
+
+[trim]
+stonewall
+rw=trim
+bs=4k
+size=64k
+io_size=4k
+
diff --git a/examples/http-swift.fio b/examples/http-swift.fio

new file mode 100644 (file)

index 0000000..b591adb
--- /dev/null
+++ b/examples/http-swift.fio
@@ -0,0 +1,32 @@
+[global]
+ioengine=http
+rw=randwrite
+name=test
+direct=1
+http_verbose=0
+http_mode=swift
+https=on
+# This is the hostname and port portion of the public access link for
+# the container:
+http_host=swift.srv.openstack.local:8081
+filename_format=/swift/v1/fio-test/bucket.$jobnum
+group_reporting
+bs=64k
+size=1M
+# Currently, fio cannot yet generate the Swift Auth-Token itself.
+# You need to set this prior to running fio via
+# eval $(openstack token issue -f shell --prefix SWIFT_) ; export SWIFT_id
+http_swift_auth_token=${SWIFT_id}
+
+[create]
+numjobs=1
+rw=randwrite
+io_size=256k
+verify=sha256
+
+# This will delete all created objects again
+[trim]
+stonewall
+numjobs=1
+rw=trim
+io_size=64k
diff --git a/examples/http-webdav.fio b/examples/http-webdav.fio

new file mode 100644 (file)

index 0000000..2d1ca73
--- /dev/null
+++ b/examples/http-webdav.fio
@@ -0,0 +1,26 @@
+[global]
+ioengine=http
+rw=randwrite
+name=test
+direct=1
+http_verbose=0
+http_mode=webdav
+https=off
+http_host=localhost
+filename_format=/dav/bucket.$jobnum
+group_reporting
+bs=64k
+size=1M
+
+[create]
+numjobs=16
+rw=randwrite
+io_size=10M
+verify=sha256
+
+# This will delete all created objects again
+[trim]
+stonewall
+numjobs=16
+rw=trim
+io_size=1M
diff --git a/examples/ime.fio b/examples/ime.fio

new file mode 100644 (file)

index 0000000..e97fd1d
--- /dev/null
+++ b/examples/ime.fio
@@ -0,0 +1,51 @@
+# This jobfile performs basic write+read operations using
+# DDN's Infinite Memory Engine.
+
+[global]
+
+# Use as much jobs as possible to maximize performance
+numjobs=8
+
+# The filename should be uniform so that "read" jobs can read what
+# the "write" jobs have written.
+filename_format=fio-test-ime.$jobnum.$filenum
+
+size=25g
+bs=128k
+
+# These settings are useful for the asynchronous ime_aio engine:
+# by setting the io depth to twice the size of a "batch", we can
+# queue IOs while other IOs are "in-flight".
+iodepth=32
+iodepth_batch=16
+iodepth_batch_complete=16
+
+[write-psync]
+stonewall
+rw=write
+ioengine=ime_psync
+
+[read-psync]
+stonewall
+rw=read
+ioengine=ime_psync
+
+[write-psyncv]
+stonewall
+rw=write
+ioengine=ime_psyncv
+
+[read-psyncv]
+stonewall
+rw=read
+ioengine=ime_psyncv
+
+[write-aio]
+stonewall
+rw=write
+ioengine=ime_aio
+
+[read-aio]
+stonewall
+rw=read
+ioengine=ime_aio
+\ No newline at end of file
diff --git a/fio.1 b/fio.1

index 11f81fb279af500b740d37f7f517bbb59eedb5ff..4071947f24f303edb8739f2b8b048ea9a6ee84d7 100644 (file)
--- a/fio.1
+++ b/fio.1
@@ -168,7 +168,8 @@ Set this \fIcommand\fR as local trigger.
  Set this \fIcommand\fR as remote trigger.
  .TP
  .BI \-\-aux\-path \fR=\fPpath
-Use this \fIpath\fR for fio state generated files.
+Use the directory specified by \fIpath\fP for generated state files instead
+of the current working directory.
  .SH "JOB FILE FORMAT"
  Any parameters following the options will be assumed to be job files, unless
  they match a job file parameter. Multiple job files can be listed and each job
@@ -523,12 +524,15 @@ separating the names with a ':' character. These directories will be
  assigned equally distributed to job clones created by \fBnumjobs\fR as
  long as they are using generated filenames. If specific \fBfilename\fR(s) are
  set fio will use the first listed directory, and thereby matching the
-\fBfilename\fR semantic which generates a file each clone if not specified, but
-let all clones use the same if set.
+\fBfilename\fR semantic (which generates a file for each clone if not
+specified, but lets all clones use the same file if set).
  .RS
  .P
-See the \fBfilename\fR option for information on how to escape ':' and '\'
+See the \fBfilename\fR option for information on how to escape ':' and '\\'
  characters within the directory path itself.
+.P
+Note: To control the directory fio will use for internal state files
+use \fB\-\-aux\-path\fR.
  .RE
  .TP
  .BI filename \fR=\fPstr
@@ -545,13 +549,13 @@ by this option will be \fBsize\fR divided by number of files unless an
  explicit size is specified by \fBfilesize\fR.
  .RS
  .P
-Each colon and backslash in the wanted path must be escaped with a '\'
+Each colon and backslash in the wanted path must be escaped with a '\\'
  character. For instance, if the path is `/dev/dsk/foo@3,0:c' then you
  would use `filename=/dev/dsk/foo@3,0\\:c' and if the path is
-`F:\\\\filename' then you would use `filename=F\\:\\\\filename'.
+`F:\\filename' then you would use `filename=F\\:\\\\filename'.
  .P
-On Windows, disk devices are accessed as `\\\\\\\\.\\\\PhysicalDrive0' for
-the first device, `\\\\\\\\.\\\\PhysicalDrive1' for the second etc.
+On Windows, disk devices are accessed as `\\\\.\\PhysicalDrive0' for
+the first device, `\\\\.\\PhysicalDrive1' for the second etc.
  Note: Windows and FreeBSD prevent write access to areas
  of the disk containing in\-use data (e.g. filesystems).
  .P
@@ -720,15 +724,22 @@ false.
  .BI unlink_each_loop \fR=\fPbool
  Unlink job files after each iteration or loop. Default: false.
  .TP
-.BI zonesize \fR=\fPint
-Divide a file into zones of the specified size. See \fBzoneskip\fR.
+Fio supports strided data access. After having read \fBzonesize\fR bytes from an area that is \fBzonerange\fR bytes big, \fBzoneskip\fR bytes are skipped.
  .TP
  .BI zonerange \fR=\fPint
-Give size of an I/O zone. See \fBzoneskip\fR.
+Size of a single zone in which I/O occurs.
+.TP
+.BI zonesize \fR=\fPint
+Number of bytes to transfer before skipping \fBzoneskip\fR bytes. If this
+parameter is smaller than \fBzonerange\fR then only a fraction of each zone
+with \fBzonerange\fR bytes will be accessed.  If this parameter is larger than
+\fBzonerange\fR then each zone will be accessed multiple times before skipping
+to the next zone.
  .TP
  .BI zoneskip \fR=\fPint
-Skip the specified number of bytes when \fBzonesize\fR data has been
-read. The two zone options can be used to only do I/O on zones of a file.
+Skip the specified number of bytes after \fBzonesize\fR bytes of data have been
+transferred.
+
  .SS "I/O type"
  .TP
  .BI direct \fR=\fPbool
@@ -1597,6 +1608,15 @@ I/O engine supporting direct access to Ceph Rados Block Devices
  (RBD) via librbd without the need to use the kernel rbd driver. This
  ioengine defines engine specific options.
  .TP
+.B http
+I/O engine supporting GET/PUT requests over HTTP(S) with libcurl to
+a WebDAV or S3 endpoint.  This ioengine defines engine specific options.
+
+This engine only supports direct IO of iodepth=1; you need to scale this
+via numjobs. blocksize defines the size of the objects to be created.
+
+TRIM is translated to object deletion.
+.TP
  .B gfapi
  Using GlusterFS libgfapi sync interface to direct access to
  GlusterFS volumes without having to go through FUSE. This ioengine
@@ -1653,6 +1673,20 @@ done other than creating the file.
  Read and write using mmap I/O to a file on a filesystem
  mounted with DAX on a persistent memory device through the PMDK
  libpmem library.
+.TP
+.B ime_psync
+Synchronous read and write using DDN's Infinite Memory Engine (IME). This
+engine is very basic and issues calls to IME whenever an IO is queued.
+.TP
+.B ime_psyncv
+Synchronous read and write using DDN's Infinite Memory Engine (IME). This
+engine uses iovecs and will try to stack as much IOs as possible (if the IOs
+are "contiguous" and the IO depth is not exceeded) before issuing a call to IME.
+.TP
+.B ime_aio
+Asynchronous read and write using DDN's Infinite Memory Engine (IME). This
+engine will try to stack as much IOs as possible by creating requests for IME.
+FIO will then decide when to commit these requests.
  .SS "I/O engine specific parameters"
  In addition, there are some parameters which are only valid when a specific
  \fBioengine\fR is in use. These are used identically to normal parameters,
@@ -1799,6 +1833,43 @@ by default.
  Poll store instead of waiting for completion. Usually this provides better
  throughput at cost of higher(up to 100%) CPU utilization.
  .TP
+.BI (http)http_host \fR=\fPstr
+Hostname to connect to. For S3, this could be the bucket name. Default
+is \fBlocalhost\fR
+.TP
+.BI (http)http_user \fR=\fPstr
+Username for HTTP authentication.
+.TP
+.BI (http)http_pass \fR=\fPstr
+Password for HTTP authentication.
+.TP
+.BI (http)https \fR=\fPstr
+Whether to use HTTPS instead of plain HTTP. \fRon\fP enables HTTPS;
+\fRinsecure\fP will enable HTTPS, but disable SSL peer verification (use
+with caution!).  Default is \fBoff\fR.
+.TP
+.BI (http)http_mode \fR=\fPstr
+Which HTTP access mode to use: webdav, swift, or s3. Default is
+\fBwebdav\fR.
+.TP
+.BI (http)http_s3_region \fR=\fPstr
+The S3 region/zone to include in the request. Default is \fBus-east-1\fR.
+.TP
+.BI (http)http_s3_key \fR=\fPstr
+The S3 secret key.
+.TP
+.BI (http)http_s3_keyid \fR=\fPstr
+The S3 key/access id.
+.TP
+.BI (http)http_swift_auth_token \fR=\fPstr
+The Swift auth token. See the example configuration file on how to
+retrieve this.
+.TP
+.BI (http)http_verbose \fR=\fPint
+Enable verbose requests from libcurl. Useful for debugging. 1 turns on
+verbose logging from libcurl, 2 additionally enables HTTP IO tracing.
+Default is \fB0\fR
+.TP
  .BI (mtd)skip_bad \fR=\fPbool
  Skip operations against known bad blocks.
  .TP
@@ -2612,9 +2683,11 @@ within the file.
  .TP
  .BI write_iops_log \fR=\fPstr
  Same as \fBwrite_bw_log\fR, but writes an IOPS file (e.g.
-`name_iops.x.log') instead. See \fBwrite_bw_log\fR for
-details about the filename format and the \fBLOG FILE FORMATS\fR section for how data
-is structured within the file.
+`name_iops.x.log`) instead. Because fio defaults to individual
+I/O logging, the value entry in the IOPS log will be 1 unless windowed
+logging (see \fBlog_avg_msec\fR) has been enabled. See
+\fBwrite_bw_log\fR for details about the filename format and \fBLOG
+FILE FORMATS\fR for how data is structured within the file.
  .TP
  .BI log_avg_msec \fR=\fPint
  By default, fio will log an entry in the iops, latency, or bw log for every
@@ -3531,17 +3604,16 @@ I/O is a WRITE
  I/O is a TRIM
  .RE
  .P
-The entry's `block size' is always in bytes. The `offset' is the offset, in bytes,
-from the start of the file, for that particular I/O. The logging of the offset can be
+The entry's `block size' is always in bytes. The `offset' is the position in bytes
+from the start of the file for that particular I/O. The logging of the offset can be
  toggled with \fBlog_offset\fR.
  .P
-Fio defaults to logging every individual I/O. When IOPS are logged for individual
-I/Os the `value' entry will always be 1. If windowed logging is enabled through
-\fBlog_avg_msec\fR, fio logs the average values over the specified period of time.
-If windowed logging is enabled and \fBlog_max_value\fR is set, then fio logs
-maximum values in that window instead of averages. Since `data direction', `block size'
-and `offset' are per\-I/O values, if windowed logging is enabled they
-aren't applicable and will be 0.
+Fio defaults to logging every individual I/O but when windowed logging is set
+through \fBlog_avg_msec\fR, either the average (by default) or the maximum
+(\fBlog_max_value\fR is set) `value' seen over the specified period of time
+is recorded. Each `data direction' seen within the window period will aggregate
+its values in a separate row. Further, when using windowed logging the `block
+size' and `offset' entries will always contain 0.
  .SH CLIENT / SERVER
  Normally fio is invoked as a stand\-alone application on the machine where the
  I/O workload should be generated. However, the backend and frontend of fio can
diff --git a/gclient.c b/gclient.c

index 7e5071d65b420315aea022b7d2dccce65dde74da..04275a1384c21ee74f44d13912c282c77660e65d 100644 (file)
--- a/gclient.c
+++ b/gclient.c
@@ -121,7 +121,7 @@ static void gfio_text_op(struct fio_client *client, struct fio_net_cmd *cmd)
         GtkTreeIter iter;
         struct tm *tm;
         time_t sec;
-       char tmp[64], timebuf[80];
+       char tmp[64], timebuf[96];
  
         sec = p->log_sec;
         tm = localtime(&sec);
diff --git a/lib/axmap.c b/lib/axmap.c

index 454af0b9d0361806bbf515c2e9c5c94776db3891..923aae403085ac0cb1ee2efe48aef5b0207de651 100644 (file)
--- a/lib/axmap.c
+++ b/lib/axmap.c
@@ -35,8 +35,6 @@
  #define BLOCKS_PER_UNIT                (1U << UNIT_SHIFT)
  #define BLOCKS_PER_UNIT_MASK   (BLOCKS_PER_UNIT - 1)
  
-#define firstfree_valid(b)     ((b)->first_free != (uint64_t) -1)
-
  static const unsigned long bit_masks[] = {
         0x0000000000000000, 0x0000000000000001, 0x0000000000000003, 0x0000000000000007,
         0x000000000000000f, 0x000000000000001f, 0x000000000000003f, 0x000000000000007f,
@@ -68,7 +66,6 @@ struct axmap_level {
  struct axmap {
         unsigned int nr_levels;
         struct axmap_level *levels;
-       uint64_t first_free;
         uint64_t nr_bits;
  };
  
@@ -89,8 +86,6 @@ void axmap_reset(struct axmap *axmap)
  
                 memset(al->map, 0, al->map_size * sizeof(unsigned long));
         }
-
-       axmap->first_free = 0;
  }
  
  void axmap_free(struct axmap *axmap)
@@ -192,24 +187,6 @@ static bool axmap_handler_topdown(struct axmap *axmap, uint64_t bit_nr,
         return false;
  }
  
-static bool axmap_clear_fn(struct axmap_level *al, unsigned long offset,
-                          unsigned int bit, void *unused)
-{
-       if (!(al->map[offset] & (1UL << bit)))
-               return true;
-
-       al->map[offset] &= ~(1UL << bit);
-       return false;
-}
-
-void axmap_clear(struct axmap *axmap, uint64_t bit_nr)
-{
-       axmap_handler(axmap, bit_nr, axmap_clear_fn, NULL);
-
-       if (bit_nr < axmap->first_free)
-               axmap->first_free = bit_nr;
-}
-
  struct axmap_set_data {
         unsigned int nr_bits;
         unsigned int set_bits;
@@ -262,10 +239,6 @@ static void __axmap_set(struct axmap *axmap, uint64_t bit_nr,
  {
         unsigned int set_bits, nr_bits = data->nr_bits;
  
-       if (axmap->first_free >= bit_nr &&
-           axmap->first_free < bit_nr + data->nr_bits)
-               axmap->first_free = -1ULL;
-
         if (bit_nr > axmap->nr_bits)
                 return;
         else if (bit_nr + nr_bits > axmap->nr_bits)
@@ -336,99 +309,119 @@ bool axmap_isset(struct axmap *axmap, uint64_t bit_nr)
         return false;
  }
  
-static uint64_t axmap_find_first_free(struct axmap *axmap, unsigned int level,
-                                      uint64_t index)
+/*
+ * Find the first free bit that is at least as large as bit_nr.  Return
+ * -1 if no free bit is found before the end of the map.
+ */
+static uint64_t axmap_find_first_free(struct axmap *axmap, uint64_t bit_nr)
  {
-       uint64_t ret = -1ULL;
-       unsigned long j;
         int i;
+       unsigned long temp;
+       unsigned int bit;
+       uint64_t offset, base_index, index;
+       struct axmap_level *al;
  
-       /*
-        * Start at the bottom, then converge towards first free bit at the top
-        */
-       for (i = level; i >= 0; i--) {
-               struct axmap_level *al = &axmap->levels[i];
-
-               if (index >= al->map_size)
-                       goto err;
-
-               for (j = index; j < al->map_size; j++) {
-                       if (al->map[j] == -1UL)
-                               continue;
+       index = 0;
+       for (i = axmap->nr_levels - 1; i >= 0; i--) {
+               al = &axmap->levels[i];
  
-                       /*
-                        * First free bit here is our index into the first
-                        * free bit at the next higher level
-                        */
-                       ret = index = (j << UNIT_SHIFT) + ffz(al->map[j]);
-                       break;
+               /* Shift previously calculated index for next level */
+               index <<= UNIT_SHIFT;
+
+               /*
+                * Start from an index that's at least as large as the
+                * originally passed in bit number.
+                */
+               base_index = bit_nr >> (UNIT_SHIFT * i);
+               if (index < base_index)
+                       index = base_index;
+
+               /* Get the offset and bit for this level */
+               offset = index >> UNIT_SHIFT;
+               bit = index & BLOCKS_PER_UNIT_MASK;
+
+               /*
+                * If the previous level had unused bits in its last
+                * word, the offset could be bigger than the map at
+                * this level. That means no free bits exist before the
+                * end of the map, so return -1.
+                */
+               if (offset >= al->map_size)
+                       return -1ULL;
+
+               /* Check the first word starting with the specific bit */
+               temp = ~bit_masks[bit] & ~al->map[offset];
+               if (temp)
+                       goto found;
+
+               /*
+                * No free bit in the first word, so iterate
+                * looking for a word with one or more free bits.
+                */
+               for (offset++; offset < al->map_size; offset++) {
+                       temp = ~al->map[offset];
+                       if (temp)
+                               goto found;
                 }
-       }
-
-       if (ret < axmap->nr_bits)
-               return ret;
-
-err:
-       return (uint64_t) -1ULL;
-}
-
-static uint64_t axmap_first_free(struct axmap *axmap)
-{
-       if (!firstfree_valid(axmap))
-               axmap->first_free = axmap_find_first_free(axmap, axmap->nr_levels - 1, 0);
-
-       return axmap->first_free;
-}
-
-struct axmap_next_free_data {
-       unsigned int level;
-       unsigned long offset;
-       uint64_t bit;
-};
  
-static bool axmap_next_free_fn(struct axmap_level *al, unsigned long offset,
-                              unsigned int bit, void *__data)
-{
-       struct axmap_next_free_data *data = __data;
-       uint64_t mask = ~bit_masks[(data->bit + 1) & BLOCKS_PER_UNIT_MASK];
-
-       if (!(mask & ~al->map[offset]))
-               return false;
+               /* Did not find a free bit */
+               return -1ULL;
  
-       if (al->map[offset] != -1UL) {
-               data->level = al->level;
-               data->offset = offset;
-               return true;
+found:
+               /* Compute the index of the free bit just found */
+               index = (offset << UNIT_SHIFT) + ffz(~temp);
         }
  
-       data->bit = (data->bit + BLOCKS_PER_UNIT - 1) / BLOCKS_PER_UNIT;
-       return false;
+       /* If found an unused bit in the last word of level 0, return -1 */
+       if (index >= axmap->nr_bits)
+               return -1ULL;
+
+       return index;
  }
  
  /*
   * 'bit_nr' is already set. Find the next free bit after this one.
+ * Return -1 if no free bits found.
   */
  uint64_t axmap_next_free(struct axmap *axmap, uint64_t bit_nr)
  {
-       struct axmap_next_free_data data = { .level = -1U, .bit = bit_nr, };
         uint64_t ret;
+       uint64_t next_bit = bit_nr + 1;
+       unsigned long temp;
+       uint64_t offset;
+       unsigned int bit;
  
-       if (firstfree_valid(axmap) && bit_nr < axmap->first_free)
-               return axmap->first_free;
+       if (bit_nr >= axmap->nr_bits)
+               return -1ULL;
  
-       if (!axmap_handler(axmap, bit_nr, axmap_next_free_fn, &data))
-               return axmap_first_free(axmap);
+       /* If at the end of the map, wrap-around */
+       if (next_bit == axmap->nr_bits)
+               next_bit = 0;
  
-       assert(data.level != -1U);
+       offset = next_bit >> UNIT_SHIFT;
+       bit = next_bit & BLOCKS_PER_UNIT_MASK;
  
         /*
-        * In the rare case that the map is unaligned, we might end up
-        * finding an offset that's beyond the valid end. For that case,
-        * find the first free one, the map is practically full.
+        * As an optimization, do a quick check for a free bit
+        * in the current word at level 0. If not found, do
+        * a topdown search.
          */
-       ret = axmap_find_first_free(axmap, data.level, data.offset);
-       if (ret != -1ULL)
-               return ret;
+       temp = ~bit_masks[bit] & ~axmap->levels[0].map[offset];
+       if (temp) {
+               ret = (offset << UNIT_SHIFT) + ffz(~temp);
+
+               /* Might have found an unused bit at level 0 */
+               if (ret >= axmap->nr_bits)
+                       ret = -1ULL;
+       } else
+               ret = axmap_find_first_free(axmap, next_bit);
  
-       return axmap_first_free(axmap);
+       /*
+        * If there are no free bits starting at next_bit and going
+        * to the end of the map, wrap around by searching again
+        * starting at bit 0.
+        */
+       if (ret == -1ULL && next_bit != 0)
+               ret = axmap_find_first_free(axmap, 0);
+       return ret;
  }
diff --git a/lib/axmap.h b/lib/axmap.h

index a7a6f9429b8163e5c5d239a402d1d9aabd83ce63..55349d8731f2e4edfcc01f7aad025e309782acf6 100644 (file)
--- a/lib/axmap.h
+++ b/lib/axmap.h
@@ -8,7 +8,6 @@ struct axmap;
  struct axmap *axmap_new(unsigned long nr_bits);
  void axmap_free(struct axmap *bm);
  
-void axmap_clear(struct axmap *axmap, uint64_t bit_nr);
  void axmap_set(struct axmap *axmap, uint64_t bit_nr);
  unsigned int axmap_set_nr(struct axmap *axmap, uint64_t bit_nr, unsigned int nr_bits);
  bool axmap_isset(struct axmap *axmap, uint64_t bit_nr);
diff --git a/log.h b/log.h

index b50d4484cb575f295bae3428a7e5bd94574ef750..562f3f42027064898829202025c54ea9047170a4 100644 (file)
--- a/log.h
+++ b/log.h
@@ -3,6 +3,7 @@
  
  #include <stdio.h>
  #include <stdarg.h>
+#include <unistd.h>
  
  #include "lib/output_buffer.h"
  
diff --git a/optgroup.h b/optgroup.h

index d5e968d4208920dcc8212332066642ecd54ae12b..adf4d09bbc18af119f843abd9e59eb6361cea5a2 100644 (file)
--- a/optgroup.h
+++ b/optgroup.h
@@ -56,6 +56,7 @@ enum opt_category_group {
         __FIO_OPT_G_ACT,
         __FIO_OPT_G_LATPROF,
         __FIO_OPT_G_RBD,
+       __FIO_OPT_G_HTTP,
         __FIO_OPT_G_GFAPI,
         __FIO_OPT_G_MTD,
         __FIO_OPT_G_HDFS,
@@ -91,6 +92,7 @@ enum opt_category_group {
         FIO_OPT_G_ACT           = (1ULL << __FIO_OPT_G_ACT),
         FIO_OPT_G_LATPROF       = (1ULL << __FIO_OPT_G_LATPROF),
         FIO_OPT_G_RBD           = (1ULL << __FIO_OPT_G_RBD),
+       FIO_OPT_G_HTTP          = (1ULL << __FIO_OPT_G_HTTP),
         FIO_OPT_G_GFAPI         = (1ULL << __FIO_OPT_G_GFAPI),
         FIO_OPT_G_MTD           = (1ULL << __FIO_OPT_G_MTD),
         FIO_OPT_G_HDFS          = (1ULL << __FIO_OPT_G_HDFS),
diff --git a/options.c b/options.c

index 58395051aa6ce23bb4c553031f7f18d4604f8354..86ab5d6d230cf0df8d218a66aabe90a01adf6cbf 100644 (file)
--- a/options.c
+++ b/options.c
@@ -959,48 +959,6 @@ static int zone_split_ddir(struct thread_options *o, enum fio_ddir ddir,
         return 0;
  }
  
-static void __td_zone_gen_index(struct thread_data *td, enum fio_ddir ddir)
-{
-       unsigned int i, j, sprev, aprev;
-       uint64_t sprev_sz;
-
-       td->zone_state_index[ddir] = malloc(sizeof(struct zone_split_index) * 100);
-
-       sprev_sz = sprev = aprev = 0;
-       for (i = 0; i < td->o.zone_split_nr[ddir]; i++) {
-               struct zone_split *zsp = &td->o.zone_split[ddir][i];
-
-               for (j = aprev; j < aprev + zsp->access_perc; j++) {
-                       struct zone_split_index *zsi = &td->zone_state_index[ddir][j];
-
-                       zsi->size_perc = sprev + zsp->size_perc;
-                       zsi->size_perc_prev = sprev;
-
-                       zsi->size = sprev_sz + zsp->size;
-                       zsi->size_prev = sprev_sz;
-               }
-
-               aprev += zsp->access_perc;
-               sprev += zsp->size_perc;
-               sprev_sz += zsp->size;
-       }
-}
-
-/*
- * Generate state table for indexes, so we don't have to do it inline from
- * the hot IO path
- */
-static void td_zone_gen_index(struct thread_data *td)
-{
-       int i;
-
-       td->zone_state_index = malloc(DDIR_RWDIR_CNT *
-                                       sizeof(struct zone_split_index *));
-
-       for (i = 0; i < DDIR_RWDIR_CNT; i++)
-               __td_zone_gen_index(td, i);
-}
-
  static int parse_zoned_distribution(struct thread_data *td, const char *input,
                                     bool absolute)
  {
@@ -1055,9 +1013,7 @@ static int parse_zoned_distribution(struct thread_data *td, const char *input,
                 return ret;
         }
  
-       if (!ret)
-               td_zone_gen_index(td);
-       else {
+       if (ret) {
                 for (i = 0; i < DDIR_RWDIR_CNT; i++)
                         td->o.zone_split_nr[i] = 0;
         }
@@ -1906,6 +1862,17 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
                           },
  
  #endif
+#ifdef CONFIG_IME
+                         { .ival = "ime_psync",
+                           .help = "DDN's IME synchronous IO engine",
+                         },
+                         { .ival = "ime_psyncv",
+                           .help = "DDN's IME synchronous IO engine using iovecs",
+                         },
+                         { .ival = "ime_aio",
+                           .help = "DDN's IME asynchronous IO engine",
+                         },
+#endif
  #ifdef CONFIG_LINUX_DEVDAX
                           { .ival = "dev-dax",
                             .help = "DAX Device based IO engine",
@@ -1923,6 +1890,11 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
                           { .ival = "libpmem",
                             .help = "PMDK libpmem based IO engine",
                           },
+#endif
+#ifdef CONFIG_HTTP
+                         { .ival = "http",
+                           .help = "HTTP (WebDAV/S3) IO engine",
+                         },
  #endif
                 },
         },
diff --git a/oslib/asprintf.c b/oslib/asprintf.c

index 969479faf4f41cd9fd7f2e156508372cdf89abcc..ff503c52222f5c694584182b70defdf87953c3b7 100644 (file)
--- a/oslib/asprintf.c
+++ b/oslib/asprintf.c
@@ -6,38 +6,38 @@
  #ifndef CONFIG_HAVE_VASPRINTF
  int vasprintf(char **strp, const char *fmt, va_list ap)
  {
-    va_list ap_copy;
-    char *str;
-    int len;
+       va_list ap_copy;
+       char *str;
+       int len;
  
  #ifdef va_copy
-    va_copy(ap_copy, ap);
+       va_copy(ap_copy, ap);
  #else
-    __va_copy(ap_copy, ap);
+       __va_copy(ap_copy, ap);
  #endif
-    len = vsnprintf(NULL, 0, fmt, ap_copy);
-    va_end(ap_copy);
+       len = vsnprintf(NULL, 0, fmt, ap_copy);
+       va_end(ap_copy);
  
-    if (len < 0)
-        return len;
+       if (len < 0)
+               return len;
  
-    len++;
-    str = malloc(len);
-    *strp = str;
-    return str ? vsnprintf(str, len, fmt, ap) : -1;
+       len++;
+       str = malloc(len);
+       *strp = str;
+       return str ? vsnprintf(str, len, fmt, ap) : -1;
  }
  #endif
  
  #ifndef CONFIG_HAVE_ASPRINTF
  int asprintf(char **strp, const char *fmt, ...)
  {
-    va_list arg;
-    int done;
+       va_list arg;
+       int done;
  
-    va_start(arg, fmt);
-    done = vasprintf(strp, fmt, arg);
-    va_end(arg);
+       va_start(arg, fmt);
+       done = vasprintf(strp, fmt, arg);
+       va_end(arg);
  
-    return done;
+       return done;
  }
  #endif
diff --git a/t/axmap.c b/t/axmap.c

index 1512737eda8228a35589122e50cbc1128721a31b..1752439ae66e79a021773cc0d1a581f0c5b4278b 100644 (file)
--- a/t/axmap.c
+++ b/t/axmap.c
@@ -9,8 +9,6 @@ static int test_regular(size_t size, int seed)
  {
         struct fio_lfsr lfsr;
         struct axmap *map;
-       size_t osize;
-       uint64_t ff;
         int err;
  
         printf("Using %llu entries...", (unsigned long long) size);
@@ -18,7 +16,6 @@ static int test_regular(size_t size, int seed)
  
         lfsr_init(&lfsr, size, seed, seed & 0xF);
         map = axmap_new(size);
-       osize = size;
         err = 0;
  
         while (size--) {
@@ -45,11 +42,154 @@ static int test_regular(size_t size, int seed)
         if (err)
                 return err;
  
-       ff = axmap_next_free(map, osize);
-       if (ff != (uint64_t) -1ULL) {
-               printf("axmap_next_free broken: got %llu\n", (unsigned long long) ff);
+       printf("pass!\n");
+       axmap_free(map);
+       return 0;
+}
+
+static int check_next_free(struct axmap *map, uint64_t start, uint64_t expected)
+{
+
+       uint64_t ff;
+
+       ff = axmap_next_free(map, start);
+       if (ff != expected) {
+               printf("axmap_next_free broken: Expected %llu, got %llu\n",
+                               (unsigned long long)expected, (unsigned long long) ff);
                 return 1;
         }
+       return 0;
+}
+
+static int test_next_free(size_t size, int seed)
+{
+       struct fio_lfsr lfsr;
+       struct axmap *map;
+       size_t osize;
+       uint64_t ff, lastfree;
+       int err, i;
+
+       printf("Test next_free %llu entries...", (unsigned long long) size);
+       fflush(stdout);
+
+       map = axmap_new(size);
+       err = 0;
+
+
+       /* Empty map.  Next free after 0 should be 1. */
+       if (check_next_free(map, 0, 1))
+               err = 1;
+
+       /* Empty map.  Next free after 63 should be 64. */
+       if (check_next_free(map, 63, 64))
+               err = 1;
+
+       /* Empty map.  Next free after size - 2 should be size - 1 */
+       if (check_next_free(map, size - 2, size - 1))
+               err = 1;
+
+       /* Empty map.  Next free after size - 1 should be 0 */
+       if (check_next_free(map, size - 1, 0))
+               err = 1;
+
+       /* Empty map.  Next free after 63 should be 64. */
+       if (check_next_free(map, 63, 64))
+               err = 1;
+
+
+       /* Bit 63 set.  Next free after 62 should be 64. */
+       axmap_set(map, 63);
+       if (check_next_free(map, 62, 64))
+               err = 1;
+
+       /* Last bit set.  Next free after size - 2 should be 0. */
+       axmap_set(map, size - 1);
+       if (check_next_free(map, size - 2, 0))
+               err = 1;
+
+       /* Last bit set.  Next free after size - 1 should be 0. */
+       if (check_next_free(map, size - 1, 0))
+               err = 1;
+       
+       /* Last 64 bits set.  Next free after size - 66 or size - 65 should be 0. */
+       for (i=size - 65; i < size; i++)
+               axmap_set(map, i);
+       if (check_next_free(map, size - 66, 0))
+               err = 1;
+       if (check_next_free(map, size - 65, 0))
+               err = 1;
+       
+       /* Last 64 bits set.  Next free after size - 67 should be size - 66. */
+       if (check_next_free(map, size - 67, size - 66))
+               err = 1;
+
+       axmap_free(map);
+       
+       /* Start with a fresh map and mostly fill it up */
+       lfsr_init(&lfsr, size, seed, seed & 0xF);
+       map = axmap_new(size);
+       osize = size;
+
+       /* Leave 1 entry free */
+       size--;
+       while (size--) {
+               uint64_t val;
+
+               if (lfsr_next(&lfsr, &val)) {
+                       printf("lfsr: short loop\n");
+                       err = 1;
+                       break;
+               }
+               if (axmap_isset(map, val)) {
+                       printf("bit already set\n");
+                       err = 1;
+                       break;
+               }
+               axmap_set(map, val);
+               if (!axmap_isset(map, val)) {
+                       printf("bit not set\n");
+                       err = 1;
+                       break;
+               }
+       }
+
+       /* Get last free bit */
+       lastfree = axmap_next_free(map, 0);
+       if (lastfree == -1ULL) {
+               printf("axmap_next_free broken: Couldn't find last free bit\n");
+               err = 1;
+       }
+
+       /* Start with last free bit and test wrap-around */
+       ff = axmap_next_free(map, lastfree);
+       if (ff != lastfree) {
+               printf("axmap_next_free broken: wrap-around test #1 failed\n");
+               err = 1;
+       }
+
+       /* Start with last bit and test wrap-around */
+       ff = axmap_next_free(map, osize - 1);
+       if (ff != lastfree) {
+               printf("axmap_next_free broken: wrap-around test #2 failed\n");
+               err = 1;
+       }
+
+       /* Set last free bit */
+       axmap_set(map, lastfree);
+       ff = axmap_next_free(map, 0);
+       if (ff != -1ULL) {
+               printf("axmap_next_free broken: Expected -1 from full map\n");
+               err = 1;
+       }
+
+       ff = axmap_next_free(map, osize);
+       if (ff != -1ULL) {
+               printf("axmap_next_free broken: Expected -1 from out of bounds request\n");
+               err = 1;
+       }
+
+       if (err)
+               return err;
  
         printf("pass!\n");
         axmap_free(map);
@@ -269,6 +409,16 @@ int main(int argc, char *argv[])
                 return 3;
         if (test_overlap())
                 return 4;
+       if (test_next_free(size, seed))
+               return 5;
+
+       /* Test 3 levels, all full:  64*64*64 */
+       if (test_next_free(64*64*64, seed))
+               return 6;
+
+       /* Test 4 levels, with 2 inner levels not full */
+       if (test_next_free(((((64*64)-63)*64)-63)*64*12, seed))
+               return 7;
  
         return 0;
  }
diff --git a/t/steadystate_tests.py b/t/steadystate_tests.py

new file mode 100755 (executable)

index 0000000..50254dc
--- /dev/null
+++ b/t/steadystate_tests.py
@@ -0,0 +1,226 @@
+#!/usr/bin/python2.7
+# Note: this script is python2 and python 3 compatible.
+#
+# steadystate_tests.py
+#
+# Test option parsing and functonality for fio's steady state detection feature.
+#
+# steadystate_tests.py --read file-for-read-testing --write file-for-write-testing ./fio
+#
+# REQUIREMENTS
+# Python 2.6+
+# SciPy
+#
+# KNOWN ISSUES
+# only option parsing and read tests are carried out
+# On Windows this script works under Cygwin but not from cmd.exe
+# On Windows I encounter frequent fio problems generating JSON output (nothing to decode)
+# min runtime:
+# if ss attained: min runtime = ss_dur + ss_ramp
+# if not attained: runtime = timeout
+
+from __future__ import absolute_import
+from __future__ import print_function
+import os
+import sys
+import json
+import uuid
+import pprint
+import argparse
+import subprocess
+from scipy import stats
+from six.moves import range
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('fio',
+                        help='path to fio executable')
+    parser.add_argument('--read',
+                        help='target for read testing')
+    parser.add_argument('--write',
+                        help='target for write testing')
+    args = parser.parse_args()
+
+    return args
+
+
+def check(data, iops, slope, pct, limit, dur, criterion):
+    measurement = 'iops' if iops else 'bw'
+    data = data[measurement]
+    mean = sum(data) / len(data)
+    if slope:
+        x = list(range(len(data)))
+        m, intercept, r_value, p_value, std_err = stats.linregress(x,data)
+        m = abs(m)
+        if pct:
+            target = m / mean * 100
+            criterion = criterion[:-1]
+        else:
+            target = m
+    else:
+        maxdev = 0
+        for x in data:
+            maxdev = max(abs(mean-x), maxdev)
+        if pct:
+            target = maxdev / mean * 100
+            criterion = criterion[:-1]
+        else:
+            target = maxdev
+
+    criterion = float(criterion)
+    return (abs(target - criterion) / criterion < 0.005), target < limit, mean, target
+
+
+if __name__ == '__main__':
+    args = parse_args()
+
+    pp = pprint.PrettyPrinter(indent=4)
+
+#
+# test option parsing
+#
+    parsing = [ { 'args': ["--parse-only", "--debug=parse", "--ss_dur=10s", "--ss=iops:10", "--ss_ramp=5"],
+                  'output': "set steady state IOPS threshold to 10.000000" },
+                { 'args': ["--parse-only", "--debug=parse", "--ss_dur=10s", "--ss=iops:10%", "--ss_ramp=5"],
+                  'output': "set steady state threshold to 10.000000%" },
+                { 'args': ["--parse-only", "--debug=parse", "--ss_dur=10s", "--ss=iops:.1%", "--ss_ramp=5"],
+                  'output': "set steady state threshold to 0.100000%" },
+                { 'args': ["--parse-only", "--debug=parse", "--ss_dur=10s", "--ss=bw:10%", "--ss_ramp=5"],
+                  'output': "set steady state threshold to 10.000000%" },
+                { 'args': ["--parse-only", "--debug=parse", "--ss_dur=10s", "--ss=bw:.1%", "--ss_ramp=5"],
+                  'output': "set steady state threshold to 0.100000%" },
+                { 'args': ["--parse-only", "--debug=parse", "--ss_dur=10s", "--ss=bw:12", "--ss_ramp=5"],
+                  'output': "set steady state BW threshold to 12" },
+              ]
+    for test in parsing:
+        output = subprocess.check_output([args.fio] + test['args'])
+        if test['output'] in output.decode():
+            print("PASSED '{0}' found with arguments {1}".format(test['output'], test['args']))
+        else:
+            print("FAILED '{0}' NOT found with arguments {1}".format(test['output'], test['args']))
+
+#
+# test some read workloads
+#
+# if ss active and attained,
+#   check that runtime is less than job time
+#   check criteria
+#   how to check ramp time?
+#
+# if ss inactive
+#   check that runtime is what was specified
+#
+    reads = [ {'s': True, 'timeout': 100, 'numjobs': 1, 'ss_dur': 5, 'ss_ramp': 3, 'iops': True, 'slope': True, 'ss_limit': 0.1, 'pct': True},
+              {'s': False, 'timeout': 20, 'numjobs': 2},
+              {'s': True, 'timeout': 100, 'numjobs': 3, 'ss_dur': 10, 'ss_ramp': 5, 'iops': False, 'slope': True, 'ss_limit': 0.1, 'pct': True},
+              {'s': True, 'timeout': 10, 'numjobs': 3, 'ss_dur': 10, 'ss_ramp': 500, 'iops': False, 'slope': True, 'ss_limit': 0.1, 'pct': True},
+            ]
+
+    if args.read == None:
+        if os.name == 'posix':
+            args.read = '/dev/zero'
+            extra = [ "--size=134217728" ]  # 128 MiB
+        else:
+            print("ERROR: file for read testing must be specified on non-posix systems")
+            sys.exit(1)
+    else:
+        extra = []
+
+    jobnum = 0
+    for job in reads:
+
+        tf = uuid.uuid4().hex
+        parameters = [ "--name=job{0}".format(jobnum) ]
+        parameters.extend(extra)
+        parameters.extend([ "--thread",
+                            "--output-format=json",
+                            "--output={0}".format(tf),
+                            "--filename={0}".format(args.read),
+                            "--rw=randrw",
+                            "--rwmixread=100",
+                            "--stonewall",
+                            "--group_reporting",
+                            "--numjobs={0}".format(job['numjobs']),
+                            "--time_based",
+                            "--runtime={0}".format(job['timeout']) ])
+        if job['s']:
+           if job['iops']:
+               ss = 'iops'
+           else:
+               ss = 'bw'
+           if job['slope']:
+               ss += "_slope"
+           ss += ":" + str(job['ss_limit'])
+           if job['pct']:
+               ss += '%'
+           parameters.extend([ '--ss_dur={0}'.format(job['ss_dur']),
+                               '--ss={0}'.format(ss),
+                               '--ss_ramp={0}'.format(job['ss_ramp']) ])
+
+        output = subprocess.call([args.fio] + parameters)
+        with open(tf, 'r') as source:
+            jsondata = json.loads(source.read())
+        os.remove(tf)
+
+        for jsonjob in jsondata['jobs']:
+            line = "job {0}".format(jsonjob['job options']['name'])
+            if job['s']:
+                if jsonjob['steadystate']['attained'] == 1:
+                    # check runtime >= ss_dur + ss_ramp, check criterion, check criterion < limit
+                    mintime = (job['ss_dur'] + job['ss_ramp']) * 1000
+                    actual = jsonjob['read']['runtime']
+                    if mintime > actual:
+                        line = 'FAILED ' + line + ' ss attained, runtime {0} < ss_dur {1} + ss_ramp {2}'.format(actual, job['ss_dur'], job['ss_ramp'])
+                    else:
+                        line = line + ' ss attained, runtime {0} > ss_dur {1} + ss_ramp {2},'.format(actual, job['ss_dur'], job['ss_ramp'])
+                        objsame, met, mean, target = check(data=jsonjob['steadystate']['data'],
+                            iops=job['iops'],
+                            slope=job['slope'],
+                            pct=job['pct'],
+                            limit=job['ss_limit'],
+                            dur=job['ss_dur'],
+                            criterion=jsonjob['steadystate']['criterion'])
+                        if not objsame:
+                            line = 'FAILED ' + line + ' fio criterion {0} != calculated criterion {1} '.format(jsonjob['steadystate']['criterion'], target)
+                        else:
+                            if met:
+                                line = 'PASSED ' + line + ' target {0} < limit {1}'.format(target, job['ss_limit'])
+                            else:
+                                line = 'FAILED ' + line + ' target {0} < limit {1} but fio reports ss not attained '.format(target, job['ss_limit'])
+                else:
+                    # check runtime, confirm criterion calculation, and confirm that criterion was not met
+                    expected = job['timeout'] * 1000
+                    actual = jsonjob['read']['runtime']
+                    if abs(expected - actual) > 10:
+                        line = 'FAILED ' + line + ' ss not attained, expected runtime {0} != actual runtime {1}'.format(expected, actual)
+                    else:
+                        line = line + ' ss not attained, runtime {0} != ss_dur {1} + ss_ramp {2},'.format(actual, job['ss_dur'], job['ss_ramp'])
+                        objsame, met, mean, target = check(data=jsonjob['steadystate']['data'],
+                            iops=job['iops'],
+                            slope=job['slope'],
+                            pct=job['pct'],
+                            limit=job['ss_limit'],
+                            dur=job['ss_dur'],
+                            criterion=jsonjob['steadystate']['criterion'])
+                        if not objsame:
+                            if actual > (job['ss_dur'] + job['ss_ramp'])*1000:
+                                line = 'FAILED ' + line + ' fio criterion {0} != calculated criterion {1} '.format(jsonjob['steadystate']['criterion'], target)
+                            else:
+                                line = 'PASSED ' + line + ' fio criterion {0} == 0.0 since ss_dur + ss_ramp has not elapsed '.format(jsonjob['steadystate']['criterion'])
+                        else:
+                            if met:
+                                line = 'FAILED ' + line + ' target {0} < threshold {1} but fio reports ss not attained '.format(target, job['ss_limit'])
+                            else:
+                                line = 'PASSED ' + line + ' criterion {0} > threshold {1}'.format(target, job['ss_limit'])
+            else:
+                expected = job['timeout'] * 1000
+                actual = jsonjob['read']['runtime']
+                if abs(expected - actual) < 10:
+                    result = 'PASSED '
+                else:
+                    result = 'FAILED '
+                line = result + line + ' no ss, expected runtime {0} ~= actual runtime {1}'.format(expected, actual)
+            print(line)
+            if 'steadystate' in jsonjob:
+                pp.pprint(jsonjob['steadystate'])
+        jobnum += 1
diff --git a/tools/fio.service b/tools/fio.service

index 21de0b7a47fc6f2468438129198c38a61ba26cd6..678158befc57da04dc9306f44b5347a6ffe5974f 100644 (file)
--- a/tools/fio.service
+++ b/tools/fio.service
@@ -1,10 +1,10 @@
  [Unit]
-
-Description=flexible I/O tester server
+Description=Flexible I/O tester server
  After=network.target
  
  [Service]
-
  Type=simple
-PIDFile=/run/fio.pid
  ExecStart=/usr/bin/fio --server
+
+[Install]
+WantedBy=multi-user.target
diff --git a/unit_tests/steadystate_tests.py b/unit_tests/steadystate_tests.py

deleted file mode 100755 (executable)

index 50254dc..0000000
--- a/unit_tests/steadystate_tests.py
+++ /dev/null
@@ -1,226 +0,0 @@
-#!/usr/bin/python2.7
-# Note: this script is python2 and python 3 compatible.
-#
-# steadystate_tests.py
-#
-# Test option parsing and functonality for fio's steady state detection feature.
-#
-# steadystate_tests.py --read file-for-read-testing --write file-for-write-testing ./fio
-#
-# REQUIREMENTS
-# Python 2.6+
-# SciPy
-#
-# KNOWN ISSUES
-# only option parsing and read tests are carried out
-# On Windows this script works under Cygwin but not from cmd.exe
-# On Windows I encounter frequent fio problems generating JSON output (nothing to decode)
-# min runtime:
-# if ss attained: min runtime = ss_dur + ss_ramp
-# if not attained: runtime = timeout
-
-from __future__ import absolute_import
-from __future__ import print_function
-import os
-import sys
-import json
-import uuid
-import pprint
-import argparse
-import subprocess
-from scipy import stats
-from six.moves import range
-
-def parse_args():
-    parser = argparse.ArgumentParser()
-    parser.add_argument('fio',
-                        help='path to fio executable')
-    parser.add_argument('--read',
-                        help='target for read testing')
-    parser.add_argument('--write',
-                        help='target for write testing')
-    args = parser.parse_args()
-
-    return args
-
-
-def check(data, iops, slope, pct, limit, dur, criterion):
-    measurement = 'iops' if iops else 'bw'
-    data = data[measurement]
-    mean = sum(data) / len(data)
-    if slope:
-        x = list(range(len(data)))
-        m, intercept, r_value, p_value, std_err = stats.linregress(x,data)
-        m = abs(m)
-        if pct:
-            target = m / mean * 100
-            criterion = criterion[:-1]
-        else:
-            target = m
-    else:
-        maxdev = 0
-        for x in data:
-            maxdev = max(abs(mean-x), maxdev)
-        if pct:
-            target = maxdev / mean * 100
-            criterion = criterion[:-1]
-        else:
-            target = maxdev
-
-    criterion = float(criterion)
-    return (abs(target - criterion) / criterion < 0.005), target < limit, mean, target
-
-
-if __name__ == '__main__':
-    args = parse_args()
-
-    pp = pprint.PrettyPrinter(indent=4)
-
-#
-# test option parsing
-#
-    parsing = [ { 'args': ["--parse-only", "--debug=parse", "--ss_dur=10s", "--ss=iops:10", "--ss_ramp=5"],
-                  'output': "set steady state IOPS threshold to 10.000000" },
-                { 'args': ["--parse-only", "--debug=parse", "--ss_dur=10s", "--ss=iops:10%", "--ss_ramp=5"],
-                  'output': "set steady state threshold to 10.000000%" },
-                { 'args': ["--parse-only", "--debug=parse", "--ss_dur=10s", "--ss=iops:.1%", "--ss_ramp=5"],
-                  'output': "set steady state threshold to 0.100000%" },
-                { 'args': ["--parse-only", "--debug=parse", "--ss_dur=10s", "--ss=bw:10%", "--ss_ramp=5"],
-                  'output': "set steady state threshold to 10.000000%" },
-                { 'args': ["--parse-only", "--debug=parse", "--ss_dur=10s", "--ss=bw:.1%", "--ss_ramp=5"],
-                  'output': "set steady state threshold to 0.100000%" },
-                { 'args': ["--parse-only", "--debug=parse", "--ss_dur=10s", "--ss=bw:12", "--ss_ramp=5"],
-                  'output': "set steady state BW threshold to 12" },
-              ]
-    for test in parsing:
-        output = subprocess.check_output([args.fio] + test['args'])
-        if test['output'] in output.decode():
-            print("PASSED '{0}' found with arguments {1}".format(test['output'], test['args']))
-        else:
-            print("FAILED '{0}' NOT found with arguments {1}".format(test['output'], test['args']))
-
-#
-# test some read workloads
-#
-# if ss active and attained,
-#   check that runtime is less than job time
-#   check criteria
-#   how to check ramp time?
-#
-# if ss inactive
-#   check that runtime is what was specified
-#
-    reads = [ {'s': True, 'timeout': 100, 'numjobs': 1, 'ss_dur': 5, 'ss_ramp': 3, 'iops': True, 'slope': True, 'ss_limit': 0.1, 'pct': True},
-              {'s': False, 'timeout': 20, 'numjobs': 2},
-              {'s': True, 'timeout': 100, 'numjobs': 3, 'ss_dur': 10, 'ss_ramp': 5, 'iops': False, 'slope': True, 'ss_limit': 0.1, 'pct': True},
-              {'s': True, 'timeout': 10, 'numjobs': 3, 'ss_dur': 10, 'ss_ramp': 500, 'iops': False, 'slope': True, 'ss_limit': 0.1, 'pct': True},
-            ]
-
-    if args.read == None:
-        if os.name == 'posix':
-            args.read = '/dev/zero'
-            extra = [ "--size=134217728" ]  # 128 MiB
-        else:
-            print("ERROR: file for read testing must be specified on non-posix systems")
-            sys.exit(1)
-    else:
-        extra = []
-
-    jobnum = 0
-    for job in reads:
-
-        tf = uuid.uuid4().hex
-        parameters = [ "--name=job{0}".format(jobnum) ]
-        parameters.extend(extra)
-        parameters.extend([ "--thread",
-                            "--output-format=json",
-                            "--output={0}".format(tf),
-                            "--filename={0}".format(args.read),
-                            "--rw=randrw",
-                            "--rwmixread=100",
-                            "--stonewall",
-                            "--group_reporting",
-                            "--numjobs={0}".format(job['numjobs']),
-                            "--time_based",
-                            "--runtime={0}".format(job['timeout']) ])
-        if job['s']:
-           if job['iops']:
-               ss = 'iops'
-           else:
-               ss = 'bw'
-           if job['slope']:
-               ss += "_slope"
-           ss += ":" + str(job['ss_limit'])
-           if job['pct']:
-               ss += '%'
-           parameters.extend([ '--ss_dur={0}'.format(job['ss_dur']),
-                               '--ss={0}'.format(ss),
-                               '--ss_ramp={0}'.format(job['ss_ramp']) ])
-
-        output = subprocess.call([args.fio] + parameters)
-        with open(tf, 'r') as source:
-            jsondata = json.loads(source.read())
-        os.remove(tf)
-
-        for jsonjob in jsondata['jobs']:
-            line = "job {0}".format(jsonjob['job options']['name'])
-            if job['s']:
-                if jsonjob['steadystate']['attained'] == 1:
-                    # check runtime >= ss_dur + ss_ramp, check criterion, check criterion < limit
-                    mintime = (job['ss_dur'] + job['ss_ramp']) * 1000
-                    actual = jsonjob['read']['runtime']
-                    if mintime > actual:
-                        line = 'FAILED ' + line + ' ss attained, runtime {0} < ss_dur {1} + ss_ramp {2}'.format(actual, job['ss_dur'], job['ss_ramp'])
-                    else:
-                        line = line + ' ss attained, runtime {0} > ss_dur {1} + ss_ramp {2},'.format(actual, job['ss_dur'], job['ss_ramp'])
-                        objsame, met, mean, target = check(data=jsonjob['steadystate']['data'],
-                            iops=job['iops'],
-                            slope=job['slope'],
-                            pct=job['pct'],
-                            limit=job['ss_limit'],
-                            dur=job['ss_dur'],
-                            criterion=jsonjob['steadystate']['criterion'])
-                        if not objsame:
-                            line = 'FAILED ' + line + ' fio criterion {0} != calculated criterion {1} '.format(jsonjob['steadystate']['criterion'], target)
-                        else:
-                            if met:
-                                line = 'PASSED ' + line + ' target {0} < limit {1}'.format(target, job['ss_limit'])
-                            else:
-                                line = 'FAILED ' + line + ' target {0} < limit {1} but fio reports ss not attained '.format(target, job['ss_limit'])
-                else:
-                    # check runtime, confirm criterion calculation, and confirm that criterion was not met
-                    expected = job['timeout'] * 1000
-                    actual = jsonjob['read']['runtime']
-                    if abs(expected - actual) > 10:
-                        line = 'FAILED ' + line + ' ss not attained, expected runtime {0} != actual runtime {1}'.format(expected, actual)
-                    else:
-                        line = line + ' ss not attained, runtime {0} != ss_dur {1} + ss_ramp {2},'.format(actual, job['ss_dur'], job['ss_ramp'])
-                        objsame, met, mean, target = check(data=jsonjob['steadystate']['data'],
-                            iops=job['iops'],
-                            slope=job['slope'],
-                            pct=job['pct'],
-                            limit=job['ss_limit'],
-                            dur=job['ss_dur'],
-                            criterion=jsonjob['steadystate']['criterion'])
-                        if not objsame:
-                            if actual > (job['ss_dur'] + job['ss_ramp'])*1000:
-                                line = 'FAILED ' + line + ' fio criterion {0} != calculated criterion {1} '.format(jsonjob['steadystate']['criterion'], target)
-                            else:
-                                line = 'PASSED ' + line + ' fio criterion {0} == 0.0 since ss_dur + ss_ramp has not elapsed '.format(jsonjob['steadystate']['criterion'])
-                        else:
-                            if met:
-                                line = 'FAILED ' + line + ' target {0} < threshold {1} but fio reports ss not attained '.format(target, job['ss_limit'])
-                            else:
-                                line = 'PASSED ' + line + ' criterion {0} > threshold {1}'.format(target, job['ss_limit'])
-            else:
-                expected = job['timeout'] * 1000
-                actual = jsonjob['read']['runtime']
-                if abs(expected - actual) < 10:
-                    result = 'PASSED '
-                else:
-                    result = 'FAILED '
-                line = result + line + ' no ss, expected runtime {0} ~= actual runtime {1}'.format(expected, actual)
-            print(line)
-            if 'steadystate' in jsonjob:
-                pp.pprint(jsonjob['steadystate'])
-        jobnum += 1
diff --git a/zone-dist.c b/zone-dist.c

new file mode 100644 (file)

index 0000000..819d531
--- /dev/null
+++ b/zone-dist.c
@@ -0,0 +1,74 @@
+#include <stdlib.h>
+#include "fio.h"
+#include "zone-dist.h"
+
+static void __td_zone_gen_index(struct thread_data *td, enum fio_ddir ddir)
+{
+       unsigned int i, j, sprev, aprev;
+       uint64_t sprev_sz;
+
+       td->zone_state_index[ddir] = malloc(sizeof(struct zone_split_index) * 100);
+
+       sprev_sz = sprev = aprev = 0;
+       for (i = 0; i < td->o.zone_split_nr[ddir]; i++) {
+               struct zone_split *zsp = &td->o.zone_split[ddir][i];
+
+               for (j = aprev; j < aprev + zsp->access_perc; j++) {
+                       struct zone_split_index *zsi = &td->zone_state_index[ddir][j];
+
+                       zsi->size_perc = sprev + zsp->size_perc;
+                       zsi->size_perc_prev = sprev;
+
+                       zsi->size = sprev_sz + zsp->size;
+                       zsi->size_prev = sprev_sz;
+               }
+
+               aprev += zsp->access_perc;
+               sprev += zsp->size_perc;
+               sprev_sz += zsp->size;
+       }
+}
+
+static bool has_zones(struct thread_data *td)
+{
+       int i, zones = 0;
+
+       for (i = 0; i < DDIR_RWDIR_CNT; i++)
+               zones += td->o.zone_split_nr[i];
+
+       return zones != 0;
+}
+
+/*
+ * Generate state table for indexes, so we don't have to do it inline from
+ * the hot IO path
+ */
+void td_zone_gen_index(struct thread_data *td)
+{
+       int i;
+
+       if (!has_zones(td))
+               return;
+
+       td->zone_state_index = malloc(DDIR_RWDIR_CNT *
+                                       sizeof(struct zone_split_index *));
+
+       for (i = 0; i < DDIR_RWDIR_CNT; i++)
+               __td_zone_gen_index(td, i);
+}
+
+void td_zone_free_index(struct thread_data *td)
+{
+       int i;
+
+       if (!td->zone_state_index)
+               return;
+
+       for (i = 0; i < DDIR_RWDIR_CNT; i++) {
+               free(td->zone_state_index[i]);
+               td->zone_state_index[i] = NULL;
+       }
+
+       free(td->zone_state_index);
+       td->zone_state_index = NULL;
+}
diff --git a/zone-dist.h b/zone-dist.h

new file mode 100644 (file)

index 0000000..c0b2884
--- /dev/null
+++ b/zone-dist.h
@@ -0,0 +1,7 @@
+#ifndef FIO_ZONE_DIST_H
+#define FIO_ZONE_DIST_H
+
+void td_zone_gen_index(struct thread_data *td);
+void td_zone_free_index(struct thread_data *td);
+
+#endif
author	Jens Axboe <axboe@kernel.dk>
	Mon, 20 Aug 2018 14:32:29 +0000 (08:32 -0600)
committer	Jens Axboe <axboe@kernel.dk>
	Mon, 20 Aug 2018 14:32:29 +0000 (08:32 -0600)
.travis.yml		patch \| blob \| blame \| history
HOWTO		patch \| blob \| blame \| history
Makefile		patch \| blob \| blame \| history
arch/arch-arm.h		patch \| blob \| blame \| history
backend.c		patch \| blob \| blame \| history
configure		patch \| blob \| blame \| history
engines/http.c	[new file with mode: 0644]	patch \| blob
engines/ime.c	[new file with mode: 0644]	patch \| blob
eta.c		patch \| blob \| blame \| history
examples/http-s3.fio	[new file with mode: 0644]	patch \| blob
examples/http-swift.fio	[new file with mode: 0644]	patch \| blob
examples/http-webdav.fio	[new file with mode: 0644]	patch \| blob
examples/ime.fio	[new file with mode: 0644]	patch \| blob
fio.1		patch \| blob \| blame \| history
gclient.c		patch \| blob \| blame \| history
lib/axmap.c		patch \| blob \| blame \| history
lib/axmap.h		patch \| blob \| blame \| history
log.h		patch \| blob \| blame \| history
optgroup.h		patch \| blob \| blame \| history
options.c		patch \| blob \| blame \| history
oslib/asprintf.c		patch \| blob \| blame \| history
t/axmap.c		patch \| blob \| blame \| history
t/steadystate_tests.py	[new file with mode: 0755]	patch \| blob
tools/fio.service		patch \| blob \| blame \| history
unit_tests/steadystate_tests.py	[deleted file]	patch \| blob \| blame \| history
zone-dist.c	[new file with mode: 0644]	patch \| blob
zone-dist.h	[new file with mode: 0644]	patch \| blob