From: Manish Mandlik <manishm@fb.com>
Date: Thu, 14 Aug 2014 17:45:16 +0000 (-0600)
Subject: Update libhdfs engine documention and options
X-Git-Tag: fio-2.1.12~9
X-Git-Url: https://git.kernel.dk/?p=fio.git;a=commitdiff_plain;h=b74e419ec6152ae2dd4b9f36c2559961f4fab5cf

Update libhdfs engine documention and options

Signed-off-by: Jens Axboe <axboe@fb.com>
---

diff --git a/HOWTO b/HOWTO
index d7283535..a0b89c80 100644
--- a/HOWTO
+++ b/HOWTO
@@ -694,7 +694,21 @@ ioengine=str	Defines how the job issues io to the file. The following
 				having to go through FUSE. This ioengine
 				defines engine specific options.
 
-			hdfs	Read and write through Hadoop (HDFS).
+			libhdfs	Read and write through Hadoop (HDFS).
+				The 'filename' option is used to specify host,
+				port of the hdfs name-node to connect. This
+				engine interprets offsets a little
+				differently. In HDFS, files once created
+				cannot be modified. So random writes are not
+				possible. To imitate this, libhdfs engine
+				expects bunch of small files to be created
+				over HDFS, and engine will randomly pick a
+				file out of those files based on the offset
+				generated by fio backend. (see the example
+				job file to create such files, use rw=write
+				option). Please note, you might want to set
+				necessary environment variables to work with
+				hdfs/libhdfs properly.
 
 			external Prefix to specify loading an external
 				IO engine object file. Append the engine
diff --git a/examples/libhdfs.fio b/examples/libhdfs.fio
new file mode 100644
index 00000000..d5c0ba66
--- /dev/null
+++ b/examples/libhdfs.fio
@@ -0,0 +1,8 @@
+[global]
+runtime=300
+
+[hdfs]
+filename=dfs-perftest-base.dfs-perftest-base,9000
+ioengine=libhdfs
+rw=read
+bs=256k
diff --git a/fio.1 b/fio.1
index b5ff3ccb..c61948bb 100644
--- a/fio.1
+++ b/fio.1
@@ -613,8 +613,16 @@ Using Glusterfs libgfapi async interface to direct access to Glusterfs volumes w
 having to go through FUSE. This ioengine defines engine specific
 options.
 .TP
-.B hdfs
-Read and write through Hadoop (HDFS)
+.B libhdfs
+Read and write through Hadoop (HDFS).  The \fBfilename\fR option is used to
+specify host,port of the hdfs name-node to connect. This engine interprets
+offsets a little differently. In HDFS, files once created cannot be modified.
+So random writes are not possible. To imitate this, libhdfs engine expects
+bunch of small files to be created over HDFS, and engine will randomly pick a
+file out of those files based on the offset generated by fio backend. (see the
+example job file to create such files, use rw=write option). Please note, you
+might want to set necessary environment variables to work with hdfs/libhdfs
+properly.
 .RE
 .P
 .RE
diff --git a/options.c b/options.c
index 484efc1a..3acfdc86 100644
--- a/options.c
+++ b/options.c
@@ -672,7 +672,7 @@ static int str_numa_mpol_cb(void *data, char *input)
 		}
 		td->o.numa_memnodes = strdup(nodelist);
 		numa_free_nodemask(verify_bitmask);
-                
+
 		break;
 	case MPOL_LOCAL:
 	case MPOL_DEFAULT:
@@ -1542,7 +1542,7 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 			  },
 #endif
 #ifdef CONFIG_LIBHDFS
-			  { .ival = "hdfs",
+			  { .ival = "libhdfs",
 			    .help = "Hadoop Distributed Filesystem (HDFS) engine"
 			  },
 #endif