tools/hist/fiologparser_hist.py.1

   1 .TH fiologparser_hist.py 1 "August 18, 2016"
   2 .SH NAME
   3 fiologparser_hist.py \- Calculate statistics from fio histograms
   4 .SH SYNOPSIS
   5 .B fiologparser_hist.py
   6 [\fIoptions\fR] [clat_hist_files]...
   7 .SH DESCRIPTION
   8 .B fiologparser_hist.py
   9 is a utility for converting *_clat_hist* files
  10 generated by fio into a CSV of latency statistics including minimum,
  11 average, maximum latency, and 50th, 95th, and 99th percentiles.
  12 .SH EXAMPLES
  13 .PP
  14 .nf
  15 $ fiologparser_hist.py *_clat_hist*
  16 end-time, samples, min, avg, median, 90%, 95%, 99%, max
  17 1000, 15, 192, 1678.107, 1788.859, 1856.076, 1880.040, 1899.208, 1888.000
  18 2000, 43, 152, 1642.368, 1714.099, 1816.659, 1845.552, 1888.131, 1888.000
  19 4000, 39, 1152, 1546.962, 1545.785, 1627.192, 1640.019, 1691.204, 1744
  20 ...
  21 .fi
  22 .PP
  23
  24 .SH OPTIONS
  25 .TP
  26 .BR \-\-help
  27 Print these options.
  28 .TP
  29 .BR \-\-buff_size \fR=\fPint
  30 Number of samples to buffer into numpy at a time. Default is 10,000.
  31 This can be adjusted to help performance.
  32 .TP
  33 .BR \-\-max_latency \fR=\fPint
  34 Number of seconds of data to process at a time. Defaults to 20 seconds,
  35 in order to handle the 17 second upper bound on latency in histograms
  36 reported by fio. This should be increased if fio has been
  37 run with a larger maximum latency. Lowering this when a lower maximum
  38 latency is known can improve performance. See NOTES for more details.
  39 .TP
  40 .BR \-i ", " \-\-interval \fR=\fPint
  41 Interval at which statistics are reported. Defaults to 1000 ms. This
  42 should be set a minimum of the value for \fBlog_hist_msec\fR as given
  43 to fio.
  44 .TP
  45 .BR \-d ", " \-\-divisor \fR=\fPint
  46 Divide statistics by this value. Defaults to 1. Useful if you want to
  47 convert latencies from milliseconds to seconds (\fBdivisor\fR=\fP1000\fR).
  48 .TP
  49 .BR \-\-warn
  50 Enables warning messages printed to stderr, useful for debugging.
  51 .TP
  52 .BR \-\-group_nr \fR=\fPint
  53 Set this to the value of \fIFIO_IO_U_PLAT_GROUP_NR\fR as defined in
  54 \fPstat.h\fR if fio has been recompiled. Defaults to 19, the
  55 current value used in fio. See NOTES for more details.
  56
  57 .SH NOTES
  58 end-times are calculated to be uniform increments of the \fB\-\-interval\fR value given,
  59 regardless of when histogram samples are reported. Of note:
  60
  61 .RS
  62 Intervals with no samples are omitted. In the example above this means
  63 "no statistics from 2 to 3 seconds" and "39 samples influenced the statistics
  64 of the interval from 3 to 4 seconds".
  65 .LP
  66 Intervals with a single sample will have the same value for all statistics
  67 .RE
  68
  69 .PP
  70 The number of samples is unweighted, corresponding to the total number of samples
  71 which have any effect whatsoever on the interval.
  72
  73 Min statistics are computed using value of the lower boundary of the first bin
  74 (in increasing bin order) with non-zero samples in it. Similarly for max,
  75 we take the upper boundary of the last bin with non-zero samples in it.
  76 This is semantically identical to taking the 0th and 100th percentiles with a
  77 50% bin-width buffer (because percentiles are computed using mid-points of
  78 the bins). This enforces the following nice properties:
  79
  80 .RS
  81 min <= 50th <= 90th <= 95th <= 99th <= max
  82 .LP
  83 min and max are strict lower and upper bounds on the actual
  84 min / max seen by fio (and reported in *_clat.* with averaging turned off).
  85 .RE
  86
  87 .PP
  88 Average statistics use a standard weighted arithmetic mean.
  89
  90 Percentile statistics are computed using the weighted percentile method as
  91 described here: \fIhttps://en.wikipedia.org/wiki/Percentile#Weighted_percentile\fR.
  92 See weights() method for details on how weights are computed for individual
  93 samples. In process_interval() we further multiply by the height of each bin
  94 to get weighted histograms.
  95
  96 We convert files given on the command line, assumed to be fio histogram files,
  97 An individual histogram file can contain the
  98 histograms for multiple different r/w directions (notably when \fB\-\-rw\fR=\fPrandrw\fR). This
  99 is accounted for by tracking each r/w direction separately. In the statistics
 100 reported we ultimately merge *all* histograms (regardless of r/w direction).
 101
 102 The value of *_GROUP_NR in \fIstat.h\fR (and *_BITS) determines how many latency bins
 103 fio outputs when histogramming is enabled. Namely for the current default of
 104 GROUP_NR=19, we get 1,216 bins with a maximum latency of approximately 17
 105 seconds. For certain applications this may not be sufficient. With GROUP_NR=24
 106 we have 1,536 bins, giving us a maximum latency of 541 seconds (~ 9 minutes). If
 107 you expect your application to experience latencies greater than 17 seconds,
 108 you will need to recompile fio with a larger GROUP_NR, e.g. with:
 109
 110 .RS
 111 .PP
 112 .nf
 113 sed -i.bak 's/^#define FIO_IO_U_PLAT_GROUP_NR 19\n/#define FIO_IO_U_PLAT_GROUP_NR 24/g' stat.h
 114 make fio
 115 .fi
 116 .PP
 117 .RE
 118
 119 .PP
 120 Quick reference table for the max latency corresponding to a sampling of
 121 values for GROUP_NR:
 122
 123 .RS
 124 .PP
 125 .nf
 126 GROUP_NR | # bins | max latency bin value
 127 19       | 1216   | 16.9 sec
 128 20       | 1280   | 33.8 sec
 129 21       | 1344   | 67.6 sec
 130 22       | 1408   | 2  min, 15 sec
 131 23       | 1472   | 4  min, 32 sec
 132 24       | 1536   | 9  min, 4  sec
 133 25       | 1600   | 18 min, 8  sec
 134 26       | 1664   | 36 min, 16 sec
 135 .fi
 136 .PP
 137 .RE
 138
 139 .PP
 140 At present this program automatically detects the number of histogram bins in
 141 the log files, and adjusts the bin latency values accordingly. In particular if
 142 you use the \fB\-\-log_hist_coarseness\fR parameter of fio, you get output files with
 143 a number of bins according to the following table (note that the first
 144 row is identical to the table above):
 145
 146 .RS
 147 .PP
 148 .nf
 149 coarse \\ GROUP_NR
 150         19     20    21     22     23     24     25     26
 151    -------------------------------------------------------
 152   0  [[ 1216,  1280,  1344,  1408,  1472,  1536,  1600,  1664],
 153   1   [  608,   640,   672,   704,   736,   768,   800,   832],
 154   2   [  304,   320,   336,   352,   368,   384,   400,   416],
 155   3   [  152,   160,   168,   176,   184,   192,   200,   208],
 156   4   [   76,    80,    84,    88,    92,    96,   100,   104],
 157   5   [   38,    40,    42,    44,    46,    48,    50,    52],
 158   6   [   19,    20,    21,    22,    23,    24,    25,    26],
 159   7   [  N/A,    10,   N/A,    11,   N/A,    12,   N/A,    13],
 160   8   [  N/A,     5,   N/A,   N/A,   N/A,     6,   N/A,   N/A]]
 161 .fi
 162 .PP
 163 .RE
 164
 165 .PP
 166 For other values of GROUP_NR and coarseness, this table can be computed like this:
 167
 168 .RS
 169 .PP
 170 .nf
 171 bins = [1216,1280,1344,1408,1472,1536,1600,1664]
 172 max_coarse = 8
 173 fncn = lambda z: list(map(lambda x: z/2**x if z % 2**x == 0 else nan, range(max_coarse + 1)))
 174 np.transpose(list(map(fncn, bins)))
 175 .fi
 176 .PP
 177 .RE
 178
 179 .PP
 180 If you have not adjusted GROUP_NR for your (high latency) application, then you
 181 will see the percentiles computed by this tool max out at the max latency bin
 182 value as in the first table above, and in this plot (where GROUP_NR=19 and thus we see
 183 a max latency of ~16.7 seconds in the red line):
 184
 185 .RS
 186 \fIhttps://www.cronburg.com/fio/max_latency_bin_value_bug.png
 187 .RE
 188
 189 .PP
 190 Motivation for, design decisions, and the implementation process are
 191 described in further detail here:
 192
 193 .RS
 194 \fIhttps://www.cronburg.com/fio/cloud-latency-problem-measurement/
 195 .RE
 196
 197 .SH AUTHOR
 198 .B fiologparser_hist.py
 199 and this manual page were written by Karl Cronburg <karl.cronburg@gmail.com>.
 200 .SH "REPORTING BUGS"
 201 Report bugs to the \fBfio\fR mailing list <fio@vger.kernel.org>.