Merge branch 'asprintf' of https://github.com/bvanassche/fio
[fio.git] / tools / hist / fiologparser_hist.py.1
CommitLineData
d1f6fcad
KC
1.TH fiologparser_hist.py 1 "August 18, 2016"
2.SH NAME
3fiologparser_hist.py \- Calculate statistics from fio histograms
4.SH SYNOPSIS
5.B fiologparser_hist.py
6[\fIoptions\fR] [clat_hist_files]...
7.SH DESCRIPTION
8.B fiologparser_hist.py
9is a utility for converting *_clat_hist* files
10generated by fio into a CSV of latency statistics including minimum,
4a11e9d7 11average, maximum latency, and selectable percentiles.
d1f6fcad
KC
12.SH EXAMPLES
13.PP
14.nf
15$ fiologparser_hist.py *_clat_hist*
16end-time, samples, min, avg, median, 90%, 95%, 99%, max
171000, 15, 192, 1678.107, 1788.859, 1856.076, 1880.040, 1899.208, 1888.000
182000, 43, 152, 1642.368, 1714.099, 1816.659, 1845.552, 1888.131, 1888.000
194000, 39, 1152, 1546.962, 1545.785, 1627.192, 1640.019, 1691.204, 1744
afc764cc 20\[char46]..
d1f6fcad
KC
21.fi
22.PP
23
24.SH OPTIONS
25.TP
26.BR \-\-help
27Print these options.
28.TP
29.BR \-\-buff_size \fR=\fPint
30Number of samples to buffer into numpy at a time. Default is 10,000.
31This can be adjusted to help performance.
32.TP
33.BR \-\-max_latency \fR=\fPint
34Number of seconds of data to process at a time. Defaults to 20 seconds,
35in order to handle the 17 second upper bound on latency in histograms
36reported by fio. This should be increased if fio has been
37run with a larger maximum latency. Lowering this when a lower maximum
38latency is known can improve performance. See NOTES for more details.
39.TP
40.BR \-i ", " \-\-interval \fR=\fPint
41Interval at which statistics are reported. Defaults to 1000 ms. This
42should be set a minimum of the value for \fBlog_hist_msec\fR as given
43to fio.
44.TP
4a11e9d7 45.BR \-\-noweight
46Do not perform weighting of samples between output intervals. Default is False.
47.TP
d1f6fcad
KC
48.BR \-d ", " \-\-divisor \fR=\fPint
49Divide statistics by this value. Defaults to 1. Useful if you want to
50convert latencies from milliseconds to seconds (\fBdivisor\fR=\fP1000\fR).
51.TP
52.BR \-\-warn
53Enables warning messages printed to stderr, useful for debugging.
54.TP
55.BR \-\-group_nr \fR=\fPint
56Set this to the value of \fIFIO_IO_U_PLAT_GROUP_NR\fR as defined in
57\fPstat.h\fR if fio has been recompiled. Defaults to 19, the
58current value used in fio. See NOTES for more details.
4a11e9d7 59.TP
60.BR \-\-percentiles \fR=\fPstr
61Pass desired list of comma or colon separated percentiles to print.
62The default is "90.0:95.0:99.0", but min, median(50%) and max percentiles are always printed
63.TP
64.BR \-\-usbin
65Use to indicate to parser that histogram bin latencies values are in microseconds.
66The default is to use nanoseconds, but histogram logs from fio versions <= 2.99 are in microseconds.
67.TP
68.BR \-\-directions \fR=\fPstr
69By default, all directions (e.g read and write) histogram bins are combined
70producing one 'mixed' result.
7c5489c0
JA
71To produce independent directional results, pass some combination of
72\'rwtm\' characters with the \-\-directions\fR=\fPrwtm option.
4a11e9d7 73A \'dir\' column is added indicating the result direction for a row.
d1f6fcad
KC
74
75.SH NOTES
76end-times are calculated to be uniform increments of the \fB\-\-interval\fR value given,
77regardless of when histogram samples are reported. Of note:
78
79.RS
80Intervals with no samples are omitted. In the example above this means
81"no statistics from 2 to 3 seconds" and "39 samples influenced the statistics
82of the interval from 3 to 4 seconds".
83.LP
84Intervals with a single sample will have the same value for all statistics
85.RE
86
87.PP
88The number of samples is unweighted, corresponding to the total number of samples
89which have any effect whatsoever on the interval.
90
91Min statistics are computed using value of the lower boundary of the first bin
92(in increasing bin order) with non-zero samples in it. Similarly for max,
93we take the upper boundary of the last bin with non-zero samples in it.
94This is semantically identical to taking the 0th and 100th percentiles with a
9550% bin-width buffer (because percentiles are computed using mid-points of
96the bins). This enforces the following nice properties:
97
98.RS
99min <= 50th <= 90th <= 95th <= 99th <= max
100.LP
101min and max are strict lower and upper bounds on the actual
102min / max seen by fio (and reported in *_clat.* with averaging turned off).
103.RE
104
105.PP
106Average statistics use a standard weighted arithmetic mean.
107
4a11e9d7 108When --noweights option is false (the default)
109percentile statistics are computed using the weighted percentile method as
d1f6fcad
KC
110described here: \fIhttps://en.wikipedia.org/wiki/Percentile#Weighted_percentile\fR.
111See weights() method for details on how weights are computed for individual
112samples. In process_interval() we further multiply by the height of each bin
113to get weighted histograms.
114
115We convert files given on the command line, assumed to be fio histogram files,
116An individual histogram file can contain the
117histograms for multiple different r/w directions (notably when \fB\-\-rw\fR=\fPrandrw\fR). This
118is accounted for by tracking each r/w direction separately. In the statistics
119reported we ultimately merge *all* histograms (regardless of r/w direction).
120
121The value of *_GROUP_NR in \fIstat.h\fR (and *_BITS) determines how many latency bins
122fio outputs when histogramming is enabled. Namely for the current default of
123GROUP_NR=19, we get 1,216 bins with a maximum latency of approximately 17
124seconds. For certain applications this may not be sufficient. With GROUP_NR=24
125we have 1,536 bins, giving us a maximum latency of 541 seconds (~ 9 minutes). If
126you expect your application to experience latencies greater than 17 seconds,
127you will need to recompile fio with a larger GROUP_NR, e.g. with:
128
129.RS
130.PP
131.nf
132sed -i.bak 's/^#define FIO_IO_U_PLAT_GROUP_NR 19\n/#define FIO_IO_U_PLAT_GROUP_NR 24/g' stat.h
133make fio
134.fi
135.PP
136.RE
137
138.PP
139Quick reference table for the max latency corresponding to a sampling of
140values for GROUP_NR:
141
142.RS
143.PP
144.nf
145GROUP_NR | # bins | max latency bin value
14619 | 1216 | 16.9 sec
14720 | 1280 | 33.8 sec
14821 | 1344 | 67.6 sec
14922 | 1408 | 2 min, 15 sec
15023 | 1472 | 4 min, 32 sec
15124 | 1536 | 9 min, 4 sec
15225 | 1600 | 18 min, 8 sec
15326 | 1664 | 36 min, 16 sec
154.fi
155.PP
156.RE
157
158.PP
159At present this program automatically detects the number of histogram bins in
160the log files, and adjusts the bin latency values accordingly. In particular if
161you use the \fB\-\-log_hist_coarseness\fR parameter of fio, you get output files with
162a number of bins according to the following table (note that the first
163row is identical to the table above):
164
165.RS
166.PP
167.nf
168coarse \\ GROUP_NR
169 19 20 21 22 23 24 25 26
170 -------------------------------------------------------
171 0 [[ 1216, 1280, 1344, 1408, 1472, 1536, 1600, 1664],
172 1 [ 608, 640, 672, 704, 736, 768, 800, 832],
173 2 [ 304, 320, 336, 352, 368, 384, 400, 416],
174 3 [ 152, 160, 168, 176, 184, 192, 200, 208],
175 4 [ 76, 80, 84, 88, 92, 96, 100, 104],
176 5 [ 38, 40, 42, 44, 46, 48, 50, 52],
177 6 [ 19, 20, 21, 22, 23, 24, 25, 26],
178 7 [ N/A, 10, N/A, 11, N/A, 12, N/A, 13],
179 8 [ N/A, 5, N/A, N/A, N/A, 6, N/A, N/A]]
180.fi
181.PP
182.RE
183
184.PP
185For other values of GROUP_NR and coarseness, this table can be computed like this:
186
187.RS
188.PP
189.nf
190bins = [1216,1280,1344,1408,1472,1536,1600,1664]
191max_coarse = 8
192fncn = lambda z: list(map(lambda x: z/2**x if z % 2**x == 0 else nan, range(max_coarse + 1)))
193np.transpose(list(map(fncn, bins)))
194.fi
195.PP
196.RE
197
198.PP
199If you have not adjusted GROUP_NR for your (high latency) application, then you
200will see the percentiles computed by this tool max out at the max latency bin
201value as in the first table above, and in this plot (where GROUP_NR=19 and thus we see
202a max latency of ~16.7 seconds in the red line):
203
204.RS
205\fIhttps://www.cronburg.com/fio/max_latency_bin_value_bug.png
206.RE
207
208.PP
209Motivation for, design decisions, and the implementation process are
210described in further detail here:
211
212.RS
213\fIhttps://www.cronburg.com/fio/cloud-latency-problem-measurement/
214.RE
215
216.SH AUTHOR
217.B fiologparser_hist.py
218and this manual page were written by Karl Cronburg <karl.cronburg@gmail.com>.
219.SH "REPORTING BUGS"
220Report bugs to the \fBfio\fR mailing list <fio@vger.kernel.org>.