Merge branch 'misc' of https://github.com/sitsofe/fio
[fio.git] / tools / hist / fiologparser_hist.py.1
CommitLineData
d1f6fcad
KC
1.TH fiologparser_hist.py 1 "August 18, 2016"
2.SH NAME
3fiologparser_hist.py \- Calculate statistics from fio histograms
4.SH SYNOPSIS
5.B fiologparser_hist.py
6[\fIoptions\fR] [clat_hist_files]...
7.SH DESCRIPTION
8.B fiologparser_hist.py
9is a utility for converting *_clat_hist* files
10generated by fio into a CSV of latency statistics including minimum,
11average, maximum latency, and 50th, 95th, and 99th percentiles.
12.SH EXAMPLES
13.PP
14.nf
15$ fiologparser_hist.py *_clat_hist*
16end-time, samples, min, avg, median, 90%, 95%, 99%, max
171000, 15, 192, 1678.107, 1788.859, 1856.076, 1880.040, 1899.208, 1888.000
182000, 43, 152, 1642.368, 1714.099, 1816.659, 1845.552, 1888.131, 1888.000
194000, 39, 1152, 1546.962, 1545.785, 1627.192, 1640.019, 1691.204, 1744
afc764cc 20\[char46]..
d1f6fcad
KC
21.fi
22.PP
23
24.SH OPTIONS
25.TP
26.BR \-\-help
27Print these options.
28.TP
29.BR \-\-buff_size \fR=\fPint
30Number of samples to buffer into numpy at a time. Default is 10,000.
31This can be adjusted to help performance.
32.TP
33.BR \-\-max_latency \fR=\fPint
34Number of seconds of data to process at a time. Defaults to 20 seconds,
35in order to handle the 17 second upper bound on latency in histograms
36reported by fio. This should be increased if fio has been
37run with a larger maximum latency. Lowering this when a lower maximum
38latency is known can improve performance. See NOTES for more details.
39.TP
40.BR \-i ", " \-\-interval \fR=\fPint
41Interval at which statistics are reported. Defaults to 1000 ms. This
42should be set a minimum of the value for \fBlog_hist_msec\fR as given
43to fio.
44.TP
45.BR \-d ", " \-\-divisor \fR=\fPint
46Divide statistics by this value. Defaults to 1. Useful if you want to
47convert latencies from milliseconds to seconds (\fBdivisor\fR=\fP1000\fR).
48.TP
49.BR \-\-warn
50Enables warning messages printed to stderr, useful for debugging.
51.TP
52.BR \-\-group_nr \fR=\fPint
53Set this to the value of \fIFIO_IO_U_PLAT_GROUP_NR\fR as defined in
54\fPstat.h\fR if fio has been recompiled. Defaults to 19, the
55current value used in fio. See NOTES for more details.
56
57.SH NOTES
58end-times are calculated to be uniform increments of the \fB\-\-interval\fR value given,
59regardless of when histogram samples are reported. Of note:
60
61.RS
62Intervals with no samples are omitted. In the example above this means
63"no statistics from 2 to 3 seconds" and "39 samples influenced the statistics
64of the interval from 3 to 4 seconds".
65.LP
66Intervals with a single sample will have the same value for all statistics
67.RE
68
69.PP
70The number of samples is unweighted, corresponding to the total number of samples
71which have any effect whatsoever on the interval.
72
73Min statistics are computed using value of the lower boundary of the first bin
74(in increasing bin order) with non-zero samples in it. Similarly for max,
75we take the upper boundary of the last bin with non-zero samples in it.
76This is semantically identical to taking the 0th and 100th percentiles with a
7750% bin-width buffer (because percentiles are computed using mid-points of
78the bins). This enforces the following nice properties:
79
80.RS
81min <= 50th <= 90th <= 95th <= 99th <= max
82.LP
83min and max are strict lower and upper bounds on the actual
84min / max seen by fio (and reported in *_clat.* with averaging turned off).
85.RE
86
87.PP
88Average statistics use a standard weighted arithmetic mean.
89
90Percentile statistics are computed using the weighted percentile method as
91described here: \fIhttps://en.wikipedia.org/wiki/Percentile#Weighted_percentile\fR.
92See weights() method for details on how weights are computed for individual
93samples. In process_interval() we further multiply by the height of each bin
94to get weighted histograms.
95
96We convert files given on the command line, assumed to be fio histogram files,
97An individual histogram file can contain the
98histograms for multiple different r/w directions (notably when \fB\-\-rw\fR=\fPrandrw\fR). This
99is accounted for by tracking each r/w direction separately. In the statistics
100reported we ultimately merge *all* histograms (regardless of r/w direction).
101
102The value of *_GROUP_NR in \fIstat.h\fR (and *_BITS) determines how many latency bins
103fio outputs when histogramming is enabled. Namely for the current default of
104GROUP_NR=19, we get 1,216 bins with a maximum latency of approximately 17
105seconds. For certain applications this may not be sufficient. With GROUP_NR=24
106we have 1,536 bins, giving us a maximum latency of 541 seconds (~ 9 minutes). If
107you expect your application to experience latencies greater than 17 seconds,
108you will need to recompile fio with a larger GROUP_NR, e.g. with:
109
110.RS
111.PP
112.nf
113sed -i.bak 's/^#define FIO_IO_U_PLAT_GROUP_NR 19\n/#define FIO_IO_U_PLAT_GROUP_NR 24/g' stat.h
114make fio
115.fi
116.PP
117.RE
118
119.PP
120Quick reference table for the max latency corresponding to a sampling of
121values for GROUP_NR:
122
123.RS
124.PP
125.nf
126GROUP_NR | # bins | max latency bin value
12719 | 1216 | 16.9 sec
12820 | 1280 | 33.8 sec
12921 | 1344 | 67.6 sec
13022 | 1408 | 2 min, 15 sec
13123 | 1472 | 4 min, 32 sec
13224 | 1536 | 9 min, 4 sec
13325 | 1600 | 18 min, 8 sec
13426 | 1664 | 36 min, 16 sec
135.fi
136.PP
137.RE
138
139.PP
140At present this program automatically detects the number of histogram bins in
141the log files, and adjusts the bin latency values accordingly. In particular if
142you use the \fB\-\-log_hist_coarseness\fR parameter of fio, you get output files with
143a number of bins according to the following table (note that the first
144row is identical to the table above):
145
146.RS
147.PP
148.nf
149coarse \\ GROUP_NR
150 19 20 21 22 23 24 25 26
151 -------------------------------------------------------
152 0 [[ 1216, 1280, 1344, 1408, 1472, 1536, 1600, 1664],
153 1 [ 608, 640, 672, 704, 736, 768, 800, 832],
154 2 [ 304, 320, 336, 352, 368, 384, 400, 416],
155 3 [ 152, 160, 168, 176, 184, 192, 200, 208],
156 4 [ 76, 80, 84, 88, 92, 96, 100, 104],
157 5 [ 38, 40, 42, 44, 46, 48, 50, 52],
158 6 [ 19, 20, 21, 22, 23, 24, 25, 26],
159 7 [ N/A, 10, N/A, 11, N/A, 12, N/A, 13],
160 8 [ N/A, 5, N/A, N/A, N/A, 6, N/A, N/A]]
161.fi
162.PP
163.RE
164
165.PP
166For other values of GROUP_NR and coarseness, this table can be computed like this:
167
168.RS
169.PP
170.nf
171bins = [1216,1280,1344,1408,1472,1536,1600,1664]
172max_coarse = 8
173fncn = lambda z: list(map(lambda x: z/2**x if z % 2**x == 0 else nan, range(max_coarse + 1)))
174np.transpose(list(map(fncn, bins)))
175.fi
176.PP
177.RE
178
179.PP
180If you have not adjusted GROUP_NR for your (high latency) application, then you
181will see the percentiles computed by this tool max out at the max latency bin
182value as in the first table above, and in this plot (where GROUP_NR=19 and thus we see
183a max latency of ~16.7 seconds in the red line):
184
185.RS
186\fIhttps://www.cronburg.com/fio/max_latency_bin_value_bug.png
187.RE
188
189.PP
190Motivation for, design decisions, and the implementation process are
191described in further detail here:
192
193.RS
194\fIhttps://www.cronburg.com/fio/cloud-latency-problem-measurement/
195.RE
196
197.SH AUTHOR
198.B fiologparser_hist.py
199and this manual page were written by Karl Cronburg <karl.cronburg@gmail.com>.
200.SH "REPORTING BUGS"
201Report bugs to the \fBfio\fR mailing list <fio@vger.kernel.org>.