Commit | Line | Data |
---|---|---|
d1f6fcad KC |
1 | .TH fiologparser_hist.py 1 "August 18, 2016" |
2 | .SH NAME | |
3 | fiologparser_hist.py \- Calculate statistics from fio histograms | |
4 | .SH SYNOPSIS | |
5 | .B fiologparser_hist.py | |
6 | [\fIoptions\fR] [clat_hist_files]... | |
7 | .SH DESCRIPTION | |
8 | .B fiologparser_hist.py | |
9 | is a utility for converting *_clat_hist* files | |
10 | generated by fio into a CSV of latency statistics including minimum, | |
11 | average, maximum latency, and 50th, 95th, and 99th percentiles. | |
12 | .SH EXAMPLES | |
13 | .PP | |
14 | .nf | |
15 | $ fiologparser_hist.py *_clat_hist* | |
16 | end-time, samples, min, avg, median, 90%, 95%, 99%, max | |
17 | 1000, 15, 192, 1678.107, 1788.859, 1856.076, 1880.040, 1899.208, 1888.000 | |
18 | 2000, 43, 152, 1642.368, 1714.099, 1816.659, 1845.552, 1888.131, 1888.000 | |
19 | 4000, 39, 1152, 1546.962, 1545.785, 1627.192, 1640.019, 1691.204, 1744 | |
afc764cc | 20 | \[char46].. |
d1f6fcad KC |
21 | .fi |
22 | .PP | |
23 | ||
24 | .SH OPTIONS | |
25 | .TP | |
26 | .BR \-\-help | |
27 | Print these options. | |
28 | .TP | |
29 | .BR \-\-buff_size \fR=\fPint | |
30 | Number of samples to buffer into numpy at a time. Default is 10,000. | |
31 | This can be adjusted to help performance. | |
32 | .TP | |
33 | .BR \-\-max_latency \fR=\fPint | |
34 | Number of seconds of data to process at a time. Defaults to 20 seconds, | |
35 | in order to handle the 17 second upper bound on latency in histograms | |
36 | reported by fio. This should be increased if fio has been | |
37 | run with a larger maximum latency. Lowering this when a lower maximum | |
38 | latency is known can improve performance. See NOTES for more details. | |
39 | .TP | |
40 | .BR \-i ", " \-\-interval \fR=\fPint | |
41 | Interval at which statistics are reported. Defaults to 1000 ms. This | |
42 | should be set a minimum of the value for \fBlog_hist_msec\fR as given | |
43 | to fio. | |
44 | .TP | |
45 | .BR \-d ", " \-\-divisor \fR=\fPint | |
46 | Divide statistics by this value. Defaults to 1. Useful if you want to | |
47 | convert latencies from milliseconds to seconds (\fBdivisor\fR=\fP1000\fR). | |
48 | .TP | |
49 | .BR \-\-warn | |
50 | Enables warning messages printed to stderr, useful for debugging. | |
51 | .TP | |
52 | .BR \-\-group_nr \fR=\fPint | |
53 | Set this to the value of \fIFIO_IO_U_PLAT_GROUP_NR\fR as defined in | |
54 | \fPstat.h\fR if fio has been recompiled. Defaults to 19, the | |
55 | current value used in fio. See NOTES for more details. | |
56 | ||
57 | .SH NOTES | |
58 | end-times are calculated to be uniform increments of the \fB\-\-interval\fR value given, | |
59 | regardless of when histogram samples are reported. Of note: | |
60 | ||
61 | .RS | |
62 | Intervals with no samples are omitted. In the example above this means | |
63 | "no statistics from 2 to 3 seconds" and "39 samples influenced the statistics | |
64 | of the interval from 3 to 4 seconds". | |
65 | .LP | |
66 | Intervals with a single sample will have the same value for all statistics | |
67 | .RE | |
68 | ||
69 | .PP | |
70 | The number of samples is unweighted, corresponding to the total number of samples | |
71 | which have any effect whatsoever on the interval. | |
72 | ||
73 | Min statistics are computed using value of the lower boundary of the first bin | |
74 | (in increasing bin order) with non-zero samples in it. Similarly for max, | |
75 | we take the upper boundary of the last bin with non-zero samples in it. | |
76 | This is semantically identical to taking the 0th and 100th percentiles with a | |
77 | 50% bin-width buffer (because percentiles are computed using mid-points of | |
78 | the bins). This enforces the following nice properties: | |
79 | ||
80 | .RS | |
81 | min <= 50th <= 90th <= 95th <= 99th <= max | |
82 | .LP | |
83 | min and max are strict lower and upper bounds on the actual | |
84 | min / max seen by fio (and reported in *_clat.* with averaging turned off). | |
85 | .RE | |
86 | ||
87 | .PP | |
88 | Average statistics use a standard weighted arithmetic mean. | |
89 | ||
90 | Percentile statistics are computed using the weighted percentile method as | |
91 | described here: \fIhttps://en.wikipedia.org/wiki/Percentile#Weighted_percentile\fR. | |
92 | See weights() method for details on how weights are computed for individual | |
93 | samples. In process_interval() we further multiply by the height of each bin | |
94 | to get weighted histograms. | |
95 | ||
96 | We convert files given on the command line, assumed to be fio histogram files, | |
97 | An individual histogram file can contain the | |
98 | histograms for multiple different r/w directions (notably when \fB\-\-rw\fR=\fPrandrw\fR). This | |
99 | is accounted for by tracking each r/w direction separately. In the statistics | |
100 | reported we ultimately merge *all* histograms (regardless of r/w direction). | |
101 | ||
102 | The value of *_GROUP_NR in \fIstat.h\fR (and *_BITS) determines how many latency bins | |
103 | fio outputs when histogramming is enabled. Namely for the current default of | |
104 | GROUP_NR=19, we get 1,216 bins with a maximum latency of approximately 17 | |
105 | seconds. For certain applications this may not be sufficient. With GROUP_NR=24 | |
106 | we have 1,536 bins, giving us a maximum latency of 541 seconds (~ 9 minutes). If | |
107 | you expect your application to experience latencies greater than 17 seconds, | |
108 | you will need to recompile fio with a larger GROUP_NR, e.g. with: | |
109 | ||
110 | .RS | |
111 | .PP | |
112 | .nf | |
113 | sed -i.bak 's/^#define FIO_IO_U_PLAT_GROUP_NR 19\n/#define FIO_IO_U_PLAT_GROUP_NR 24/g' stat.h | |
114 | make fio | |
115 | .fi | |
116 | .PP | |
117 | .RE | |
118 | ||
119 | .PP | |
120 | Quick reference table for the max latency corresponding to a sampling of | |
121 | values for GROUP_NR: | |
122 | ||
123 | .RS | |
124 | .PP | |
125 | .nf | |
126 | GROUP_NR | # bins | max latency bin value | |
127 | 19 | 1216 | 16.9 sec | |
128 | 20 | 1280 | 33.8 sec | |
129 | 21 | 1344 | 67.6 sec | |
130 | 22 | 1408 | 2 min, 15 sec | |
131 | 23 | 1472 | 4 min, 32 sec | |
132 | 24 | 1536 | 9 min, 4 sec | |
133 | 25 | 1600 | 18 min, 8 sec | |
134 | 26 | 1664 | 36 min, 16 sec | |
135 | .fi | |
136 | .PP | |
137 | .RE | |
138 | ||
139 | .PP | |
140 | At present this program automatically detects the number of histogram bins in | |
141 | the log files, and adjusts the bin latency values accordingly. In particular if | |
142 | you use the \fB\-\-log_hist_coarseness\fR parameter of fio, you get output files with | |
143 | a number of bins according to the following table (note that the first | |
144 | row is identical to the table above): | |
145 | ||
146 | .RS | |
147 | .PP | |
148 | .nf | |
149 | coarse \\ GROUP_NR | |
150 | 19 20 21 22 23 24 25 26 | |
151 | ------------------------------------------------------- | |
152 | 0 [[ 1216, 1280, 1344, 1408, 1472, 1536, 1600, 1664], | |
153 | 1 [ 608, 640, 672, 704, 736, 768, 800, 832], | |
154 | 2 [ 304, 320, 336, 352, 368, 384, 400, 416], | |
155 | 3 [ 152, 160, 168, 176, 184, 192, 200, 208], | |
156 | 4 [ 76, 80, 84, 88, 92, 96, 100, 104], | |
157 | 5 [ 38, 40, 42, 44, 46, 48, 50, 52], | |
158 | 6 [ 19, 20, 21, 22, 23, 24, 25, 26], | |
159 | 7 [ N/A, 10, N/A, 11, N/A, 12, N/A, 13], | |
160 | 8 [ N/A, 5, N/A, N/A, N/A, 6, N/A, N/A]] | |
161 | .fi | |
162 | .PP | |
163 | .RE | |
164 | ||
165 | .PP | |
166 | For other values of GROUP_NR and coarseness, this table can be computed like this: | |
167 | ||
168 | .RS | |
169 | .PP | |
170 | .nf | |
171 | bins = [1216,1280,1344,1408,1472,1536,1600,1664] | |
172 | max_coarse = 8 | |
173 | fncn = lambda z: list(map(lambda x: z/2**x if z % 2**x == 0 else nan, range(max_coarse + 1))) | |
174 | np.transpose(list(map(fncn, bins))) | |
175 | .fi | |
176 | .PP | |
177 | .RE | |
178 | ||
179 | .PP | |
180 | If you have not adjusted GROUP_NR for your (high latency) application, then you | |
181 | will see the percentiles computed by this tool max out at the max latency bin | |
182 | value as in the first table above, and in this plot (where GROUP_NR=19 and thus we see | |
183 | a max latency of ~16.7 seconds in the red line): | |
184 | ||
185 | .RS | |
186 | \fIhttps://www.cronburg.com/fio/max_latency_bin_value_bug.png | |
187 | .RE | |
188 | ||
189 | .PP | |
190 | Motivation for, design decisions, and the implementation process are | |
191 | described in further detail here: | |
192 | ||
193 | .RS | |
194 | \fIhttps://www.cronburg.com/fio/cloud-latency-problem-measurement/ | |
195 | .RE | |
196 | ||
197 | .SH AUTHOR | |
198 | .B fiologparser_hist.py | |
199 | and this manual page were written by Karl Cronburg <karl.cronburg@gmail.com>. | |
200 | .SH "REPORTING BUGS" | |
201 | Report bugs to the \fBfio\fR mailing list <fio@vger.kernel.org>. |