Commit | Line | Data |
---|---|---|
512566d4 | 1 | \documentclass{article} |
69040794 | 2 | \usepackage{epsfig,placeins} |
512566d4 AB |
3 | |
4 | % | |
5 | % Copyright (C) 2007 Alan D. Brunelle <Alan.Brunelle@hp.com> | |
6 | % | |
7 | % This program is free software; you can redistribute it and/or modify | |
8 | % it under the terms of the GNU General Public License as published by | |
9 | % the Free Software Foundation; either version 2 of the License, or | |
10 | % (at your option) any later version. | |
11 | % | |
12 | % This program is distributed in the hope that it will be useful, | |
13 | % but WITHOUT ANY WARRANTY; without even the implied warranty of | |
14 | % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | |
15 | % GNU General Public License for more details. | |
16 | % | |
17 | % You should have received a copy of the GNU General Public License | |
18 | % along with this program; if not, write to the Free Software | |
19 | % Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA | |
20 | % | |
21 | % vi :set textwidth=75 | |
22 | ||
23 | \title{\texttt{btt} User Guide} | |
24 | \author{Alan D. Brunelle (Alan.Brunelle@hp.com)} | |
50f73899 | 25 | \date{13 November 2007} |
512566d4 AB |
26 | |
27 | \begin{document} | |
28 | \maketitle | |
29 | %-------------- | |
30 | \section{\label{sec:intro}Introduction} | |
31 | ||
32 | \texttt{btt} is a post-processing tool for the block layer IO tracing tool called blktrace. As noted in its Users Guide, blktrace | |
33 | \begin{quotation} | |
34 | is a block layer IO tracing mechanism which provides detailed | |
35 | information about request queue operations up to user space. | |
36 | \end{quotation} | |
37 | ||
38 | blktrace is capable of producing tremendous amounts of output in the | |
39 | form of multiple individual traces per IO executed during the traced | |
40 | run. It is also capable of producing some general statistics concerning | |
41 | IO rates and the like. \texttt{btt} goes further and produces a variety | |
42 | of overall statistics about each of the individual handling of IOs, and | |
43 | provides data we believe is useful to plot to provide visual comparisons | |
44 | for evaluation. | |
45 | ||
46 | This document will discuss \texttt{btt} usage, provide some sample output, | |
47 | and also show some interesting plots generated from the data provided | |
48 | by the \texttt{btt} utility. | |
49 | ||
50 | \bigskip | |
51 | A short note on the ordering of this document -- the actual | |
52 | command-line usage section occurs relatively late in the document (see | |
53 | section~\ref{sec:cmd-line}), as we felt that discussing some of the | |
54 | capabilities and output formats would make the parameter discussion | |
55 | easier. | |
56 | ||
57 | \bigskip | |
58 | This document refers to the output formats generated by \texttt{btt} | |
50f73899 | 59 | version 2.00. However, the descriptions are general enough to cover |
512566d4 AB |
60 | output formats prior to that. |
61 | ||
62 | \newpage\tableofcontents | |
63 | ||
64 | \newpage\section{\label{sec:getting-started}Getting Started} | |
65 | ||
66 | The simple pipeline to get going with \texttt{btt} is to perform the | |
67 | following steps: | |
68 | ||
69 | \begin{enumerate} | |
70 | \item Run \texttt{blktrace}, specifying whatever devices and other | |
71 | parameters you want. You must save the traces to disk in this step, | |
72 | btt does not work in live mode. | |
73 | ||
74 | \item After tracing completes, run \texttt{blkrawverify}, specifying | |
75 | all devices that were traced (or at least on all devices that you | |
76 | will use \texttt{btt} with -- section~\ref{sec:o-D} shows how you | |
77 | can dictate which devices to use with btt). If blkrawverify finds | |
78 | errors in the trace streams saved, it is best to recapture the data | |
79 | -- utilizing \texttt{btt} on \emph{unclean} trace files produces | |
80 | inconsistent results. | |
81 | ||
82 | While this step is optional, we have found that performing this | |
83 | helps to ensure data coming from \texttt{btt} makes the most sense. | |
84 | ||
85 | \item Run \texttt{blkparse} with the \texttt{-d} option specifying | |
86 | a file to store the combined binary stream. (e.g.: \texttt{blkparse | |
87 | -d bp.bin ...}). | |
88 | ||
89 | \texttt{blktrace} produces a series of binary files | |
90 | containing parallel trace streams -- one file per CPU per | |
91 | device. \texttt{blkparse} provides the ability to combine all the | |
92 | files into one time-ordered stream of traces for all devices. | |
93 | ||
94 | \item Run \texttt{btt} specifying the file produced by | |
95 | \texttt{blkparse} utilizing the \texttt{-i} option (e.g.: \texttt{btt | |
96 | -i bp.bin ...}). | |
97 | ||
98 | \end{enumerate} | |
99 | ||
100 | \newpage\section{\label{sec:output-overview}Output Overview} | |
101 | ||
102 | The major default areas of output provided by \texttt{btt} | |
103 | include\label{tl-defs}: | |
104 | ||
105 | \begin{description} | |
106 | \item[average component times across all IOs] The time line of each IO | |
107 | is broken down into 3 major regions: | |
108 | ||
109 | \begin{enumerate} | |
110 | \item Time needed to insert or merge an incoming IO onto the request | |
111 | queue. This is the average time from when the IO enters the block | |
112 | IO layer (queue trace) until it is inserted (insert trace) or merged | |
113 | (back merge or front merge trace). | |
114 | ||
115 | This is denoted as \emph{Q2I} time. | |
116 | ||
117 | \item Time spent on the request queue. The average time from when | |
118 | the IO is inserted or merged onto the request queue, until it is | |
119 | issued (issue trace) to the lower level driver. | |
120 | ||
121 | Referred to as \emph{I2D} time\footnote{The \emph{issue} trace | |
122 | is represented by a D in the blkparse output, hence its usage in | |
123 | btt to refer to issue traces. Note that an I is used to refer to | |
124 | \emph{insert} traces.}. | |
125 | ||
126 | \item Driver and device time -- the average time from when the | |
127 | actual IO was issued to the driver until is completed (completion | |
128 | trace) back to the block IO layer. | |
129 | ||
130 | This is referred to as the \emph{D2C} time\ | |
131 | \end{enumerate} | |
132 | ||
133 | Two other sets of results are presented in this section: | |
134 | ||
135 | \begin{enumerate} | |
136 | \item \emph{Q2Q} which measures the time between queue traces | |
137 | in the system. This provides some idea as to how quickly IOs are | |
138 | being handed to the block IO layer. | |
139 | ||
140 | \item \emph{Q2C} which measures the times for the complete life cycle | |
141 | of IOs during the run\footnote{One of the areas that needs some | |
142 | work in \texttt{btt} is to better understand the multiplex nature of | |
143 | IOs during a run. In theory, one would like ${Q2I} + {I2D} + {D2C} | |
144 | = {Q2C}$ however, typically there are multiple queue traces that | |
145 | are combined via merges into a single IO issued and completed. We | |
146 | currently average the queue-to-insert and queue-to-merge times, | |
147 | and thus tend to be quite close to the expected equation.} | |
148 | ||
149 | \end{enumerate} | |
150 | ||
151 | For each row in this output, we provide a minimum, average, maximum | |
152 | (which are all presented in seconds), and overall count. As an | |
153 | example\footnote{As with this display, the author has taken some liberty | |
154 | in reformatting the output for better display on the printed page.}: | |
155 | ||
156 | \begin{verbatim} | |
157 | ALL MIN AVG MAX N | |
158 | ---- ------------- ------------- ------------- ----------- | |
159 | Q2Q 0.000000058 0.000012761 9.547941661 2262310 | |
160 | Q2I 0.000000272 0.000005995 0.104588839 2262311 | |
161 | I2D 0.000001446 0.094992714 0.239636864 2262311 | |
162 | D2C 0.000193721 0.030406554 1.634221408 2262311 | |
163 | Q2C 0.000207665 0.125405263 1.830917198 2262311 | |
164 | \end{verbatim} | |
165 | ||
166 | \item[Device Overhead] | |
167 | ||
168 | Using the data from the previous chart, we can then provide some idea | |
169 | as to where IO spend most of the time on average. The following output | |
170 | shows the percentage of time spent in each of the 3 phases of an IO: | |
171 | ||
172 | \begin{verbatim} | |
173 | DEV | Q2I I2D D2C | |
174 | ---------- | ------ ------ ------ | |
175 | ( 68, 64) | 0.0% 75.7% 24.2% | |
176 | \end{verbatim} | |
177 | ||
178 | \item[Device Merge Information] | |
179 | ||
180 | A key measurement when making changes in the system (software \emph{or} | |
181 | hardware) is to understand the block IO layer ends up merging incoming | |
182 | requests into fewer, but larger, IOs to the underlying driver. In this | |
183 | section, we show the number of incoming requests (Q), the number of | |
184 | issued requests (D) and the resultant ratio. We also provide values | |
185 | for the minimum, average and maximum IOs generated. | |
186 | ||
187 | Looking at the following example: | |
188 | ||
189 | \begin{verbatim} | |
190 | DEV | #Q #D Ratio | BLKmin BLKavg BLKmax Total | |
191 | ---------- | ------- ----- ----- | ------ ------ ------ ------- | |
192 | ( 68, 64) | 2262311 18178 124.5 | 2 124 128 2262382 | |
193 | \end{verbatim} | |
194 | ||
195 | we see that (on average) the block IO layer is combining upwards of | |
196 | 125 incoming requests into a single request down the IO stack. The | |
197 | resultant average IO size is 124 blocks. | |
198 | ||
199 | \item[Device Seek Information] | |
200 | ||
201 | Another useful measure is the variability in the sector distances | |
4c48f14e AB |
202 | between consecutively \emph{recieved -- queued} and \emph{submitted |
203 | -- issued} IOs. The next two sections provides some rudimentary | |
512566d4 AB |
204 | statistics to gauge the general nature of the sector differences |
205 | between IOs. Values provided include the number of seeks (number of IOs | |
206 | submitted to lower level drivers), the \emph{mean} distance between | |
207 | IOs, the \emph{median} value for all seeks, and the \emph{mode} - | |
208 | the value(s) and the counts are provided for the latter. | |
209 | ||
4c48f14e AB |
210 | The first of the two sections displays values for Q2Q seek distances -- |
211 | providing a set of indicators showing how close incoming IO requests | |
212 | are to each other. The second section shows D2D seek distances -- | |
213 | providing a set of indicators showing how close the IO requests are | |
214 | that are handled by underlying drivers. | |
215 | ||
512566d4 AB |
216 | \begin{verbatim} |
217 | DEV | NSEEKS MEAN MEDIAN | MODE | |
218 | --------- | ------ ------- ------ | ------- | |
219 | ( 68, 64) | 18178 19611.3 0 | 0(17522) | |
220 | \end{verbatim} | |
221 | ||
222 | We have almost exclusively seen median and mode values of 0, indicating | |
223 | that seeks tend to have an equal amount of forward and backwards | |
224 | seeks. The larger the count for the mode in comparison to the total | |
225 | number of seeks is indicative as to how many IOs are coming out of | |
226 | the block IO layer in adjacent sectors. (Obviously, the higher this | |
227 | percentage, the better the underlying subsystems can handle them.) | |
228 | ||
512566d4 AB |
229 | \item[Request Queue Plug Information] |
230 | ||
231 | During normal operation, requests queues are \emph{plugged} and during | |
232 | such times the IO request queue elements are not able to be processed | |
233 | by underlying drivers. The next section shows how often the request | |
234 | queue was in such a state. | |
235 | ||
236 | \begin{verbatim} | |
237 | DEV | # Plugs # Timer Us | % Time Q Plugged | |
238 | --------- | ------- ---------- | ---------------- | |
239 | ( 68, 64) | 833( 0) | 0.356511895% | |
240 | \end{verbatim} | |
241 | ||
242 | There are two major reasons why request queues are unplugged, and both | |
243 | are represented in the above table. | |
244 | ||
245 | \begin{enumerate} | |
246 | \item Explicit unplug request from some subsystem in the kernel. | |
247 | ||
248 | \item Timed unplugs, due to a request queue exceeding some temporal | |
249 | limit for being plugged. | |
250 | \end{enumerate} | |
251 | ||
252 | The total number of unplugs is equal to the number of plugs less the | |
253 | ones due to timer unplugs. | |
50f73899 AB |
254 | |
255 | \item[Active Requests At Q Information] | |
256 | ||
257 | An important consideration when analyzing block IO schedulers is to | |
258 | know how many requests the scheduler has to work with. The metric | |
259 | provided in this section details how many requests (on average) were | |
260 | being held by the IO scheduler when an incoming IO request was being | |
261 | handled. To determine this, \texttt{btt} keeps track of how many Q | |
262 | requests came in, and subtacts requests that have been issued (D). | |
263 | ||
264 | Here is a sample output of this sections: | |
265 | ||
266 | \begin{verbatim} | |
267 | ==================== Active Requests At Q Information ==================== | |
268 | ||
269 | DEV | Avg Reqs @ Q | |
270 | ---------- | ------------- | |
271 | ( 65, 80) | 12.0 | |
272 | ( 65,240) | 16.9 | |
273 | ... | |
274 | ( 66,112) | 44.2 | |
275 | ---------- | ------------- | |
276 | Overall | Avgs Reqs @ Q | |
277 | Average | 17.4 | |
278 | \end{verbatim} | |
279 | ||
512566d4 AB |
280 | \end{description} |
281 | ||
4c48f14e | 282 | \newpage |
512566d4 AB |
283 | \subsection{\label{sec:detailed-data}Detailed Data} |
284 | ||
285 | In addition to the default sections output, if one supplies the | |
286 | \texttt{--all-data} or \texttt{-A} argument (see section~\ref{sec:o-A}) | |
287 | to \texttt{btt} further sections are output: | |
288 | ||
289 | \begin{description} | |
290 | \item[Per Process] As traces are emitted, they are tagged with the | |
291 | process ID of the currently running thread in the kernel. The process | |
292 | names are also preserved, and mapped to the ID. For each of the parts | |
293 | of the time line discussed above on page~\pageref{tl-defs}, a chart is | |
294 | provided which breaks down the traces according to process ID (name). | |
295 | ||
296 | One must be aware, however, that the process ID may not have anything | |
297 | to do with the originating IO. For example, if an application is | |
298 | doing buffered IO, then the actual submitted IOs will most likely | |
299 | come from some page buffer management daemon thread (like pdflush, | |
300 | or kjournald for example). Similarly, completion traces are rarely | |
301 | (if ever?) going to be associated with the process which submitted | |
302 | the IO in the first place. | |
303 | ||
304 | Here is a sample portion of this type of chart, showing Q2Q times | |
305 | per process: | |
306 | ||
307 | \begin{verbatim} | |
308 | Q2Q MIN AVG MAX N | |
309 | ------------- ----------- ----------- ----------- ------- | |
310 | mkfs.ext3 0.000000778 0.000009074 1.797176188 1899371 | |
311 | mount 0.000000885 0.000672513 0.030638128 73 | |
312 | pdflush 0.000000790 0.000006752 0.247231307 179791 | |
313 | \end{verbatim} | |
314 | ||
315 | \item[Per Process Averages] The average columns from the above charts, | |
316 | are also presented in their own chart. | |
317 | ||
318 | \item[Per Device] Similar to the per-process display, \texttt{btt} | |
319 | will also break down the various parts of an IOs time line based upon a | |
320 | per-device criteria. Here's a portion of this area, displayed showing | |
321 | the issued to complete times (D2C). | |
322 | ||
323 | \begin{verbatim} | |
324 | D2C MIN AVG MAX N | |
325 | --------- ----------- ----------- ----------- ------ | |
326 | ( 65, 80) 0.000140488 0.001076906 0.149739869 169112 | |
327 | ( 65, 96) 0.000142762 0.001215221 0.173263182 155488 | |
328 | ( 65,112) 0.000145221 0.001254966 0.124929936 165726 | |
329 | ( 65,128) 0.000141896 0.001159596 0.775231052 169015 | |
330 | ( 65,144) 0.000140832 0.001290985 0.211384698 210661 | |
331 | ( 65,160) 0.000139915 0.001175554 0.073512063 133973 | |
332 | ( 65,176) 0.000141254 0.001104870 0.073231310 145764 | |
333 | ( 65,192) 0.000141453 0.001234460 0.167622507 140618 | |
512566d4 AB |
334 | ... |
335 | \end{verbatim} | |
336 | ||
337 | \item[Per Device Averages] The average columns from the above charts, | |
338 | are also presented in their own chart. | |
951ea56c AB |
339 | |
340 | \item[Q2D Histogram] A display of histogram buckets for the Q to D times | |
341 | -- basically, from where an IO enters the block IO layer for a given | |
342 | device, and when it is dispatched. The buckets are arranged via the | |
343 | time in seconds, as in: | |
344 | ||
345 | \begin{verbatim} | |
346 | ==================== Q2D Histogram ==================== | |
347 | ||
348 | DEV | <.005 <.010 <.025 <.050 <.075 <.100 <.250 <.500 < 1.0 >=1.0 | |
349 | --------- | ===== ===== ===== ===== ===== ===== ===== ===== ===== ===== | |
350 | ( 66, 80) | 61.2 7.9 12.1 7.9 3.0 1.4 1.5 0.2 0.0 4.6 | |
351 | ( 65,192) | 42.3 5.0 8.7 30.0 8.9 3.0 1.8 0.1 0.0 0.1 | |
352 | ( 65,128) | 34.3 5.3 8.9 32.0 9.7 3.7 5.3 0.6 0.0 0.1 | |
353 | ... | |
354 | ( 65, 64) | 59.9 4.2 6.0 24.6 4.2 0.8 0.1 0.0 0.0 0.1 | |
355 | ( 66, 64) | 62.6 8.1 12.7 7.9 2.4 0.6 0.1 0.0 0.0 5.4 | |
356 | ========== | ===== ===== ===== ===== ===== ===== ===== ===== ===== ===== | |
357 | AVG | 52.9 6.2 10.0 20.1 5.3 1.7 1.4 0.2 0.0 2.1 | |
358 | \end{verbatim} | |
359 | ||
512566d4 AB |
360 | \end{description} |
361 | ||
362 | \newpage\section{\label{sec:data-files}Data Files Output} | |
363 | ||
364 | Besides the averages output by default, the following 3 files are also | |
365 | created with data points which may be plotted. | |
366 | ||
367 | \begin{description} | |
368 | \item[\emph{file}.dat] This file provides a notion of \emph{activity} | |
369 | for the system, devices and processes. The details of this file are | |
370 | provided in section~\ref{sec:activity}. | |
371 | ||
372 | \item[\emph{file}\_qhist.dat] Provides histogram data for the size of | |
373 | incoming IO requests, for more information see section~\ref{sec:qhist}. | |
374 | ||
375 | \item[\emph{file}\_dhist.dat] Provides histogram data for the size | |
376 | of IO requests submitted to lower layer drivers, for more information | |
377 | see section~\ref{sec:dhist}. | |
378 | ||
379 | \end{description} | |
380 | ||
381 | Besides the default data files output, there are optional data files | |
382 | which can be generated by btt. These include: | |
383 | ||
384 | \begin{description} | |
385 | \item[iostat] iostat-like data can be distilled by btt, and is | |
386 | described in section~\ref{sec:iostat}. | |
387 | ||
388 | \item[per IO detail] Each and every IO traced can be output in a form | |
389 | that shows each of the IO components on consecutive lines (rather | |
390 | than grepping through a blkparse output file for example). The | |
391 | details on this file is included in section~\ref{sec:per-io}. | |
392 | ||
393 | \item[iostat] Latency information -- both Q2C and D2C -- | |
394 | on a per-IO basis can be generated. These are described in | |
395 | sections~\ref{sec:lat-q2c} and~\ref{sec:lat-d2c}. | |
396 | ||
4c48f14e | 397 | \item[seek details] A set of data files containing all IO-to-IO |
512566d4 AB |
398 | sector differences can be output, with details found in |
399 | section~\ref{sec:seek}. | |
fc16a815 AB |
400 | |
401 | \item[unplug histogram details] A data file per device containing | |
402 | histogram output for the amount of IOs released at unplug time. | |
403 | Section~\ref{sec:o-u} has more details. | |
512566d4 AB |
404 | \end{description} |
405 | ||
406 | \newpage\section{\label{sec:activity}Activity Data File} | |
407 | ||
408 | The activity data file contains a series of data values that indicate | |
409 | those periods of time when queue and complete traces are being | |
410 | processed. The values happen to be in a format easily handled by | |
411 | xmgrace\footnote{\texttt{http://plasma-gate.weizmann.ac.il/Grace/} | |
412 | ``Grace is a WYSIWYG 2D plotting tool for the X Window System and | |
413 | M*tif.''}, but is easy to parse for other plotting and/or analysis | |
414 | programs. | |
415 | ||
416 | The file is split into pairs of sets of data points, where each pair | |
417 | contains a set of queue activity and a set of completion activity. The | |
418 | points are presented with the first column (X values) being the time | |
419 | (in seconds), and the second column (Y values) providing an on/off | |
420 | type of setting. For each pair, the Y values have two settings off | |
421 | (low) and on (high). For example, here is a snippet of a file showing | |
422 | some Q activity: | |
423 | ||
424 | \begin{verbatim} | |
425 | # Total System | |
426 | # Total System : q activity | |
427 | 0.000000000 0.0 | |
428 | 0.000000000 0.4 | |
429 | 0.000070381 0.4 | |
430 | 0.000070381 0.0 | |
431 | 1.023482637 0.0 | |
432 | 1.023482637 0.4 | |
433 | 6.998746618 0.4 | |
434 | 6.998746618 0.0 | |
435 | 7.103336799 0.0 | |
436 | 7.103336799 0.4 | |
437 | 17.235419786 0.4 | |
438 | 17.235419786 0.0 | |
439 | 26.783361447 0.0 | |
440 | 26.783361447 0.4 | |
441 | 26.832454929 0.4 | |
442 | 26.832454929 0.0 | |
443 | 28.870431266 0.0 | |
444 | 28.870431266 0.4 | |
445 | 28.870431266 0.4 | |
446 | 28.870431266 0.0 | |
447 | \end{verbatim} | |
448 | ||
449 | What this indicates is that there was q activity for the system | |
450 | from 0.000000000 through 0.000070381, but was inactive from there to | |
451 | 1.023482637, and so on. Section~\ref{sec:o-d} contains details on how | |
452 | to adjust btt's notion of what constitutes activity. | |
453 | ||
454 | The pairs are arranged as follows: | |
455 | ||
456 | \begin{itemize} | |
457 | \item First there is the total system activity -- meaning activity | |
458 | in either queue or completion traces across all devices. | |
459 | ||
460 | \item Next comes per-device activity information -- for each device | |
461 | being traced, that request queues Q and C traces are presented. | |
462 | ||
463 | \item Last we present pairs per-process. | |
464 | \end{itemize} | |
465 | ||
466 | Using this, one is then able to plot regions of activity versus | |
467 | inactivity -- and one can gather a sense of deltas between the queueing | |
468 | of IOs and when they are completed. Figure~\ref{fig:activity} shows | |
469 | a very simplistic chart showing some activity: | |
470 | ||
471 | \begin{figure}[hb] | |
472 | \leavevmode\centering | |
7174703f | 473 | \epsfig{file=activity.eps,width=4.5in} |
512566d4 AB |
474 | \caption{\label{fig:activity}Simple Activity Chart} |
475 | \end{figure} | |
476 | ||
7174703f AB |
477 | When the black line (system Q activity) is \emph{high}, then the system |
478 | is seeing relatively continuous incoming queues. Conversely, when it is | |
479 | low, it represents an extended period of time where no queue requests | |
480 | were coming in. Similarly for the red line and C activity. | |
512566d4 AB |
481 | |
482 | \newpage\section{\label{sec:hist}Histogram Data Files} | |
483 | ||
484 | The histogram data files provide information concerning incoming and | |
485 | outgoing IO sizes (in blocks). For simplicity, the histogram buckets | |
486 | are one-for-one for sizes up to 1,024 blocks in the IO, and then a | |
487 | single bucket for all sizes greater than or equal to 1,024 blocks. | |
488 | ||
489 | The files are again in grace-friendly format, with the first set | |
490 | containing data for the first 1,023 buckets, and a separate set | |
491 | representing sizes $\ge 1024$ blocks. (This is done so that one can | |
492 | easily use a separate formatting specification for the latter set.) | |
493 | ||
494 | The first column (X values) is the various IO sizes, and the second | |
495 | column (Y values) represents the number of IOs of that size. | |
496 | ||
497 | \subsection{\label{sec:qhist}Q Histogram Data File} | |
498 | ||
499 | Figure~\ref{fig:qhist} is a sample graph generated from data used during | |
500 | some real-world analysis\footnote{Note the logarithmic nature of the | |
501 | Y axis for this chart.}. With the visual representation provided by | |
502 | this, one can quickly discern some different characteristics between | |
503 | the 3 runs -- in particular, one can see that there is only a single | |
504 | red point (representing 8 blocks per IO), whereas the other two had | |
505 | multiple data points greater than 8 blocks. | |
506 | ||
507 | \begin{figure}[hb] | |
508 | \leavevmode\centering | |
7174703f | 509 | \epsfig{file=qhist.eps,width=4.5in} |
512566d4 AB |
510 | \caption{\label{fig:qhist}Q Histogram} |
511 | \end{figure} | |
512 | ||
513 | \subsection{\label{sec:dhist}D Histogram Data File} | |
514 | ||
515 | Figure~\ref{fig:dhist} is a sample graph generated from data used during | |
516 | some real-world analysis\footnote{Note the logarithmic nature of the | |
517 | Y axis for this chart.}. Again, visually, one can see that the black | |
518 | and blue dots are somewhat similar below about 192 blocks per IO going | |
519 | out. And then one can make the broad generalization of higher reds, | |
520 | lower blues and blacks in the middle. | |
521 | ||
522 | \begin{figure}[hb] | |
523 | \leavevmode\centering | |
7174703f | 524 | \epsfig{file=dhist.eps,width=4.5in} |
512566d4 AB |
525 | \caption{\label{fig:dhist}D Histogram} |
526 | \end{figure} | |
527 | ||
528 | \newpage\section{\label{sec:iostat}iostat Data File} | |
529 | \texttt{btt} attempts to produce the results from running an | |
530 | \texttt{iostat -x} command in parallel with the system as it is being | |
531 | traced. The fields (columns) generated by the \texttt{--iostat} or | |
532 | \texttt{-I} option can be seen from the following output snippet -- | |
533 | note that the line has been split to fit on the printed page: | |
534 | ||
535 | \begin{verbatim} | |
536 | Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s | |
537 | rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util Stamp | |
538 | ... | |
539 | ( 8, 16) 0.00 0.00 0.00 1005.30 0.00 152806.36 | |
540 | 0.00 76403.18 152.00 31.00 0.00 0.00 0.00 71.79 | |
541 | ... | |
542 | ( 8, 16) 1.02 5.80 0.34 1.07 4.03 55.62 | |
543 | 2.02 27.81 42.13 0.61 0.00 21.90 0.00 TOTAL | |
544 | \end{verbatim} | |
545 | ||
546 | Note that the STAMP field contains the runtime (in seconds) for that | |
547 | line of data. | |
548 | ||
549 | \newpage\section{\label{sec:per-io}Per-IO Data File} | |
550 | ||
551 | \texttt{btt} can produce a text file containing time line data for each | |
552 | IO processed. The time line data contains rudimentary information for | |
553 | the following stages: | |
554 | ||
555 | \begin{itemize} | |
556 | \item queue traces | |
557 | \item get request traces | |
558 | \item insert traces | |
559 | \item merge traces | |
560 | \item issue traces | |
561 | \item completion traces | |
562 | \item remap traces | |
563 | \end{itemize} | |
564 | ||
565 | The \emph{--per-io-dump} or \emph{-p} option triggers this behavior, | |
566 | and will produce a file containing streams of IOs (separated by blank | |
4c48f14e AB |
567 | spaces). As an example, here is a snippet of 4 IOs that were merged |
568 | together, you will note there are 3 merged IOs, and 1 inserted in the | |
569 | stream. The issue and completion traces are replicated per IO. | |
512566d4 AB |
570 | |
571 | \begin{verbatim} | |
4c48f14e AB |
572 | 66,0 : 0.763283556 Q 6208+8 |
573 | 0.763300157 I 6208+8 | |
574 | 0.763296365 G 6208+8 | |
575 | 0.763338848 D 6208+32 | |
576 | 0.763705760 C 6208+32 | |
577 | ||
578 | 66,0 : 0.763314550 Q 6224+8 | |
579 | 0.763315341 M 6224+8 | |
580 | 0.763338848 D 6208+32 | |
581 | 0.763705760 C 6208+32 | |
582 | ||
583 | 66,0 : 0.763321010 Q 6232+8 | |
584 | 0.763321775 M 6232+8 | |
585 | 0.763338848 D 6208+32 | |
586 | 0.763705760 C 6208+32 | |
587 | ||
588 | 65,240: 0.763244173 Q 6216+8 | |
589 | 0.763244974 M 6216+8 | |
590 | 0.763374288 D 6208+32 | |
591 | 0.763826610 C 6208+32 | |
512566d4 AB |
592 | \end{verbatim} |
593 | ||
594 | The columns provide the following information: | |
595 | ||
596 | \begin{enumerate} | |
512566d4 AB |
597 | \item Device major/minor. |
598 | ||
4c48f14e AB |
599 | \item Time of the trace (seconds from the start of the run) |
600 | ||
512566d4 AB |
601 | \item Trace type |
602 | ||
603 | \item start block + number of blocks | |
604 | \end{enumerate} | |
605 | ||
512566d4 AB |
606 | \newpage\section{\label{sec:lat}\label{sec:lat-q2c}\label{sec:lat-d2c}Latency Data Files} |
607 | ||
608 | The latency data files which can be optionally produced by \texttt{btt} | |
609 | provide per-IO latency information, one for total IO time (Q2C) and | |
610 | one for latencies induced by lower layer drivers and devices (D2C). | |
611 | ||
612 | In both cases, the first column (X values) represent runtime (seconds), | |
613 | while the second column (Y values) shows the actual latency for a | |
614 | command at that time (either Q2C or D2C). | |
615 | ||
512566d4 AB |
616 | \newpage\section{\label{sec:seek}Seek Data File} |
617 | ||
4c48f14e | 618 | \texttt{btt} can also produce two data files containing all IO-to-IO sector |
512566d4 AB |
619 | deltas, providing seek information which can then be plotted. The |
620 | produced data file contains 3 sets of data: | |
621 | ||
622 | \begin{enumerate} | |
623 | \item Combined data -- all read and write IOs | |
624 | ||
625 | \item Read data -- just seek deltas for reads | |
626 | ||
627 | \item Write data -- just seek deltas for writes | |
628 | \end{enumerate} | |
629 | ||
4c48f14e AB |
630 | The format of the output file names is to have the name generated by |
631 | the following fields separated by underscores (\texttt{\_}): | |
632 | ||
633 | \begin{itemize} | |
634 | \item The prefix provided as the argument to the \texttt{-s} option. | |
635 | \item The major and minor numbers of the device separated by a comma. | |
636 | \item The string \texttt{q2q} or \texttt{d2d}, indicating the Q2Q or | |
637 | D2D seeks, respectively. | |
638 | \item One of the following characters: | |
639 | \begin{description} | |
640 | \item[r] For read (device to system) IOs | |
641 | \item[w] For write (system to device) IOs | |
642 | \item[c] Combined -- both read and write IOs | |
643 | \end{description} | |
644 | \end{itemize} | |
645 | ||
646 | An example name would be after specifying \texttt{-s seek} would be: | |
647 | \texttt{seek\_065,048\_q2q\_w.dat}. | |
648 | ||
512566d4 AB |
649 | The format of the data is to have the runtime values (seconds since |
650 | the start of the run) in column 1 (X values); and the difference in | |
651 | sectors from the previous IO in column 2 (Y values). Here is a snippet | |
652 | of the first few items from a file: | |
653 | ||
654 | \begin{verbatim} | |
655 | # Combined | |
656 | 0.000034733 35283790.0 | |
657 | 0.000106453 35283790.0 | |
658 | 0.005239009 35283950.0 | |
659 | 0.006968575 35283886.0 | |
660 | 0.007218709 35283694.0 | |
661 | 0.012145393 35283566.0 | |
662 | 0.014980835 -35848914.0 | |
663 | 0.024239323 -35848914.0 | |
664 | 0.024249402 -35848914.0 | |
665 | 0.025707095 -35849072.0 | |
666 | ... | |
667 | \end{verbatim} | |
668 | ||
669 | Figure~\ref{fig:seek} shows a simple graph that can be produced which | |
670 | provides visual details concerning seek patterns. | |
671 | ||
7174703f | 672 | \begin{figure}[h!] |
512566d4 | 673 | \leavevmode\centering |
7174703f | 674 | \epsfig{file=seek.eps,width=4.5in} |
512566d4 AB |
675 | \caption{\label{fig:seek}Seek Chart} |
676 | \end{figure} | |
69040794 AB |
677 | \FloatBarrier |
678 | ||
679 | The seek difference is calculated in one of two ways: | |
680 | ||
681 | \begin{description} | |
682 | \item[default] By default, the seek distance is calculated as the | |
683 | \emph{closest} distance between the previous IO and this IO. The | |
684 | concept of \emph{closeness} means that it could either be the | |
685 | \emph{end} of the previous IO and the beginning of the next, or the | |
686 | end of this IO and the start of the next. | |
687 | ||
688 | \item[\texttt{-a}] If the \texttt{-a} or \texttt{--seek-absolute} | |
689 | option is specified, then the seek distance is simply the difference | |
690 | between the end of the previous IO and the start of this IO. | |
691 | \end{description} | |
512566d4 AB |
692 | |
693 | \newpage\section{\label{sec:cmd-line}Command Line} | |
694 | ||
695 | \begin{verbatim} | |
50f73899 | 696 | Usage: btt 2.00 |
69040794 | 697 | [ -a | --seek-absolute ] |
512566d4 | 698 | [ -A | --all-data ] |
69040794 | 699 | [ -B <output name> | --dump-blocknos=<output name> ] |
512566d4 AB |
700 | [ -d <seconds> | --range-delta=<seconds> ] |
701 | [ -D <dev;...> | --devices=<dev;...> ] | |
702 | [ -e <exe,...> | --exes=<exe,...> ] | |
703 | [ -h | --help ] | |
704 | [ -i <input name> | --input-file=<input name> ] | |
705 | [ -I <output name> | --iostat=<output name> ] | |
706 | [ -l <output name> | --d2c-latencies=<output name> ] | |
707 | [ -M <dev map> | --dev-maps=<dev map> | |
708 | [ -o <output name> | --output-file=<output name> ] | |
709 | [ -p <output name> | --per-io-dump=<output name> ] | |
710 | [ -q <output name> | --q2c-latencies=<output name> ] | |
711 | [ -s <output name> | --seeks=<output name> ] | |
712 | [ -S <interval> | --iostat-interval=<interval> ] | |
713 | [ -t <sec> | --time-start=<sec> ] | |
714 | [ -T <sec> | --time-end=<sec> ] | |
fc16a815 | 715 | [ -u <output name> | --unplug-hist=<output name> ] |
512566d4 AB |
716 | [ -V | --version ] |
717 | [ -v | --verbose ] | |
718 | \end{verbatim} | |
719 | ||
69040794 AB |
720 | \subsection{\label{sec:o-a}\texttt{--seek-absolute}/\texttt{-a}} |
721 | ||
722 | When specified on the command line, this directs btt to calculate | |
723 | seek distances based solely upon the ending block address of one IO, | |
724 | and the start of the next. By default \texttt{btt} uses the concept | |
725 | of the closeness to either the beginning or end of the previous IO. See | |
726 | section~\ref{sec:seek} for more details about seek distances. | |
727 | ||
512566d4 AB |
728 | \subsection{\label{sec:o-A}\texttt{--all-data}/\texttt{-A}} |
729 | ||
730 | Normally \texttt{btt} will not print out verbose information | |
731 | concerning per-process and per-device data (as outlined in | |
732 | section~\ref{sec:detailed-data}). If you desire that level of | |
733 | detail you can specify this option. | |
734 | ||
69040794 AB |
735 | \subsection{\label{sec:o-B}\texttt{--dump-blocknos}/\texttt{-B}} |
736 | ||
737 | This option will output absolute block numbers to three files prefixed | |
738 | by the specified output name: | |
739 | ||
740 | \begin{description} | |
741 | \item[\emph{prefix}\_\emph{device}\_r.dat] All read block numbers are | |
742 | output, first column is time (seconds), second is the block number, | |
743 | and the third column is the ending block number. | |
744 | ||
745 | \item[\emph{prefix}\_\emph{device}\_w.dat] All write block numbers are | |
746 | output, first column is time (seconds), second is the block number, | |
747 | and the third column is the ending block number. | |
748 | ||
749 | \item[\emph{prefix}\_\emph{device}\_c.dat] All block numbers (read | |
750 | and write) are output, first column is time (seconds), second is | |
751 | the block number, and the third column is the ending block number. | |
752 | \end{description} | |
753 | ||
512566d4 AB |
754 | \subsection{\label{sec:o-d}\texttt{--range-delta}/\texttt{-d}} |
755 | ||
756 | Section~\ref{sec:activity} discussed how \texttt{btt} outputs a file | |
757 | containing Q and C activity, the notion of \emph{active} traces simply | |
758 | means that there are Q or C traces occurring within a certain period | |
759 | of each other. The default values is 0.1 seconds; with this option | |
760 | allowing one to change that granularity. The smaller the value, the | |
761 | more data points provided. | |
762 | ||
763 | \subsection{\label{sec:o-D}\texttt{--devices}/\texttt{-D}} | |
764 | ||
765 | Normally, \texttt{btt} will produce data for all devices detected in | |
766 | the traces parsed. With this option, one can reduce the analysis to | |
767 | one or more devices provided in the string passed to this option. The | |
768 | device identifiers are the major and minor number of each device, and | |
769 | each device identifier is separated by a colon (:). A valid specifier | |
770 | for devices 8,0 and 8,8 would then be: \texttt{"8,0:8,8"}. | |
771 | ||
772 | \subsection{\label{sec:o-e}\texttt{--exes}/\texttt{-e}} | |
773 | ||
774 | Likewise, \texttt{btt} will produce data for all processes (executables) | |
775 | found in the traces. With this option, one can specify which processes | |
776 | you want displayed in the output. The format of the string passed is | |
777 | a list of executable \emph{names} separated by commas (,). An example | |
778 | would be \texttt{"-e mkfs.ext3,mount"}. | |
779 | ||
780 | \subsection{\label{sec:o-h}\texttt{--help}/\texttt{-h}} | |
781 | ||
782 | Prints out the simple help information, as seen at the top of | |
783 | section~\ref{sec:cmd-line}. | |
784 | ||
785 | \subsection{\label{sec:o-i}\texttt{--input-file}/\texttt{-i}} | |
786 | ||
787 | Specifies the binary input file that \texttt{btt} will interpret traces | |
788 | in. See section~\ref{sec:getting-started} for information concerning | |
789 | binary trace files. | |
790 | ||
791 | \subsection{\label{sec:o-I}\texttt{--iostat}/\texttt{-I}} | |
792 | ||
793 | This option triggers \texttt{btt} to generate iostat-like output to the | |
794 | file specified. Refer to section~\ref{sec:iostat} for more information | |
795 | on the output produced. | |
796 | ||
797 | \subsection{\label{sec:o-l}\texttt{--d2c-latencies}/\texttt{-l}} | |
798 | ||
799 | This option instructs \texttt{btt} to generate the D2C latency file | |
800 | discussed in section~\ref{sec:lat-d2c}. | |
801 | ||
802 | \subsection{\label{sec:o-M}\texttt{--dev-maps}/\texttt{-M}} | |
803 | ||
804 | Internal option, still under construction. | |
805 | ||
806 | \subsection{\label{sec:o-o}\texttt{--output-file}/\texttt{-o}} | |
807 | ||
808 | Normally \texttt{btt} sends the statistical output (covered in | |
809 | section~\ref{sec:output-overview}) to standard out, if you specify | |
810 | this option this data is redirected to the file specified. | |
811 | ||
812 | \subsection{\label{sec:o-p}\texttt{--per-io-dump}/\texttt{-p}} | |
813 | ||
814 | This option tells \texttt{btt} to generate the per IO dump file as | |
815 | discussed in section~\ref{sec:per-io}. | |
816 | ||
817 | \subsection{\label{sec:o-q}\texttt{--q2c-latencies}/\texttt{-q}} | |
818 | ||
819 | This option instructs \texttt{btt} to generate the Q2C latency file | |
820 | discussed in section~\ref{sec:lat-q2c}. | |
821 | ||
822 | \subsection{\label{sec:o-s}\texttt{--seeks}/\texttt{-s}} | |
823 | ||
fc16a815 | 824 | This option instructs \texttt{btt} to generate the seek data file |
512566d4 AB |
825 | discussed in section~\ref{sec:seek}. |
826 | ||
827 | \subsection{\label{sec:o-S}\texttt{--iostat-interval}/\texttt{-S}} | |
828 | ||
829 | The normal \texttt{iostat} command allows one to specify the snapshot | |
830 | interval, likewise, \texttt{btt} allows one to specify how many seconds | |
831 | between its generation of snapshots of the data via this option. Details | |
832 | about the iostat-like capabilities of \texttt{btt} may be found in | |
7174703f | 833 | section~\ref{sec:iostat}. |
512566d4 AB |
834 | |
835 | \subsection{\label{sec:o-tT}\texttt{--time-start}/\texttt{-t} and | |
836 | \texttt{--time-end}/\texttt{T}} | |
837 | ||
838 | \begin{quote} | |
839 | \emph{This \texttt{btt} capability is still under construction, results are | |
840 | not always consistent at this point in time.} | |
841 | \end{quote} | |
842 | ||
843 | These options allow one to dictate to \texttt{btt} when to start and stop | |
844 | parsing of trace data in terms of seconds since the start of the run. The | |
845 | trace chosen will be between the start time (or 0.0 if not | |
846 | specified) and end time (or the end of the run) specified. | |
847 | ||
fc16a815 AB |
848 | \subsection{\label{sec:o-u}\texttt{--unplug-hist}/\texttt{-u}} |
849 | ||
850 | This option instructs \texttt{btt} to generate a data file containing | |
851 | histogram information for \emph{unplug} traces on a per device | |
852 | basis. It shows how many times an unplug was hit with a specified | |
853 | number of IOs released. There are 21 output values into the file, as | |
854 | follows: | |
855 | ||
856 | \medskip | |
857 | \begin{tabular}{ll} | |
858 | \textbf{X value} & \textbf{Representing Counts} \\\hline | |
859 | 0 & 0\dots\/4 \\ | |
860 | 1 & 5\dots\/9 \\ | |
861 | 2 & 10\dots\/14 \\ | |
862 | \dots & \dots\dots\\ | |
863 | 19 & 95\dots\/99 \\ | |
864 | 20 & 100+ \\ | |
865 | \end{tabular} | |
866 | ||
867 | \medskip | |
868 | The file name(s) generated use the text string passed as an argument for | |
869 | the prefix, followed by the device identifier in \texttt{major,minor} | |
870 | form, with a \texttt{.dat} extension (as an example, with \texttt{-u | |
871 | up\_hist} specified on the command line: \texttt{up\_hist\_008,032.dat}. | |
872 | ||
512566d4 AB |
873 | \subsection{\label{sec:o-V}\texttt{--version}/\texttt{-V}} |
874 | ||
875 | Prints out the \texttt{btt} version, and exits. | |
876 | ||
877 | \subsection{\label{sec:o-v}\texttt{--verbose}/\texttt{-v}} | |
878 | ||
879 | While \texttt{btt} is processing data, it will put out periodic (1-second | |
880 | granularity) values describing the progress it is making through the | |
881 | input trace stream. The value describes how many traces have been | |
882 | processed. At the end of the run, the overall number of traces, trace | |
883 | rate (number of thousands of traces per second), and the real time for | |
884 | trace processing and output are displayed. Example (note: the interim | |
885 | trace counts are put out with carriage returns, hence, they overwrite | |
886 | each time): | |
887 | ||
888 | \begin{verbatim} | |
889 | # btt -i bp.bin -o btt -v | |
890 | Sending range data to bttX.dat | |
891 | Sending stats data to bttX.avg | |
892 | 287857 t | |
893 | 1414173 t | |
894 | 1691581 t | |
895 | ... | |
896 | 4581291 traces @ 279.7 Ktps | |
897 | 16.379036+0.000005=16.379041 | |
898 | \end{verbatim} | |
899 | ||
fc16a815 | 900 | \newpage\section*{\label{sec:appendix}Appendix: Sample \texttt{btt} |
512566d4 AB |
901 | Output} |
902 | ||
903 | Here is a complete output file from a btt run, illustrating a lot of the | |
904 | capabilities of btt. | |
4c48f14e | 905 | \input{sample-btt-output.tex} |
512566d4 | 906 | \end{document} |