\documentclass{article}
%
-% Copyright (C) 2005 Alan D. Brunelle <Alan.Brunelle@hp.com>
+% Copyright (C) 2005, 2006 Alan D. Brunelle <Alan.Brunelle@hp.com>
%
% This program is free software; you can redistribute it and/or modify
% it under the terms of the GNU General Public License as published by
%
\title{blktrace User Guide}
-\author{blktrace: Jens Axboe (axboe@suse.de)\\
+\author{blktrace: Jens Axboe (jens.axboe@oracle.com)\\
User Guide: Alan D. Brunelle (Alan.Brunelle@hp.com)}
-\date{4 October 2005}
+\date{27 May 2008}
\begin{document}
\maketitle
\begin{description}
\item[Kernel patch] A patch to the Linux kernel which includes the
kernel event logging interfaces, and patches to areas within the block
- layer to emit event traces.
+ layer to emit event traces. If you run a 2.6.17-rc1 or newer kernel,
+ you don't need to patch blktrace support as it is already included.
\item[blktrace] A utility which transfers event traces from the kernel
into either long-term on-disk storage, or provides direct formatted
The blktrace and blkparse utilities and associated kernel patch are provided
as part of the following git repository:
-rsync://rsync.kernel.org/pub/scm/linux/kernel/git/axboe/blktrace.git
+git://git.kernel.org/pub/scm/linux/kernel/git/axboe/blktrace.git bt
%--------------------------
\newpage\section{\label{sec:quick-start}Quick Start Guide}
As noted above, the kernel patch along with the blktrace and blkparse utilities are stored in a git repository. One simple way to get going would be:
\begin{verbatim}
-% git clone rsync://rsync.kernel.org/pub/scm/linux/kernel/git/axboe/blktrace.git bt
+% git clone git://git.kernel.org/pub/scm/linux/kernel/git/axboe/blktrace.git bt
% cd bt
% git checkout
\end{verbatim}
\emph{bt} is the name of the directory from the above git sequence). The
detailed actual patching instructions for a Linux kernel is outside the
scope of this document, but the following may be used as a sample template.
+Note that you may skip this step, if you kernel is at least 2.6.17-rc1.
As an example, bt/kernel contains blk-trace-2.6.14-rc1-git-G2, download
linux-2.6.13.tar.bz2 and patch-2.6.14-rc1.bz2
Install the new kernel (and modules\ldots) and reboot.
-\subsection{\label{sec:mount}Mounting the RelayFS file system}
+\subsection{\label{sec:mount}Mounting the debugfs file system}
-blktrace utilizes files under the Relay file system, and thus must have
-the mount point set up -- mounted on the directory /relay. To do this
-one may do either of the following:
+blktrace utilizes files under the debug file system, and thus must have
+the mount point set up -- mounted on the directory /sys/kernel/debug.
+To do this one may do either of the following:
\begin{enumerate}
\item Manually mount after each boot:
\begin{verbatim}
-% mount -t relayfs relayfs /relay
+% mount -t debugfs debugfs /sys/kernel/debug
\end{verbatim}
\item Add an entry into /etc/fstab, and have it done automatically at
each boot\footnote{Note: after adding the entry to /etc/fstab, you
- could then mount the directory this time only by doing: \% mount /relay}:
+ could then mount the directory this time only by doing: \% mount debug}:
\begin{verbatim}
-relay /relay relayfs default 0 0
+debug /sys/kernel/debug debugfs default 0 0
\end{verbatim}
\end{enumerate}
\newpage\section{\label{sec:blktrace-ug}blktrace User Guide}
The \emph{blktrace} utility extracts event traces from the kernel (via
-the relay file system). Some background details concerning the run-time
-behaviour of blktrace will help to understand some of the more arcane
-command line options:
+the relaying through the debug file system). Some background details
+concerning the run-time behaviour of blktrace will help to understand some
+of the more arcane command line options:
\begin{itemize}
\item blktrace receives data from the kernel in buffers passed up
- through the Relay file system (RelayFS). Each device being traced has
- a file created in the mounted directory for the RelayFS, which defaults
- to \emph{/relay} -- this can be overridden with the \emph{-r} command
- line argument.
+ through the debug file system (relay). Each device being traced has
+ a file created in the mounted directory for the debugfs, which defaults
+ to \emph{/sys/kernel/debug} -- this can be overridden with the \emph{-r}
+ command line argument.
\item blktrace defaults to collecting \emph{all} events that can be
traced. To limit the events being captured, you can specify one or
representation of the filter mask.)
\item As noted above, the events are passed up via a series of buffers
- stored into RelayFS files. The size and number of buffers can be
+ stored into debugfs files. The size and number of buffers can be
specified via the \emph{-b} and \emph{-n} arguments respectively.
\item blktrace stores the extracted data into files stored in the
-d \emph{dev} & --dev=\emph{dev} & Adds \emph{dev} as a device to trace \\ \hline
-k & --kill & Kill on-going trace \\ \hline
-n \emph{num-sub} & --num-sub=\emph{num-sub} & Specifies number of buffers to use \\ \hline
--o \emph{file} & --output=\emph{file} & Prepend \emph{file} to output file name(s) \\ \hline
--r \emph{rel-path} & --relay=\emph{rel-path} & Specifies RelayFS mount point \\ \hline
+-o \emph{file} & --output=\emph{file} & Prepend \emph{file} to output file name(s) \\
+ & & \textbf{This only works when using a single device} \\
+ & & \textbf{or when piping the output via \texttt{-o -}} \\
+ & & \textbf{with multiple devices.} \\ \hline
+-r \emph{rel-path} & --relay=\emph{rel-path} & Specifies debugfs mount point \\ \hline
-V & --version & Outputs version \\ \hline
-w \emph{seconds} & --stopwatch=\emph{seconds} & Sets run time to the number of seconds specified \\ \hline
+-I \emph{devs file}& --input-devs=\emph{devs file}& Adds devices found in \emph{devs file} to list of devices to trace. \\
+ & & (One device per line.) \\ \hline
\end{tabular}
\subsubsection{\label{sec:filter-mask}Filter Masks}
requeue & \emph{requeue} operations \\ \hline
sync & \emph{synchronous} attribute \\ \hline
write & \emph{write} traces \\ \hline
+notify & \emph{notify} trace messages \\ \hline
\end{tabular}
\subsubsection{\label{sec:request-types}Request types}
\item The format of the output data can be controlled via the \emph{-f}
or \emph{-F} options -- see section~\ref{sec:blkparse-format} for details.
- By default, blkparse sends formatted data to standard output. This
- may be changed via the \emph{-o} option.
+ By default, blkparse sends formatted data to standard output. This may
+ be changed via the \emph{-o} option, or text output can be disabled
+ via the\emph{-O} option. A merged binary stream can be produced using
+ the \emph{-d} option.
\end{itemize}
-m & --missing & Print missing entries\\ \hline
--n & --hash-by-name & Hash processes by name, not by PID\\ \hline
+-h & --hash-by-name & Hash processes by name, not by PID\\ \hline
-o \emph{file} & --output=\emph{file} & Output file \\ \hline
+-O & --no-text-output & Do \emph{not} produce text output, used for binary (-d) only \\ \hline
+
+-d \emph{file} & --dump-binary=\emph{file} & Binary output file \\ \hline
-q & --quiet & Quite mode \\ \hline
& & \emph{start:end-time} -- Display traces from time \emph{start} \\
& & through {end-time} (in ns). \\ \hline
+-M & --no-msgs & Do not add messages to binary output file \\\hline
-v & --verbose & More verbose marginal on marginal errors \\ \hline
-V & --version & Display version \\ \hline
\item[Q -- queued] This notes intent to queue io at the given location.
No real requests exists yet.
- \item[W -- bounced] The data pages attached to this \emph{bio} are
+ \item[B -- bounced] The data pages attached to this \emph{bio} are
not reachable by the hardware and must be bounced to a lower memory
location. This causes a big slowdown in io performance, since the data
must be copied to/from kernel buffers. Usually this can be fixed with
using better hardware - either a better io controller, or a platform
with an IOMMU.
- \item[B -- back merge] A previously inserted request exists that ends
+ \item[m -- message] Text message generated via kernel call to
+ \texttt{blk\_add\_trace\_msg}.
+
+ \item[M -- back merge] A previously inserted request exists that ends
on the boundary of where this io begins, so the io scheduler can merge
them together.
\item[F -- front merge] Same as the back merge, except this io ends
where a previously inserted requests starts.
- \item[M -- front or back merge] One of the above.
-
\item[G -- get request] To send any type of request to a block device,
a \emph{struct request} container must be allocated first.
\begin{tabular}{|l|l|}\hline
Act & Description \\ \hline\hline
A & IO was remapped to a different device \\ \hline
-B & IO back merged with request on queue \\ \hline
+B & IO bounced \\ \hline
C & IO completion \\ \hline
D & IO issued to driver \\ \hline
F & IO front merged with request on queue \\ \hline
G & Get request \\ \hline
I & IO inserted onto request queue \\ \hline
+M & IO back merged with request on queue \\ \hline
P & Plug request \\ \hline
Q & IO handled by request queue code \\ \hline
S & Sleep request \\ \hline
T & Unplug due to timeout \\ \hline
U & Unplug request \\ \hline
-W & IO bounced \\ \hline
X & Split \\ \hline
\end{tabular}
\subsubsection{\label{sec:act-table}RWBS Description}
This is a small string containing at least one character ('R' for read,
-'W' for write operation), and optionally either a 'B' (for barrier
-operations) or 'S' (for synchronous operations).
+'W' for write, or 'D' for block discard operation), and optionally either
+a 'B' (for barrier operations) or 'S' (for synchronous operations).
\subsubsection{\label{sec:default-output}Default output}
\item[D -- issued]
\item[I -- inserted]
\item[Q -- queued]
- \item[W -- bounced] If a payload is present, the number of payload bytes
+ \item[B -- bounced] If a payload is present, the number of payload bytes
is output, followed by the payload in hexadecimal between parenthesis.
If no payload is present, the sector and number of blocks are presented
either case, it is followed by the command associated with the event
(surrounded by square brackets).
- \item[B -- back merge]
+ \item[M -- back merge]
\item[F -- front merge]
\item[G -- get request]
- \item[M -- front or back merge]
\item[S -- sleep] The starting sector and number of blocks is output
(with an intervening plus (+) character), followed by the command
associated with the event (surrounded by square brackets).
\item[A -- remap] Sector and length is output, along with the original
device and sector offset.
+ \item[m -- message] The supplied message is appended to the end of
+ the standard header.
+
\end{description}
%------------------------------
Adds a trace with a remap event. \emph{dev} and \emph{sector} denote
the original device this \emph{bio} was mapped from.
+ \item[blk\_add\_trace\_msg(struct request\_queue *q, char *fmt, ...)]
+ Adds a formatted message to the output stream. The total message
+ size can not exceed BLK\_TN\_MSG\_MSG characters (currently
+ 1024). Standard format conversions are supported (as supplied
+ by \texttt{vscnprintf}.
\end{description}
\end{document}