Age | Commit message (Collapse) | Author |
|
Commit dd093eb1c48e ("Fix warnings on newer gcc") moved string buffers holding
device names during map file parse stage to stack. However, only pointers to
them are being stored in the allocated "struct map_dev" structure. These
pointers are invalid outside of scope of this function and in a different
thread context. Also "release_map_devs" function still tries to "free" them
later as if they were allocated on the heap.
Moving the buffers back to the heap by instructing "fscanf" to allocate them
while parsing the file.
Alternatively, we could redefine the "struct map_dev" to include the whole
buffers instead of just pointers to them and free them as part of releasing the
whole "struct map_dev".
Fixes: dd093eb1c48e ("Fix warnings on newer gcc")
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Displays each program's data sorted by program name or io event, like
Queued, Read, Write and Complete. When -S is specified the -s will be ignored.
The capital letters Q,R,W,C stand for KB, then q/r/w/c stand for IO.
The N is used for sorting programs by name, same to -s.
If you want to sort programs by how many data they queued, you can use:
blkparse -i sda.blktrace. -q -S Q -o sda.parse
Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
iowatcher currently always spawns 8 rsvg-convert processes, no matter
how many CPUs a system has. I did some limited testing of different
numbers of rsvg-convert processes. Here are the results:
8 processes:
real 4m2.194s
user 23m36.665s
sys 0m38.523s
20 processes:
real 2m28.935s
user 24m51.817s
sys 0m49.227s
40 processes:
real 2m28.150s
user 24m56.994s
sys 0m49.621s
Note that this is the time it takes for a full run of iowatcher -- I
didn't separate out just the rsvg-convert portion.
Given the above results, it seems like a reasonable thing to spawn one
rsvg-convert process per cpu.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Hi,
Bryan Gurney reported iowatcher taking a *really* long time to generate
a movie for 16GiB worth of trace data. I took a look, and the io hash
was growing without bounds. The reason was that the I/O pattern looks
like this:
259,0 5 8 0.000208501 31708 A W 5435592 + 8 <- (259,1) 5433544
259,1 5 9 0.000209537 31708 Q W 5435592 + 8 [kvdo30:bioQ0]
259,1 5 10 0.000209880 31708 G W 5435592 + 8 [kvdo30:bioQ0]
259,0 5 11 0.000211064 31708 A W 5435600 + 8 <- (259,1) 5433552
259,1 5 12 0.000211347 31708 Q W 5435600 + 8 [kvdo30:bioQ0]
259,1 5 13 0.000212957 31708 M W 5435600 + 8 [kvdo30:bioQ0]
259,0 5 14 0.000213379 31708 A W 5435608 + 8 <- (259,1) 5433560
259,1 5 15 0.000213629 31708 Q W 5435608 + 8 [kvdo30:bioQ0]
259,1 5 16 0.000213937 31708 M W 5435608 + 8 [kvdo30:bioQ0]
...
259,1 5 107 0.000246274 31708 D W 5435592 + 256 [kvdo30:bioQ0]
For each of those Q events, an entry was created in the io_hash. Then,
upon I/O completion, only the first event (with the right starting
sector) was removed! The runtime overhead of just iterating the hash
chains was enormous.
The solution is to simply ignore the Q events, so long as there are D
events in the trace. If there are no D events, then go ahead and hash
the Q events as before. I'm hoping that if we only have Q and C, that
they will actually be aligned. If that's an incorrect assumption, we
could account merges in an rbtree. I'll defer that work until someone
can show me blktrace data that needs it.
The comments should be self explanatory. Review would be appreciated
as the code isn't well documented, and I don't know if I'm missing some
hidden assumption about the data.
Before applying this patch, iowatcher would take more than 12 hours to
complete. After the patch:
real 9m44.476s
user 41m35.426s
sys 3m29.106s
'nuf said.
Cheers,
Jeff
Reviewed-by: Chris Mason <clm@fb.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Many distributions are moving to python3 by default. Here's
an attempt to make the python scripts in blktrace python3-ready.
Most of this was done with automated tools. I hand fixed some
space-vs tab issues, and cast an array index to integer. It
passes rudimentary testing when run under python2.7 as well
as python3.
This doesn't do anything with the shebangs, it leaves them both
invoking whatever "env python" coughs up on the system.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Herbo Zhang reports:
I found a bug in blktrace/btt/devmap.c. The code is just as follows:
https://git.kernel.org/pub/scm/linux/kernel/git/axboe/blktrace.git/tree/btt/devmap.c?id=8349ad2f2d19422a6241f94ea84d696b21de4757
struct devmap {
struct list_head head;
char device[32], devno[32]; // #1
};
LIST_HEAD(all_devmaps);
static int dev_map_add(char *line)
{
struct devmap *dmp;
if (strstr(line, "Device") != NULL)
return 1;
dmp = malloc(sizeof(struct devmap));
if (sscanf(line, "%s %s", dmp->device, dmp->devno) != 2) { //#2
free(dmp);
return 1;
}
list_add_tail(&dmp->head, &all_devmaps);
return 0;
}
int dev_map_read(char *fname)
{
char line[256]; // #3
FILE *fp = my_fopen(fname, "r");
if (!fp) {
perror(fname);
return 1;
}
while (fscanf(fp, "%255[a-zA-Z0-9 :.,/_-]\n", line) == 1) {
if (dev_map_add(line))
break;
}
fclose(fp);
return 0;
}
The line length is 256, but the dmp->device, dmp->devno max length
is only 32. We can put strings longer than 32 into dmp->device and
dmp->devno , and then they will be overflowed.
we can trigger this bug just as follows:
$ python -c "print 'A'*256" > ./test
$ btt -M ./test
*** Error in btt': free(): invalid next size (fast): 0x000055ad7349b250 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f7f158ce7e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x7fe0a)[0x7f7f158d6e0a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f7f158da98c]
btt(+0x32e0)[0x55ad7306f2e0]
btt(+0x2c5f)[0x55ad7306ec5f]
btt(+0x251f)[0x55ad7306e51f]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f7f15877830]
btt(+0x26b9)[0x55ad7306e6b9]
======= Memory map: ========
55ad7306c000-55ad7307f000 r-xp 00000000 08:14 3698139
/usr/bin/btt
55ad7327e000-55ad7327f000 r--p 00012000 08:14 3698139
/usr/bin/btt
55ad7327f000-55ad73280000 rw-p 00013000 08:14 3698139
/usr/bin/btt
55ad73280000-55ad73285000 rw-p 00000000 00:00 0
55ad7349a000-55ad734bb000 rw-p 00000000 00:00 0
[heap]
7f7f10000000-7f7f10021000 rw-p 00000000 00:00 0
7f7f10021000-7f7f14000000 ---p 00000000 00:00 0
7f7f15640000-7f7f15656000 r-xp 00000000 08:14 14942237
/lib/x86_64-linux-gnu/libgcc_s.so.1
7f7f15656000-7f7f15855000 ---p 00016000 08:14 14942237
/lib/x86_64-linux-gnu/libgcc_s.so.1
7f7f15855000-7f7f15856000 r--p 00015000 08:14 14942237
/lib/x86_64-linux-gnu/libgcc_s.so.1
7f7f15856000-7f7f15857000 rw-p 00016000 08:14 14942237
/lib/x86_64-linux-gnu/libgcc_s.so.1
7f7f15857000-7f7f15a16000 r-xp 00000000 08:14 14948477
/lib/x86_64-linux-gnu/libc-2.23.so
7f7f15a16000-7f7f15c16000 ---p 001bf000 08:14 14948477
/lib/x86_64-linux-gnu/libc-2.23.so
7f7f15c16000-7f7f15c1a000 r--p 001bf000 08:14 14948477
/lib/x86_64-linux-gnu/libc-2.23.so
7f7f15c1a000-7f7f15c1c000 rw-p 001c3000 08:14 14948477
/lib/x86_64-linux-gnu/libc-2.23.so
7f7f15c1c000-7f7f15c20000 rw-p 00000000 00:00 0
7f7f15c20000-7f7f15c46000 r-xp 00000000 08:14 14948478
/lib/x86_64-linux-gnu/ld-2.23.so
7f7f15e16000-7f7f15e19000 rw-p 00000000 00:00 0
7f7f15e42000-7f7f15e45000 rw-p 00000000 00:00 0
7f7f15e45000-7f7f15e46000 r--p 00025000 08:14 14948478
/lib/x86_64-linux-gnu/ld-2.23.so
7f7f15e46000-7f7f15e47000 rw-p 00026000 08:14 14948478
/lib/x86_64-linux-gnu/ld-2.23.so
7f7f15e47000-7f7f15e48000 rw-p 00000000 00:00 0
7ffdebe5c000-7ffdebe7d000 rw-p 00000000 00:00 0
[stack]
7ffdebebc000-7ffdebebe000 r--p 00000000 00:00 0
[vvar]
7ffdebebe000-7ffdebec0000 r-xp 00000000 00:00 0
[vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0
[vsyscall]
[1] 6272 abort btt -M test
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Signed-off-by: Weiping Zhang <zhangweiping@didichuxing.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
remove dupliated entry 'M' for man page of blkparse.
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Weiping Zhang <zhangweiping@didichuxing.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
if we run blktrace on same device twice, the second time will failed
to ioctl(BLKTRACESETUP), then it will call __stop_tracer, which lead
the first blktrace failed to access debugfs entries. So this patch add
a check to handle this case, to avoid stop tracer uncondionally.
Signed-off-by: weiping zhang <zhangweiping@didichuxing.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When building in parallel, the btreplay/btrecord and btreplay/btreplay
targets cause make to kick off two jobs for `make -C btreplay` and they
sometimes end up clobbering each other. We could fix this by making one
a dependency of the other, but it's a bit cleaner to refactor things to
be based on subdirs. This way changes in subdirs also get noticed:
$ touch btreplay/*.[ch]
$ make
<btreplay is now correctly updated>
Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Keep scanning the tree for overlapping IO otherwise Q2G and process
traces will be incorrect.
Let assume we have 2 IOs:
A A+a
|---------------------------------------|
B B+b
|-----------------|
In the red/black tree we have:
o -> [A,A+a]
/ \
left right
/ \
[...]o o -> [B, B+b]
In the current code, if we would not be able to find [B+b] in the tree:
B is greater than A, so we won't go left
B+b is smaller than A+a, so we are not going right either.
When we have a [X, X+x] IO to look for:
We need to check for right when either:
X+x >= A+a (for merged IO)
and
X > A (for overlapping IO)
TEST=Check with a trace with overlapping IO: Q2C and Q2G are expected.
Signed-off-by: Gwendal Grignou <gwendal@chromium.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If we fail doing the BLKTRACESETUP ioctl, blktrace still marches on
and sets up the rest. This results in errors like the below:
blktrace /dev/sdf
BLKTRACESETUP(2) /dev/sdf failed: 5/Input/output error
Thread 1 failed open /sys/kernel/debug/block/(null)/trace1: 2/No such file or directory
Thread 3 failed open /sys/kernel/debug/block/(null)/trace3: 2/No such file or directory
Thread 2 failed open /sys/kernel/debug/block/(null)/trace2: 2/No such file or directory
[...]
FAILED to start thread on CPU 0: 1/Operation not permitted
FAILED to start thread on CPU 1: 1/Operation not permitted
FAILED to start thread on CPU 2: 1/Operation not permitted
and blktrace continues to run, though it can't do anything in this
state.
If the ioctl setup fails, just abort.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When CPU number space is sparse, we don't start threads for non-existent
CPUs. As a result, there are no output files created for these CPUs
which confuses tools like blkparse which expect that CPU numbers are
contiguous. Create fake empty files for non-existent CPUs so that other
tools don't have to bother.
Note that in network mode, the server will create all files in the range
0..max_cpus automatically.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
We would like to generate output file name without having corresponding
iop structure. Reorganize the function to allow that. Also fix couple of
overflows possible when generating the file name when we are modifying
the code anyway.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
On some machines CPU numbers do not form a contiguous interval. In such
cases blktrace will fail to start threads for missing CPUs and exit
effectively rendering itself unusable.
Add support into blktrace to handle systems with sparse CPU numbers.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Some C libraries (notably uClibc) have the posix_spawn*() functions in
librt, so let's link iowatcher with -lrt.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
An earlier commit:
fb7f8667 blktrace: disable kill option - take 2
removed the "-k" option documentation, but left
it in the synopsis.
This is a bit unusual and unhelpful and probably
unintended; remove it from the synopsis as well.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Proper graph name is queue-depth, not queue_depth.
Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Trace label isn't properly separated with space from suffix (Read /
Write). Fix it.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
When user specifies trace files directly via -t option, it doesn't make
sense to prepend blktrace destination directory to them (it is
especially confusing if you specify absolute path names with -t option
and this logic breaks the path names). So avoid that.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Currently btt keeps the original IO in its RB-tree even if it sees new
IO that is beginning at the same sector. However such IO most likely
means that we have just lost the completion event for the IO that is
still in the tree. So in such case replacing the IO in RB-tree makes
more sense to avoid bogus IOs being reported as taking huge amount of
time.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Currently queue depth and latency graphs are generated from ISSUE and
COMPLETE events. For traces which miss the ISSUE events (e.g. from
device mapper) use QUEUE events instead. The result won't be as great
but it still conveys some useful information.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
When parsing blktrace data, process notify events even outside the
specified interval. This way we can learn about time stamps, process
names etc.
Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
So far we used maximum of the first trace for the maximum range of the
queue depth graph. Use maximum over all traces similarly as for other
line graphs.
Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Use maximum of rolling average as the upper range end for the line graph
to use better the available space in the plot.
Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Just replace the malloc/memset with a calloc().
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Using __DATE__ and __TIME__ will break reproducible builds. The
resulting binary will change with each rebuild even if the source and
toolchain is identical.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
is_reap_done() must also check that SIGINT or SIGTERM have come, or
we hang forever with such backtraces after Ctrl-C:
(gdb) thr a a bt
Thread 3 (Thread 0x7fbff8ff9700 (LWP 12607)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x0000000000402698 in replay_rec () at btreplay.c:1035
#2 0x00007fc001fe5454 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007fc001d1eecd in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x0000000000000000 in ?? ()
Thread 2 (Thread 0x7fbfea7fc700 (LWP 12611)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x0000000000402698 in replay_rec () at btreplay.c:1035
#2 0x00007fc001fe5454 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007fc001d1eecd in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x0000000000000000 in ?? ()
Thread 1 (Thread 0x7fc00282e700 (LWP 12597)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x0000000000402303 in __wait_cv () at btreplay.c:413
#2 0x0000000000401ae8 in main () at btreplay.c:426
Signed-off-by: Roman Pen <r.peniaev@gmail.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: <linux-btrace@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
getpid() is a pid of a process, at least tid must be provided.
But if zero is passed, then calling thread will be used.
That exactly what is needed.
Signed-off-by: Roman Pen <r.peniaev@gmail.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: <linux-btrace@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Size should be provided, not cpus number.
Signed-off-by: Roman Pen <r.peniaev@gmail.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: <linux-btrace@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Currently, blktrace uses _SC_NPROCESSORS_CONF to find out the number of
CPUs. This is a problem, because if you reduce the number of online
CPUs by passing kernel parameter maxcpus, then blktrace fails to start
with the error:
FAILED to start thread on CPU 4: 22/Invalid argument
FAILED to start thread on CPU 5: 22/Invalid argument
...
The attached patch fixes it to use _SC_NPROCESSORS_ONLN.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Avoids the build failures when sys/types.h does not get included
indirectly through other headers.
Signed-off-by: Khem Raj <raj.khem@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
In get_ncpus, we default to using 4096 CPUs if _SC_NPROCESSORS_CONF isn't
enabled. If that is insufficient, sched_getaffinity will fail and we
retry after doubling the size of the cpu_set_t allocation. There's a typo
in there that means we don't actually double the size and will loop
forever allocating the same sized cpu_set_t instead.
Signed-off-by: Josef Cejka <jcejka@suse.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Kills the errors on unchecked return of system()
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Andrew says:
Here are some trivial tweaks which I found were needed or desirable while
adding iowatcher to the blktrace packaging in Fedora. They improve the
integration of iowatcher into the tree and reduce duplication of docs.
|
|
Signed-off-by: Jens Axboe <axboe@fb.com>
Conflicts:
iowatcher/Makefile
|
|
Merge the requirements bits of iowatcher/README into README
Signed-off-by: Andrew Price <anprice@redhat.com>
|
|
Signed-off-by: Chris Mason <clm@fb.com>
|
|
We were setting C=gcc instead of CC=gcc, and using -O0. Fix both.
Signed-off-by: Chris Mason <clm@fb.com>
|
|
This README is getting out-of-date and its contents are duplicated in
the iowatcher manpage which is up-to-date, so remove it to reduce
duplication of effort.
Signed-off-by: Andrew Price <anprice@redhat.com>
|
|
iowatcher's manpage wasn't being installed with the other manpages so
add it to the doc directory.
Signed-off-by: Andrew Price <anprice@redhat.com>
|
|
Signed-off-by: Andrew Price <anprice@redhat.com>
|