2019-09-16btreplay: fix device IO remap functionalityIgnat Korchagin
Commit dd093eb1c48e ("Fix warnings on newer gcc") moved string buffers holding device names during map file parse stage to stack. However, only pointers to them are being stored in the allocated "struct map_dev" structure. These pointers are invalid outside of scope of this function and in a different thread context. Also "release_map_devs" function still tries to "free" them later as if they were allocated on the heap. Moving the buffers back to the heap by instructing "fscanf" to allocate them while parsing the file. Alternatively, we could redefine the "struct map_dev" to include the whole buffers instead of just pointers to them and free them as part of releasing the whole "struct map_dev". Fixes: dd093eb1c48e ("Fix warnings on newer gcc") Signed-off-by: Ignat Korchagin <> Signed-off-by: Jens Axboe <>
2019-05-21blkparse: add support sort program by io eventWeiping Zhang
Displays each program's data sorted by program name or io event, like Queued, Read, Write and Complete. When -S is specified the -s will be ignored. The capital letters Q,R,W,C stand for KB, then q/r/w/c stand for IO. The N is used for sorting programs by name, same to -s. If you want to sort programs by how many data they queued, you can use: blkparse -i sda.blktrace. -q -S Q -o sda.parse Signed-off-by: Weiping Zhang <> Signed-off-by: Jens Axboe <>
2018-08-31iowatcher: spawn NPROCESSORS_ONLN for rsvg-convert-sJeff Moyer
iowatcher currently always spawns 8 rsvg-convert processes, no matter how many CPUs a system has. I did some limited testing of different numbers of rsvg-convert processes. Here are the results: 8 processes: real 4m2.194s user 23m36.665s sys 0m38.523s 20 processes: real 2m28.935s user 24m51.817s sys 0m49.227s 40 processes: real 2m28.150s user 24m56.994s sys 0m49.621s Note that this is the time it takes for a full run of iowatcher -- I didn't separate out just the rsvg-convert portion. Given the above results, it seems like a reasonable thing to spawn one rsvg-convert process per cpu. Signed-off-by: Jeff Moyer <> Signed-off-by: Jens Axboe <>
2018-08-30iowatcher: don't add Q events to the io hashJeff Moyer
Hi, Bryan Gurney reported iowatcher taking a *really* long time to generate a movie for 16GiB worth of trace data. I took a look, and the io hash was growing without bounds. The reason was that the I/O pattern looks like this: 259,0 5 8 0.000208501 31708 A W 5435592 + 8 <- (259,1) 5433544 259,1 5 9 0.000209537 31708 Q W 5435592 + 8 [kvdo30:bioQ0] 259,1 5 10 0.000209880 31708 G W 5435592 + 8 [kvdo30:bioQ0] 259,0 5 11 0.000211064 31708 A W 5435600 + 8 <- (259,1) 5433552 259,1 5 12 0.000211347 31708 Q W 5435600 + 8 [kvdo30:bioQ0] 259,1 5 13 0.000212957 31708 M W 5435600 + 8 [kvdo30:bioQ0] 259,0 5 14 0.000213379 31708 A W 5435608 + 8 <- (259,1) 5433560 259,1 5 15 0.000213629 31708 Q W 5435608 + 8 [kvdo30:bioQ0] 259,1 5 16 0.000213937 31708 M W 5435608 + 8 [kvdo30:bioQ0] ... 259,1 5 107 0.000246274 31708 D W 5435592 + 256 [kvdo30:bioQ0] For each of those Q events, an entry was created in the io_hash. Then, upon I/O completion, only the first event (with the right starting sector) was removed! The runtime overhead of just iterating the hash chains was enormous. The solution is to simply ignore the Q events, so long as there are D events in the trace. If there are no D events, then go ahead and hash the Q events as before. I'm hoping that if we only have Q and C, that they will actually be aligned. If that's an incorrect assumption, we could account merges in an rbtree. I'll defer that work until someone can show me blktrace data that needs it. The comments should be self explanatory. Review would be appreciated as the code isn't well documented, and I don't know if I'm missing some hidden assumption about the data. Before applying this patch, iowatcher would take more than 12 hours to complete. After the patch: real 9m44.476s user 41m35.426s sys 3m29.106s 'nuf said. Cheers, Jeff Reviewed-by: Chris Mason <> Signed-off-by: Jeff Moyer <> Signed-off-by: Jens Axboe <>
2018-05-16make btt scripts python3-readyEric Sandeen
Many distributions are moving to python3 by default. Here's an attempt to make the python scripts in blktrace python3-ready. Most of this was done with automated tools. I hand fixed some space-vs tab issues, and cast an array index to integer. It passes rudimentary testing when run under python2.7 as well as python3. This doesn't do anything with the shebangs, it leaves them both invoking whatever "env python" coughs up on the system. Signed-off-by: Eric Sandeen <> Signed-off-by: Jens Axboe <>
2018-05-02btt: make device/devno use PATH_MAX to avoid overflowJens Axboe
Herbo Zhang reports: I found a bug in blktrace/btt/devmap.c. The code is just as follows: struct devmap { struct list_head head; char device[32], devno[32]; // #1 }; LIST_HEAD(all_devmaps); static int dev_map_add(char *line) { struct devmap *dmp; if (strstr(line, "Device") != NULL) return 1; dmp = malloc(sizeof(struct devmap)); if (sscanf(line, "%s %s", dmp->device, dmp->devno) != 2) { //#2 free(dmp); return 1; } list_add_tail(&dmp->head, &all_devmaps); return 0; } int dev_map_read(char *fname) { char line[256]; // #3 FILE *fp = my_fopen(fname, "r"); if (!fp) { perror(fname); return 1; } while (fscanf(fp, "%255[a-zA-Z0-9 :.,/_-]\n", line) == 1) { if (dev_map_add(line)) break; } fclose(fp); return 0; } The line length is 256, but the dmp->device, dmp->devno max length is only 32. We can put strings longer than 32 into dmp->device and dmp->devno , and then they will be overflowed. we can trigger this bug just as follows: $ python -c "print 'A'*256" > ./test $ btt -M ./test *** Error in btt': free(): invalid next size (fast): 0x000055ad7349b250 *** ======= Backtrace: ========= /lib/x86_64-linux-gnu/[0x7f7f158ce7e5] /lib/x86_64-linux-gnu/[0x7f7f158d6e0a] /lib/x86_64-linux-gnu/[0x7f7f158da98c] btt(+0x32e0)[0x55ad7306f2e0] btt(+0x2c5f)[0x55ad7306ec5f] btt(+0x251f)[0x55ad7306e51f] /lib/x86_64-linux-gnu/[0x7f7f15877830] btt(+0x26b9)[0x55ad7306e6b9] ======= Memory map: ======== 55ad7306c000-55ad7307f000 r-xp 00000000 08:14 3698139 /usr/bin/btt 55ad7327e000-55ad7327f000 r--p 00012000 08:14 3698139 /usr/bin/btt 55ad7327f000-55ad73280000 rw-p 00013000 08:14 3698139 /usr/bin/btt 55ad73280000-55ad73285000 rw-p 00000000 00:00 0 55ad7349a000-55ad734bb000 rw-p 00000000 00:00 0 [heap] 7f7f10000000-7f7f10021000 rw-p 00000000 00:00 0 7f7f10021000-7f7f14000000 ---p 00000000 00:00 0 7f7f15640000-7f7f15656000 r-xp 00000000 08:14 14942237 /lib/x86_64-linux-gnu/ 7f7f15656000-7f7f15855000 ---p 00016000 08:14 14942237 /lib/x86_64-linux-gnu/ 7f7f15855000-7f7f15856000 r--p 00015000 08:14 14942237 /lib/x86_64-linux-gnu/ 7f7f15856000-7f7f15857000 rw-p 00016000 08:14 14942237 /lib/x86_64-linux-gnu/ 7f7f15857000-7f7f15a16000 r-xp 00000000 08:14 14948477 /lib/x86_64-linux-gnu/ 7f7f15a16000-7f7f15c16000 ---p 001bf000 08:14 14948477 /lib/x86_64-linux-gnu/ 7f7f15c16000-7f7f15c1a000 r--p 001bf000 08:14 14948477 /lib/x86_64-linux-gnu/ 7f7f15c1a000-7f7f15c1c000 rw-p 001c3000 08:14 14948477 /lib/x86_64-linux-gnu/ 7f7f15c1c000-7f7f15c20000 rw-p 00000000 00:00 0 7f7f15c20000-7f7f15c46000 r-xp 00000000 08:14 14948478 /lib/x86_64-linux-gnu/ 7f7f15e16000-7f7f15e19000 rw-p 00000000 00:00 0 7f7f15e42000-7f7f15e45000 rw-p 00000000 00:00 0 7f7f15e45000-7f7f15e46000 r--p 00025000 08:14 14948478 /lib/x86_64-linux-gnu/ 7f7f15e46000-7f7f15e47000 rw-p 00026000 08:14 14948478 /lib/x86_64-linux-gnu/ 7f7f15e47000-7f7f15e48000 rw-p 00000000 00:00 0 7ffdebe5c000-7ffdebe7d000 rw-p 00000000 00:00 0 [stack] 7ffdebebc000-7ffdebebe000 r--p 00000000 00:00 0 [vvar] 7ffdebebe000-7ffdebec0000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] [1] 6272 abort btt -M test Signed-off-by: Jens Axboe <>
2018-04-09blkparse: add documetation for 'R' requeue requestWeiping Zhang
Signed-off-by: Weiping Zhang <> Signed-off-by: Jens Axboe <>
2018-04-09blkparse: remove duplicated entry for flag MWeiping Zhang
remove dupliated entry 'M' for man page of blkparse. Reviewed-by: Steffen Maier <> Signed-off-by: Weiping Zhang <> Signed-off-by: Jens Axboe <>
2018-01-24blktrace: don't stop tracer if not setup trace successfullyweiping zhang
if we run blktrace on same device twice, the second time will failed to ioctl(BLKTRACESETUP), then it will call __stop_tracer, which lead the first blktrace failed to access debugfs entries. So this patch add a check to handle this case, to avoid stop tracer uncondionally. Signed-off-by: weiping zhang <> Signed-off-by: Jens Axboe <>
2018-01-23fix parallel build failuresRobin H. Johnson
When building in parallel, the btreplay/btrecord and btreplay/btreplay targets cause make to kick off two jobs for `make -C btreplay` and they sometimes end up clobbering each other. We could fix this by making one a dependency of the other, but it's a bit cleaner to refactor things to be based on subdirs. This way changes in subdirs also get noticed: $ touch btreplay/*.[ch] $ make <btreplay is now correctly updated> Signed-off-by: Robin H. Johnson <> Signed-off-by: Mike Frysinger <> Signed-off-by: Jens Axboe <>
2018-01-23respect LDFLAGS when linking programsRobin H. Johnson
Signed-off-by: Robin H. Johnson <> Signed-off-by: Mike Frysinger <> Signed-off-by: Jens Axboe <>
2017-11-07btt: Fix overlapping IO stats.Gwendal Grignou
Keep scanning the tree for overlapping IO otherwise Q2G and process traces will be incorrect. Let assume we have 2 IOs: A A+a |---------------------------------------| B B+b |-----------------| In the red/black tree we have: o -> [A,A+a] / \ left right / \ [...]o o -> [B, B+b] In the current code, if we would not be able to find [B+b] in the tree: B is greater than A, so we won't go left B+b is smaller than A+a, so we are not going right either. When we have a [X, X+x] IO to look for: We need to check for right when either: X+x >= A+a (for merged IO) and X > A (for overlapping IO) TEST=Check with a trace with overlapping IO: Q2C and Q2G are expected. Signed-off-by: Gwendal Grignou <> Signed-off-by: Jens Axboe <>
2017-11-05btt/devs: silence warning on sprintf overflowJens Axboe
Signed-off-by: Jens Axboe <>
2017-11-05jhash: fix annoying gcc fall through warningsJens Axboe
Signed-off-by: Jens Axboe <>
2017-11-04Blktrace 1.2.0blktrace-1.2.0Jens Axboe
Signed-off-by: Jens Axboe <>
2017-11-04blktrace: abort if device ioctl setup failsJens Axboe
If we fail doing the BLKTRACESETUP ioctl, blktrace still marches on and sets up the rest. This results in errors like the below: blktrace /dev/sdf BLKTRACESETUP(2) /dev/sdf failed: 5/Input/output error Thread 1 failed open /sys/kernel/debug/block/(null)/trace1: 2/No such file or directory Thread 3 failed open /sys/kernel/debug/block/(null)/trace3: 2/No such file or directory Thread 2 failed open /sys/kernel/debug/block/(null)/trace2: 2/No such file or directory [...] FAILED to start thread on CPU 0: 1/Operation not permitted FAILED to start thread on CPU 1: 1/Operation not permitted FAILED to start thread on CPU 2: 1/Operation not permitted and blktrace continues to run, though it can't do anything in this state. If the ioctl setup fails, just abort. Signed-off-by: Jens Axboe <>
2017-01-26blktrace: Create empty output files for non-existent cpusJan Kara
When CPU number space is sparse, we don't start threads for non-existent CPUs. As a result, there are no output files created for these CPUs which confuses tools like blkparse which expect that CPU numbers are contiguous. Create fake empty files for non-existent CPUs so that other tools don't have to bother. Note that in network mode, the server will create all files in the range 0..max_cpus automatically. Signed-off-by: Jan Kara <> Signed-off-by: Jens Axboe <>
2017-01-26blktrace: Reorganize creation of output file nameJan Kara
We would like to generate output file name without having corresponding iop structure. Reorganize the function to allow that. Also fix couple of overflows possible when generating the file name when we are modifying the code anyway. Signed-off-by: Jan Kara <> Signed-off-by: Jens Axboe <>
2017-01-26blktrace: Add support for sparse CPU numbersJan Kara
On some machines CPU numbers do not form a contiguous interval. In such cases blktrace will fail to start threads for missing CPUs and exit effectively rendering itself unusable. Add support into blktrace to handle systems with sparse CPU numbers. Signed-off-by: Jan Kara <> Signed-off-by: Jens Axboe <>
2016-08-23iowatcher: link with -lrtThomas Petazzoni
Some C libraries (notably uClibc) have the posix_spawn*() functions in librt, so let's link iowatcher with -lrt. Signed-off-by: Thomas Petazzoni <> Signed-off-by: Jens Axboe <>
2016-05-19blktrace: remove -k from manpage synopsisEric Sandeen
An earlier commit: fb7f8667 blktrace: disable kill option - take 2 removed the "-k" option documentation, but left it in the synopsis. This is a bit unusual and unhelpful and probably unintended; remove it from the synopsis as well. Signed-off-by: Eric Sandeen <> Signed-off-by: Jens Axboe <>
2016-05-05Fixup graph name in help textJan Kara
Proper graph name is queue-depth, not queue_depth. Signed-off-by: Jan Kara <> Signed-off-by: Jens Axboe <>
2016-05-05Separate prefix in legend with spaceJan Kara
Trace label isn't properly separated with space from suffix (Read / Write). Fix it. Signed-off-by: Jan Kara <> Signed-off-by: Jens Axboe <>
2016-05-05Don't prepend blktrace destination dir if we didn't run blktraceJan Kara
When user specifies trace files directly via -t option, it doesn't make sense to prepend blktrace destination directory to them (it is especially confusing if you specify absolute path names with -t option and this logic breaks the path names). So avoid that. Signed-off-by: Jan Kara <> Signed-off-by: Jens Axboe <>
2016-05-05Zero sectors are strangeJan Kara
Signed-off-by: Jan Kara <> Signed-off-by: Jens Axboe <>
2016-05-05btt: Replace overlapping IOJan Kara
Currently btt keeps the original IO in its RB-tree even if it sees new IO that is beginning at the same sector. However such IO most likely means that we have just lost the completion event for the IO that is still in the tree. So in such case replacing the IO in RB-tree makes more sense to avoid bogus IOs being reported as taking huge amount of time. Signed-off-by: Jan Kara <> Signed-off-by: Jens Axboe <>
2016-05-05iowatcher: Use queue events if issue not availableJan Kara
Currently queue depth and latency graphs are generated from ISSUE and COMPLETE events. For traces which miss the ISSUE events (e.g. from device mapper) use QUEUE events instead. The result won't be as great but it still conveys some useful information. Signed-off-by: Jan Kara <> Signed-off-by: Jens Axboe <>
2016-05-05Process notify events outside of given intervalJan Kara
When parsing blktrace data, process notify events even outside the specified interval. This way we can learn about time stamps, process names etc. Signed-off-by: Jan Kara <> Signed-off-by: Jens Axboe <>
2016-05-05Use maximum over all traces for queue depthJan Kara
So far we used maximum of the first trace for the maximum range of the queue depth graph. Use maximum over all traces similarly as for other line graphs. Signed-off-by: Jan Kara <> Signed-off-by: Jens Axboe <>
2016-05-05Better max estimate for line graphsJan Kara
Use maximum of rolling average as the upper range end for the line graph to use better the available space in the plot. Signed-off-by: Jan Kara <> Signed-off-by: Jens Axboe <>
2016-05-03btt/unplug_hist: fix bad memsetJens Axboe
Just replace the malloc/memset with a calloc(). Signed-off-by: Jens Axboe <>
2016-04-25btreplay: remove timestampsOlaf Hering
Using __DATE__ and __TIME__ will break reproducible builds. The resulting binary will change with each rebuild even if the source and toolchain is identical. Signed-off-by: Olaf Hering <> Signed-off-by: Jens Axboe <>
2016-04-25btreplay: make Ctrl-C workRoman Pen
is_reap_done() must also check that SIGINT or SIGTERM have come, or we hang forever with such backtraces after Ctrl-C: (gdb) thr a a bt Thread 3 (Thread 0x7fbff8ff9700 (LWP 12607)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x0000000000402698 in replay_rec () at btreplay.c:1035 #2 0x00007fc001fe5454 in start_thread () from /lib/x86_64-linux-gnu/ #3 0x00007fc001d1eecd in ?? () from /lib/x86_64-linux-gnu/ #4 0x0000000000000000 in ?? () Thread 2 (Thread 0x7fbfea7fc700 (LWP 12611)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x0000000000402698 in replay_rec () at btreplay.c:1035 #2 0x00007fc001fe5454 in start_thread () from /lib/x86_64-linux-gnu/ #3 0x00007fc001d1eecd in ?? () from /lib/x86_64-linux-gnu/ #4 0x0000000000000000 in ?? () Thread 1 (Thread 0x7fc00282e700 (LWP 12597)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x0000000000402303 in __wait_cv () at btreplay.c:413 #2 0x0000000000401ae8 in main () at btreplay.c:426 Signed-off-by: Roman Pen <> Cc: Jens Axboe <> Cc: <> Signed-off-by: Jens Axboe <>
2016-04-25btreplay: fix sched_{set|get}affinityRoman Pen
getpid() is a pid of a process, at least tid must be provided. But if zero is passed, then calling thread will be used. That exactly what is needed. Signed-off-by: Roman Pen <> Cc: Jens Axboe <> Cc: <> Signed-off-by: Jens Axboe <>
2016-04-25btreplay: fix memory corruption caused by CPU_ZERO_SRoman Pen
Size should be provided, not cpus number. Signed-off-by: Roman Pen <> Cc: Jens Axboe <> Cc: <> Signed-off-by: Jens Axboe <>
2016-02-09blktrace: Use number of online CPUsAbutalib Aghayev
Currently, blktrace uses _SC_NPROCESSORS_CONF to find out the number of CPUs. This is a problem, because if you reduce the number of online CPUs by passing kernel parameter maxcpus, then blktrace fails to start with the error: FAILED to start thread on CPU 4: 22/Invalid argument FAILED to start thread on CPU 5: 22/Invalid argument ... The attached patch fixes it to use _SC_NPROCESSORS_ONLN. Signed-off-by: Jens Axboe <>
2016-01-08Add the "-a discard" filter option to the blktrace.8 man pageJohn Groves
Signed-off-by: Jens Axboe <>
2015-09-15Fix warnings on newer gccJens Axboe
Signed-off-by: Jens Axboe <>
2015-09-15include sys/types.h for dev_t definitionKhem Raj
Avoids the build failures when sys/types.h does not get included indirectly through other headers. Signed-off-by: Khem Raj <> Signed-off-by: Jens Axboe <>
2015-08-20btreplay: Fix typo in scaling up the dynamic cpu set size.Josef Cejka
In get_ncpus, we default to using 4096 CPUs if _SC_NPROCESSORS_CONF isn't enabled. If that is insufficient, sched_getaffinity will fail and we retry after doubling the size of the cpu_set_t allocation. There's a typo in there that means we don't actually double the size and will loop forever allocating the same sized cpu_set_t instead. Signed-off-by: Josef Cejka <> Signed-off-by: Jeff Mahoney <> Signed-off-by: Jens Axboe <>
2015-02-18Refer to sda instead of hda in man pagesOlaf Hering
Signed-off-by: Olaf Hering <> Signed-off-by: Jens Axboe <>
2014-09-25iowatcher: wrap system() in a checker functionJens Axboe
Kills the errors on unchecked return of system() Signed-off-by: Jens Axboe <>
2014-09-25Merge branch 'for-upstream' of Axboe
Andrew says: Here are some trivial tweaks which I found were needed or desirable while adding iowatcher to the blktrace packaging in Fedora. They improve the integration of iowatcher into the tree and reduce duplication of docs.
2014-09-25Merge git:// Axboe
Signed-off-by: Jens Axboe <> Conflicts: iowatcher/Makefile
2014-09-25Add iowatcher requirements to READMEAndrew Price
Merge the requirements bits of iowatcher/README into README Signed-off-by: Andrew Price <>
2014-09-25iowatcher: check the return value from write()Chris Mason
Signed-off-by: Chris Mason <>
2014-09-25iowatcher: fixup the MakefileChris Mason
We were setting C=gcc instead of CC=gcc, and using -O0. Fix both. Signed-off-by: Chris Mason <>
2014-09-25iowatcher: Remove iowatcher/READMEAndrew Price
This README is getting out-of-date and its contents are duplicated in the iowatcher manpage which is up-to-date, so remove it to reduce duplication of effort. Signed-off-by: Andrew Price <>
2014-09-25iowatcher: Move iowatcher.1 into doc directoryAndrew Price
iowatcher's manpage wasn't being installed with the other manpages so add it to the doc directory. Signed-off-by: Andrew Price <>
2014-09-25iowatcher: Add iowatcher to .gitignoreAndrew Price
Signed-off-by: Andrew Price <>