path: root/blktrace.c
AgeCommit message (Collapse)Author
2021-06-28blktrace: exit directly when nthreads_running != ncpus in run_tracers()lijinlin
We found blktrace got stuck when cgroup restricts blktrace to use cpu, the messages and stack is: [root@localhost ~]# blktrace -w 10 -o- /dev/sda FAILED to start thread on CPU 1: 22/Invalid argument FAILED to start thread on CPU 2: 22/Invalid argument [root@localhost ~]# cat /proc/1385110/stack [<0>] __switch_to+0xe8/0x150 [<0>] futex_wait_queue_me+0xd4/0x158 [<0>] futex_wait+0xf4/0x230 [<0>] do_futex+0x470/0x900 [<0>] __arm64_sys_futex+0x13c/0x188 [<0>] el0_svc_common+0x80/0x200 [<0>] el0_svc_handler+0x78/0xe0 [<0>] el0_svc+0x10/0x260 [<0>] 0xffffffffffffffff Blktrace failed to start thread is caused by thread can't lock on the Restricted cpu. In this case, blktrace would't schedule an alarm after defined time to set variable 'done' as 1. We debug the code and found the call trace as bellow: main() ==>run_tracers() ==>wait_tracers() ==>process_trace_bufs() ==>wait_empty_entries() ==>t_pthread_cond_wait() Blktrace was set to piped output, so the process is stuck in wait_empty_entries() for wait variable 'done' have been set as 1. We set variable 'done' as 1 when 'nthreads_running' is not equal to 'ncpus' in run_tracers() to fix the problem. Signed-off-by: lijinlin <> Signed-off-by: Zhiqiang Liu <> Signed-off-by: Lixiaokeng <> Signed-off-by: Jens Axboe <>
2021-02-19blktrace: inclusive terminologyEric Sandeen
Use more inclusive terminology in a couple places. Signed-off-by: Eric Sandeen <> Signed-off-by: Jens Axboe <>
2018-01-24blktrace: don't stop tracer if not setup trace successfullyweiping zhang
if we run blktrace on same device twice, the second time will failed to ioctl(BLKTRACESETUP), then it will call __stop_tracer, which lead the first blktrace failed to access debugfs entries. So this patch add a check to handle this case, to avoid stop tracer uncondionally. Signed-off-by: weiping zhang <> Signed-off-by: Jens Axboe <>
2017-11-04blktrace: abort if device ioctl setup failsJens Axboe
If we fail doing the BLKTRACESETUP ioctl, blktrace still marches on and sets up the rest. This results in errors like the below: blktrace /dev/sdf BLKTRACESETUP(2) /dev/sdf failed: 5/Input/output error Thread 1 failed open /sys/kernel/debug/block/(null)/trace1: 2/No such file or directory Thread 3 failed open /sys/kernel/debug/block/(null)/trace3: 2/No such file or directory Thread 2 failed open /sys/kernel/debug/block/(null)/trace2: 2/No such file or directory [...] FAILED to start thread on CPU 0: 1/Operation not permitted FAILED to start thread on CPU 1: 1/Operation not permitted FAILED to start thread on CPU 2: 1/Operation not permitted and blktrace continues to run, though it can't do anything in this state. If the ioctl setup fails, just abort. Signed-off-by: Jens Axboe <>
2017-01-26blktrace: Create empty output files for non-existent cpusJan Kara
When CPU number space is sparse, we don't start threads for non-existent CPUs. As a result, there are no output files created for these CPUs which confuses tools like blkparse which expect that CPU numbers are contiguous. Create fake empty files for non-existent CPUs so that other tools don't have to bother. Note that in network mode, the server will create all files in the range 0..max_cpus automatically. Signed-off-by: Jan Kara <> Signed-off-by: Jens Axboe <>
2017-01-26blktrace: Reorganize creation of output file nameJan Kara
We would like to generate output file name without having corresponding iop structure. Reorganize the function to allow that. Also fix couple of overflows possible when generating the file name when we are modifying the code anyway. Signed-off-by: Jan Kara <> Signed-off-by: Jens Axboe <>
2017-01-26blktrace: Add support for sparse CPU numbersJan Kara
On some machines CPU numbers do not form a contiguous interval. In such cases blktrace will fail to start threads for missing CPUs and exit effectively rendering itself unusable. Add support into blktrace to handle systems with sparse CPU numbers. Signed-off-by: Jan Kara <> Signed-off-by: Jens Axboe <>
2016-02-09blktrace: Use number of online CPUsAbutalib Aghayev
Currently, blktrace uses _SC_NPROCESSORS_CONF to find out the number of CPUs. This is a problem, because if you reduce the number of online CPUs by passing kernel parameter maxcpus, then blktrace fails to start with the error: FAILED to start thread on CPU 4: 22/Invalid argument FAILED to start thread on CPU 5: 22/Invalid argument ... The attached patch fixes it to use _SC_NPROCESSORS_ONLN. Signed-off-by: Jens Axboe <>
2014-09-08signal condition variable at end of stop_tracersRobert Schiele
stop_tracers modifies tp->is_done and thus must signal the condition variable tracer_wait_unblock is waiting on to monitor tp->is_done. Not doing so might cause the tool to deadlock if stop_tracers is called while a tracer thread is in tracer_wait_unblock. Signed-off-by: Robert Schiele <> Signed-off-by: Jens Axboe <>
2013-08-01blktrace blkreplay: convert to use a dynamic cpu_set_tNathan Zimmer
Some distros have changed CPU_SETSIZE in glibc to 4096 since that matches the NR_CPUS in the linux kernel config file. Some distros have decided to leave CPU_SETSIZE at 1024. This is a problem if you want to run that distro on a very large machine. CPU_SETSIZE is use by the struct cpu_set_t. This means you to deal with cpus greater the 1024 you must use the dynamic cpu sets, which involves converting from things like CPU_SET to CPU_SET_S. Cc: Jens Axboe <> Modified by Jens to fix the CPU_{SET,ZERO}_S pointer mixup. Signed-off-by: Nathan Zimmer <> Signed-off-by: Jens Axboe <>
2013-08-01blktrace: use number of configured cpus instead of online cpusNathan Zimmer
We want to run on all online processors. However is there is a hole in the online cpumask this won't happen. We need the number of configured processors instead of online. Cc: Jens Axboe <> Signed-off-by: Nathan Zimmer <> Signed-off-by: Jens Axboe <>
2012-02-01avoid string overflowsEric Sandeen
Several places using strcpy would benefit from strncpy for safety. Signed-off-by: Eric Sandeen <> Signed-off-by: Jens Axboe <>
2012-02-01blktrace: remove unused variableEric Sandeen
sp was being incremented w/o initialization, but thankfully not used otherwise. Just remove it. Signed-off-by: Eric Sandeen <> Signed-off-by: Jens Axboe <>
2012-02-01Close stream in 'I' switch handlingEric Sandeen
The file containing the list of devices was never closed after processing was complete. Signed-off-by: Eric Sandeen <> Signed-off-by: Jens Axboe <>
2012-02-01Check setvbuf return valueEric Sandeen
Check for setvbuf failure. Signed-off-by: Eric Sandeen <> Signed-off-by: Jens Axboe <>
2012-01-31Fix for realloc bug and wrong error loggingMikulas Patocka
This patch fixes two bugs in blktrace. 1. realloc is called on a wrong memory address (glibc reports heap corruption if the user sends the output to a pipe, for example "blktrace /dev/sdc -o -"). 2. errno 0 is actually reported if debugfs is not mounted Mikulas Signed-off-by: Jens Axboe <>
2011-03-16blktrace: Document default values for -b and -nJustin TerAvest
To help users better deal with the log message "You have dropped events, consider using a larger buffer size (-b)", it's helpful to list the defaults for sub buffer management, without flags. Signed-off-by: Justin TerAvest <> Signed-off-by: Jens Axboe <>
2011-02-09blktrace: remove unused idx from devpath.Tao Ma
idx isn't used, so remove it. Cc: Jens Axboe <> Signed-off-by: Tao Ma <> Signed-off-by: Jens Axboe <>
2011-02-09blktrace: break mlock in case of is_done.Tao Ma
In 38-rc2, there is a bug in mlock which will return error in mlock of blktrace(I have sent the corresponding patch to the lkml). So when we try to break the blktrace by "ctrl+c", mlock will loop forever and in the end, I have to use "kill -9" to kill it and then run "blktrace -k" to stop the tracer. I don't think it is good. How to reproduce it is simple: Use a 38-rc kernel, and run blktrace /dev/sdx then use "ctrl+c", it doesn't exit. So this patch adds the check for tp->is_done. In case of is_done is set, break mlock so that we don't deadloop in the mlock. In case of the real mlock error, I will let it to retry 10 times and it should succeed after 10 tries in case of tp->is_done. If tp isn't set or tp->is_done isn't set, it works like the original design. Cc: Jens Axboe <> Signed-off-by: Tao Ma <> Signed-off-by: Jens Axboe <>
2010-10-22blktrace: blktrace documentation updateEdward Shishkin
Fixup for RH bugzilla 595620. Document undocumented blktrace options. Update the man pages. Signed-off-by: Edward Shishkin <> Signed-off-by: Jens Axboe <>
2010-09-16blktrace: disallow -o when using multiple devicesAlan D. Brunelle
Document that "-o" does not work when specyfing multiple devices to blktrace, also: enforce this by stopping blktrace when one tries do do this. The technical reason why "-o" doesn't work with multiple devices is because we use multiple threads of execution - one per device/CPU pair - and each of them opens a file named "<prefix>.blktrace.<cpu>". With the "-o" all of the "<prefix>" values are the same - so multiple threads open the same file and try to do output. Not good. Without the "-o" we get unique files named: "<device>.blktrace.<cpu>" - as the tuple (<device>,<cpu>) is unique. Signed-off-by: Alan D. Brunelle <>
2010-04-20blktrace: disable kill option - take 2Edward Shishkin
Fixup for 513950. Problem: 'blktrace -d <device> -k' does not kill a running backgound trace. Executing 'blktrace -d <device> -k' for the second time results in "BLKTRACETEARDOWN: Invalid argument" message and then each run of blktrace on that machine prints the following output: BLKTRACESETUP: No such file or directory. The bug: The option -k results in clobbering information about running trace by kernel (blk_trace_remove), while resources (files open in debugfs by the running background blktrace) are not released. Solution: Update documentation: Undocument the non-working "kill" option. Advise to send SIGINT signall via kill(1) to the running background blktrace for its correct termination. Signed-off-by: Edward Shishkin <> Signed-off-by: Jens Axboe <>
2010-04-20blktrace: print correct usageEdward Shishkin
Fixup for 498898: Problem: When somebody runs blktrace without parameters, it shows the usage message. The usage message suggests that version number "x.y.z" is a required parameter, which is not true. Solution: Don't print version number when running blktrace, blkparce, btt without parameters. Signed-off-by: Edward Shishkin <> Signed-off-by: Jens Axboe <>
2010-04-20blktrace: avoid device duplicationEdward Shishkin
Fixup for bz 501457. Problem: If the device list file contains the same device as supplied on the command line, blktrace stops immediately and further I/O tracing is impossible. Bug: device duplication in the devpaths ends with programm termination (BLKTRACESETUP ioctl returns error) while resources (open files in debugfs) are not released. Solution: Make sure devices are not duplicated in devpaths pool. Signed-off-by: Edward Shishkin <> Signed-off-by: Jens Axboe <>
2009-04-17handle race to mkdir at startupJeff Moyer
I ran into a problem when specifying -D dirname-that-doesnt-yet-exist. Blktrace would fail, spewing the following messages: [root@megadeth blktrace]# ./blktrace -d /dev/cciss/c0d1 -D ./2.6.30-rc2-cfq-local Destination dir ./2.6.30-rc2-cfq-local/ can't be made: 17/File exists Destination dir ./2.6.30-rc2-cfq-local/ can't be made: 17/File exists Destination dir ./2.6.30-rc2-cfq-local/ can't be made: 17/File exists Destination dir ./2.6.30-rc2-cfq-local/ can't be made: 17/File exists Destination dir ./2.6.30-rc2-cfq-local/ can't be made: 17/File exists FAILED to start thread on CPU 0: 1/Operation not permitted FAILED to start thread on CPU 4: 1/Operation not permitted FAILED to start thread on CPU 5: 1/Operation not permitted FAILED to start thread on CPU 6: 1/Operation not permitted FAILED to start thread on CPU 7: 1/Operation not permitted I tracked it down to the fact that there is no synchronization between threads when trying to create the output directory. The fix is simple, just allow the race to happen and detect it. It's not really worth putting in any extra synchronization. It looks like no place else in that startup path needs synchronization either. This patch fixes the issue for me. I tested it by running the very command that caused me headaches 100% of the time before. I also did a chattr +i on the directory and verified that it would really fail in the case where it couldn't create the directory. Signed-off-by: Jens Axboe <>
2009-03-23Blktrace failed to lock reader threads on the cpu used by the correspondingTom Zanussi
writer. This resulted in stale data being consumed when blktrace accidently read at a position that was being written to at the same time. This issue surfaced as "bad trace magic" warnings emitted by blktrace tools. The problem occured on an SMP System z machine. The patch fixes the issue. Signed-off-by: Martin Peschke <> Signed-off-by: Jens Axboe <axboe@carl.(none)>
2009-02-17Fixed EAGAIN handling in blktrace.cAlan D. Brunelle
EAGAIN was causing header failures in network mode. Added in a usleep and retried the recv(). Signed-off-by: Alan D. Brunelle <>
2009-02-12Code review updatesAlan D. Brunelle
Re-coding large functions, re-arranging some stuff.
2009-02-12Reworked blktrace master/thread interfaceAlan D. Brunelle
Allows parallel initializations. Signed-off-by: Alan D. Brunelle <>
2009-02-11Moved starting of tracing after tracers are goingAlan D. Brunelle
Hold off BLKTRACESTART to threads are ready to consume tracers.
2009-02-11Synchronized trace gatheringAlan D. Brunelle
Previously, each tracer thread would start gathering traces as soon as it got going - which might slow down later thread start ups. This change allows each thread to be ready to gather traces, and then the main thread starts all the threads gathering at the same time.
2009-02-11Invoke gethostbyname once, handle errors betterAlan D. Brunelle
Instead of invoking gethostbyname once per client, we only need to do it once at initialization time. Plus: gethostbyname has a non-standard errno reporting mechanism, handle this better.
2009-02-11Added accept as a system call needing resource increasesAlan D. Brunelle
accept(2) opens a socket, and thus needs to handle EMFILE/ENFILE errors like other system calls.
2009-02-09Rewrote blktrace to have a single thread per CPUAlan D. Brunelle
Massive changes: mostly around the notion of having much fewer threads (instead of N(devs) X N(cpus) threads, we'll have just N(cpus)). This is very important for larger systems (with lots of devices to trace). A lot of the code was stolen from the original blktrace code, major changes include: o On the client side we only have a single thread per client CPU. Each thread will then open all device files for that CPU, and use poll to determine which file needs processing. o For network client mode w/ sendfile, this means that a single socket will carry all data to the remote network server. The network server side will then distribute its reads off that one socket onto different trace files. o For network client mode w/out sendfile, we fall back to doing things like piped mode: keep buffers of tracers read in, and then the main thread will issue these on sockets to the server. In this case, the main thread will still have a single socket per CPU. o For networked mode we added an OPEN concept on the client side: as soon as the connection to the server is set up, a "header" is sent signifying that this connection will handle a <cpu, device> tuple. For each socket opened on the client side, it will send a header per device being managed. The server side will handle utilize opens to set up appropriate data structures to handle incoming data streams. o For both the OPEN and CLOSE headers the server will acknowledge with a short write back to the client. This allows the client & server sides to gracefully close socket connections. o I also re-did the resource limitiation issue a bit differently: for open calls (including socket) or for memory map/lock calls I have provided a wrapper function that will try to increase specific limits as needed. The previous method (attempting to do it at the beginning of the run) fails for network server mode - you don't know at initialization how many devices and CPUs will be handled. o The standard output is slightly different in a few places, if this is a problem w/ compatibility we can work to rectify that. The command line argument handling is identical though. o Using code stolen from Linux to manipulate doubly-linked lists. I've found that this makes the code easier to read/write (but may be a bit of overkill here...) o The code passes valgrind quite well (at least for my tests so far). The only nit has to do with inet_ntoa - but that is out of our control. Thanks to Stefan Raspl <> for testing and finding some issues and for providing suggestions. Signed-off-by: Alan D. Brunelle <>
2009-01-23Increased limits to allow for large system runsAlan D. Brunelle
On 16-way w/ 104 disks and a 32-way w/ 96 disks, I was getting: $ sudo blktrace -b 1024 -n 8 -I ../files ./cciss_c1d6.blktrace.10: Too many open files Failed to start worker threads Due to the nature of our N(cpus) X N(devices) order of file opens, and our N(cpus) X N(devices) X N(buffers) X (buffer size) amount of mmaps() going on we're exceeding both the RLIMIT_NOFILE and RLIMIT_MEMLOCK limits. This patch raises limits for RLIMIT_NOFILE and RLIMIT_MEMLOCK to "infinity", and allows blktrace to handle the large(ish) systems. (If these settings fail, we "guestimate" about how much we really need.) There is still an underlying blktrace and/or kernel problem: The directory /sys/kernel/debug/block/<DSF> where <DSF> is the device that encountered the limit is left behind (not cleaned up correctly). This stops blktrace from running a second time (even on another device): $ ls /sys/kernel/debug/block cciss_c1d6 $ sudo blktrace /dev/sda BLKTRACESETUP: No such file or directory Failed to start trace on /dev/sda and requires a reboot. (Looking into that next, as this patch - whilst stopping the original problem from happening - does not address the secondary problem. And there may be some other ways for the secondary problem to still occur...) I also fixed a warning concerning ftruncate's return value being ignored. Signed-off-by: Alan D. Brunelle <> Signed-off-by: Jens Axboe <axboe@carl.(none)>
2008-10-30Set release version 1.0.0blktrace-1.0.0Jens Axboe
Signed-off-by: Jens Axboe <>
2008-10-16blktrace: accept -v (lower case) for version info as wellJens Axboe
Christof Schmitt <> points out that the documentation uses -v but blktrace supports only -V, so change blktrace to accept both cases. Signed-off-by: Jens Axboe <>
2007-11-14memset() must be done after NULL checkJens Axboe
Pointed out by Jan Blunck <> Signed-off-by: Jens Axboe <>
2007-10-29blktrace segfaultAneesh Kumar K.V
Core was generated by `blktrace -d /dev/hdc'. Program terminated with signal 11, Segmentation fault. #0 0xb7e4cdec in ?? () (gdb) where #0 0xb7e4cdec in ?? () #1 0xb7dbf000 in ?? () #2 0x00021000 in ?? () #3 0xb7dee6e8 in ?? () #4 0x0804ecf0 in ?? () #5 0x00000001 in ?? () #6 0x6c616367 in ?? () #7 0xbfee3f68 in ?? () #8 0xb7f51300 in ?? () #9 0x00000168 in ?? () #10 0x0804ecf0 in ?? () #11 0x00000001 in ?? () #12 0xbfee3f88 in ?? () #13 0xbfee3f68 in ?? () #14 0x080499dc in close_thread (tip=0xb7f1eff4) at blktrace.c:637 Backtrace stopped: frame did not save the PC (gdb) the below diff fix the same. Signed-off-by: Jens Axboe <>
2007-08-28blktrace 0.99.3blktrace-0.99.3Jens Axboe
Signed-off-by: Jens Axboe <>
2007-02-26Added ability to add device names from a file to blktrace.Alan D. Brunelle
Added new argument to blktrace: -I <devs file> Where <devs file> has one device per line, each device is added to any explicit -d arg, or the trailing device arguments. Signed-off-by: Jens Axboe <>
2007-02-07[PATCH] Fix debugfs references in docsVasily Tarasov
At several places in docs old mountpoint for debugfs is mentioned. The patch just corrects these misprints. Signed-off-by: Jens Axboe <>
2006-12-01[PATCH] Bump versionblktrace-0.99.2Jens Axboe
Signed-off-by: Jens Axboe <>
2006-10-30[PATCH] Ignore -o (output_name) when in server modeJens Axboe
Reported by It doesn't make a lot of sense to allow directories inside the session private directory, and we currently do not handle multiple directories inside the generated dir. So just ignore that option and inform the user. Signed-off-by: Jens Axboe <>
2006-10-30[PATCH] Default debugfs path to /sys/kernel/debugJens Axboe
This seems to be where distros put it. Signed-off-by: Jens Axboe <>
2006-09-05[PATCH] email updateJens Axboe
2006-05-08[PATCH] blktrace.c should ignore SIGPIPEAlex Polvi
After sending a SIGINT to a "blktrace .. | blkparse .. " pipeline, occasionally I would not be able to run blktrace again. On my next run of the same pipeline I would get the following error: cs411vm:/usr/src/bt# ./blktrace -d /dev/sdb -o - | ./blkparse -i - BLKTRACESETUP: No such file or directory Failed to start trace on /dev/sdb After rebooting, I could reproduce this by starting the pipeline, and kicking off a big write and sending a SIGINT to blktrace. The following is what I used for a write: while [ 1 ] do dd if=/dev/zero of=/test/write bs=1M done It seemed like blktrace was not handling signals correctly, so I strace'd the process to see what was happening. Sure enough: --- SIGINT (Interrupt) @ 0 (0) --- ioctl(3, 0x1275, 0xb7fa6000) = 0 sigreturn() = ? (mask now []) write(1, "O\0\0\0\20\0\200\0\0\0\0\0\0\0\0\0\7taeN3\1\0\257R\260"..., 4096) = -1 EPIPE (Broken pipe) --- SIGPIPE (Broken pipe) @ 0 (0) --- +++ killed by SIGPIPE +++ Any idea what clean-up is not happening? I tried to figure it out, but my only work-around was a reboot. My patch against 2.6.17-rc3 just tells blktrace to ignore SIGPIPE. Nothing crazy. :) blktrace already does enough error checking elsewhere to handle a bad write. On a completely unrelated note, the write_data function calls fwrite, which will not return -1 on error. Instead you'll need to use ferror to check if there was a problem. I'm not sure if this is a cause of any bugs, but it seemed worth mentioning. Furthermore, I updated the URLs in the documentation to point at a valid git repo.
2006-05-08[PATCH] fread/fwrite error handlingJens Axboe
Need to check ferror(), not return value. Thanks to Alex Polvi.
2006-03-28[PATCH] blktrace: fix sendfile problem with > buf_size xmitsTom Zanussi
Basically, it add buf_size, buf_nr and page_size to the net header and this info is put into each tip, for easy access in mmap_subbuf(). The global buf_size, etc is still used by clients, since the global value is all they need (for clients, the tip is filled in with the global values anyway, so that mmap_subbuf() can use them, maybe the tip->buf_size should be used everywhere instead for consistency and only use the global values for initial args. Also, each tip doesn't really need to have these values, they could be stored in tip->device but I thought it would be easier and quicker to access tip->buf_size rather than tip->device->buf_size). Actually, this data only needs to be sent once per trace, but since there's no header covering the entire trace, sends it each time. Could be an optimization to make later... I've tested it using sendfile over the network with different buffer sizes, and with normal non-sendfile network and non-network tracing straight to disk, and haven't seen any problems. Each test generated a little over 1Gb of trace data in a little under 2 minutes, just under 23 million events after parsing, so it looks to me like it's doing the job at this point...
2006-03-16[PATCH] blktrace: fix get_subbuf() leakJens Axboe