iowatcher: don't add Q events to the io hash
authorJeff Moyer <jmoyer@redhat.com>
Thu, 30 Aug 2018 22:14:20 +0000 (18:14 -0400)
committerJens Axboe <axboe@kernel.dk>
Fri, 31 Aug 2018 02:57:53 +0000 (20:57 -0600)
Hi,

Bryan Gurney reported iowatcher taking a *really* long time to generate
a movie for 16GiB worth of trace data.  I took a look, and the io hash
was growing without bounds.  The reason was that the I/O pattern looks
like this:

259,0    5        8     0.000208501 31708  A   W 5435592 + 8 <- (259,1) 5433544
259,1    5        9     0.000209537 31708  Q   W 5435592 + 8 [kvdo30:bioQ0]
259,1    5       10     0.000209880 31708  G   W 5435592 + 8 [kvdo30:bioQ0]
259,0    5       11     0.000211064 31708  A   W 5435600 + 8 <- (259,1) 5433552
259,1    5       12     0.000211347 31708  Q   W 5435600 + 8 [kvdo30:bioQ0]
259,1    5       13     0.000212957 31708  M   W 5435600 + 8 [kvdo30:bioQ0]
259,0    5       14     0.000213379 31708  A   W 5435608 + 8 <- (259,1) 5433560
259,1    5       15     0.000213629 31708  Q   W 5435608 + 8 [kvdo30:bioQ0]
259,1    5       16     0.000213937 31708  M   W 5435608 + 8 [kvdo30:bioQ0]
...
259,1    5      107     0.000246274 31708  D   W 5435592 + 256 [kvdo30:bioQ0]

For each of those Q events, an entry was created in the io_hash.  Then,
upon I/O completion, only the first event (with the right starting
sector) was removed!  The runtime overhead of just iterating the hash
chains was enormous.

The solution is to simply ignore the Q events, so long as there are D
events in the trace.  If there are no D events, then go ahead and hash
the Q events as before.  I'm hoping that if we only have Q and C, that
they will actually be aligned.  If that's an incorrect assumption, we
could account merges in an rbtree.  I'll defer that work until someone
can show me blktrace data that needs it.

The comments should be self explanatory.  Review would be appreciated
as the code isn't well documented, and I don't know if I'm missing some
hidden assumption about the data.

Before applying this patch, iowatcher would take more than 12 hours to
complete.  After the patch:

real 9m44.476s
user 41m35.426s
sys 3m29.106s

'nuf said.

Cheers,
Jeff

Reviewed-by: Chris Mason <clm@fb.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
iowatcher/blkparse.c

index c7d1d652c78e6bbe9a513bfeaeca2d2fe0207a49..41e20f0c01f9ed986a885916e0e85250a180c83a 100644 (file)
@@ -1099,16 +1099,28 @@ void add_pending_io(struct trace *trace, struct graph_line_data *gld)
        if (action == __BLK_TA_QUEUE) {
                if (io->sector == 0)
                        return;
-               if (trace->found_issue || trace->found_completion) {
-                       pio = hash_queued_io(trace->io);
-                       /*
-                        * When there are no ISSUE events count depth and
-                        * latency at least from queue events
-                        */
-                       if (pio && !trace->found_issue) {
-                               pio->dispatch_time = io->time;
-                               goto account_io;
-                       }
+               /*
+                * If D (issue) events are available, use them for I/O
+                * accounting.  Nothing needs to be done for Q.
+                */
+               if (trace->found_issue)
+                       return;
+               /*
+                * If there are no D or C events, then all that can be
+                * done is to account the Q event (and make sure not to
+                * add the I/O to the hash, because it will never be
+                * removed).
+                */
+               if (!trace->found_completion)
+                       goto account_io;
+               /*
+                * When there are no ISSUE events, count depth and
+                * latency from queue events.
+                */
+               pio = hash_queued_io(trace->io);
+               if (pio) {
+                       pio->dispatch_time = io->time;
+                       goto account_io;
                }
                return;
        }