writeback: throttle buffered writeback
Test patch that throttles buffered writeback to make it a lot
more smooth, and has way less impact on other system activity.
Background writeback should be, by definition, background
activity. The fact that we flush huge bundles of it at the time
means that it potentially has heavy impacts on foreground workloads,
which isn't ideal. We can't easily limit the sizes of writes that
we do, since that would impact file system layout in the presence
of delayed allocation. So just throttle back buffered writeback,
unless someone is waiting for it.
The algorithm for when to throttle takes its inspiration in the
CoDel networking scheduling algorithm. Like CoDel, blk-wb monitors
the minimum latencies of requests over a window of time. In that
window of time, if the minimum latency of any request exceeds a
given target, then a scale count is incremented and the queue depth
is shrunk. The next monitoring window is shrunk accordingly. Unlike
CoDel, if we hit a window that exhibits good behavior, then we
simply increment the scale count and re-calculate the limits for that
scale value. This prevents us from oscillating between a
close-to-ideal value and max all the time, instead remaining in the
windows where we get good behavior.
The patch registers two sysfs entries. The first one, 'wb_lat_usec',
sets the latency target for the window. It defaults to 2 msec for
non-rotational storage, and 75 msec for rotational storage. Setting
this value to '0' disables blk-wb.
The second entry, 'wb_stats', is a debug entry, that simply shows the
current internal state of the throttling machine:
$ cat /sys/block/nvme0n1/queue/wb_stats
background=16, normal=32, max=64, inflight=0, wait=0, bdp_wait=0
'background' denotes how many requests we will allow in-flight for
idle background buffered writeback, 'normal' for higher priority
writeback, and 'max' for when it's urgent we clean pages.
'inflight' shows how many requests are currently in-flight for
buffered writeback, 'wait' shows if anyone is currently waiting for
access, and 'bdp_wait' shows if someone is currently throttled on this
device in balance_dirty_pages().
blk-wb also registers a few trace events, that can be used to monitor
the state changes:
block_wb_lat: Latency
2446318
block_wb_stat: read lat: mean=
2446318, min=
2446318, max=
2446318, samples=1,
write lat: mean=518866, min=15522, max=
5330353, samples=57
block_wb_step: step down: step=1, background=8, normal=16, max=32
'block_wb_lat' logs a violation in sync issue latency, 'block_wb_stat'
logs a window violation of latencies and dumps the stats that lead to
that, and finally, 'block_wb_stat' logs a step up/down and the new
limits associated with that state.
Signed-off-by: Jens Axboe <axboe@fb.com>