block: explicit plugging
authorJens Axboe <jaxboe@fusionio.com>
Tue, 22 Jun 2010 10:31:25 +0000 (12:31 +0200)
committerJens Axboe <jaxboe@fusionio.com>
Tue, 22 Jun 2010 10:31:25 +0000 (12:31 +0200)
commitbd9bd51b669e0abad657e050a33841076c99eb37
treee8779c239c50d774b0d3b1197cfee433905b6c3c
parentc3284799cfcd220fe1335a99b2d6f3bf914987ec
block: explicit plugging

Nick writes:

This is a patch to perform block device plugging explicitly in the submitting
process context rather than implicitly by the block device.

There are several advantages to plugging in process context over plugging
by the block device:

- Implicit plugging is only active when the queue empties, so any
  advantages are lost if there is parallel IO occuring. Not so with
  explicit plugging.

- Implicit plugging relies on a timer and watermarks and a kind-of-explicit
  directive in lock_page which directs plugging. These are heuristics and
  can cost performance due to holding a block device idle longer than it
  should be. Explicit plugging avoids most of these issues by only holding
  the device idle when it is known more requests will be submitted.

- This lock_page directive uses a roundabout way to attempt to minimise
  intrusiveness of plugging on the VM. In doing so, it gets needlessly
  complex: the VM really is in a good position to direct the block layer
  as to the nature of its requests, so there is no need to try to hide
  the fact.

- Explicit plugging keeps a process-private queue of requests being held.
  This offers some advantages over immediately sending requests to the
  block device: firstly, merging can be attempted on requests in this list
  (currently only attempted on the head of the list) without taking any
  locks; secondly, when unplugging occurs, the requests can be delivered
  to the block device queue in a batch, thus the lock aquisitions can be
  batched up.

On a parallel tiobench benchmark, of the 800 000 calls to __make_request
performed, this patch avoids 490 000 (62%) of queue_lock aquisitions by
early merging on the private plugged list.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Changes so far by me:

- Don't invoke ->request_fn() in blk_queue_invalidate_tags

- Fixup all filesystems for block_sync_page()

- Add blk_delay_queue() to handle the old plugging-on-shortage usage.

- Unconditionally run replug_current_nested() in ioschedule()

- Fixup queue start/stop

- Fixup all the remaining drivers

- Change the namespace (prefix the plug functions with blk_)

- Fixup ext4

- Dead code removal

- Fixup blktrace plug/unplug notifications

- __make_request() cleanups

- bio_sync() fixups

- Kill queue empty checking

- Make barriers work again, using QRCU

- Make blk_sync_queue() work again, reuse barrier SRCU handling

- Fixup fuse

- Make it work with ioc sharing

- Tons of other fixes and improvements

This patch needs more work and some dedicated testing.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
100 files changed:
Documentation/block/biodoc.txt
block/blk-core.c
block/blk-exec.c
block/blk-ioc.c
block/blk-settings.c
block/blk-sysfs.c
block/blk.h
block/cfq-iosched.c
block/deadline-iosched.c
block/elevator.c
block/noop-iosched.c
drivers/block/cciss.c
drivers/block/cpqarray.c
drivers/block/floppy.c
drivers/block/loop.c
drivers/block/pktcdvd.c
drivers/block/umem.c
drivers/ide/ide-cd.c
drivers/ide/ide-io.c
drivers/md/bitmap.c
drivers/md/dm-raid1.c
drivers/md/dm-table.c
drivers/md/dm.c
drivers/md/linear.c
drivers/md/md.c
drivers/md/multipath.c
drivers/md/raid0.c
drivers/md/raid1.c
drivers/md/raid10.c
drivers/md/raid5.c
drivers/message/i2o/i2o_block.c
drivers/mmc/card/queue.c
drivers/s390/block/dasd.c
drivers/s390/char/tape_block.c
drivers/scsi/scsi_lib.c
drivers/scsi/scsi_transport_fc.c
drivers/scsi/scsi_transport_sas.c
fs/adfs/inode.c
fs/affs/file.c
fs/aio.c
fs/befs/linuxvfs.c
fs/bfs/file.c
fs/block_dev.c
fs/btrfs/disk-io.c
fs/btrfs/inode.c
fs/btrfs/volumes.c
fs/buffer.c
fs/cifs/file.c
fs/direct-io.c
fs/efs/inode.c
fs/ext2/inode.c
fs/ext3/inode.c
fs/ext4/inode.c
fs/fat/inode.c
fs/freevxfs/vxfs_subr.c
fs/fuse/inode.c
fs/gfs2/aops.c
fs/gfs2/meta_io.c
fs/hfs/inode.c
fs/hfsplus/inode.c
fs/hpfs/file.c
fs/isofs/inode.c
fs/jfs/inode.c
fs/jfs/jfs_metapage.c
fs/minix/inode.c
fs/nilfs2/btnode.c
fs/nilfs2/gcinode.c
fs/nilfs2/inode.c
fs/nilfs2/mdt.c
fs/ntfs/aops.c
fs/ntfs/compress.c
fs/ocfs2/aops.c
fs/ocfs2/cluster/heartbeat.c
fs/omfs/file.c
fs/qnx4/inode.c
fs/reiserfs/inode.c
fs/sysv/itree.c
fs/udf/file.c
fs/udf/inode.c
fs/ufs/inode.c
fs/ufs/truncate.c
fs/xfs/linux-2.6/xfs_aops.c
fs/xfs/linux-2.6/xfs_buf.c
include/linux/backing-dev.h
include/linux/blkdev.h
include/linux/buffer_head.h
include/linux/elevator.h
include/linux/fs.h
include/linux/iocontext.h
include/linux/pagemap.h
include/linux/swap.h
kernel/sched.c
mm/backing-dev.c
mm/filemap.c
mm/nommu.c
mm/page-writeback.c
mm/readahead.c
mm/shmem.c
mm/swap_state.c
mm/swapfile.c