summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-04-17writeback: throttle buffered writebackwb-buf-throttle-v4Jens Axboe
Test patch that throttles buffered writeback to make it a lot more smooth, and has way less impact on other system activity. Background writeback should be, by definition, background activity. The fact that we flush huge bundles of it at the time means that it potentially has heavy impacts on foreground workloads, which isn't ideal. We can't easily limit the sizes of writes that we do, since that would impact file system layout in the presence of delayed allocation. So just throttle back buffered writeback, unless someone is waiting for it. The algorithm for when to throttle takes its inspiration in the CoDel networking scheduling algorithm. Like CoDel, blk-wb monitors the minimum latencies of requests over a window of time. In that window of time, if the minimum latency of any request exceeds a given target, then a scale count is incremented and the queue depth is shrunk. The next monitoring window is shrunk accordingly. Unlike CoDel, if we hit a window that exhibits good behavior, then we simply increment the scale count and re-calculate the limits for that scale value. This prevents us from oscillating between a close-to-ideal value and max all the time, instead remaining in the windows where we get good behavior. The patch registers two sysfs entries. The first one, 'wb_lat_usec', sets the latency target for the window. It defaults to 2 msec for non-rotational storage, and 75 msec for rotational storage. Setting this value to '0' disables blk-wb. The second entry, 'wb_stats', is a debug entry, that simply shows the current internal state of the throttling machine: $ cat /sys/block/nvme0n1/queue/wb_stats background=16, normal=32, max=64, inflight=0, wait=0, bdp_wait=0 'background' denotes how many requests we will allow in-flight for idle background buffered writeback, 'normal' for higher priority writeback, and 'max' for when it's urgent we clean pages. 'inflight' shows how many requests are currently in-flight for buffered writeback, 'wait' shows if anyone is currently waiting for access, and 'bdp_wait' shows if someone is currently throttled on this device in balance_dirty_pages(). blk-wb also registers a few trace events, that can be used to monitor the state changes: block_wb_lat: Latency 2446318 block_wb_stat: read lat: mean=2446318, min=2446318, max=2446318, samples=1, write lat: mean=518866, min=15522, max=5330353, samples=57 block_wb_step: step down: step=1, background=8, normal=16, max=32 'block_wb_lat' logs a violation in sync issue latency, 'block_wb_stat' logs a window violation of latencies and dumps the stats that lead to that, and finally, 'block_wb_stat' logs a step up/down and the new limits associated with that state. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-04-17block: add scalable completion tracking of requestsJens Axboe
For legacy block, we simply track them in the request queue. For blk-mq, we track them on a per-sw queue basis, which we can then sum up through the hardware queues and finally to a per device state. The stats are tracked in, roughly, 0.1s interval windows. Add sysfs files to display the stats. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-04-17block: add code to track actual device queue depthJens Axboe
For blk-mq, ->nr_requests does track queue depth, at least at init time. But for the older queue paths, it's simply a soft setting. On top of that, it's generally larger than the hardware setting on purpose, to allow backup of requests for merging. Fill a hole in struct request with a 'queue_depth' member, that drivers can call to more closely inform the block layer of the real queue depth. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-04-17writeback: increment page wait count when waitingJens Axboe
If we end up waiting on a page that is dirty or marked writeback, then increment the corresponding bdi_writeback counter. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-04-17writeback: track if we're sleeping on progress in balance_dirty_pages()Jens Axboe
Note in the bdi_writeback structure if a task is currently being limited in balance_dirty_pages(), waiting for writeback to proceed. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-04-17writeback: use WRITE_BG for kupdate and background writebackJens Axboe
If we're doing background type writes, then use the appropriate write command for that. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-04-17writeback: add wbc_to_write_cmd()Jens Axboe
Add wbc_to_write_cmd(), which returns the write type to use, based on a struct writeback_control. No functional changes in this patch, but it prepares us for factoring other wbc fields for write type. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-04-17block: add WRITE_BGJens Axboe
This adds a new request flag, REQ_BG, that callers can use to tell the block layer that this is background (non-urgent) IO. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-04-17block: kill off q->flush_flagsJens Axboe
Now that we converted everything to the newer block write cache interface, kill off the queue flush_flags and queueable flush entries. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-04-17block: kill blk_queue_flush()Jens Axboe
We don't have any drivers left using it, so kill it off. Update documentation to use the newer blk_queue_write_cache(). Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-04-17um: switch to using blk_queue_write_cache()Jens Axboe
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-04-17mtd: switch to using blk_queue_write_cache()Jens Axboe
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-04-17mmc/block: switch to using blk_queue_write_cache()Jens Axboe
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-04-17md: update to using blk_queue_write_cache()Jens Axboe
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-04-17ide-disk: update to using blk_queue_write_cache()Jens Axboe
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-04-17xen-blkfront: switch to using blk_queue_write_cache()Jens Axboe
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-04-17dm: switch to using blk_queue_write_cache()Jens Axboe
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-04-17bcache: switch to using blk_queue_write_cache()Jens Axboe
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-04-17virtio_blk: switch to using blk_queue_write_cache()Jens Axboe
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-04-17ps3disk: switch to using blk_queue_write_cache()Jens Axboe
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-04-17skd_main: switch to using blk_queue_write_cache()Jens Axboe
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-04-17osdblk: switch to using blk_queue_write_cache()Jens Axboe
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-04-17nbd: switch to using blk_queue_write_cache()Jens Axboe
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-04-17mtip32xx: remove call to blk_queue_flush()Jens Axboe
The driver calls it with 0 for flags, since it doesn't have a writeback cache. Just remove the call, as it's a no-op right now. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-04-17loop: switch to using blk_queue_write_cache()Jens Axboe
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-04-17drbd: switch to using blk_queue_write_cache()Jens Axboe
Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-04-17NVMe: switch to using blk_queue_write_cache()Jens Axboe
Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-04-17sd: switch to using blk_queue_write_cache()Jens Axboe
Switch to the newer interface, instead of using blk_queue_flush() directly. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-04-17block: add ability to flag write back caching on a deviceJens Axboe
Add an internal helper and flag for setting whether a queue has write back caching, or write through (or none). Add a sysfs file to show this as well, and make it changeable from user space. This will replace the (awkward) blk_queue_flush() interface that drivers currently use to inform the block layer of write cache state and capabilities. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-04-17Merge tag 'dm-4.6-fix-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fix from Mike Snitzer: "Fix for earlier 4.6-rc4 stable@ commit that introduced improper use of write lock in cmd_read_lock() -- due to cut-n-paste gone awry (and sparse didn't catch it)" * tag 'dm-4.6-fix-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm cache metadata: fix cmd_read_lock() acquiring write lock
2016-04-17dm cache metadata: fix cmd_read_lock() acquiring write lockAhmed Samy
Commit 9567366fefdd ("dm cache metadata: fix READ_LOCK macros and cleanup WRITE_LOCK macros") uses down_write() instead of down_read() in cmd_read_lock(), yet up_read() is used to release the lock in READ_UNLOCK(). Fix it. Fixes: 9567366fefdd ("dm cache metadata: fix READ_LOCK macros and cleanup WRITE_LOCK macros") Cc: stable@vger.kernel.org Signed-off-by: Ahmed Samy <f.fallen45@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-04-16Merge tag 'char-misc-4.6-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc fixes from Greg KH: "Here are some small char/misc driver fixes for 4.6-rc4. Full details are in the shortlog, nothing major here. These have all been in linux-next for a while with no reported issues" * tag 'char-misc-4.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: lkdtm: do not leak free page on kmalloc failure lkdtm: fix memory leak of base lkdtm: fix memory leak of val extcon: palmas: Drop stray IRQF_EARLY_RESUME flag
2016-04-16Merge tag 'driver-core-4.6-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull misc fixes from Greg KH: "Here are three small fixes for 4.6-rc4. Two fix up some lz4 issues with big endian systems, and the remaining one resolves a minor debugfs issue that was reported. All have been in linux-next with no reported issues" * tag 'driver-core-4.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: lib: lz4: cleanup unaligned access efficiency detection lib: lz4: fixed zram with lz4 on big endian machines debugfs: Make automount point inodes permanently empty
2016-04-16Merge tag 'usb-4.6-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB driver fixes from Greg KH: "Here are some small USB fixes for 4.6-rc4. Mostly xhci fixes for reported issues, a UAS bug that has hit a number of people, including stable tree users, and a few other minor things. All have been in linux-next for a while with no reported issues" * tag 'usb-4.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: hcd: out of bounds access in for_each_companion USB: uas: Add a new NO_REPORT_LUNS quirk USB: uas: Limit qdepth at the scsi-host level doc: usb: Fix typo in gadget_multi documentation usb: host: xhci-plat: Make enum xhci_plat_type start at a non zero value xhci: fix 10 second timeout on removal of PCI hotpluggable xhci controllers usb: xhci: fix wild pointers in xhci_mem_cleanup usb: host: xhci-plat: fix cannot work if R-Car Gen2/3 run on above 4GB phys usb: host: xhci: add a new quirk XHCI_NO_64BIT_SUPPORT xhci: resume USB 3 roothub first usb: xhci: applying XHCI_PME_STUCK_QUIRK to Intel BXT B0 host cdc-acm: fix crash if flushed with nothing buffered
2016-04-16Merge tag 'dmaengine-fix-4.6-rc4' of ↵Linus Torvalds
git://git.infradead.org/users/vkoul/slave-dma Pull dmaengine fixes from Vinod Koul: "This time we have some odd fixes in hsu, edma, omap and xilinx. Usual fixes and nothing special" * tag 'dmaengine-fix-4.6-rc4' of git://git.infradead.org/users/vkoul/slave-dma: dmaengine: dw: fix master selection dmaengine: edma: special case slot limit workaround dmaengine: edma: Remove dynamic TPTC power management feature dmaengine: vdma: don't crash when bad channel is requested dmaengine: omap-dma: Do not suppress interrupts for memcpy dmaengine: omap-dma: Fix polled channel completion detection and handling dmaengine: hsu: correct use of channel status register dmaengine: hsu: correct residue calculation of active descriptor dmaengine: hsu: set HSU_CH_MTSR to memory width
2016-04-16Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixlet from Ingo Molnar: "Fixes a build warning on certain Kconfig combinations" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/lockdep: Fix print_collision() unused warning
2016-04-16Merge branch 'efi-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fix from Ingo Molnar: "An arm64 boot crash fix" * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/arm64: Don't apply MEMBLOCK_NOMAP to UEFI memory map mapping
2016-04-16Merge branch 'fix/edma' into fixesVinod Koul
2016-04-16Merge branch 'fix/xilinx' into fixesVinod Koul
2016-04-16Merge branch 'fix/omap' into fixesVinod Koul
2016-04-16Merge branch 'fix/hsu' into fixesVinod Koul
2016-04-15Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "A few fixes for the current series. This contains: - Two fixes for NVMe: One fixes a reset race that can be triggered by repeated insert/removal of the module. The other fixes an issue on some platforms, where we get probe timeouts since legacy interrupts isn't working. This used not to be a problem since we had the worker thread poll for completions, but since that was killed off, it means those poor souls can't successfully probe their NVMe device. Use a proper IRQ check and probe (msi-x -> msi ->legacy), like most other drivers to work around this. Both from Keith. - A loop corruption issue with offset in iters, from Ming Lei. - A fix for not having the partition stat per cpu ref count initialized before sending out the KOBJ_ADD, which could cause user space to access the counter prior to initialization. Also from Ming Lei. - A fix for using the wrong congestion state, from Kaixu Xia" * 'for-linus' of git://git.kernel.dk/linux-block: block: loop: fix filesystem corruption in case of aio/dio NVMe: Always use MSI/MSI-x interrupts NVMe: Fix reset/remove race writeback: fix the wrong congested state variable definition block: partition: initialize percpuref before sending out KOBJ_ADD
2016-04-15Merge branch 'libnvdimm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Ross Zwisler: "Two fixes: - Fix memcpy_from_pmem() to fallback to memcpy() for architectures where CONFIG_ARCH_HAS_PMEM_API=n. - Add a comment explaining why we write data twice when clearing poison in pmem_do_bvec(). This has passed a boot test on an X86_32 config, which was the architecture where issue #1 above was first noticed" Dan Williams adds: "We're giving this multi-maintainer setup a shot, so expect libnvdimm pull requests from either Ross or I going forward" * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: libnvdimm, pmem: clarify the write+clear_poison+write flow pmem: fix BUG() error in pmem.h:48 on X86_32
2016-04-15Merge tag 'for-linus-20160415' of git://git.infradead.org/linux-mtdLinus Torvalds
Pull MTD fix from Brian Norris: "One MTD fix for v4.6-rc4: In the v4.4 cycle, we relaxed the requirement for assigning mtd->owner, but we didn't remove this error case. It's hit only by drivers that are both: (a) using nand_scan() directly and (b) built as modules We haven't seen explicit complaints about this (most use cases don't fit one or both of the above), but we should definitely not be BUG()'ing here" * tag 'for-linus-20160415' of git://git.infradead.org/linux-mtd: mtd: nand: Drop mtd.owner requirement in nand_scan
2016-04-15Merge tag 'mmc-v4.6-rc3' of git://git.linaro.org/people/ulf.hansson/mmcLinus Torvalds
Pull MMC fixes from Ulf Hansson: "Here are a couple of mmc fixes intended for v4.6 rc4. Regarding the fix for the regression about mmcblk device indexes. The approach taken to solve the problem seems to be good enough. There were some discussions around the solution, but it seems like people were happy about it in the end. MMC core: - Restore similar old behaviour when assigning mmcblk device indexes MMC host: - tegra: Disable UHS-I modes for Tegra124 to fix regression" * tag 'mmc-v4.6-rc3' of git://git.linaro.org/people/ulf.hansson/mmc: mmc: tegra: Disable UHS-I modes for Tegra124 mmc: block: Use the mmc host device index as the mmcblk device index
2016-04-15Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm fixes from Dave Airlie: "This contains fixes for exynos, amdgpu, radeon, i915 and qxl. It also contains some fixes to the core drm edid parser. qxl: - fix for a cursor hotspot issue radeon: - some MST fixes that I've been running locally and make my monitor a bit happier exynos: - fix some regressions and build fixes amdgpu: - a couple of small fixes i915: - two DP MST fixes and a couple of other regression fixes Nothing too out of the ordinary or surprising at this point" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm/exynos: Use VIDEO_SAMSUNG_S5P_G2D=n as G2D Kconfig dependency drm/exynos: fix a warning message drm/exynos: mic: fix an error code drm/exynos: fimd: fix broken dp_clock control drm/exynos: build fbdev code conditionally drm/exynos: fix adjusted_mode pointer in exynos_plane_mode_set drm/exynos: fix error handling in exynos_drm_subdrv_open drm/amd/amdgpu: fix irq domain remove for tonga ih drm/i915: fix deadlock on lid open drm/radeon: use helper for mst connector dpms. drm/radeon/mst: port some MST setup code from DAL. drm/amdgpu: add invisible pin size statistic drm/edid: Fix DMT 1024x768@43Hz (interlaced) timings drm/i915: Exit cherryview_irq_handler() after one pass drm/i915: Call intel_dp_mst_resume() before resuming displays drm/i915: Fix race condition in intel_dp_destroy_mst_connector() drm/edid: Fix parsing of EDID 1.4 Established Timings III descriptor drm/edid: Fix EDID Established Timings I and II drm/qxl: fix cursor position with non-zero hotspot
2016-04-15Merge branch 'parisc-4.6-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc ftrace fixes from Helge Deller: "This is (most likely) the last pull request for v4.6 for the parisc architecture. It fixes the FTRACE feature for parisc, which is horribly broken since quite some time and doesn't even compile. This patch just fixes the bare minimum (it actually removes more lines than it adds), so that the function tracer works again on 32- and 64bit kernels. I've queued up additional patches on top of this patch which e.g. add the syscall tracer, but those have to wait for the merge window for v4.7." * 'parisc-4.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Fix ftrace function tracer
2016-04-15libnvdimm, pmem: clarify the write+clear_poison+write flowDan Williams
The ACPI specification does not specify the state of data after a clear poison operation. Potential future libnvdimm bus implementations for other architectures also might not specify or disagree on the state of data after clear poison. Clarify why we write twice. Reported-by: Jeff Moyer <jmoyer@redhat.com> Reported-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
2016-04-15block: loop: fix filesystem corruption in case of aio/dioMing Lei
Starting from commit e36f620428(block: split bios to max possible length), block core starts to split bio in the middle of bvec. Unfortunately loop dio/aio doesn't consider this situation, and always treat 'iter.iov_offset' as zero. Then filesystem corruption is observed. This patch figures out the offset of the base bvevc via 'bio->bi_iter.bi_bvec_done' and fixes the issue by passing the offset to iov iterator. Fixes: e36f6204288088f (block: split bios to max possible length) Cc: Keith Busch <keith.busch@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org (4.5) Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-04-14Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: a binutils fix, an lguest fix, an mcelog fix and a missing documentation fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Avoid using object after free in genpool lguest, x86/entry/32: Fix handling of guest syscalls using interrupt gates x86/build: Build compressed x86 kernels as PIE x86/mm/pkeys: Add missing Documentation