summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-07-15block: by default, limit maximum discard size to 64MBdiscard-rwJens Axboe
Lots of devices exhibit very high latencies for big discards, hurting reads and writes. By default, limit the max discard we will build to 64MB. This value has shown good results across a number of devices. This will potentially hurt discard throughput, from a provisioning point of view (when the user does mkfs.xfs, for instance, and mkfs issues a full device discard). If that becomes an issue, we could have different behavior for provisioning vs runtime discards. Signed-off-by: Jens Axboe <axboe@fb.com>
2015-07-14block: make /sys/block/<dev>/queue/discard_max_bytes writeableJens Axboe
Lots of devices support huge discard sizes these days. Depending on how the device handles them internally, huge discards can introduce massive latencies (hundreds of msec) on the device side. We have a sysfs file, discard_max_bytes, that advertises the max hardware supported discard size. Make this writeable, and split the settings into a soft and hard limit. This can be set from 'discard_granularity' and up to the hardware limit. Add a new sysfs file, 'discard_max_hw_bytes', that shows the hw set limit. Signed-off-by: Jens Axboe <axboe@fb.com>
2015-07-14block: have drivers use blk_queue_max_discard_sectors()Jens Axboe
Some drivers use it now, others just set the limits field manually. But in preparation for splitting this into a hard and soft limit, ensure that they all call the proper function for setting the hw limit for discards. Signed-off-by: Jens Axboe <axboe@fb.com>
2015-07-11bcache: don't embed 'return' statements in closure macrosJens Axboe
This is horribly confusing, it breaks the flow of the code without it being apparent in the caller. Signed-off-by: Jens Axboe <axboe@fb.com> Acked-by: Christoph Hellwig <hch@lst.de>
2015-07-09blkcg: fix blkcg_policy_data allocation bugTejun Heo
e48453c386f3 ("block, cgroup: implement policy-specific per-blkcg data") updated per-blkcg policy data to be dynamically allocated. When a policy is registered, its policy data aren't created. Instead, when the policy is activated on a queue, the policy data are allocated if there are blkg's (blkcg_gq's) which are attached to a given blkcg. This is buggy. Consider the following scenario. 1. A blkcg is created. No blkg's attached yet. 2. The policy is registered. No policy data is allocated. 3. The policy is activated on a queue. As the above blkcg doesn't have any blkg's, it won't allocate the matching blkcg_policy_data. 4. An IO is issued from the blkcg and blkg is created and the blkcg still doesn't have the matching policy data allocated. With cfq-iosched, this leads to an oops. It also doesn't free policy data on policy unregistration assuming that freeing of all policy data on blkcg destruction should take care of it; however, this also is incorrect. 1. A blkcg has policy data. 2. The policy gets unregistered but the policy data remains. 3. Another policy gets registered on the same slot. 4. Later, the new policy tries to allocate policy data on the previous blkcg but the slot is already occupied and gets skipped. The policy ends up operating on the policy data of the previous policy. There's no reason to manage blkcg_policy_data lazily. The reason we do lazy allocation of blkg's is that the number of all possible blkg's is the product of cgroups and block devices which can reach a surprising level. blkcg_policy_data is contrained by the number of cgroups and shouldn't be a problem. This patch makes blkcg_policy_data to be allocated for all existing blkcg's on policy registration and freed on unregistration and removes blkcg_policy_data handling from policy [de]activation paths. This makes that blkcg_policy_data are created and removed with the policy they belong to and fixes the above described problems. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: e48453c386f3 ("block, cgroup: implement policy-specific per-blkcg data") Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-07-09blkcg: implement all_blkcgs listTejun Heo
Add all_blkcgs list goes through blkcg->all_blkcgs_node and is protected by blkcg_pol_mutex. This will be used to fix blkcg_policy_data allocation bug. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-07-09blkcg: blkcg_css_alloc() should grab blkcg_pol_mutex while iterating ↵Tejun Heo
blkcg_policy[] An entry in blkcg_policy[] is stable while there are non-bypassing in-flight IOs on a request_queue which has the policy activated. This is why most derefs of blkcg_policy[] don't need explicit locking; however, blkcg_css_alloc() isn't invoked from IO path and thus doesn't have this protection and may race policies being added and removed. Fix it by adding explicit blkcg_pol_mutex protection around blkcg_policy[] iteration in blkcg_css_alloc(). Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: e48453c386f3 ("block, cgroup: implement policy-specific per-blkcg data") Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-07-09blkcg: allow blkcg_pol_mutex to be grabbed from cgroup [file] methodsTejun Heo
blkcg_pol_mutex primarily protects the blkcg_policy array. It also protects cgroup file type [un]registration during policy addition / removal. This puts blkcg_pol_mutex outside cgroup internal synchronization and in turn makes it impossible to grab from blkcg's cgroup methods as that leads to cyclic dependency. Another problematic dependency arising from this is through cgroup interface file deactivation. Removing a cftype requires removing all files of the type which in turn involves draining all on-going invocations of the file methods. This means that an interface file implementation can't grab blkcg_pol_mutex as draining can lead to AA deadlock. blkcg_reset_stats() is already in this situation. It currently trylocks blkcg_pol_mutex and then unwinds and retries the whole operation on failure, which is cumbersome at best. It has a lengthy comment explaining how cgroup internal synchronization is involved and expected to be updated but as explained above this doesn't need cgroup internal locking to deadlock. It's a self-contained AA deadlock. The described circular dependencies can be easily broken by moving cftype [un]registration out of blkcg_pol_mutex and protect them with an outer mutex. This patch introduces blkcg_pol_register_mutex which wraps entire policy [un]registration including cftype operations and shrinks blkcg_pol_mutex critical section. This also makes the trylock dancing in blkcg_reset_stats() unnecessary. Removed. This patch is necessary for the following blkcg_policy_data allocation bug fixes. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-07-07block/blk-cgroup.c: free per-blkcg data when freeing the blkcgArianna Avanzini
Currently, per-blkcg data is freed each time a policy is deactivated, that is also upon scheduler switch. However, when switching from a scheduler implementing a policy which requires per-blkcg data to another one, that same policy might be active on other devices, and therefore those same per-blkcg data could be still in use. This commit lets per-blkcg data be freed when the blkcg is freed instead of on policy deactivation. Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com> Reported-and-tested-by: Michael Kaminsky <kaminsky@cs.cmu.edu> Fixes: e48453c3 ("block, cgroup: implement policy-specific per-blkcg data") Signed-off-by: Jens Axboe <axboe@fb.com>
2015-07-07block: use FIELD_SIZEOF to calculate size of a fieldManinder Singh
use FIELD_SIZEOF instead of open coding Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-07-07bio integrity: do not assume bio_integrity_pool exists if bioset existsMike Snitzer
bio_integrity_alloc() and bio_integrity_free() assume that if a bio was allocated from a bioset that that bioset also had its bio_integrity_pool allocated using bioset_integrity_create(). This is a very bad assumption given that bioset_create() and bioset_integrity_create() are completely disjoint. Not all callers of bioset_create() have been trained to also call bioset_integrity_create() -- and they may not care to be. Fix this by falling back to kmalloc'ing 'struct bio_integrity_payload' rather than force all bioset consumers to (wastefully) preallocate a bio_integrity_pool that they very likely won't actually need (given the niche nature of the current block integrity support). Otherwise, a NULL pointer "Kernel BUG" with a trace like the following will be observed (as seen on s390x using zfcp storage) because dm-io doesn't use bioset_integrity_create() when creating its bioset: [ 791.643338] Call Trace: [ 791.643339] ([<00000003df98b848>] 0x3df98b848) [ 791.643341] [<00000000002c5de8>] bio_integrity_alloc+0x48/0xf8 [ 791.643348] [<00000000002c6486>] bio_integrity_prep+0xae/0x2f0 [ 791.643349] [<0000000000371e38>] blk_queue_bio+0x1c8/0x3d8 [ 791.643355] [<000000000036f8d0>] generic_make_request+0xc0/0x100 [ 791.643357] [<000000000036f9b2>] submit_bio+0xa2/0x198 [ 791.643406] [<000003ff801f9774>] dispatch_io+0x15c/0x3b0 [dm_mod] [ 791.643419] [<000003ff801f9b3e>] dm_io+0x176/0x2f0 [dm_mod] [ 791.643423] [<000003ff8074b28a>] do_reads+0x13a/0x1a8 [dm_mirror] [ 791.643425] [<000003ff8074b43a>] do_mirror+0x142/0x298 [dm_mirror] [ 791.643428] [<0000000000154fca>] process_one_work+0x18a/0x3f8 [ 791.643432] [<000000000015598a>] worker_thread+0x132/0x3b0 [ 791.643435] [<000000000015d49a>] kthread+0xd2/0xd8 [ 791.643438] [<00000000005bc0ca>] kernel_thread_starter+0x6/0xc [ 791.643446] [<00000000005bc0c4>] kernel_thread_starter+0x0/0xc Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
2015-07-06Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: - fix the perf build, by fixing the rbtree.c sharing bug between kernel and tools/perf by creating a local copy of rbtree.c (more will be done for v4.3) - fix an AUX buffer (Intel-PT support) refcounting bug - fix copy_from_user_nmi() return value" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Fix copy_from_user_nmi() return if range is not ok perf: Fix AUX buffer refcounting tools: Copy rbtree_augmented.h from the kernel tools: Move rbtree.h from tools/perf/ tools: Copy lib/rbtree.c to tools/lib/ perf tools: Copy rbtree.h from the kernel tools: Adopt {READ,WRITE_ONCE} from the kernel
2015-07-06perf/x86: Fix copy_from_user_nmi() return if range is not okYann Droneaud
Commit 0a196848ca36 ("perf: Fix arch_perf_out_copy_user default"), changes copy_from_user_nmi() to return the number of remaining bytes so that it behave like copy_from_user(). Unfortunately, when the range is outside of the process memory, the return value is still the number of byte copied, eg. 0, instead of the remaining bytes. As all users of copy_from_user_nmi() were modified as part of commit 0a196848ca36, the function should be fixed to return the total number of bytes if range is not correct. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1435001923-30986-1-git-send-email-ydroneaud@opteya.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06perf: Fix AUX buffer refcountingPeter Zijlstra
Its currently possible to drop the last refcount to the aux buffer from NMI context, which results in the expected fireworks. The refcounting needs a bigger overhaul, but to cure the immediate problem, delay the freeing by using an irq_work. Reviewed-and-tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150618103249.GK19282@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06Merge branch 'perf/rbtree_copy' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull rbtree build fix from Arnaldo Carvalho de Melo. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-05tools: Copy rbtree_augmented.h from the kernelArnaldo Carvalho de Melo
To complete the transitioning to not to share the same files with the kernel, also moving it from tools/perf/include/linux/ to tools/include/linux to make the whoke rbtree kit to other tools/ living codebases. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-5bxyehixafckqm6ez25alnfo@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-05tools: Move rbtree.h from tools/perf/Arnaldo Carvalho de Melo
The previous step, copying the contents minus the rcupdate.h parts, was done as a minimal fix, now do the move from tools/perf/. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-52fllxtsgmtke66pmv98mcma@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-05tools: Copy lib/rbtree.c to tools/lib/Arnaldo Carvalho de Melo
So that we can remove kernel specific stuff we've been stubbing out via a tools/include/linux/export.h that gets removed in this patch and to avoid breakages in the future like the one fixed recently where rcupdate.h started being used in rbtree.h. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-rxuzfsozpb8hv1emwpx06rm6@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-05Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bugfixes from Ted Ts'o: "Bug fixes (all for stable kernels) for ext4: - address corner cases for indirect blocks->extent migration - fix reserved block accounting invalidate_page when page_size != block_size (i.e., ppc or 1k block size file systems) - fix deadlocks when a memcg is under heavy memory pressure - fix fencepost error in lazytime optimization" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: replace open coded nofail allocation in ext4_free_blocks() ext4: correctly migrate a file with a hole at the beginning ext4: be more strict when migrating to non-extent based file ext4: fix reservation release on invalidatepage for delalloc fs ext4: avoid deadlocks in the writeback path by using sb_getblk_gfp bufferhead: Add _gfp version for sb_getblk() ext4: fix fencepost error in lazytime optimization
2015-07-05perf tools: Copy rbtree.h from the kernelArnaldo Carvalho de Melo
We were using the include/linux/rbtree.h directly from the kernel, which broke the build as soon as it started using rcupdate.h, to avoid dragging the rcu header files into tools/, for which there is no use so far, grab a copy of rbtree.h. This is the minimal fix, later patches will copy as well lib/rbtree.c and move rbtree.h into tools/include/, etc. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-dfmuj0j63w4by7vhlh4hhn74@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-05tools: Adopt {READ,WRITE_ONCE} from the kernelArnaldo Carvalho de Melo
We need it to build rbtree.c after this cset: commit d72da4a4d973 Author: Peter Zijlstra <peterz@infradead.org> Date: Wed May 27 11:09:36 2015 +0930 rbtree: Make lockless searches non-fatal Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-qlnzhezv5ddwst0w9fydju0y@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-05Linux 4.2-rc1v4.2-rc1Linus Torvalds
2015-07-05Merge tag 'platform-drivers-x86-v4.2-2' of ↵Linus Torvalds
git://git.infradead.org/users/dvhart/linux-platform-drivers-x86 Pull late x86 platform driver updates from Darren Hart: "The following came in a bit later and I wanted them to bake in next a few more days before submitting, thus the second pull. A new intel_pmc_ipc driver, a symmetrical allocation and free fix in dell-laptop, a couple minor fixes, and some updated documentation in the dell-laptop comments. intel_pmc_ipc: - Add Intel Apollo Lake PMC IPC driver tc1100-wmi: - Delete an unnecessary check before the function call "kfree" dell-laptop: - Fix allocating & freeing SMI buffer page - Show info about WiGig and UWB in debugfs - Update information about wireless control" * tag 'platform-drivers-x86-v4.2-2' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86: intel_pmc_ipc: Add Intel Apollo Lake PMC IPC driver tc1100-wmi: Delete an unnecessary check before the function call "kfree" dell-laptop: Fix allocating & freeing SMI buffer page dell-laptop: Show info about WiGig and UWB in debugfs dell-laptop: Update information about wireless control
2015-07-05ext4: replace open coded nofail allocation in ext4_free_blocks()Michal Hocko
ext4_free_blocks is looping around the allocation request and mimics __GFP_NOFAIL behavior without any allocation fallback strategy. Let's remove the open coded loop and replace it with __GFP_NOFAIL. Without the flag the allocator has no way to find out never-fail requirement and cannot help in any way. Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2015-07-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: "Assorted VFS fixes and related cleanups (IMO the most interesting in that part are f_path-related things and Eric's descriptor-related stuff). UFS regression fixes (it got broken last cycle). 9P fixes. fs-cache series, DAX patches, Jan's file_remove_suid() work" [ I'd say this is much more than "fixes and related cleanups". The file_table locking rule change by Eric Dumazet is a rather big and fundamental update even if the patch isn't huge. - Linus ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits) 9p: cope with bogus responses from server in p9_client_{read,write} p9_client_write(): avoid double p9_free_req() 9p: forgetting to cancel request on interrupted zero-copy RPC dax: bdev_direct_access() may sleep block: Add support for DAX reads/writes to block devices dax: Use copy_from_iter_nocache dax: Add block size note to documentation fs/file.c: __fget() and dup2() atomicity rules fs/file.c: don't acquire files->file_lock in fd_install() fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation vfs: avoid creation of inode number 0 in get_next_ino namei: make set_root_rcu() return void make simple_positive() public ufs: use dir_pages instead of ufs_dir_pages() pagemap.h: move dir_pages() over there remove the pointless include of lglock.h fs: cleanup slight list_entry abuse xfs: Correctly lock inode when removing suid and file capabilities fs: Call security_ops->inode_killpriv on truncate fs: Provide function telling whether file_remove_privs() will do anything ...
2015-07-04bluetooth: fix list handlingLinus Torvalds
Commit 835a6a2f8603 ("Bluetooth: Stop sabotaging list poisoning") thought that the code was sabotaging the list poisoning when NULL'ing out the list pointers and removed it. But what was going on was that the bluetooth code was using NULL pointers for the list as a way to mark it empty, and that commit just broke it (and replaced the test with NULL with a "list_empty()" test on a uninitialized list instead, breaking things even further). So fix it all up to use the regular and real list_empty() handling (which does not use NULL, but a pointer to itself), also making sure to initialize the list properly (the previous NULL case was initialized implicitly by the session being allocated with kzalloc()) This is a combination of patches by Marcel Holtmann and Tedd Ho-Jeong An. [ I would normally expect to get this through the bt tree, but I'm going to release -rc1, so I'm just committing this directly - Linus ] Reported-and-tested-by: Jörg Otte <jrg.otte@gmail.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Original-by: Tedd Ho-Jeong An <tedd.an@intel.com> Original-by: Marcel Holtmann <marcel@holtmann.org>: Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-07-04Merge branch 'for-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target updates from Nicholas Bellinger: "It's been a busy development cycle for target-core in a number of different areas. The fabric API usage for se_node_acl allocation is now within target-core code, dropping the external API callers for all fabric drivers tree-wide. There is a new conversion to RCU hlists for se_node_acl and se_portal_group LUN mappings, that turns fast-past LUN lookup into a completely lockless code-path. It also removes the original hard-coded limitation of 256 LUNs per fabric endpoint. The configfs attributes for backends can now be shared between core and driver code, allowing existing drivers to use common code while still allowing flexibility for new backend provided attributes. The highlights include: - Merge sbc_verify_dif_* into common code (sagi) - Remove iscsi-target support for obsolete IFMarker/OFMarker (Christophe Vu-Brugier) - Add bidi support in target/user backend (ilias + vangelis + agover) - Move se_node_acl allocation into target-core code (hch) - Add crc_t10dif_update common helper (akinobu + mkp) - Handle target-core odd SGL mapping for data transfer memory (akinobu) - Move transport ID handling into target-core (hch) - Move task tag into struct se_cmd + support 64-bit tags (bart) - Convert se_node_acl->device_list[] to RCU hlist (nab + hch + paulmck) - Convert se_portal_group->tpg_lun_list[] to RCU hlist (nab + hch + paulmck) - Simplify target backend driver registration (hch) - Consolidate + simplify target backend attribute implementations (hch + nab) - Subsume se_port + t10_alua_tg_pt_gp_member into se_lun (hch) - Drop lun_sep_lock for se_lun->lun_se_dev RCU usage (hch + nab) - Drop unnecessary core_tpg_register TFO parameter (nab) - Use 64-bit LUNs tree-wide (hannes) - Drop left-over TARGET_MAX_LUNS_PER_TRANSPORT limit (hannes)" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (76 commits) target: Bump core version to v5.0 target: remove target_core_configfs.h target: remove unused TARGET_CORE_CONFIG_ROOT define target: consolidate version defines target: implement WRITE_SAME with UNMAP bit using ->execute_unmap target: simplify UNMAP handling target: replace se_cmd->execute_rw with a protocol_data field target/user: Fix inconsistent kmap_atomic/kunmap_atomic target: Send UA when changing LUN inventory target: Send UA upon LUN RESET tmr completion target: Send UA on ALUA target port group change target: Convert se_lun->lun_deve_lock to normal spinlock target: use 'se_dev_entry' when allocating UAs target: Remove 'ua_nacl' pointer from se_ua structure target_core_alua: Correct UA handling when switching states xen-scsiback: Fix compile warning for 64-bit LUN target: Remove TARGET_MAX_LUNS_PER_TRANSPORT target: use 64-bit LUNs target: Drop duplicate + unused se_dev_check_wce target: Drop unnecessary core_tpg_register TFO parameter ...
2015-07-04Merge tag 'ntb-4.2' of git://github.com/jonmason/ntbLinus Torvalds
Pull NTB updates from Jon Mason: "This includes a pretty significant reworking of the NTB core code, but has already produced some significant performance improvements. An abstraction layer was added to allow the hardware and clients to be easily added. This required rewriting the NTB transport layer for this abstraction layer. This modification will allow future "high performance" NTB clients. In addition to this change, a number of performance modifications were added. These changes include NUMA enablement, using CPU memcpy instead of asyncdma, and modification of NTB layer MTU size" * tag 'ntb-4.2' of git://github.com/jonmason/ntb: (22 commits) NTB: Add split BAR output for debugfs stats NTB: Change WARN_ON_ONCE to pr_warn_once on unsafe NTB: Print driver name and version in module init NTB: Increase transport MTU to 64k from 16k NTB: Rename Intel code names to platform names NTB: Default to CPU memcpy for performance NTB: Improve performance with write combining NTB: Use NUMA memory in Intel driver NTB: Use NUMA memory and DMA chan in transport NTB: Rate limit ntb_qp_link_work NTB: Add tool test client NTB: Add ping pong test client NTB: Add parameters for Intel SNB B2B addresses NTB: Reset transport QP link stats on down NTB: Do not advance transport RX on link down NTB: Differentiate transport link down messages NTB: Check the device ID to set errata flags NTB: Enable link for Intel root port mode in probe NTB: Read peer info from local SPAD in transport NTB: Split ntb_hw_intel and ntb_transport drivers ...
2015-07-049p: cope with bogus responses from server in p9_client_{read,write}Al Viro
if server claims to have written/read more than we'd told it to, warn and cap the claimed byte count to avoid advancing more than we are ready to.
2015-07-04p9_client_write(): avoid double p9_free_req()Al Viro
Braino in "9p: switch p9_client_write() to passing it struct iov_iter *"; if response is impossible to parse and we discard the request, get the out of the loop right there. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-049p: forgetting to cancel request on interrupted zero-copy RPCAl Viro
If we'd already sent a request and decide to abort it, we *must* issue TFLUSH properly and not just blindly reuse the tag, or we'll get seriously screwed when response eventually arrives and we confuse it for response to later request that had reused the same tag. Cc: stable@vger.kernel.org # v3.2 and later Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-04dax: bdev_direct_access() may sleepMatthew Wilcox
The brd driver is the only in-tree driver that may sleep currently. After some discussion on linux-fsdevel, we decided that any driver may choose to sleep in its ->direct_access method. To ensure that all callers of bdev_direct_access() are prepared for this, add a call to might_sleep(). Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-04block: Add support for DAX reads/writes to block devicesMatthew Wilcox
If a block device supports the ->direct_access methods, bypass the normal DIO path and use DAX to go straight to memcpy() instead of allocating a DIO and a BIO. Includes support for the DIO_SKIP_DIO_COUNT flag in DAX, as is done in do_blockdev_direct_IO(). Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-04dax: Use copy_from_iter_nocacheMatthew Wilcox
When userspace does a write, there's no need for the written data to pollute the CPU cache. This matches the original XIP code. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-04dax: Add block size note to documentationMatthew Wilcox
For block devices which are small enough, mkfs will default to creating a filesystem with block sizes smaller than page size. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Except for the preempt notifiers fix, these are all small bugfixes that could have been waited for -rc2. Sending them now since I was taking care of Peter's patch anyway" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: add hyper-v crash msrs values KVM: x86: remove data variable from kvm_get_msr_common KVM: s390: virtio-ccw: don't overwrite config space values KVM: x86: keep track of LVT0 changes under APICv KVM: x86: properly restore LVT0 KVM: x86: make vapics_in_nmi_mode atomic sched, preempt_notifier: separate notifier registration from static_key inc/dec
2015-07-04NTB: Add split BAR output for debugfs statsDave Jiang
When split BAR is enabled, the driver needs to dump out the split BAR registers rather than the original 64bit BAR registers. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Change WARN_ON_ONCE to pr_warn_once on unsafeDave Jiang
The unsafe doorbell and scratchpad access should display reason when WARN is called. Otherwise we get a stack dump without any explanation. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Print driver name and version in module initDave Jiang
Printouts driver name and version to indicate what is being loaded. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Increase transport MTU to 64k from 16kDave Jiang
Benchmarking showed a significant performance increase with the MTU size to 64k instead of 16k. Change the driver default to 64k. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Rename Intel code names to platform namesDave Jiang
Instead of using the platform code names, use the correct platform names to identify the respective Intel NTB hardware. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Default to CPU memcpy for performanceDave Jiang
Disable DMA usage by default, since the CPU provides much better performance with write combining. Provide a module parameter to enable DMA usage when offloading the memcpy is preferred. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Improve performance with write combiningDave Jiang
Changing the memory window BAR mappings to write combining significantly boosts the performance. We will also use memcpy that uses non-temporal store, which showed performance improvement when doing non-cached memcpys. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Use NUMA memory in Intel driverAllen Hubbe
Allocate memory for the NUMA node of the NTB device. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Use NUMA memory and DMA chan in transportAllen Hubbe
Allocate memory and request the DMA channel for the same NUMA node as the NTB device. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Rate limit ntb_qp_link_workAllen Hubbe
When the ntb transport is connecting and waiting for the peer, the debug console receives lots of debug level messages about the remote qp link status being down. Rate limit those messages. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Add tool test clientAllen Hubbe
This is a simple debugging driver that enables the doorbell and scratch pad registers to be read and written from the debugfs. This tool enables more complicated debugging to be scripted from user space. This driver may be used to test that your ntb hardware and drivers are functioning at a basic level. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Add ping pong test clientAllen Hubbe
This is a simple ping pong driver that exercises the scratch pads and doorbells of the ntb hardware. This driver may be used to test that your ntb hardware and drivers are functioning at a basic level. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Add parameters for Intel SNB B2B addressesAllen Hubbe
Add module parameters for the addresses to be used in B2B topology. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Reset transport QP link stats on downAllen Hubbe
Reset the link stats when the link goes down. In particular, the TX and RX index and count must be reset, or else the TX side will be sending packets to the RX side where the RX side is not expecting them. Reset all the stats, to be consistent. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>