AgeCommit message (Collapse)Author
2014-08-05partitions: aix.c: off by one bugfor-3.17/coreDan Carpenter
The lvip[] array has "state->limit" elements so the condition here should be >= instead of >. Fixes: 6ceea22bbbc8 ('partitions: add aix lvm partition support files') Signed-off-by: Dan Carpenter <> Acked-by: Philippe De Muyter <> Signed-off-by: Jens Axboe <>
2014-07-16blkcg: don't call into policy draining if root_blkg is already goneTejun Heo
While a queue is being destroyed, all the blkgs are destroyed and its ->root_blkg pointer is set to NULL. If someone else starts to drain while the queue is in this state, the following oops happens. NULL pointer dereference at 0000000000000028 IP: [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230 PGD e4a1067 PUD b773067 PMD 0 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: cfq_iosched(-) [last unloaded: cfq_iosched] CPU: 1 PID: 537 Comm: bash Not tainted 3.16.0-rc3-work+ #2 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff88000e222250 ti: ffff88000efd4000 task.ti: ffff88000efd4000 RIP: 0010:[<ffffffff8144e944>] [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230 RSP: 0018:ffff88000efd7bf0 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ffff880015091450 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88000efd7c10 R08: 0000000000000000 R09: 0000000000000001 R10: ffff88000e222250 R11: 0000000000000000 R12: ffff880015091450 R13: ffff880015092e00 R14: ffff880015091d70 R15: ffff88001508fc28 FS: 00007f1332650740(0000) GS:ffff88001fa80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000028 CR3: 0000000009446000 CR4: 00000000000006e0 Stack: ffffffff8144e8f6 ffff880015091450 0000000000000000 ffff880015091d80 ffff88000efd7c28 ffffffff8144ae2f ffff880015091450 ffff88000efd7c58 ffffffff81427641 ffff880015091450 ffffffff82401f00 ffff880015091450 Call Trace: [<ffffffff8144ae2f>] blkcg_drain_queue+0x1f/0x60 [<ffffffff81427641>] __blk_drain_queue+0x71/0x180 [<ffffffff81429b3e>] blk_queue_bypass_start+0x6e/0xb0 [<ffffffff814498b8>] blkcg_deactivate_policy+0x38/0x120 [<ffffffff8144ec44>] blk_throtl_exit+0x34/0x50 [<ffffffff8144aea5>] blkcg_exit_queue+0x35/0x40 [<ffffffff8142d476>] blk_release_queue+0x26/0xd0 [<ffffffff81454968>] kobject_cleanup+0x38/0x70 [<ffffffff81454848>] kobject_put+0x28/0x60 [<ffffffff81427505>] blk_put_queue+0x15/0x20 [<ffffffff817d07bb>] scsi_device_dev_release_usercontext+0x16b/0x1c0 [<ffffffff810bc339>] execute_in_process_context+0x89/0xa0 [<ffffffff817d064c>] scsi_device_dev_release+0x1c/0x20 [<ffffffff817930e2>] device_release+0x32/0xa0 [<ffffffff81454968>] kobject_cleanup+0x38/0x70 [<ffffffff81454848>] kobject_put+0x28/0x60 [<ffffffff817934d7>] put_device+0x17/0x20 [<ffffffff817d11b9>] __scsi_remove_device+0xa9/0xe0 [<ffffffff817d121b>] scsi_remove_device+0x2b/0x40 [<ffffffff817d1257>] sdev_store_delete+0x27/0x30 [<ffffffff81792ca8>] dev_attr_store+0x18/0x30 [<ffffffff8126f75e>] sysfs_kf_write+0x3e/0x50 [<ffffffff8126ea87>] kernfs_fop_write+0xe7/0x170 [<ffffffff811f5e9f>] vfs_write+0xaf/0x1d0 [<ffffffff811f69bd>] SyS_write+0x4d/0xc0 [<ffffffff81d24692>] system_call_fastpath+0x16/0x1b 776687bce42b ("block, blk-mq: draining can't be skipped even if bypass_depth was non-zero") made it easier to trigger this bug by making blk_queue_bypass_start() drain even when it loses the first bypass test to blk_cleanup_queue(); however, the bug has always been there even before the commit as blk_queue_bypass_start() could race against queue destruction, win the initial bypass test but perform the actual draining after blk_cleanup_queue() already destroyed all blkgs. Fix it by skippping calling into policy draining if all the blkgs are already gone. Signed-off-by: Tejun Heo <> Reported-by: Shirish Pargaonkar <> Reported-by: Sasha Levin <> Reported-by: Jet Chen <> Cc: Tested-by: Shirish Pargaonkar <> Signed-off-by: Jens Axboe <>
2014-07-14Revert "bio: modify __bio_add_page() to accept pages that don't start a new ↵Jens Axboe
segment" This reverts commit 254c4407cb84a6dec90336054615b0f0e996bb7c. It causes crashes with cryptsetup, even after a few iterations and updates. Drop it for now.
2014-07-01bio: modify __bio_add_page() to accept pages that don't start a new segmentMaurizio Lombardi
The original behaviour is to refuse to add a new page if the maximum number of segments has been reached, regardless of the fact the page we are going to add can be merged into the last segment or not. Unfortunately, when the system runs under heavy memory fragmentation conditions, a driver may try to add multiple pages to the last segment. The original code won't accept them and EBUSY will be reported to userspace. This patch modifies the function so it refuses to add a page only in case the latter starts a new segment and the maximum number of segments has already been reached. The bug can be easily reproduced with the st driver: 1) set CONFIG_SCSI_MPT2SAS_MAX_SGE or CONFIG_SCSI_MPT3SAS_MAX_SGE to 16 2) modprobe st buffer_kbs=1024 3) #dd if=/dev/zero of=/dev/st0 bs=1M count=10 dd: error writing `/dev/st0': Device or resource busy [ update bi_iter.bi_size before recounting segments] Signed-off-by: Maurizio Lombardi <> Signed-off-by: Ming Lei <> Tested-by: Dongsu Park <> Tested-by: Jet Chen <> Cc: Al Viro <> Cc: Christoph Hellwig <> Cc: Kent Overstreet <> Cc: Jens Axboe <> Signed-off-by: Andrew Morton <> Signed-off-by: Jens Axboe <>
2014-07-01block: fix SG_[GS]ET_RESERVED_SIZE ioctl when max_sectors is hugeAkinobu Mita
SG_GET_RESERVED_SIZE and SG_SET_RESERVED_SIZE ioctls access a reserved buffer in bytes as int type. The value needs to be capped at the request queue's max_sectors. But integer overflow is not correctly handled in the calculation when converting max_sectors from sectors to bytes. Signed-off-by: Akinobu Mita <> Cc: Jens Axboe <> Cc: "James E.J. Bottomley" <> Cc: Douglas Gilbert <> Cc: Reviewed-by: Christoph Hellwig <> Signed-off-by: Jens Axboe <>
2014-07-01block: fix BLKSECTGET ioctl when max_sectors is greater than USHRT_MAXAkinobu Mita
BLKSECTGET ioctl loads the request queue's max_sectors as unsigned short value to the argument pointer. So if the max_sector is greater than USHRT_MAX, the upper 16 bits of that is just discarded. In such case, USHRT_MAX is more preferable than the lower 16 bits of max_sectors. Signed-off-by: Akinobu Mita <> Cc: Jens Axboe <> Cc: "James E.J. Bottomley" <> Cc: Douglas Gilbert <> Cc: Reviewed-by: Christoph Hellwig <> Signed-off-by: Jens Axboe <>
2014-07-01block/partitions/efi.c: kerneldoc fixingFabian Frederick
Adding function documentation and fixing kerneldoc warnings ('field: description' uniformization). Cc: Davidlohr Bueso <> Cc: Jens Axboe <> Signed-off-by: Fabian Frederick <> Signed-off-by: Jens Axboe <>
2014-07-01block/partitions/msdos.c: code clean-upFabian Frederick
checkpatch fixing: WARNING: Missing a blank line after declarations WARNING: space prohibited between function name and open parenthesis '(' ERROR: spaces required around that '<' (ctx:VxV) Cc: Jens Axboe <> Cc: Andrew Morton <> Signed-off-by: Fabian Frederick <> Signed-off-by: Jens Axboe <>
2014-07-01block/partitions/amiga.c: replace nolevel printk by pr_errFabian Frederick
Also add no prefix pr_fmt to avoid any future default format update Cc: Jens Axboe <> Cc: Andrew Morton <> Signed-off-by: Fabian Frederick <> Signed-off-by: Jens Axboe <>
2014-07-01block/partitions/aix.c: replace count*size kzalloc by kcallocFabian Frederick
kcalloc manages count*sizeof overflow. Cc: Jens Axboe <> Cc: Andrew Morton <> Signed-off-by: Fabian Frederick <> Signed-off-by: Jens Axboe <>
2014-07-01bio-integrity: add "bip_max_vcnt" into struct bio_integrity_payloadGu Zheng
Commit 08778795 ("block: Fix nr_vecs for inline integrity vectors") from Martin introduces the function bip_integrity_vecs(get the useful vectors) to fix the issue about nr_vecs for inline integrity vectors that reported by David Milburn. But it seems that bip_integrity_vecs() will return the wrong number if the bio is not based on any bio_set for some reason(bio->bi_pool == NULL), because in that case, the bip_inline_vecs[0] is malloced directly. So here we add the bip_max_vcnt to record the count of vector slots, and cleanup the function bip_integrity_vecs(). Signed-off-by: Gu Zheng <> Cc: Martin K. Petersen <> Cc: Kent Overstreet <> Signed-off-by: Jens Axboe <>
2014-07-01blk-mq: use percpu_ref for mq usage countTejun Heo
Currently, blk-mq uses a percpu_counter to keep track of how many usages are in flight. The percpu_counter is drained while freezing to ensure that no usage is left in-flight after freezing is complete. blk_mq_queue_enter/exit() and blk_mq_[un]freeze_queue() implement this per-cpu gating mechanism. This type of code has relatively high chance of subtle bugs which are extremely difficult to trigger and it's way too hairy to be open coded in blk-mq. percpu_ref can serve the same purpose after the recent changes. This patch replaces the open-coded per-cpu usage counting and draining mechanism with percpu_ref. blk_mq_queue_enter() performs tryget_live on the ref and exit() performs put. blk_mq_freeze_queue() kills the ref and waits until the reference count reaches zero. blk_mq_unfreeze_queue() revives the ref and wakes up the waiters. Signed-off-by: Tejun Heo <> Cc: Jens Axboe <> Cc: Nicholas A. Bellinger <> Cc: Kent Overstreet <> Signed-off-by: Jens Axboe <>
2014-07-01blk-mq: collapse __blk_mq_drain_queue() into blk_mq_freeze_queue()Tejun Heo
Keeping __blk_mq_drain_queue() as a separate function doesn't buy us anything and it's gonna be further simplified. Let's flatten it into its caller. This patch doesn't make any functional change. Signed-off-by: Tejun Heo <> Cc: Jens Axboe <> Cc: Nicholas A. Bellinger <> Signed-off-by: Jens Axboe <>
2014-07-01blk-mq: decouble blk-mq freezing from generic bypassingTejun Heo
blk_mq freezing is entangled with generic bypassing which bypasses blkcg and io scheduler and lets IO requests fall through the block layer to the drivers in FIFO order. This allows forward progress on IOs with the advanced features disabled so that those features can be configured or altered without worrying about stalling IO which may lead to deadlock through memory allocation. However, generic bypassing doesn't quite fit blk-mq. blk-mq currently doesn't make use of blkcg or ioscheds and it maps bypssing to freezing, which blocks request processing and drains all the in-flight ones. This causes problems as bypassing assumes that request processing is online. blk-mq works around this by conditionally allowing request processing for the problem case - during queue initialization. Another weirdity is that except for during queue cleanup, bypassing started on the generic side prevents blk-mq from processing new requests but doesn't drain the in-flight ones. This shouldn't break anything but again highlights that something isn't quite right here. The root cause is conflating blk-mq freezing and generic bypassing which are two different mechanisms. The only intersecting purpose that they serve is during queue cleanup. Let's properly separate blk-mq freezing from generic bypassing and simply use it where necessary. * request_queue->mq_freeze_depth is added and blk_mq_[un]freeze_queue() now operate on this counter instead of ->bypass_depth. The replacement for QUEUE_FLAG_BYPASS isn't added but the counter is tested directly. This will be further updated by later changes. * blk_mq_drain_queue() is dropped and "__" prefix is dropped from blk_mq_freeze_queue(). Queue cleanup path now calls blk_mq_freeze_queue() directly. * blk_queue_enter()'s fast path condition is simplified to simply check @q->mq_freeze_depth. Previously, the condition was !blk_queue_dying(q) && (!blk_queue_bypass(q) || !blk_queue_init_done(q)) mq_freeze_depth is incremented right after dying is set and blk_queue_init_done() exception isn't necessary as blk-mq doesn't start frozen, which only leaves the blk_queue_bypass() test which can be replaced by @q->mq_freeze_depth test. This change simplifies the code and reduces confusion in the area. Signed-off-by: Tejun Heo <> Cc: Jens Axboe <> Cc: Nicholas A. Bellinger <> Signed-off-by: Jens Axboe <>
2014-07-01block, blk-mq: draining can't be skipped even if bypass_depth was non-zeroTejun Heo
Currently, both blk_queue_bypass_start() and blk_mq_freeze_queue() skip queue draining if bypass_depth was already above zero. The assumption is that the one which bumped the bypass_depth should have performed draining already; however, there's nothing which prevents a new instance of bypassing/freezing from starting before the previous one finishes draining. The current code may allow the later bypassing/freezing instances to complete while there still are in-flight requests which haven't finished draining. Fix it by draining regardless of bypass_depth. We still skip draining from blk_queue_bypass_start() while the queue is initializing to avoid introducing excessive delays during boot. INIT_DONE setting is moved above the initial blk_queue_bypass_end() so that bypassing attempts can't slip inbetween. Signed-off-by: Tejun Heo <> Cc: Jens Axboe <> Cc: Nicholas A. Bellinger <> Signed-off-by: Jens Axboe <>
2014-07-01blk-mq: fix a memory ordering bug in blk_mq_queue_enter()Tejun Heo
blk-mq uses a percpu_counter to keep track of how many usages are in flight. The percpu_counter is drained while freezing to ensure that no usage is left in-flight after freezing is complete. blk_mq_queue_enter/exit() and blk_mq_[un]freeze_queue() implement this per-cpu gating mechanism; unfortunately, it contains a subtle bug - smp_wmb() in blk_mq_queue_enter() doesn't prevent prevent the cpu from fetching @q->bypass_depth before incrementing @q->mq_usage_counter and if freezing happens inbetween the caller can slip through and freezing can be complete while there are active users. Use smp_mb() instead so that bypass_depth and mq_usage_counter modifications and tests are properly interlocked. Signed-off-by: Tejun Heo <> Cc: Jens Axboe <> Cc: Nicholas A. Bellinger <> Signed-off-by: Jens Axboe <>
2014-07-01Merge branch 'for-3.17' of ↵Jens Axboe
git:// into for-3.17/core Merge the percpu_ref changes from Tejun, he says they are stable now.
2014-06-29Linux 3.16-rc3v3.16-rc3Linus Torvalds
2014-06-29Merge branch 'fixes' of git:// Torvalds
Pull ARM fixes from Russell King: "Another round of ARM fixes. The largest change here is the L2 changes to work around problems for the Armada 37x/380 devices, where most of the size comes down to comments rather than code. The other significant fix here is for the ptrace code, to ensure that rewritten syscalls work as intended. This was pointed out by Kees Cook, but Will Deacon reworked the patch to be more elegant. The remainder are fairly trivial changes" * 'fixes' of git:// ARM: 8087/1: ptrace: reload syscall number after secure_computing() check ARM: 8086/1: Set memblock limit for nommu ARM: 8085/1: sa1100: collie: add top boot mtd partition ARM: 8084/1: sa1100: collie: revert back to cfi_probe ARM: 8080/1: mcpm.h: remove unused variable declaration ARM: 8076/1: mm: add support for HW coherent systems in PL310 cache
2014-06-29MAINTAINERS: exceptions for Documentation maintainerRandy Dunlap
Note that I don't maintain Documentation/ABI/, Documentation/devicetree/, or the language translation files. Signed-off-by: Randy Dunlap <> Signed-off-by: Linus Torvalds <>
2014-06-29Documentation: add section about git to email-clients.txtDan Carpenter
These days most people use git to send patches so I have added a section about that. Signed-off-by: Dan Carpenter <> Signed-off-by: Randy Dunlap <> Signed-off-by: Linus Torvalds <>
2014-06-29ARM: 8087/1: ptrace: reload syscall number after secure_computing() checkWill Deacon
On the syscall tracing path, we call out to secure_computing() to allow seccomp to check the syscall number being attempted. As part of this, a SIGTRAP may be sent to the tracer and the syscall could be re-written by a subsequent SET_SYSCALL ptrace request. Unfortunately, this new syscall is ignored by the current code unless TIF_SYSCALL_TRACE is also set on the current thread. This patch slightly reworks the enter path of the syscall tracing code so that we always reload the syscall number from current_thread_info()->syscall after the potential ptrace traps. Acked-by: Kees Cook <> Tested-by: Kees Cook <> Signed-off-by: Will Deacon <> Signed-off-by: Russell King <>
2014-06-29ARM: 8086/1: Set memblock limit for nommuLaura Abbott
Commit 1c2f87c (ARM: 8025/1: Get rid of meminfo) changed find_limits to use memblock_get_current_limit for calculating the max_low pfn. nommu targets never actually set a limit on memblock though which means memblock_get_current_limit will just return the default value. Set the memblock_limit to be the end of DDR to make sure bounds are calculated correctly. Signed-off-by: Laura Abbott <> Signed-off-by: Russell King <>
2014-06-29ARM: 8085/1: sa1100: collie: add top boot mtd partitionAndrea Adami
The CFI mapping is now perfect so we can expose the top block, read only. There isn't much to read, though, just the sharpsl_params values. Signed-off-by: Andrea Adami <> Signed-off-by: Russell King <>
2014-06-29ARM: 8084/1: sa1100: collie: revert back to cfi_probeAndrea Adami
Reverts commit d26b17edafc45187c30cae134a5e5429d58ad676 ARM: sa1100: collie.c: fall back to jedec_probe flash detection Unfortunately the detection was challenged on the defective unit used for tests: one of the NOR chips did not respond to the CFI query. Moreover that bad device needed extra delays on erase-suspend/resume cycles. Tested personally on 3 different units and with feedback of two other users. Signed-off-by: Andrea Adami <> Signed-off-by: Russell King <>
2014-06-29ARM: 8080/1: mcpm.h: remove unused variable declarationNicolas Pitre
The sync_phys variable has been replaced by link time computation in mcpm_head.S before the code was submitted upstream. Signed-off-by: Nicolas Pitre <> Signed-off-by: Russell King <>
2014-06-29ARM: 8076/1: mm: add support for HW coherent systems in PL310 cacheThomas Petazzoni
When a PL310 cache is used on a system that provides hardware coherency, the outer cache sync operation is useless, and can be skipped. Moreover, on some systems, it is harmful as it causes deadlocks between the Marvell coherency mechanism, the Marvell PCIe controller and the Cortex-A9. To avoid this, this commit introduces a new Device Tree property 'arm,io-coherent' for the L2 cache controller node, valid only for the PL310 cache. It identifies the usage of the PL310 cache in an I/O coherent configuration. Internally, it makes the driver disable the outer cache sync operation. Note that technically speaking, a fully coherent system wouldn't require any of the other .outer_cache operations. However, in practice, when booting secondary CPUs, these are not yet coherent, and therefore a set of cache maintenance operations are necessary at this point. This explains why we keep the other .outer_cache operations and only ->sync is disabled. While in theory any write to a PL310 register could cause the deadlock, in practice, disabling ->sync is sufficient to workaround the deadlock, since the other cache maintenance operations are only used in very specific situations. Contrary to previous versions of this patch, this new version does not simply NULL-ify the ->sync member, because the l2c_init_data structures are now 'const' and therefore cannot be modified, which is a good thing. Therefore, this patch introduces a separate l2c_init_data instance, called of_l2c310_coherent_data. Signed-off-by: Thomas Petazzoni <> Signed-off-by: Russell King <>
2014-06-28Merge tag 'spi-v3.16-rc2' of ↵Linus Torvalds
git:// Pull spi fixes from Mark Brown: "A few driver specific fixes, the biggest one being a fix for the newly added Qualcomm SPI controller driver to make it not use its internal chip select due to hardware bugs, replacing it with GPIOs" * tag 'spi-v3.16-rc2' of git:// spi: qup: Remove chip select function spi: qup: Fix order of spi_register_master spi: sh-sci: fix use-after-free in sh_sci_spi_remove() spi/pxa2xx: fix incorrect SW mode chipselect setting for BayTrail LPSS SPI
2014-06-28Merge tag 'regulator-v3.16-rc2' of ↵Linus Torvalds
git:// Pull regulator fixes from Mark Brown: "Several driver specific fixes here, the palmas fixes being especially important for a range of boards - the recent updates to support new devices have introduced several regressions" * tag 'regulator-v3.16-rc2' of git:// regulator: tps65218: Correct the the config register for LDO1 regulator: tps65218: Add the missing of_node assignment in probe regulator: palmas: fix typo in enable_reg calculation regulator: bcm590xx: fix vbus name regulator: palmas: Fix SMPS enable/disable/is_enabled
2014-06-28Merge git:// Torvalds
Pull SCSI target fixes from Nicholas Bellinger: "Mostly minor fixes this time around. The highlights include: - iscsi-target CHAP authentication fixes to enforce explicit key values (Tejas Vaykole + rahul.rane) - fix a long-standing OOPs in target-core when a alua configfs attribute is accessed after port symlink has been removed. (Sebastian Herbszt) - fix a v3.10.y iscsi-target regression causing the login reject status class/detail to be ignored (Christoph Vu-Brugier) - fix a v3.10.y iscsi-target regression to avoid rejecting an existing ITT during Data-Out when data-direction is wrong (Santosh Kulkarni + Arshad Hussain) - fix a iscsi-target related shutdown deadlock on UP kernels (Mikulas Patocka) - fix a v3.16-rc1 build issue with vhost-scsi + !CONFIG_NET (MST)" * git:// iscsi-target: fix iscsit_del_np deadlock on unload iovec: move memcpy_from/toiovecend to lib/iovec.c iscsi-target: Avoid rejecting incorrect ITT for Data-Out tcm_loop: Fix memory leak in tcm_loop_submission_work error path iscsi-target: Explicily clear login response PDU in exception path target: Fix left-over se_lun->lun_sep pointer OOPs iscsi-target; Enforce 1024 byte maximum for CHAP_C key value iscsi-target: Convert chap_server_compute_md5 to use kstrtoul
2014-06-28Merge remote-tracking branches 'spi/fix/pxa2xx', 'spi/fix/qup' and ↵Mark Brown
'spi/fix/sh-sci' into spi-linus
2014-06-28Merge remote-tracking branches 'regulator/fix/bcm590xx', ↵Mark Brown
'regulator/fix/palmas' and 'regulator/fix/tps65218' into regulator-linus
2014-06-28percpu-refcount: implement percpu_ref_reinit() and percpu_ref_is_zero()Tejun Heo
Now that explicit invocation of percpu_ref_exit() is necessary to free the percpu counter, we can implement percpu_ref_reinit() which reinitializes a released percpu_ref. This can be used implement scalable gating switch which can be drained and then re-opened without worrying about memory allocation failures. percpu_ref_is_zero() is added to be used in a sanity check in percpu_ref_exit(). As this function will be useful for other purposes too, make it a public interface. v2: Use smp_read_barrier_depends() instead of smp_load_acquire(). We only need data dep barrier and smp_load_acquire() is stronger and heavier on some archs. Spotted by Lai Jiangshan. Signed-off-by: Tejun Heo <> Cc: Kent Overstreet <> Cc: Christoph Lameter <> Cc: Lai Jiangshan <>
2014-06-28percpu-refcount: require percpu_ref to be exited explicitlyTejun Heo
Currently, a percpu_ref undoes percpu_ref_init() automatically by freeing the allocated percpu area when the percpu_ref is killed. While seemingly convenient, this has the following niggles. * It's impossible to re-init a released reference counter without going through re-allocation. * In the similar vein, it's impossible to initialize a percpu_ref count with static percpu variables. * We need and have an explicit destructor anyway for failure paths - percpu_ref_cancel_init(). This patch removes the automatic percpu counter freeing in percpu_ref_kill_rcu() and repurposes percpu_ref_cancel_init() into a generic destructor now named percpu_ref_exit(). percpu_ref_destroy() is considered but it gets confusing with percpu_ref_kill() while "exit" clearly indicates that it's the counterpart of percpu_ref_init(). All percpu_ref_cancel_init() users are updated to invoke percpu_ref_exit() instead and explicit percpu_ref_exit() calls are added to the destruction path of all percpu_ref users. Signed-off-by: Tejun Heo <> Acked-by: Benjamin LaHaise <> Cc: Kent Overstreet <> Cc: Christoph Lameter <> Cc: Benjamin LaHaise <> Cc: Nicholas A. Bellinger <> Cc: Li Zefan <>
2014-06-28percpu-refcount: use unsigned long for pcpu_count pointerTejun Heo
percpu_ref->pcpu_count is a percpu pointer with a status flag in its lowest bit. As such, it always goes through arithmetic operations which is very cumbersome to do on a pointer. It has to be first casted to unsigned long and then back. Let's just make the field unsigned long so that we can skip the first casts. While at it, rename it to pcpu_counter_ptr to clarify that it's a pointer value. Signed-off-by: Tejun Heo <> Cc: Kent Overstreet <> Cc: Christoph Lameter <>
2014-06-28percpu-refcount: add helpers for ->percpu_count accessesTejun Heo
* All four percpu_ref_*() operations implemented in the header file perform the same operation to determine whether the percpu_ref is alive and extract the percpu pointer. Factor out the common logic into __pcpu_ref_alive(). This doesn't change the generated code. * There are a couple places in percpu-refcount.c which masks out PCPU_REF_DEAD to obtain the percpu pointer. Factor it out into pcpu_count_ptr(). * The above changes make the WARN_ON_ONCE() conditional at the top of percpu_ref_kill_and_confirm() the only user of REF_STATUS(). Test PCPU_REF_DEAD directly and remove REF_STATUS(). This patch doesn't introduce any functional change. Signed-off-by: Tejun Heo <> Cc: Kent Overstreet <> Cc: Christoph Lameter <>
2014-06-28percpu-refcount: one bit is enough for REF_STATUSTejun Heo
percpu-refcount currently reserves two lowest bits of its percpu pointer to indicate its state; however, only one bit is used for PCPU_REF_DEAD. Simplify it by removing PCPU_STATUS_BITS/MASK and testing PCPU_REF_DEAD directly. This also allows the compiler to choose a more efficient instruction depending on the architecture. Signed-off-by: Tejun Heo <> Cc: Kent Overstreet <> Cc: Christoph Lameter <>
2014-06-28percpu-refcount, aio: use percpu_ref_cancel_init() in ioctx_alloc()Tejun Heo
ioctx_alloc() reaches inside percpu_ref and directly frees ->pcpu_count in its failure path, which is quite gross. percpu_ref has been providing a proper interface to do this, percpu_ref_cancel_init(), for quite some time now. Let's use that instead. This patch doesn't introduce any behavior changes. Signed-off-by: Tejun Heo <> Acked-by: Benjamin LaHaise <> Cc: Kent Overstreet <>
2014-06-27iscsi-target: fix iscsit_del_np deadlock on unloadMikulas Patocka
On uniprocessor preemptible kernel, target core deadlocks on unload. The following events happen: * iscsit_del_np is called * it calls send_sig(SIGINT, np->np_thread, 1); * the scheduler switches to the np_thread * the np_thread is woken up, it sees that kthread_should_stop() returns false, so it doesn't terminate * the np_thread clears signals with flush_signals(current); and goes back to sleep in iscsit_accept_np * the scheduler switches back to iscsit_del_np * iscsit_del_np calls kthread_stop(np->np_thread); * the np_thread is waiting in iscsit_accept_np and it doesn't respond to kthread_stop The deadlock could be resolved if the administrator sends SIGINT signal to the np_thread with killall -INT iscsi_np The reproducible deadlock was introduced in commit db6077fd0b7dd41dc6ff18329cec979379071f87, but the thread-stopping code was racy even before. This patch fixes the problem. Using kthread_should_stop to stop the np_thread is unreliable, so we test np_thread_state instead. If np_thread_state equals ISCSI_NP_THREAD_SHUTDOWN, the thread exits. Signed-off-by: Mikulas Patocka <> Cc: Signed-off-by: Nicholas Bellinger <>
2014-06-27Merge tag 'iommu-fixes-v3.16-rc1' of ↵Linus Torvalds
git:// Pull IOMMU fixes from Joerg Roedel: - fix VT-d regression with handling multiple RMRR entries per device - fix a small race that was left in the mmu_notifier handling in the AMD IOMMUv2 driver * tag 'iommu-fixes-v3.16-rc1' of git:// iommu/amd: Fix small race between invalidate_range_end/start iommu/vt-d: fix bug in handling multiple RMRRs for the same PCI device
2014-06-27Merge branch 'x86/urgent' of ↵Linus Torvalds
git:// Pull x86 fixes from Peter Anvin: "A pile of fixes related to the VDSO, EFI and 32-bit badsys handling. It turns out that removing the section headers from the VDSO breaks gdb, so this puts back most of them. A very simple typo broke rt_sigreturn on some versions of glibc, with obviously disastrous results. The rest is pretty much fixes for the corresponding fallout. The EFI fixes fixes an arithmetic overflow on 32-bit systems and quiets some build warnings. Finally, when invoking an invalid system call number on x86-32, we bypass a bunch of handling, which can make the audit code oops" * 'x86/urgent' of git:// efi-pstore: Fix an overflow on 32-bit builds x86/vdso: Error out in vdso2c if DT_RELA is present x86/vdso: Move DISABLE_BRANCH_PROFILING into the vdso makefile x86_32, signal: Fix vdso rt_sigreturn x86_32, entry: Do syscall exit work on badsys (CVE-2014-4508) x86/vdso: Create .build-id links for unstripped vdso files x86/vdso: Remove some redundant in-memory section headers x86/vdso: Improve the fake section headers x86/vdso2c: Use better macros for ELF bitness x86/vdso: Discard the __bug_table section efi: Fix compiler warnings (unused, const, type)
2014-06-27Merge branch 'upstream' of git:// Torvalds
Pull MIPS fixes from Ralf Baechle: "This is dominated by a large number of changes necessary for the MIPS BPF code. code. Aside of that there are - a fix for the MSC system controller support code. - a Turbochannel fix. - a recordmcount fix that's MIPS-specific. - barrier fixes to smp-cps / pm-cps after unrelated changes elsewhere in the kernel. - revert support for MSA registers in the signal frames. The reverted patch did modify the signal stack frame which of course is inacceptable. - fix math-emu build breakage with older compilers. - some related cleanup. - fix Lasat build error if CONFIG_CRC32 isn't set to y by the user" * 'upstream' of git:// (27 commits) MIPS: Lasat: Fix build error if CRC32 is not enabled. TC: Handle device_register() errors. MIPS: MSC: Prevent out-of-bounds writes to MIPS SC ioremap'd region MIPS: bpf: Fix stack space allocation for BPF memwords on MIPS64 MIPS: BPF: Use 32 or 64-bit load instruction to load an address to register MIPS: bpf: Fix PKT_TYPE case for big-endian cores MIPS: BPF: Prevent kernel fall over for >=32bit shifts MIPS: bpf: Drop update_on_xread and always initialize the X register MIPS: bpf: Fix is_range() semantics MIPS: bpf: Use pr_debug instead of pr_warn for unhandled opcodes MIPS: bpf: Fix return values for VLAN_TAG_PRESENT case MIPS: bpf: Use correct mask for VLAN_TAG case MIPS: bpf: Fix branch conditional for BPF_J{GT/GE} cases MIPS: bpf: Add SEEN_SKB to flags when looking for the PKT_TYPE MIPS: bpf: Use 'andi' instead of 'and' for the VLAN cases MIPS: bpf: Return error code if the offset is a negative number MIPS: bpf: Use the LO register to get division's quotient MIPS: mm: uasm: Fix lh micro-assembler instruction MIPS: uasm: Add SLT uasm instruction MIPS: uasm: Add s3s1s2 instruction builder ...
2014-06-27Merge tag 'arc-fixes-for-3.16' of ↵Linus Torvalds
git:// Pull ARC fixes from Vineet Gupta: "Some SMP changes, a ptrace request for NPTL debugging, bunch of build breakages/warnings" * tag 'arc-fixes-for-3.16' of git:// ARC: [SMP] Enable icache coherency ARC: [SMP] Fix IPI IRQ registration ARC: Implement ptrace(PTRACE_GET_THREAD_AREA) ARC: optimize kernel bss clearing in early boot code ARC: Fix build breakage for !CONFIG_ARC_DW2_UNWIND ARC: fix build warning in devtree ARC: remove checks for CONFIG_ARC_MMU_V4
2014-06-27Merge tag 'compress-3.16-rc3' of ↵Linus Torvalds
git:// Pull compress bugfix from Greg KH: "Here is another lz4 bugfix for 3.16-rc3 that resolves a reported issue with that compression algorithm" * tag 'compress-3.16-rc3' of git:// lz4: fix another possible overrun
2014-06-27Merge tag 'stable/for-linus-3.16-rc1-tag' of ↵Linus Torvalds
git:// Pull swiotlb bugfix from Konrad Rzeszutek Wilk: "One bug-fix that had been in tree for quite some time. We had assumed that the physical address zero was invalid and would fail it. But that is not true and on some architectures it is not reserved and valid. This fixes it" * tag 'stable/for-linus-3.16-rc1-tag' of git:// swiotlb: don't assume PA 0 is invalid
2014-06-27Merge tag 'sound-3.16-rc3' of ↵Linus Torvalds
git:// Pull sound fixes from Takashi Iwai: "Here includes a few patchset for fixing mostly HD-audio issues in addition to a patch assuring the compress API bytes alignment and a fix for the die-hard existing race condition at USB-audio disconnection. The volume looks big in Realtek HD-audio code, but it's just a translation of the fixup tables, and the actual changes are rather trivial" * tag 'sound-3.16-rc3' of git:// ALSA: hda - restore BCLK M/N values when resuming HSW/BDW display controller ALSA: usb-audio: Fix races at disconnection and PCM closing ALSA: hda - Adjust speaker HPF and add LED support for HP Spectre 13 ALSA: hda - Make the pin quirk tables use the SND_HDA_PIN_QUIRK macro ALSA: hda - Make a SND_HDA_PIN_QUIRK macro ALSA: hda - Add pin quirk for Dell XPS 15 ALSA: hda - hdmi: call overridden init on resume ALSA: hda - Fix usage of "model" module parameter ALSA: compress: fix the struct alignment to 4 bytes
2014-06-27Merge tag 'mfd-fixes-3.16' of ↵Linus Torvalds
git:// Pull MFD fixes from Lee Jones: "Couple of simple fixes due for the v3.16 -rcs" * tag 'mfd-fixes-3.16' of git:// mfd: ab8500: Fix dt irq mapping mfd: davinci: Voicecodec needs regmap_mmio mfd: STw481x: Allow modular build mfd: UCB1x00: Enable modular build
2014-06-27Merge branch 'drm-fixes' of git:// Torvalds
Pull drm fixes from Dave Airlie: "Exynos, i915 and msm fixes and one core fix. exynos: hdmi power off and mixer issues msm: iommu, build fixes, i915: regression races and warning fixes" * 'drm-fixes' of git:// (22 commits) drm/i915: vlv_prepare_pll is only needed in case of non DSI interfaces drm: fix NULL pointer access by wrong ioctl drm/exynos: enable vsync interrupt while waiting for vblank drm/exynos: soft reset mixer before reconfigure after power-on drm/exynos: allow multiple layer updates per vsync for mixer drm/i915: Hold the table lock whilst walking the file's idr and counting the objects in debugfs drm/i915: BDW: Adding Reserved PCI IDs. drm/i915: Only mark the ctx as initialised after a SET_CONTEXT operation drm/exynos: stop mixer before gating clocks during poweroff drm/exynos: set power state variable after enabling clocks and power drm/exynos: disable unused windows on apply drm/exynos: Fix de-registration ordering drm/exynos: change zero to NULL for sparse drm/exynos: dpi: Fix NULL pointer dereference with legacy bindings drm/exynos: hdmi: fix power order issue drm/i915: default to having backlight if VBT not available drm/i915: cache hw power well enabled state drm/msm: fix IOMMU cleanup for -EPROBE_DEFER drm/msm: use PAGE_ALIGNED instead of IS_ALIGNED(PAGE_SIZE) drm/msm/hdmi: set hdp clock rate before prepare_enable ...
2014-06-27iovec: move memcpy_from/toiovecend to lib/iovec.cMichael S. Tsirkin
ERROR: "memcpy_fromiovecend" [drivers/vhost/vhost_scsi.ko] undefined! commit 9f977ef7b671f6169eca78bf40f230fe84b7c7e5 vhost-scsi: Include prot_bytes into expected data transfer length in target-pending makes drivers/vhost/scsi.c call memcpy_fromiovecend(). This function is not available when CONFIG_NET is not enabled. socket.h already includes uio.h, so no callers need updating. Reported-by: Randy Dunlap <> Cc: Stephen Rothwell <> Cc: "David S. Miller" <> Signed-off-by: David S. Miller <> Signed-off-by: Michael S. Tsirkin <> Signed-off-by: Nicholas Bellinger <>
2014-06-27iscsi-target: Avoid rejecting incorrect ITT for Data-OutNicholas Bellinger
This patch changes iscsit_check_dataout_hdr() to dump the incoming Data-Out payload when the received ITT is not associated with a WRITE, instead of calling iscsit_reject_cmd() for the non WRITE ITT descriptor. This addresses a bug where an initiator sending an Data-Out for an ITT associated with a READ would end up generating a reject for the READ, eventually resulting in list corruption. Reported-by: Santosh Kulkarni <> Reported-by: Arshad Hussain <> Cc: # 3.10+ Signed-off-by: Nicholas Bellinger <>