Jens Axboe [Tue, 6 Dec 2016 03:49:19 +0000 (20:49 -0700)]
blk-mq: test patch to get legacy IO schedulers working
With this applied, a single queue blk-mq manage device can use
any of the legacy IO schedulers. This is exposed through Kconfig,
exposing MQ scheduler choices. Like with the legacy devices,
the IO scheduler can be runtime switched by echoing something else
into:
/sys/block/</dev>/queue/scheduler
Signed-off-by: Jens Axboe <axboe@fb.com>
Jens Axboe [Mon, 5 Dec 2016 18:15:02 +0000 (11:15 -0700)]
blk-mq: add BLK_MQ_F_NO_SCHED flag
Drivers can use this to prevent IO scheduling on a queue. Use this
for NVMe, for the admin queue, which doesn't handle file system
requests.
Signed-off-by: Jens Axboe <axboe@fb.com>
Jens Axboe [Mon, 5 Dec 2016 18:13:16 +0000 (11:13 -0700)]
block: use appropriate queue running functions
Use MQ variants for MQ, legacy ones for legacy.
Signed-off-by: Jens Axboe <axboe@fb.com>
Jens Axboe [Sat, 3 Dec 2016 02:52:14 +0000 (19:52 -0700)]
cfq-iosched: use appropriate run queue function
For MQ devices, we have to use other functions to run the queue.
No functional changes in this patch, just a prep patch for
support legacy schedulers on blk-mq.
Signed-off-by: Jens Axboe <axboe@fb.com>
Jens Axboe [Sat, 3 Dec 2016 02:50:41 +0000 (19:50 -0700)]
block: use legacy path for flush requests for MQ with a scheduler
No functional changes with this patch, it's just in preparation for
supporting legacy schedulers on blk-mq.
Signed-off-by: Jens Axboe <axboe@fb.com>
Jens Axboe [Tue, 6 Dec 2016 15:28:15 +0000 (08:28 -0700)]
Merge branch 'master' into blk-mq-legacy-sched
Signed-off-by: Jens Axboe <axboe@fb.com>
Jens Axboe [Tue, 6 Dec 2016 15:06:19 +0000 (08:06 -0700)]
Merge branch 'nvmf-4.10' of git://git.infradead.org/nvme-fabrics into for-4.10/block
Sagi writes:
The major addition here is the nvme FC transport implementation
from James.
What else:
- some cleanups and memory leak fixes in the host side fabrics code from Bart
- possible rcu violation fix from Sasha
- logging change from Max
- small include cleanup
James Smart [Fri, 2 Dec 2016 08:28:44 +0000 (00:28 -0800)]
nvme-fabrics: Add FC LLDD loopback driver to test FC-NVME
Add FC LLDD loopback driver to test FC host and target transport within
nvme-fabrics
To aid in the development and testing of the lower-level api of the FC
transport, this loopback driver has been created to act as if it were a
FC hba driver supporting both the host interfaces as well as the target
interfaces with the nvme FC transport.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
James Smart [Fri, 2 Dec 2016 08:28:43 +0000 (00:28 -0800)]
nvme-fabrics: Add target support for FC transport
Implements the FC-NVME T11 definition of how nvme fabric capsules are
performed on an FC fabric. Utilizes a lower-layer API to FC host adapters
to send/receive FC-4 LS operations and perform the FCP transactions
necessary to perform and FCP IO request for NVME.
The T11 definitions for FC-4 Link Services are implemented which create
NVMeOF connections. Implements the hooks with nvmet layer to pass NVME
commands to it for processing and posting of data/response base to the
host via the different connections.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
James Smart [Fri, 2 Dec 2016 08:28:42 +0000 (00:28 -0800)]
nvme-fabrics: Add host support for FC transport
Implements the FC-NVME T11 definition of how nvme fabric capsules are
performed on an FC fabric. Utilizes a lower-layer API to FC host adapters
to send/receive FC-4 LS operations and FCP operations that comprise NVME
over FC operation.
The T11 definitions for FC-4 Link Services are implemented which create
NVMeOF connections. Implements the hooks with blk-mq to then submit admin
and io requests to the different connections.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
James Smart [Fri, 2 Dec 2016 08:28:41 +0000 (00:28 -0800)]
nvme-fabrics: Add FC transport LLDD api definitions
Host:
- LLDD registration with the host transport
- registering host ports (local ports) and target ports seen on
fabric (remote ports)
- Data structures and call points for FC-4 LS's and FCP IO requests
Target:
- LLDD registration with the target transport
- registering nvme subsystem ports (target ports)
- Data structures and call points for reception of FC-4 LS's and
FCP IO requests, and callbacks to perform data and rsp transfers
for the io.
Add to MAINTAINERS file
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
James Smart [Fri, 2 Dec 2016 08:28:40 +0000 (00:28 -0800)]
nvme-fabrics: Add FC transport FC-NVME definitions
- Formats for Cmd, Data, Rsp IUs
- Formats FC-4 LS definitions
- Add to MAINTAINERS file
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
James Smart [Fri, 2 Dec 2016 08:28:39 +0000 (00:28 -0800)]
nvme-fabrics: Add FC transport error codes to nvme.h
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
James Smart [Fri, 2 Dec 2016 08:28:38 +0000 (00:28 -0800)]
Add type 0x28 NVME type code to scsi fc headers
Signed-off-by: James Smart <james.smart@broadcom.com>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
James Smart [Fri, 21 Oct 2016 20:32:51 +0000 (23:32 +0300)]
nvme-fabrics: patch target code in prep for FC transport support
- Add FC transport type decoding
- Add FC address family decoding
Signed-off-by: James Smart <james.smart@broadcom.com>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
James Smart [Fri, 21 Oct 2016 20:33:34 +0000 (23:33 +0300)]
nvme-fabrics: set sqe.command_id in core not transports
Currently, core.c sets command_id only on rd/wr commands, leaving it to
the transport to set it again to ensure the request had a command id.
Move location of set in core so applies to all commands.
Remove transport sets.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
James Smart [Fri, 21 Oct 2016 20:51:54 +0000 (23:51 +0300)]
parser: add u64 number parser
Will be used by the nvme-fabrics FC transport in parsing options
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Max Gurtovoy [Wed, 23 Nov 2016 09:38:48 +0000 (11:38 +0200)]
nvme-rdma: align to generic ib_event logging helper
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Max Gurtovoy [Wed, 23 Nov 2016 09:38:47 +0000 (11:38 +0200)]
nvmet-rdma: align to generic ib_event logging helper
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Sagi Grimberg [Tue, 1 Nov 2016 08:41:00 +0000 (10:41 +0200)]
nvme-rdma: remove redundant define
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Bart Van Assche [Tue, 18 Oct 2016 20:10:24 +0000 (13:10 -0700)]
nvme-fabrics: Adjust source code indentation
Adjust indentation such that arguments are aligned.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Bart Van Assche [Tue, 18 Oct 2016 20:10:03 +0000 (13:10 -0700)]
nvme/scsi: Remove set-but-not-used variables
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Solganik Alexander [Sun, 30 Oct 2016 08:35:15 +0000 (10:35 +0200)]
nvmet: Fix possible infinite loop triggered on hot namespace removal
When removing a namespace we delete it from the subsystem namespaces
list with list_del_init which allows us to know if it is enabled or
not.
The problem is that list_del_init initialize the list next and does
not respect the RCU list-traversal we do on the IO path for locating
a namespace. Instead we need to use list_del_rcu which is allowed to
run concurrently with the _rcu list-traversal primitives (keeps list
next intact) and guarantees concurrent nvmet_find_naespace forward
progress.
By changing that, we cannot rely on ns->dev_link for knowing if the
namspace is enabled, so add enabled indicator entry to nvmet_ns for
that.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Solganik Alexander <sashas@lightbitslabs.com>
Cc: <stable@vger.kernel.org> # v4.8+
Bart Van Assche [Tue, 18 Oct 2016 20:11:03 +0000 (13:11 -0700)]
nvme-fabrics: Fix a memory leak in an nvmf_create_ctrl() error path
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Bart Van Assche [Tue, 18 Oct 2016 20:10:46 +0000 (13:10 -0700)]
nvme-fabrics: Fix memory leaks in nvmf_parse_options()
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Samuel Jones [Tue, 25 Oct 2016 07:22:34 +0000 (09:22 +0200)]
nvme-rdma: force queue size to respect controller capability
Queue size needs to respect the Maximum Queue Entries Supported advertised by
the controller in its Capability register.
Signed-off-by: Samuel Jones <sjones@kalray.eu>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[sagig: fixed queue_size adjustment according to
Daniel Verkamp <daniel.verkamp@intel.com> comment]
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Bart Van Assche [Tue, 18 Oct 2016 19:59:47 +0000 (12:59 -0700)]
nvmet-rdma: Fix REJ status code
nvmet_sq_init() returns a value <= 0. nvmet_rdma_cm_reject() expects
a second argument that is a NVME_RDMA_CM_* constant. Hence this patch.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Sagi Grimberg <sagi@grimbeg.me>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Jens Axboe [Sat, 3 Dec 2016 03:00:14 +0000 (20:00 -0700)]
blk-mq: blk_account_io_start() takes a bool
Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Linus Torvalds [Mon, 5 Dec 2016 18:30:12 +0000 (10:30 -0800)]
Merge tag 'powerpc-4.9-7' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Four fixes, the first for code we merged this cycle and three that are
also going to stable:
- On 64-bit Book3E we were not placing the .text section where we
said we would in the asm.
- We broke building the boot wrapper on some 32-bit toolchains.
- Lazy icache flushing was broken on pre-POWER5 machines.
- One of the error paths in our EEH code would lead to a deadlock.
Thanks to: Andrew Donnellan, Ben Hutchings, Benjamin Herrenschmidt,
Nicholas Piggin"
* tag 'powerpc-4.9-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64: Fix placement of .text to be immediately following .head.text
powerpc/eeh: Fix deadlock when PE frozen state can't be cleared
powerpc/mm: Fix lazy icache flush on pre-POWER5
powerpc/boot: Fix build failure in 32-bit boot wrapper
Linus Torvalds [Mon, 5 Dec 2016 17:16:10 +0000 (09:16 -0800)]
Merge branch 'linus' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes the following issues:
- Intermittent build failure in RSA
- Memory corruption in chelsio crypto driver
- Regression in DRBG due to vmalloced stack"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: rsa - Add Makefile dependencies to fix parallel builds
crypto: chcr - Fix memory corruption
crypto: drbg - prevent invalid SG mappings
Nicolai Stange [Sun, 4 Dec 2016 13:56:39 +0000 (14:56 +0100)]
block: fix unintended fallthrough in generic_make_request_checks()
Since commit
e73c23ff736e ("block: add async variant of
blkdev_issue_zeroout") messages like the following show up:
EXT4-fs (dm-1): Delayed block allocation failed for inode
2368848 at
logical offset 0 with max blocks 1 with error 95
EXT4-fs (dm-1): This should not happen!! Data will be lost
Due to the following fallthrough introduced with
commit
2d253440b5af ("block: Define zoned block device operations"),
generic_make_request_checks() would accept a REQ_OP_WRITE_SAME bio only
if the block device supports "write same" *and* is a zoned one:
switch (bio_op(bio)) {
[...]
case REQ_OP_WRITE_SAME:
if (!bdev_write_same(bio->bi_bdev))
goto not_supported;
case REQ_OP_ZONE_REPORT:
case REQ_OP_ZONE_RESET:
if (!bdev_is_zoned(bio->bi_bdev))
goto not_supported;
break;
[...]
}
Thus, although the bio setup as done by __blkdev_issue_write_same() from
commit
e73c23ff736e ("block: add async variant of blkdev_issue_zeroout")
would succeed, its actual submission would not, resulting in the
EOPNOTSUPP == 95.
Fix this by removing the fallthrough which, due to the lack of an explicit
comment, seems to be unintended anyway.
Fixes:
e73c23ff736e ("block: add async variant of blkdev_issue_zeroout")
Fixes:
2d253440b5af ("block: Define zoned block device operations")
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Linus Torvalds [Sun, 4 Dec 2016 20:50:51 +0000 (12:50 -0800)]
Linux 4.9-rc8
Linus Torvalds [Sun, 4 Dec 2016 00:40:21 +0000 (16:40 -0800)]
Merge tag 'drm-fixes-for-v4.9-rc8' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"A pretty small pull request: a couple of AMD powerxpress regression
fixes and a power management fix, a couple of i915 fixes and one hdlcd
fix, along with one core don't oops because of incorrect API usage fix"
* tag 'drm-fixes-for-v4.9-rc8' of git://people.freedesktop.org/~airlied/linux:
drm/i915: drop the struct_mutex when wedged or trying to reset
drm/i915: Don't touch NULL sg on i915_gem_object_get_pages_gtt() error
drm: Don't call drm_for_each_crtc with a non-KMS driver
drm/radeon: fix check for port PM availability
drm/amdgpu: fix check for port PM availability
drm/amd/powerplay: initialize the soft_regs offset in struct smu7_hwmgr
drm: hdlcd: Fix cleanup order
Dave Airlie [Sat, 3 Dec 2016 20:31:26 +0000 (06:31 +1000)]
Merge tag 'drm-intel-fixes-2016-12-01' of git://anongit.freedesktop.org/git/drm-intel into drm-fixes
2 intel fixes.
* tag 'drm-intel-fixes-2016-12-01' of git://anongit.freedesktop.org/git/drm-intel:
drm/i915: drop the struct_mutex when wedged or trying to reset
drm/i915: Don't touch NULL sg on i915_gem_object_get_pages_gtt() error
Jens Axboe [Sat, 3 Dec 2016 19:08:03 +0000 (12:08 -0700)]
nbd: fix 64-bit division
We have this:
ERROR: "__aeabi_ldivmod" [drivers/block/nbd.ko] undefined!
ERROR: "__divdi3" [drivers/block/nbd.ko] undefined!
nbd.c:(.text+0x247c72): undefined reference to `__divdi3'
due to a recent commit, that did 64-bit division. Use the proper
divider function so that 32-bit compiles don't break.
Fixes:
ef77b515243b ("nbd: use loff_t for blocksize and nbd_set_size args")
Signed-off-by: Jens Axboe <axboe@fb.com>
Josef Bacik [Fri, 2 Dec 2016 21:19:12 +0000 (16:19 -0500)]
nbd: use loff_t for blocksize and nbd_set_size args
If we have large devices (say like the 40t drive I was trying to test with) we
will end up overflowing the int arguments to nbd_set_size and not get the right
size for our device. Fix this by using loff_t everywhere so I don't have to
think about this again. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Shaohua Li [Sat, 3 Dec 2016 01:13:09 +0000 (17:13 -0800)]
blk-stat: fix a typo
Signed-off-by: Shaohua Li <shli@fb.com>
Fixes:
cf43e6be865a ("block: add scalable completion tracking of requests")
Signed-off-by: Jens Axboe <axboe@fb.com>
Linus Torvalds [Sat, 3 Dec 2016 02:48:11 +0000 (18:48 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge more fixes from Andrew Morton:
"2 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm, vmscan: add cond_resched() into shrink_node_memcg()
mm: workingset: fix NULL ptr in count_shadow_nodes
Michal Hocko [Sat, 3 Dec 2016 01:26:48 +0000 (17:26 -0800)]
mm, vmscan: add cond_resched() into shrink_node_memcg()
Boris Zhmurov has reported RCU stalls during the kswapd reclaim:
INFO: rcu_sched detected stalls on CPUs/tasks:
23-...: (22 ticks this GP) idle=92f/
140000000000000/0 softirq=
2638404/
2638404 fqs=23
(detected by 4, t=6389 jiffies, g=786259, c=786258, q=42115)
Task dump for CPU 23:
kswapd1 R running task 0 148 2 0x00000008
Call Trace:
shrink_node+0xd2/0x2f0
kswapd+0x2cb/0x6a0
mem_cgroup_shrink_node+0x160/0x160
kthread+0xbd/0xe0
__switch_to+0x1fa/0x5c0
ret_from_fork+0x1f/0x40
kthread_create_on_node+0x180/0x180
a closer code inspection has shown that we might indeed miss all the
scheduling points in the reclaim path if no pages can be isolated from
the LRU list. This is a pathological case but other reports from Donald
Buczek have shown that we might indeed hit such a path:
clusterd-989 [009] .... 118023.654491: mm_vmscan_direct_reclaim_end: nr_reclaimed=193
kswapd1-86 [001] dN.. 118023.987475: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=
4239830 nr_taken=0 file=1
kswapd1-86 [001] dN.. 118024.320968: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=
4239844 nr_taken=0 file=1
kswapd1-86 [001] dN.. 118024.654375: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=
4239858 nr_taken=0 file=1
kswapd1-86 [001] dN.. 118024.987036: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=
4239872 nr_taken=0 file=1
kswapd1-86 [001] dN.. 118025.319651: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=
4239886 nr_taken=0 file=1
kswapd1-86 [001] dN.. 118025.652248: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=
4239900 nr_taken=0 file=1
kswapd1-86 [001] dN.. 118025.984870: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=
4239914 nr_taken=0 file=1
[...]
kswapd1-86 [001] dN.. 118084.274403: mm_vmscan_lru_isolate: isolate_mode=0 classzone=0 order=0 nr_requested=32 nr_scanned=
4241133 nr_taken=0 file=1
this is minute long snapshot which didn't take a single page from the
LRU. It is not entirely clear why only 1303 pages have been scanned
during that time (maybe there was a heavy IRQ activity interfering).
In any case it looks like we can really hit long periods without
scheduling on non preemptive kernels so an explicit cond_resched() in
shrink_node_memcg which is independent on the reclaim operation is due.
Link: http://lkml.kernel.org/r/20161202095841.16648-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Boris Zhmurov <bb@kernelpanic.ru>
Tested-by: Boris Zhmurov <bb@kernelpanic.ru>
Reported-by: Donald Buczek <buczek@molgen.mpg.de>
Reported-by: "Christopher S. Aker" <caker@theshore.net>
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Sat, 3 Dec 2016 01:26:45 +0000 (17:26 -0800)]
mm: workingset: fix NULL ptr in count_shadow_nodes
Commit
0a6b76dd23fa ("mm: workingset: make shadow node shrinker memcg
aware") has made the workingset shadow nodes shrinker memcg aware. The
implementation is not correct though because memcg_kmem_enabled() might
become true while we are doing a global reclaim when the sc->memcg might
be NULL which is exactly what Marek has seen:
BUG: unable to handle kernel NULL pointer dereference at
0000000000000400
IP: [<
ffffffff8122d520>] mem_cgroup_node_nr_lru_pages+0x20/0x40
PGD 0
Oops: 0000 [#1] SMP
CPU: 0 PID: 60 Comm: kswapd0 Tainted: G O 4.8.10-12.pvops.qubes.x86_64 #1
task:
ffff880011863b00 task.stack:
ffff880011868000
RIP: mem_cgroup_node_nr_lru_pages+0x20/0x40
RSP: e02b:
ffff88001186bc70 EFLAGS:
00010293
RAX:
0000000000000000 RBX:
ffff88001186bd20 RCX:
0000000000000002
RDX:
000000000000000c RSI:
0000000000000000 RDI:
0000000000000000
RBP:
ffff88001186bc70 R08:
28f5c28f5c28f5c3 R09:
0000000000000000
R10:
0000000000006c34 R11:
0000000000000333 R12:
00000000000001f6
R13:
ffffffff81c6f6a0 R14:
0000000000000000 R15:
0000000000000000
FS:
0000000000000000(0000) GS:
ffff880013c00000(0000) knlGS:
ffff880013d00000
CS: e033 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000000000000400 CR3:
00000000122f2000 CR4:
0000000000042660
Call Trace:
count_shadow_nodes+0x9a/0xa0
shrink_slab.part.42+0x119/0x3e0
shrink_node+0x22c/0x320
kswapd+0x32c/0x700
kthread+0xd8/0xf0
ret_from_fork+0x1f/0x40
Code: 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 3b 35 dd eb b1 00 55 48 89 e5 73 2c 89 d2 31 c9 31 c0 4c 63 ce 48 0f a3 ca 73 13 <4a> 8b b4 cf 00 04 00 00 41 89 c8 4a 03 84 c6 80 00 00 00 83 c1
RIP mem_cgroup_node_nr_lru_pages+0x20/0x40
RSP <
ffff88001186bc70>
CR2:
0000000000000400
---[ end trace
100494b9edbdfc4d ]---
This patch fixes the issue by checking sc->memcg rather than
memcg_kmem_enabled() which is sufficient because shrink_slab makes sure
that only memcg aware shrinkers will get non-NULL memcgs and only if
memcg_kmem_enabled is true.
Fixes:
0a6b76dd23fa ("mm: workingset: make shadow node shrinker memcg aware")
Link: http://lkml.kernel.org/r/20161201132156.21450-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Marek Marczykowski-Górecki <marmarek@mimuw.edu.pl>
Tested-by: Marek Marczykowski-Górecki <marmarek@mimuw.edu.pl>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Cc: <stable@vger.kernel.org> [4.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nicolas Pitre [Fri, 2 Dec 2016 20:11:50 +0000 (15:11 -0500)]
kbuild: fix building bzImage with CONFIG_TRIM_UNUSED_KSYMS enabled
When building a specific target such as bzImage, modules aren't normally
built. However if CONFIG_TRIM_UNUSED_KSYMS is enabled, no built modules
means none of the exported symbols are used and therefore they will all
be trimmed away from the final kernel. A subsequent "make modules" will
fail because modpost cannot find the needed symbols for those modules in
the kernel binary.
Let's make sure modules are also built whenever CONFIG_TRIM_UNUSED_KSYMS
is enabled and that the kernel binary is properly rebuilt accordingly.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 2 Dec 2016 21:34:37 +0000 (13:34 -0800)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"This should be the last set of bugfixes for arm-soc in v4.9. None of
these are critical regressions, but it would be nice to still get them
merged.
- On the Juno platform, the idle latency was described wrong, leading
to suboptimal cpuidle tuning.
- Also on the same platform, PCI I/O space was set up incorrectly and
could not work.
- On the sti platform, a syntactically incorrect DT entry caused
warnings.
- The newly added 'gr8' platform has somewhat confusing file names,
which we rename for consistency"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
arm64: dts: juno: fix cluster sleep state entry latency on all SoC versions
arm64: dts: juno: Correct PCI IO window
ARM: dts: STiH407-family: fix i2c nodes
ARM: gr8: Rename the DTSI and relevant DTS
Linus Torvalds [Fri, 2 Dec 2016 19:45:27 +0000 (11:45 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Lots more phydev and probe error path leaks in various drivers by
Johan Hovold.
2) Fix race in packet_set_ring(), from Philip Pettersson.
3) Use after free in dccp_invalid_packet(), from Eric Dumazet.
4) Signnedness overflow in SO_{SND,RCV}BUFFORCE, also from Eric
Dumazet.
5) When tunneling between ipv4 and ipv6 we can be left with the wrong
skb->protocol value as we enter the IPSEC engine and this causes all
kinds of problems. Set it before the output path does any
dst_output() calls, from Eli Cooper.
6) bcmgenet uses wrong device struct pointer in DMA API calls, fix from
Florian Fainelli.
7) Various netfilter nat bug fixes from FLorian Westphal.
8) Fix memory leak in ipvlan_link_new(), from Gao Feng.
9) Locking fixes, particularly wrt. socket lookups, in l2tp from
Guillaume Nault.
10) Avoid invoking rhash teardowns in atomic context by moving netlink
cb->done() dump completion from a worker thread. Fix from Herbert
Xu.
11) Buffer refcount problems in tun and macvtap on errors, from Jason
Wang.
12) We don't set Kconfig symbol DEFAULT_TCP_CONG properly when the user
selects BBR. Fix from Julian Wollrath.
13) Fix deadlock in transmit path on altera TSE driver, from Lino
Sanfilippo.
14) Fix unbalanced reference counting in dsa_switch_tree, from Nikita
Yushchenko.
15) tc_tunnel_key needs to be properly exported to userspace via uapi,
fix from Roi Dayan.
16) rds_tcp_init_net() doesn't unregister notifier in error path, fix
from Sowmini Varadhan.
17) Stale packet header pointer access after pskb_expand_head() in
genenve driver, fix from Sabrina Dubroca.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (103 commits)
net: avoid signed overflows for SO_{SND|RCV}BUFFORCE
geneve: avoid use-after-free of skb->data
tipc: check minimum bearer MTU
net: renesas: ravb: unintialized return value
sh_eth: remove unchecked interrupts for RZ/A1
net: bcmgenet: Utilize correct struct device for all DMA operations
NET: usb: qmi_wwan: add support for Telit LE922A PID 0x1040
cdc_ether: Fix handling connection notification
ip6_offload: check segs for NULL in ipv6_gso_segment.
RDS: TCP: unregister_netdevice_notifier() in error path of rds_tcp_init_net
Revert: "ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()"
ipv6: Set skb->protocol properly for local output
ipv4: Set skb->protocol properly for local output
packet: fix race condition in packet_set_ring
net: ethernet: altera: TSE: do not use tx queue lock in tx completion handler
net: ethernet: altera: TSE: Remove unneeded dma sync for tx buffers
net: ethernet: stmmac: fix of-node and fixed-link-phydev leaks
net: ethernet: stmmac: platform: fix outdated function header
net: ethernet: stmmac: dwmac-meson8b: fix probe error path
net: ethernet: stmmac: dwmac-generic: fix probe error path
...
Eric Dumazet [Fri, 2 Dec 2016 17:44:53 +0000 (09:44 -0800)]
net: avoid signed overflows for SO_{SND|RCV}BUFFORCE
CAP_NET_ADMIN users should not be allowed to set negative
sk_sndbuf or sk_rcvbuf values, as it can lead to various memory
corruptions, crashes, OOM...
Note that before commit
82981930125a ("net: cleanups in
sock_setsockopt()"), the bug was even more serious, since SO_SNDBUF
and SO_RCVBUF were vulnerable.
This needs to be backported to all known linux kernels.
Again, many thanks to syzkaller team for discovering this gem.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sabrina Dubroca [Fri, 2 Dec 2016 15:49:29 +0000 (16:49 +0100)]
geneve: avoid use-after-free of skb->data
geneve{,6}_build_skb can end up doing a pskb_expand_head(), which
makes the ip_hdr(skb) reference we stashed earlier stale. Since it's
only needed as an argument to ip_tunnel_ecn_encap(), move this
directly in the function call.
Fixes:
08399efc6319 ("geneve: ensure ECN info is handled properly in all tx/rx paths")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Kubeček [Fri, 2 Dec 2016 08:33:41 +0000 (09:33 +0100)]
tipc: check minimum bearer MTU
Qian Zhang (张谦) reported a potential socket buffer overflow in
tipc_msg_build() which is also known as CVE-2016-8632: due to
insufficient checks, a buffer overflow can occur if MTU is too short for
even tipc headers. As anyone can set device MTU in a user/net namespace,
this issue can be abused by a regular user.
As agreed in the discussion on Ben Hutchings' original patch, we should
check the MTU at the moment a bearer is attached rather than for each
processed packet. We also need to repeat the check when bearer MTU is
adjusted to new device MTU. UDP case also needs a check to avoid
overflow when calculating bearer MTU.
Fixes:
b97bf3fd8f6a ("[TIPC] Initial merge")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reported-by: Qian Zhang (张谦) <zhangqian-c@360.cn>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 2 Dec 2016 19:02:13 +0000 (14:02 -0500)]
Merge tag 'linux-can-fixes-for-4.9-
20161201' of git://git./linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2016-12-02
this is a pull request for net/master.
There are two patches by Stephane Grosjean, who adds support for the new
PCAN-USB X6 USB interface to the pcan_usb driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Thu, 1 Dec 2016 20:57:44 +0000 (23:57 +0300)]
net: renesas: ravb: unintialized return value
We want to set the other "err" variable here so that we can return it
later. My version of GCC misses this issue but I caught it with a
static checker.
Fixes:
9f70eb339f52 ("net: ethernet: renesas: ravb: fix fixed-link phydev leaks")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chris Brandt [Thu, 1 Dec 2016 18:32:14 +0000 (13:32 -0500)]
sh_eth: remove unchecked interrupts for RZ/A1
When streaming a lot of data and the RZ/A1 can't keep up, some status bits
will get set that are not being checked or cleared which cause the
following messages and the Ethernet driver to stop working. This
patch fixes that issue.
irq 21: nobody cared (try booting with the "irqpoll" option)
handlers:
[<
c036b71c>] sh_eth_interrupt
Disabling IRQ #21
Fixes:
db893473d313a4ad ("sh_eth: Add support for r7s72100")
Signed-off-by: Chris Brandt <chris.brandt@renesas.com>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 1 Dec 2016 17:45:45 +0000 (09:45 -0800)]
net: bcmgenet: Utilize correct struct device for all DMA operations
__bcmgenet_tx_reclaim() and bcmgenet_free_rx_buffers() are not using the
same struct device during unmap that was used for the map operation,
which makes DMA-API debugging warn about it. Fix this by always using
&priv->pdev->dev throughout the driver, using an identical device
reference for all map/unmap calls.
Fixes:
1c1008c793fa ("net: bcmgenet: add main driver file")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 2 Dec 2016 18:48:50 +0000 (10:48 -0800)]
Fix up a couple of field names in the CREDITS file
Ozgur Karatas reported that the very first entry in the CREDITS file had
the wrong tag for name (M: instead of N: - it happened when moving the
entry from the MAINTAINERS file, where 'M:' stands for "Maintainer").
And when I went looking, I found a couple of other cases of wrong
tagging too.
Reported-by: Ozgur Karatas <mueddib@yandex.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniele Palmas [Thu, 1 Dec 2016 15:52:05 +0000 (16:52 +0100)]
NET: usb: qmi_wwan: add support for Telit LE922A PID 0x1040
This patch adds support for PID 0x1040 of Telit LE922A.
The qmi adapter requires to have DTR set for proper working,
so QMI_WWAN_QUIRK_DTR has been enabled.
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kristian Evensen [Thu, 1 Dec 2016 13:23:17 +0000 (14:23 +0100)]
cdc_ether: Fix handling connection notification
Commit
bfe9b9d2df66 ("cdc_ether: Improve ZTE MF823/831/910 handling")
introduced a work-around in usbnet_cdc_status() for devices that exported
cdc carrier on twice on connect. Before the commit, this behavior caused
the link state to be incorrect. It was assumed that all CDC Ethernet
devices would either export this behavior, or send one off and then one on
notification (which seems to be the default behavior).
Unfortunately, it turns out multiple devices sends a connection
notification multiple times per second (via an interrupt), even when
connection state does not change. This has been observed with several
different USB LAN dongles (at least), for example 13b1:0041 (Linksys).
After
bfe9b9d2df66, the link state has been set as down and then up for
each notification. This has caused a flood of Netlink NEWLINK messages and
syslog to be flooded with messages similar to:
cdc_ether 2-1:2.0 eth1: kevent 12 may have been dropped
This commit fixes the behavior by reverting usbnet_cdc_status() to how it
was before
bfe9b9d2df66. The work-around has been moved to a separate
status-function which is only called when a known, affect device is
detected.
v1->v2:
* Do not open-code netif_carrier_ok() (thanks Henning Schild).
* Call netif_carrier_off() instead of usb_link_change(). This prevents
calling schedule_work() twice without giving the work queue a chance to be
processed (thanks Bjørn Mork).
Fixes:
bfe9b9d2df66 ("cdc_ether: Improve ZTE MF823/831/910 handling")
Reported-by: Henning Schild <henning.schild@siemens.com>
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Artem Savkov [Thu, 1 Dec 2016 13:06:04 +0000 (14:06 +0100)]
ip6_offload: check segs for NULL in ipv6_gso_segment.
segs needs to be checked for being NULL in ipv6_gso_segment() before calling
skb_shinfo(segs), otherwise kernel can run into a NULL-pointer dereference:
[ 97.811262] BUG: unable to handle kernel NULL pointer dereference at
00000000000000cc
[ 97.819112] IP: [<
ffffffff816e52f9>] ipv6_gso_segment+0x119/0x2f0
[ 97.825214] PGD 0 [ 97.827047]
[ 97.828540] Oops: 0000 [#1] SMP
[ 97.831678] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 rpcsec_gss_krb5
nfsv4 dns_resolver nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4
iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter
bridge stp llc snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel
snd_hda_codec edac_mce_amd snd_hda_core edac_core snd_hwdep kvm_amd snd_seq kvm snd_seq_device
snd_pcm irqbypass snd_timer ppdev parport_serial snd parport_pc k10temp pcspkr soundcore parport
sp5100_tco shpchp sg wmi i2c_piix4 acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc
ip_tables xfs libcrc32c sr_mod cdrom sd_mod ata_generic pata_acpi amdkfd amd_iommu_v2 radeon
broadcom bcm_phy_lib i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
ttm ahci serio_raw tg3 firewire_ohci libahci pata_atiixp drm ptp libata firewire_core pps_core
i2c_core crc_itu_t fjes dm_mirror dm_region_hash dm_log dm_mod
[ 97.927721] CPU: 1 PID: 3504 Comm: vhost-3495 Not tainted 4.9.0-7.el7.test.x86_64 #1
[ 97.935457] Hardware name: AMD Snook/Snook, BIOS ESK0726A 07/26/2010
[ 97.941806] task:
ffff880129a1c080 task.stack:
ffffc90001bcc000
[ 97.947720] RIP: 0010:[<
ffffffff816e52f9>] [<
ffffffff816e52f9>] ipv6_gso_segment+0x119/0x2f0
[ 97.956251] RSP: 0018:
ffff88012fc43a10 EFLAGS:
00010207
[ 97.961557] RAX:
0000000000000000 RBX:
ffff8801292c8700 RCX:
0000000000000594
[ 97.968687] RDX:
0000000000000593 RSI:
ffff880129a846c0 RDI:
0000000000240000
[ 97.975814] RBP:
ffff88012fc43a68 R08:
ffff880129a8404e R09:
0000000000000000
[ 97.982942] R10:
0000000000000000 R11:
ffff880129a84076 R12:
00000020002949b3
[ 97.990070] R13:
ffff88012a580000 R14:
0000000000000000 R15:
ffff88012a580000
[ 97.997198] FS:
0000000000000000(0000) GS:
ffff88012fc40000(0000) knlGS:
0000000000000000
[ 98.005280] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 98.011021] CR2:
00000000000000cc CR3:
0000000126c5d000 CR4:
00000000000006e0
[ 98.018149] Stack:
[ 98.020157]
00000000ffffffff ffff88012fc43ac8 ffffffffa017ad0a 000000000000000e
[ 98.027584]
0000001300000000 0000000077d59998 ffff8801292c8700 00000020002949b3
[ 98.035010]
ffff88012a580000 0000000000000000 ffff88012a580000 ffff88012fc43a98
[ 98.042437] Call Trace:
[ 98.044879] <IRQ> [ 98.046803] [<
ffffffffa017ad0a>] ? tg3_start_xmit+0x84a/0xd60 [tg3]
[ 98.053156] [<
ffffffff815eeee0>] skb_mac_gso_segment+0xb0/0x130
[ 98.059158] [<
ffffffff815eefd3>] __skb_gso_segment+0x73/0x110
[ 98.064985] [<
ffffffff815ef40d>] validate_xmit_skb+0x12d/0x2b0
[ 98.070899] [<
ffffffff815ef5d2>] validate_xmit_skb_list+0x42/0x70
[ 98.077073] [<
ffffffff81618560>] sch_direct_xmit+0xd0/0x1b0
[ 98.082726] [<
ffffffff815efd86>] __dev_queue_xmit+0x486/0x690
[ 98.088554] [<
ffffffff8135c135>] ? cpumask_next_and+0x35/0x50
[ 98.094380] [<
ffffffff815effa0>] dev_queue_xmit+0x10/0x20
[ 98.099863] [<
ffffffffa09ce057>] br_dev_queue_push_xmit+0xa7/0x170 [bridge]
[ 98.106907] [<
ffffffffa09ce161>] br_forward_finish+0x41/0xc0 [bridge]
[ 98.113430] [<
ffffffff81627cf2>] ? nf_iterate+0x52/0x60
[ 98.118735] [<
ffffffff81627d6b>] ? nf_hook_slow+0x6b/0xc0
[ 98.124216] [<
ffffffffa09ce32c>] __br_forward+0x14c/0x1e0 [bridge]
[ 98.130480] [<
ffffffffa09ce120>] ? br_dev_queue_push_xmit+0x170/0x170 [bridge]
[ 98.137785] [<
ffffffffa09ce4bd>] br_forward+0x9d/0xb0 [bridge]
[ 98.143701] [<
ffffffffa09cfbb7>] br_handle_frame_finish+0x267/0x560 [bridge]
[ 98.150834] [<
ffffffffa09d0064>] br_handle_frame+0x174/0x2f0 [bridge]
[ 98.157355] [<
ffffffff8102fb89>] ? sched_clock+0x9/0x10
[ 98.162662] [<
ffffffff810b63b2>] ? sched_clock_cpu+0x72/0xa0
[ 98.168403] [<
ffffffff815eccf5>] __netif_receive_skb_core+0x1e5/0xa20
[ 98.174926] [<
ffffffff813659f9>] ? timerqueue_add+0x59/0xb0
[ 98.180580] [<
ffffffff815ed548>] __netif_receive_skb+0x18/0x60
[ 98.186494] [<
ffffffff815ee625>] process_backlog+0x95/0x140
[ 98.192145] [<
ffffffff815edccd>] net_rx_action+0x16d/0x380
[ 98.197713] [<
ffffffff8170cff1>] __do_softirq+0xd1/0x283
[ 98.203106] [<
ffffffff8170b2bc>] do_softirq_own_stack+0x1c/0x30
[ 98.209107] <EOI> [ 98.211029] [<
ffffffff8108a5c0>] do_softirq+0x50/0x60
[ 98.216166] [<
ffffffff815ec853>] netif_rx_ni+0x33/0x80
[ 98.221386] [<
ffffffffa09eeff7>] tun_get_user+0x487/0x7f0 [tun]
[ 98.227388] [<
ffffffffa09ef3ab>] tun_sendmsg+0x4b/0x60 [tun]
[ 98.233129] [<
ffffffffa0b68932>] handle_tx+0x282/0x540 [vhost_net]
[ 98.239392] [<
ffffffffa0b68c25>] handle_tx_kick+0x15/0x20 [vhost_net]
[ 98.245916] [<
ffffffffa0abacfe>] vhost_worker+0x9e/0xf0 [vhost]
[ 98.251919] [<
ffffffffa0abac60>] ? vhost_umem_alloc+0x40/0x40 [vhost]
[ 98.258440] [<
ffffffff81003a47>] ? do_syscall_64+0x67/0x180
[ 98.264094] [<
ffffffff810a44d9>] kthread+0xd9/0xf0
[ 98.268965] [<
ffffffff810a4400>] ? kthread_park+0x60/0x60
[ 98.274444] [<
ffffffff8170a4d5>] ret_from_fork+0x25/0x30
[ 98.279836] Code: 8b 93 d8 00 00 00 48 2b 93 d0 00 00 00 4c 89 e6 48 89 df 66 89 93 c2 00 00 00 ff 10 48 3d 00 f0 ff ff 49 89 c2 0f 87 52 01 00 00 <41> 8b 92 cc 00 00 00 48 8b 80 d0 00 00 00 44 0f b7 74 10 06 66
[ 98.299425] RIP [<
ffffffff816e52f9>] ipv6_gso_segment+0x119/0x2f0
[ 98.305612] RSP <
ffff88012fc43a10>
[ 98.309094] CR2:
00000000000000cc
[ 98.312406] ---[ end trace
726a2c7a2d2d78d0 ]---
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Thu, 1 Dec 2016 12:44:43 +0000 (04:44 -0800)]
RDS: TCP: unregister_netdevice_notifier() in error path of rds_tcp_init_net
If some error is encountered in rds_tcp_init_net, make sure to
unregister_netdevice_notifier(), else we could trigger a panic
later on, when the modprobe from a netns fails.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eli Cooper [Thu, 1 Dec 2016 02:05:12 +0000 (10:05 +0800)]
Revert: "ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()"
This reverts commit
ae148b085876fa771d9ef2c05f85d4b4bf09ce0d
("ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()").
skb->protocol is now set in __ip_local_out() and __ip6_local_out() before
dst_output() is called. It is no longer necessary to do it for each tunnel.
Cc: stable@vger.kernel.org
Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eli Cooper [Thu, 1 Dec 2016 02:05:11 +0000 (10:05 +0800)]
ipv6: Set skb->protocol properly for local output
When xfrm is applied to TSO/GSO packets, it follows this path:
xfrm_output() -> xfrm_output_gso() -> skb_gso_segment()
where skb_gso_segment() relies on skb->protocol to function properly.
This patch sets skb->protocol to ETH_P_IPV6 before dst_output() is called,
fixing a bug where GSO packets sent through an ipip6 tunnel are dropped
when xfrm is involved.
Cc: stable@vger.kernel.org
Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eli Cooper [Thu, 1 Dec 2016 02:05:10 +0000 (10:05 +0800)]
ipv4: Set skb->protocol properly for local output
When xfrm is applied to TSO/GSO packets, it follows this path:
xfrm_output() -> xfrm_output_gso() -> skb_gso_segment()
where skb_gso_segment() relies on skb->protocol to function properly.
This patch sets skb->protocol to ETH_P_IP before dst_output() is called,
fixing a bug where GSO packets sent through a sit tunnel are dropped
when xfrm is involved.
Cc: stable@vger.kernel.org
Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Philip Pettersson [Wed, 30 Nov 2016 22:55:36 +0000 (14:55 -0800)]
packet: fix race condition in packet_set_ring
When packet_set_ring creates a ring buffer it will initialize a
struct timer_list if the packet version is TPACKET_V3. This value
can then be raced by a different thread calling setsockopt to
set the version to TPACKET_V1 before packet_set_ring has finished.
This leads to a use-after-free on a function pointer in the
struct timer_list when the socket is closed as the previously
initialized timer will not be deleted.
The bug is fixed by taking lock_sock(sk) in packet_setsockopt when
changing the packet version while also taking the lock at the start
of packet_set_ring.
Fixes:
f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.")
Signed-off-by: Philip Pettersson <philip.pettersson@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 2 Dec 2016 17:15:26 +0000 (09:15 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"All architectures avoid memory corruption in an error path. ARM
prevents bogus acknowledgement of interrupts"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: use after free in kvm_ioctl_create_device()
KVM: arm/arm64: vgic: Don't notify EOI for non-SPIs
Linus Torvalds [Fri, 2 Dec 2016 17:12:44 +0000 (09:12 -0800)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
"Here is the revert for the regression of the i2c-octeon driver I
mentioned last time. I wished for a bit more feedback, but all people
working actively on it are in need of this patch, so here it goes"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
Revert "i2c: octeon: thunderx: Limit register access retries"
Lino Sanfilippo [Wed, 30 Nov 2016 22:48:32 +0000 (23:48 +0100)]
net: ethernet: altera: TSE: do not use tx queue lock in tx completion handler
The driver already uses its private lock for synchronization between xmit
and xmit completion handler making the additional use of the xmit_lock
unnecessary.
Furthermore the driver does not set NETIF_F_LLTX resulting in xmit to be
called with the xmit_lock held and then taking the private lock while xmit
completion handler does the reverse, first take the private lock, then the
xmit_lock.
Fix these issues by not taking the xmit_lock in the tx completion handler.
Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lino Sanfilippo [Wed, 30 Nov 2016 22:48:31 +0000 (23:48 +0100)]
net: ethernet: altera: TSE: Remove unneeded dma sync for tx buffers
An explicit dma sync for device directly after mapping as well as an
explicit dma sync for cpu directly before unmapping is unnecessary and
costly on the hotpath. So remove these calls.
Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Fri, 2 Dec 2016 12:40:27 +0000 (13:40 +0100)]
default exported asm symbols to zero
With binutils-2.26 and before, a weak missing symbol was kept during the
final link, and a missing CRC for an export would lead to that CRC being
treated as zero implicitly. With binutils-2.27, the crc symbol gets
dropped, and any module trying to use it will fail to load.
This sets the weak CRC symbol to zero explicitly, making it defined in
vmlinux, which in turn lets us load the modules referring to that CRC.
The comment above the __CRC_SYMBOL macro suggests that this was always
the intention, although it also seems that all symbols defined in C have
a correct CRC these days, and only the exports that are now done in
assembly need this.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Adam Borowski <kilobyte@angband.pl>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sudeep Holla [Wed, 16 Nov 2016 17:31:31 +0000 (17:31 +0000)]
arm64: dts: juno: fix cluster sleep state entry latency on all SoC versions
The core and the cluster sleep state entry latencies can't be same as
cluster sleep involves more work compared to core level e.g. shared
cache maintenance.
Experiments have shown on an average about 100us more latency for the
cluster sleep state compared to the core level sleep. This patch fixes
the entry latency for the cluster sleep state.
Fixes:
28e10a8f3a03 ("arm64: dts: juno: Add idle-states to device tree")
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: "Jon Medhurst (Tixy)" <tixy@linaro.org>
Reviewed-by: Liviu Dudau <Liviu.Dudau@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
David S. Miller [Fri, 2 Dec 2016 15:42:47 +0000 (10:42 -0500)]
Merge branch 'stmmac-probe-error-handling-and-phydev-leaks'
Johan Hovold says:
====================
net: stmmac: fix probe error handling and phydev leaks
This series fixes a number of issues with the stmmac-driver probe error
handling, which for example left clocks enabled after probe failures.
The final patch fixes a failure to deregister and free any fixed-link
PHYs that were registered during probe on probe errors and on driver
unbind. It also fixes a related of-node leak on late probe errors.
This series depends on the of_phy_deregister_fixed_link() helper that
was just merged to net.
As mentioned earlier, one staging driver also suffers from a similar
leak and can be fixed up once the above mentioned helper hits mainline.
Note that these patches have only been compile tested.
====================
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hovold [Wed, 30 Nov 2016 14:29:55 +0000 (15:29 +0100)]
net: ethernet: stmmac: fix of-node and fixed-link-phydev leaks
Make sure to deregister and free any fixed-link phy registered during
probe on probe errors and on driver unbind by adding a new glue helper
function.
Drop the of-node reference taken in the same path also on late probe
errors (and not just on driver unbind) by moving the put from
stmmac_dvr_remove() to the new helper.
Fixes:
277323814e49 ("stmmac: add fixed-link device-tree support")
Fixes:
4613b279bee7 ("ethernet: stmicro: stmmac: add missing of_node_put
after calling of_parse_phandle")
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hovold [Wed, 30 Nov 2016 14:29:54 +0000 (15:29 +0100)]
net: ethernet: stmmac: platform: fix outdated function header
Fix the OF-helper function header to reflect that the function no longer
has a platform-data parameter.
Fixes:
b0003ead75f3 ("stmmac: make stmmac_probe_config_dt return the
platform data struct")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hovold [Wed, 30 Nov 2016 14:29:53 +0000 (15:29 +0100)]
net: ethernet: stmmac: dwmac-meson8b: fix probe error path
Make sure to disable clocks before returning on late probe errors.
Fixes:
566e82516253 ("net: stmmac: add a glue driver for the Amlogic
Meson 8b / GXBB DWMAC")
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hovold [Wed, 30 Nov 2016 14:29:52 +0000 (15:29 +0100)]
net: ethernet: stmmac: dwmac-generic: fix probe error path
Make sure to call any exit() callback to undo the effect of init()
before returning on late probe errors.
Fixes:
cf3f047b9af4 ("stmmac: move hw init in the probe (v2)")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hovold [Wed, 30 Nov 2016 14:29:51 +0000 (15:29 +0100)]
net: ethernet: stmmac: dwmac-rk: fix probe error path
Make sure to disable runtime PM, power down the PHY, and disable clocks
before returning on late probe errors.
Fixes:
27ffefd2d109 ("stmmac: dwmac-rk: create a new probe function")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hovold [Wed, 30 Nov 2016 14:29:50 +0000 (15:29 +0100)]
net: ethernet: stmmac: dwmac-sti: fix probe error path
Make sure to disable clocks before returning on late probe errors.
Fixes:
8387ee21f972 ("stmmac: dwmac-sti: turn setup callback into a
probe function")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hovold [Wed, 30 Nov 2016 14:29:49 +0000 (15:29 +0100)]
net: ethernet: stmmac: dwmac-socfpga: fix use-after-free on probe errors
Make sure to call stmmac_dvr_remove() before returning on late probe
errors so that memory is freed, clocks are disabled, and the netdev is
deregistered before its resources go away.
Fixes:
3c201b5a84ed ("net: stmmac: socfpga: Remove re-registration of
reset controller")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Klauser [Wed, 30 Nov 2016 13:30:37 +0000 (14:30 +0100)]
net/rtnetlink: fix attribute name in nlmsg_size() comments
Use the correct attribute constant names IFLA_GSO_MAX_{SEGS,SIZE}
instead of IFLA_MAX_GSO_{SEGS,SIZE} for the comments int nlmsg_size().
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 2 Dec 2016 00:44:42 +0000 (16:44 -0800)]
Merge tag 'pci-v4.9-fixes-4' of git://git./linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"PCI fixes:
- Fix Read Completion Boundary setting, which fixes a boot failure on
IBM x3850 with Mellanox MT27500 ConnectX-3
- Update some MAINTAINERS entries and email addresses"
* tag 'pci-v4.9-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX)
PCI: Export pcie_find_root_port
PCI: designware-plat: Update author email
PCI: designware: Change maintainer to Joao Pinto
MAINTAINERS: Add devicetree binding to PCI i.MX6 entry
MAINTAINERS: Update Richard Zhu's email address
Alexander Duyck [Mon, 28 Nov 2016 15:42:29 +0000 (10:42 -0500)]
ixgbe/ixgbevf: Don't use lco_csum to compute IPv4 checksum
In the case of IPIP and SIT tunnel frames the outer transport header
offset is actually set to the same offset as the inner transport header.
This results in the lco_csum call not doing any checksum computation over
the inner IPv4/v6 header data.
In order to account for that I am updating the code so that we determine
the location to start the checksum ourselves based on the location of the
IPv4 header and the length.
Fixes:
b83e30104bd9 ("ixgbe/ixgbevf: Add support for GSO partial")
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Mon, 28 Nov 2016 15:42:23 +0000 (10:42 -0500)]
igb/igbvf: Don't use lco_csum to compute IPv4 checksum
In the case of IPIP and SIT tunnel frames the outer transport header
offset is actually set to the same offset as the inner transport header.
This results in the lco_csum call not doing any checksum computation over
the inner IPv4/v6 header data.
In order to account for that I am updating the code so that we determine
the location to start the checksum ourselves based on the location of the
IPv4 header and the length.
Fixes:
e10715d3e961 ("igb/igbvf: Add support for GSO partial")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
allan [Wed, 30 Nov 2016 08:29:08 +0000 (16:29 +0800)]
net: asix: Fix AX88772_suspend() USB vendor commands failure issues
The change fixes AX88772_suspend() USB vendor commands failure issues.
Signed-off-by: Allan Chou <allan@asix.com.tw>
Tested-by: Allan Chou <allan@asix.com.tw>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 1 Dec 2016 18:31:53 +0000 (10:31 -0800)]
Merge branch 'overlayfs-linus' of git://git./linux/kernel/git/mszeredi/vfs
Pull overlayfs fix from Miklos Szeredi:
"This fixes a regression introduced in 4.8"
* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: fix d_real() for stacked fs
Linus Torvalds [Thu, 1 Dec 2016 18:29:41 +0000 (10:29 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov: "We are disabling automatic
probing of BYD touchpads as it results in too many false positives,
and the hardware is not terribly popular and having the protocol
support does not result in significantly improved user experience.
We also change keycode for KEY_DATA to avoid clashing with
KEY_FASTREVERSE. Luckily this newish code is used by CEC framework
that is still in staging, so it is extremely unlikely that someone has
already started using this keycode"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: change KEY_DATA from 0x275 to 0x277
Input: psmouse - disable automatic probing of BYD touchpads
Nicolas Pitre [Wed, 30 Nov 2016 22:41:58 +0000 (17:41 -0500)]
kbuild: make sure autoksyms.h exists early
Some people are able to trigger a race where autoksyms.h is used before
its empty version is even created. Let's create it at the same time as
the directory holding it is created.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Prarit Bhargava <prarit@redhat.com>
Tested-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David S. Miller [Thu, 1 Dec 2016 16:35:49 +0000 (11:35 -0500)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2016-12-01
1) Change the error value when someone tries to run 32bit
userspace on a 64bit host from -ENOTSUPP to the userspace
exported -EOPNOTSUPP. Fix from Yi Zhao.
2) On inbound, ESN sequence numbers are already in network
byte order. So don't try to convert it again, this fixes
integrity verification for ESN. Fixes from Tobias Brunner.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 1 Dec 2016 16:04:41 +0000 (11:04 -0500)]
Merge git://git./pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
This is a large batch of Netfilter fixes for net, they are:
1) Three patches to fix NAT conversion to rhashtable: Switch to rhlist
structure that allows to have several objects with the same key.
Moreover, fix wrong comparison logic in nf_nat_bysource_cmp() as this is
expecting a return value similar to memcmp(). Change location of
the nat_bysource field in the nf_conn structure to avoid zeroing
this as it breaks interaction with SLAB_DESTROY_BY_RCU and lead us
to crashes. From Florian Westphal.
2) Don't allow malformed fragments go through in IPv6, drop them,
otherwise we hit GPF, patch from Florian Westphal.
3) Fix crash if attributes are missing in nft_range, from Liping Zhang.
4) Fix arptables 32-bits userspace 64-bits kernel compat, from Hongxu Jia.
5) Two patches from David Ahern to fix netfilter interaction with vrf.
From David Ahern.
6) Fix element timeout calculation in nf_tables, we take milliseconds
from userspace, but we use jiffies from kernelspace. Patch from
Anders K. Pedersen.
7) Missing validation length netlink attribute for nft_hash, from
Laura Garcia.
8) Fix nf_conntrack_helper documentation, we don't default to off
anymore for a bit of time so let's get this in sync with the code.
I know is late but I think these are important, specifically the NAT
bits, as they are mostly addressing fallout from recent changes. I also
read there are chances to have -rc8, if that is the case, that would
also give us a bit more time to test this.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ritesh Harjani [Thu, 1 Dec 2016 15:36:16 +0000 (08:36 -0700)]
block: factor out req_set_nomerge
Factor out common code for setting REQ_NOMERGE flag which is being used
out at certain places and make it a helper instead, req_set_nomerge().
Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org>
Get rid of the inline.
Signed-off-by: Jens Axboe <axboe@fb.com>
Rabin Vincent [Thu, 1 Dec 2016 08:18:28 +0000 (09:18 +0100)]
block: protect iterate_bdevs() against concurrent close
If a block device is closed while iterate_bdevs() is handling it, the
following NULL pointer dereference occurs because bdev->b_disk is NULL
in bdev_get_queue(), which is called from blk_get_backing_dev_info() (in
turn called by the mapping_cap_writeback_dirty() call in
__filemap_fdatawrite_range()):
BUG: unable to handle kernel NULL pointer dereference at
0000000000000508
IP: [<
ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
PGD
9e62067 PUD
9ee8067 PMD 0
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in:
CPU: 1 PID: 2422 Comm: sync Not tainted 4.5.0-rc7+ #400
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
task:
ffff880009f4d700 ti:
ffff880009f5c000 task.ti:
ffff880009f5c000
RIP: 0010:[<
ffffffff81314790>] [<
ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
RSP: 0018:
ffff880009f5fe68 EFLAGS:
00010246
RAX:
0000000000000000 RBX:
ffff88000ec17a38 RCX:
ffffffff81a4e940
RDX:
7fffffffffffffff RSI:
0000000000000000 RDI:
ffff88000ec176c0
RBP:
ffff880009f5fe68 R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000001 R11:
0000000000000000 R12:
ffff88000ec17860
R13:
ffffffff811b25c0 R14:
ffff88000ec178e0 R15:
ffff88000ec17a38
FS:
00007faee505d700(0000) GS:
ffff88000fb00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
CR2:
0000000000000508 CR3:
0000000009e8a000 CR4:
00000000000006e0
Stack:
ffff880009f5feb8 ffffffff8112e7f5 0000000000000000 7fffffffffffffff
0000000000000000 0000000000000000 7fffffffffffffff 0000000000000001
ffff88000ec178e0 ffff88000ec17860 ffff880009f5fec8 ffffffff8112e81f
Call Trace:
[<
ffffffff8112e7f5>] __filemap_fdatawrite_range+0x85/0x90
[<
ffffffff8112e81f>] filemap_fdatawrite+0x1f/0x30
[<
ffffffff811b25d6>] fdatawrite_one_bdev+0x16/0x20
[<
ffffffff811bc402>] iterate_bdevs+0xf2/0x130
[<
ffffffff811b2763>] sys_sync+0x63/0x90
[<
ffffffff815d4272>] entry_SYSCALL_64_fastpath+0x12/0x76
Code: 0f 1f 44 00 00 48 8b 87 f0 00 00 00 55 48 89 e5 <48> 8b 80 08 05 00 00 5d
RIP [<
ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20
RSP <
ffff880009f5fe68>
CR2:
0000000000000508
---[ end trace
2487336ceb3de62d ]---
The crash is easily reproducible by running the following command, if an
msleep(100) is inserted before the call to func() in iterate_devs():
while :; do head -c1 /dev/nullb0; done > /dev/null & while :; do sync; done
Fix it by holding the bd_mutex across the func() call and only calling
func() if the bdev is opened.
Cc: stable@vger.kernel.org
Fixes:
5c0d6b60a0ba ("vfs: Create function for iterating over block devices")
Reported-and-tested-by: Wei Fang <fangwei1@huawei.com>
Signed-off-by: Rabin Vincent <rabinv@axis.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Dan Carpenter [Wed, 30 Nov 2016 19:21:05 +0000 (22:21 +0300)]
KVM: use after free in kvm_ioctl_create_device()
We should move the ops->destroy(dev) after the list_del(&dev->vm_node)
so that we don't use "dev" after freeing it.
Fixes:
a28ebea2adc4 ("KVM: Protect device ops->create and list_add with kvm->lock")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Pan Bian [Thu, 1 Dec 2016 02:10:46 +0000 (10:10 +0800)]
block: mtip32xx: set error code on failure
Fix bug https://bugzilla.kernel.org/show_bug.cgi?id=188531. In function
mtip_block_initialize(), variable rv takes the return value, and its
value should be negative on errors. rv is initialized as 0 and is not
reset when the call to ida_pre_get() fails. So 0 may be returned.
The return value 0 indicates that there is no error, which may be
inconsistent with the execution status. This patch fixes the bug by
explicitly assigning -ENOMEM to rv on the branch that ida_pre_get()
fails.
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Chaitanya Kulkarni [Wed, 30 Nov 2016 20:29:02 +0000 (12:29 -0800)]
nvmet: add support for the Write Zeroes command
Add support for handling write zeroes command on target.
Call into __blkdev_issue_zeroout, which the block layer expands into the
best suitable variant of zeroing the LBAs. Allow write zeroes operation
to deallocate the LBAs when calling __blkdev_issue_zeroout.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Chaitanya Kulkarni [Wed, 30 Nov 2016 20:29:01 +0000 (12:29 -0800)]
nvme: add support for the Write Zeroes command
Allow write zeroes operations (REQ_OP_WRITE_ZEROES) on the block
device, if the device supports optional command bit set for write
zeroes. Add support to setup write zeroes command. Set maximum possible
write zeroes sectors in one write zeroes command according to
nvme write zeroes command definition.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Chaitanya Kulkarni [Wed, 30 Nov 2016 20:29:00 +0000 (12:29 -0800)]
nvme.h: add Write Zeroes definitions
Add the command structure, optional command set support (ONCS) bit and
a new error code for the Write Zeroes command.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Chaitanya Kulkarni [Wed, 30 Nov 2016 20:28:59 +0000 (12:28 -0800)]
block: add support for REQ_OP_WRITE_ZEROES
This adds a new block layer operation to zero out a range of
LBAs. This allows to implement zeroing for devices that don't use
either discard with a predictable zero pattern or WRITE SAME of zeroes.
The prominent example of that is NVMe with the Write Zeroes command,
but in the future, this should also help with improving the way
zeroing discards work. For this operation, suitable entry is exported in
sysfs which indicate the number of maximum bytes allowed in one
write zeroes operation by the device.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Chaitanya Kulkarni [Wed, 30 Nov 2016 20:28:58 +0000 (12:28 -0800)]
block: add async variant of blkdev_issue_zeroout
Similar to __blkdev_issue_discard this variant allows submitting
the final bio asynchronously and chaining multiple ranges
into a single completion.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Damien Le Moal [Thu, 1 Dec 2016 07:00:25 +0000 (16:00 +0900)]
block: Check partition alignment on zoned block devices
Both blkdev_report_zones and blkdev_reset_zones can operate on a partition of
a zoned block device. However, the first and last zones reported for a
partition make sense only if the partition start sector and size are aligned
on the device zone size. The same applies for zone reset. Resetting the first
or the last zone of a partition straddling zones may impact neighboring
partitions. Finally, if a partition start sector is not at the beginning of a
sequential zone, it will be impossible to write to the first sectors of the
partition on a host-managed device.
Avoid all these problems and incoherencies by ignoring partitions that are not
zone aligned.
Note: Even with CONFIG_BLK_DEV_ZONED disabled, bdev_is_zoned() will report the
correct disk zoning type (host-aware, host-managed or none) but
bdev_zone_size() will always return 0 for zoned block devices (i.e. the zone
size is unknown). So test this as a way to ensure that a zoned block device is
being handled as such. As a result, for a host-aware devices, unaligned zone
partitions will be accepted with CONFIG_BLK_DEV_ZONED disabled. That is, the
disk will be treated as a regular block device (as it should). If zoned block
device support is enabled, only aligned partitions will be accepted.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Radim Krčmář [Thu, 1 Dec 2016 13:56:34 +0000 (14:56 +0100)]
Merge tag 'kvm-arm-for-4.9-rc7' of git://git./linux/kernel/git/kvmarm/kvmarm
KVM/ARM updates for v4.9-rc7
- Do not call kvm_notify_acked for PPIs
Stephane Grosjean [Thu, 1 Dec 2016 10:41:12 +0000 (11:41 +0100)]
can: peak: Add support for PCAN-USB X6 USB interface
This adds support for PEAK-System PCAN-USB X6 USB to CAN interface.
The CAN FD adapter PCAN-USB X6 allows the connection of up to 6 CAN FD
or CAN networks to a computer via USB. The interface is installed in an
aluminum profile casing and is shipped in versions with D-Sub connectors
or M12 circular connectors.
The PCAN-USB X6 registers in the USB sub-system as if 3x PCAN-USB-Pro FD
adapters were plugged. So, this patch:
- updates the PEAK_USB entry of the corresponding Kconfig file
- defines and adds the device id. of the PCAN-USB X6 (0x0014) into the
table of supported device ids
- defines and adds the new software structure implementing the PCAN-USB X6,
which is obviously a clone of the software structure implementing the
PCAN-USB Pro FD.
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Stephane Grosjean [Thu, 1 Dec 2016 10:41:11 +0000 (11:41 +0100)]
can: peak: Fix bittiming fields size in bits
This fixes the bitimings fields ranges supported by all the CAN-FD USB
interfaces of the PEAK-System CAN-FD adapters.
Very first development versions of the IP core API defined smaller TSGEx
and SJW fields for both nominal and data bittimings records than the
production versions. This patch fixes them by enlarging their sizes to
the actual values:
field: old size: fixed size:
nominal TSGEG1 6 8
nominal TSGEG2 4 7
nominal SJW 4 7
data TSGEG1 4 5
data TSGEG2 3 4
data SJW 2 4
Note that this has no other consequences than offering larger choice to
bitrate encoding.
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Nicholas Piggin [Sat, 26 Nov 2016 03:20:31 +0000 (14:20 +1100)]
powerpc/64: Fix placement of .text to be immediately following .head.text
Do not introduce any additional alignment. Placement of text section
will be set by fixed section macros. Without this, output section
alignment defaults to 4096, which makes BookE text section start at
0x1000 when it is expected to start at 0x100.
This was introduced by commit
57f266497d81 ("powerpc: Use gas sections
for arranging exception vectors") and was caught with the scripted head
section checker (not yet merged).
Fixes:
57f266497d81 ("powerpc: Use gas sections for arranging exception vectors")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Andrew Donnellan [Thu, 1 Dec 2016 00:23:05 +0000 (11:23 +1100)]
powerpc/eeh: Fix deadlock when PE frozen state can't be cleared
In eeh_reset_device(), we take the pci_rescan_remove_lock immediately after
after we call eeh_reset_pe() to reset the PCI controller. We then call
eeh_clear_pe_frozen_state(), which can return an error. In this case, we
bail out of eeh_reset_device() without calling pci_unlock_rescan_remove().
Add a call to pci_unlock_rescan_remove() in the eeh_clear_pe_frozen_state()
error path so that we don't cause a deadlock later on.
Reported-by: Pradipta Ghosh <pradghos@in.ibm.com>
Fixes:
78954700631f ("powerpc/eeh: Avoid I/O access during PE reset")
Cc: stable@vger.kernel.org # v3.16+
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Linus Torvalds [Thu, 1 Dec 2016 00:33:41 +0000 (16:33 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"7 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: fix false-positive WARN_ON() in truncate/invalidate for hugetlb
kasan: support use-after-scope detection
kasan: update kasan_global for gcc 7
lib/debugobjects: export for use in modules
zram: fix unbalanced idr management at hot removal
thp: fix corner case of munlock() of PTE-mapped THPs
mm, thp: propagation of conditional compilation in khugepaged.c
Kirill A. Shutemov [Wed, 30 Nov 2016 23:54:19 +0000 (15:54 -0800)]
mm: fix false-positive WARN_ON() in truncate/invalidate for hugetlb
Hugetlb pages have ->index in size of the huge pages (PMD_SIZE or
PUD_SIZE), not in PAGE_SIZE as other types of pages. This means we
cannot user page_to_pgoff() to check whether we've got the right page
for the radix-tree index.
Let's introduce page_to_index() which would return radix-tree index for
given page.
We will be able to get rid of this once hugetlb will be switched to
multi-order entries.
Fixes:
fc127da085c2 ("truncate: handle file thp")
Link: http://lkml.kernel.org/r/20161123093053.mjbnvn5zwxw5e6lk@black.fi.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Doug Nelson <doug.nelson@intel.com>
Tested-by: Doug Nelson <doug.nelson@intel.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>