linux-2.6-block.git
8 years agoMerge branch 'for-4.11/rq-refactor' into for-next
Jens Axboe [Mon, 30 Jan 2017 03:07:10 +0000 (20:07 -0700)]
Merge branch 'for-4.11/rq-refactor' into for-next

8 years agonvme: fix compilation of scsi component
Jens Axboe [Mon, 30 Jan 2017 03:04:49 +0000 (20:04 -0700)]
nvme: fix compilation of scsi component

Since we moved the cdb parts and define out of the block proper,
we need to include scsi/scsi_request.h for the nvme scsi layer.

Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoMerge branch 'for-4.11/rq-refactor' into for-next
Jens Axboe [Sat, 28 Jan 2017 02:21:05 +0000 (19:21 -0700)]
Merge branch 'for-4.11/rq-refactor' into for-next

8 years agoMerge branch 'for-4.11/block' into for-next
Jens Axboe [Fri, 27 Jan 2017 22:15:58 +0000 (15:15 -0700)]
Merge branch 'for-4.11/block' into for-next

8 years agoblock: don't assign cmd_flags in __blk_rq_prep_clone
Christoph Hellwig [Mon, 23 Jan 2017 13:31:09 +0000 (14:31 +0100)]
block: don't assign cmd_flags in __blk_rq_prep_clone

These days we have the proper flags set since request allocation time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblock: split scsi_request out of struct request
Christoph Hellwig [Fri, 27 Jan 2017 08:46:29 +0000 (09:46 +0100)]
block: split scsi_request out of struct request

And require all drivers that want to support BLOCK_PC to allocate it
as the first thing of their private data.  To support this the legacy
IDE and BSG code is switched to set cmd_size on their queues to let
the block layer allocate the additional space.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblock/bsg: move queue creation into bsg_setup_queue
Christoph Hellwig [Tue, 3 Jan 2017 12:25:02 +0000 (15:25 +0300)]
block/bsg: move queue creation into bsg_setup_queue

Simply the boilerplate code needed for bsg nodes a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoscsi: allocate scsi_cmnd structures as part of struct request
Christoph Hellwig [Mon, 2 Jan 2017 18:55:26 +0000 (21:55 +0300)]
scsi: allocate scsi_cmnd structures as part of struct request

Rely on the new block layer functionality to allocate additional driver
specific data behind struct request instead of implementing it in SCSI
itѕelf.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoscsi: remove __scsi_alloc_queue
Christoph Hellwig [Mon, 2 Jan 2017 18:52:10 +0000 (21:52 +0300)]
scsi: remove __scsi_alloc_queue

Instead do an internal export of __scsi_init_queue for the transport
classes that export BSG nodes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoscsi: remove scsi_cmd_dma_pool
Christoph Hellwig [Mon, 2 Jan 2017 12:26:34 +0000 (15:26 +0300)]
scsi: remove scsi_cmd_dma_pool

There is no need for GFP_DMA allocations of the scsi_cmnd structures
themselves, all that might be DMAed to or from is the actual payload,
or the sense buffers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoscsi: respect unchecked_isa_dma for blk-mq
Christoph Hellwig [Tue, 3 Jan 2017 05:28:41 +0000 (08:28 +0300)]
scsi: respect unchecked_isa_dma for blk-mq

Currently blk-mq always allocates the sense buffer using normal GFP_KERNEL
allocation.  Refactor the cmd pool code to split the cmd and sense allocation
and share the code to allocate the sense buffers as well as the sense buffer
slab caches between the legacy and blk-mq path.

Note that this switches to lazy allocation of the sense slab caches - the
slab caches (not the actual allocations) won't be destroy until the scsi
module is unloaded instead of keeping track of hosts using them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoscsi: remove gfp_flags member in scsi_host_cmd_pool
Christoph Hellwig [Mon, 2 Jan 2017 11:38:03 +0000 (14:38 +0300)]
scsi: remove gfp_flags member in scsi_host_cmd_pool

When using the slab allocator we already decide at cache creation time if
an allocation comes from a GFP_DMA pool using the SLAB_CACHE_DMA flag,
and there is no point passing the kmalloc-family only GFP_DMA flag to
kmem_cache_alloc.  Drop all the infrastructure for doing so.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoscsi_dh_hp_sw: switch to scsi_execute_req_flags()
Hannes Reinecke [Thu, 3 Nov 2016 13:20:23 +0000 (14:20 +0100)]
scsi_dh_hp_sw: switch to scsi_execute_req_flags()

Switch to scsi_execute_req_flags() instead of using the block interface
directly.  This will set REQ_QUIET and REQ_PREEMPT, but this is okay as
we're evaluating the errors anyway and should be able to send the command
even if the device is quiesced.

Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoscsi_dh_emc: switch to scsi_execute_req_flags()
Hannes Reinecke [Thu, 3 Nov 2016 13:20:22 +0000 (14:20 +0100)]
scsi_dh_emc: switch to scsi_execute_req_flags()

Switch to scsi_execute_req_flags() and scsi_get_vpd_page() instead of
open-coding it.  Using scsi_execute_req_flags() will set REQ_QUIET and
REQ_PREEMPT, but this is okay as we're evaluating the errors anyway and
should be able to send the command even if the device is quiesced.

Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoscsi_dh_rdac: switch to scsi_execute_req_flags()
Hannes Reinecke [Thu, 3 Nov 2016 13:20:21 +0000 (14:20 +0100)]
scsi_dh_rdac: switch to scsi_execute_req_flags()

Switch to scsi_execute_req_flags() and scsi_get_vpd_page() instead of
open-coding it.  Using scsi_execute_req_flags() will set REQ_QUIET and
REQ_PREEMPT, but this is okay as we're evaluating the errors anyway and
should be able to send the command even if the device is quiesced.

Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agodm: always defer request allocation to the owner of the request_queue
Christoph Hellwig [Sun, 22 Jan 2017 17:32:46 +0000 (18:32 +0100)]
dm: always defer request allocation to the owner of the request_queue

DM already calls blk_mq_alloc_request on the request_queue of the
underlying device if it is a blk-mq device.  But now that we allow drivers
to allocate additional data and initialize it ahead of time we need to do
the same for all drivers.   Doing so and using the new cmd_size
infrastructure in the block layer greatly simplifies the dm-rq and mpath
code, and should also make arbitrary combinations of SQ and MQ devices
with SQ or MQ device mapper tables easily possible as a further step.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agodm: remove incomplete BLOCK_PC support
Christoph Hellwig [Tue, 10 Jan 2017 09:03:39 +0000 (10:03 +0100)]
dm: remove incomplete BLOCK_PC support

DM tries to copy a few fields around for BLOCK_PC requests, but given
that no dm-target ever wires up scsi_cmd_ioctl BLOCK_PC can't actually
be sent to dm.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblock: cleanup tracing
Christoph Hellwig [Fri, 27 Jan 2017 08:35:54 +0000 (09:35 +0100)]
block: cleanup tracing

A couple tweaks to the tracing code:

 - trace the request size for all requests
 - trace request sector and nr_sectors only for fs requests, enforced by
   helpers
 - drop SCSI CDB tracing - we have SCSI tracing for this and are going
   to me the CDB out of the generic struct request soon.

With this the tracing code stops to know about BLOCK_PC requests entirely,
it's just FS vs passthrough requests now, where the latter includes any
driver-private requests.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblock: allow specifying size for extra command data
Christoph Hellwig [Fri, 27 Jan 2017 16:51:45 +0000 (09:51 -0700)]
block: allow specifying size for extra command data

This mirrors the blk-mq capabilities to allocate extra drivers-specific
data behind struct request by setting a cmd_size field, as well as having
a constructor / destructor for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblock: simplify blk_init_allocated_queue
Christoph Hellwig [Tue, 3 Jan 2017 11:52:44 +0000 (14:52 +0300)]
block: simplify blk_init_allocated_queue

Return an errno value instead of the passed in queue so that the callers
don't have to keep track of two queues, and move the assignment of the
request_fn and lock to the caller as passing them as argument doesn't
simplify anything.  While we're at it also remove two pointless NULL
assignments, given that the request structure is zeroed on allocation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblock: fix elevator init check
Christoph Hellwig [Wed, 25 Jan 2017 10:17:11 +0000 (11:17 +0100)]
block: fix elevator init check

We can't initalize the elevator fields for flushes as flush share space
in struct request with the elevator data.  But currently we can't
communicate that a request is a flush through blk_get_request as we
can only pass READ or WRITE, and the low-level code looks at the
possible NULL bio to check for a flush.

Fix this by allowing to pass any block op and flags, and by checking for
the flush flags in __get_request.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agomd: cleanup bio op / flags handling in raid1_write_request
Christoph Hellwig [Wed, 25 Jan 2017 10:15:20 +0000 (11:15 +0100)]
md: cleanup bio op / flags handling in raid1_write_request

No need for the local variables, the bio is still live and we can just
assign the bits we want directly.  Make me wonder why we can't assign
all the bio flags to start with.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoMerge branch 'for-4.11/block' into for-4.11/rq-refactor
Jens Axboe [Fri, 27 Jan 2017 22:08:31 +0000 (15:08 -0700)]
Merge branch 'for-4.11/block' into for-4.11/rq-refactor

Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblk-mq: fix debugfs compilation issues
Omar Sandoval [Fri, 27 Jan 2017 22:03:01 +0000 (15:03 -0700)]
blk-mq: fix debugfs compilation issues

This fixes a couple of problems:

1. In the !CONFIG_DEBUG_FS case, the stub definitions were bogus.
2. In the !CONFIG_BLOCK case, blk-mq-debugfs.c shouldn't be compiled at
   all.

Fix the stub definitions and add a CONFIG_BLK_DEBUG_FS Kconfig option.

Fixes: 07e4fead45e6 ("blk-mq: create debugfs directory tree")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Augment Kconfig description.

Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoMerge branch 'for-4.11/block' into for-next
Jens Axboe [Fri, 27 Jan 2017 16:09:00 +0000 (09:09 -0700)]
Merge branch 'for-4.11/block' into for-next

8 years agoMerge branch 'master' into for-next
Jens Axboe [Fri, 27 Jan 2017 16:08:54 +0000 (09:08 -0700)]
Merge branch 'master' into for-next

8 years agoblock: cleanup remaining manual checks for PREFLUSH|FUA
Jens Axboe [Fri, 27 Jan 2017 16:08:23 +0000 (09:08 -0700)]
block: cleanup remaining manual checks for PREFLUSH|FUA

Use op_is_flush() where applicable.

Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblk-mq-sched: add flush insertion into blk_mq_sched_insert_request()
Jens Axboe [Fri, 27 Jan 2017 08:00:47 +0000 (01:00 -0700)]
blk-mq-sched: add flush insertion into blk_mq_sched_insert_request()

Instead of letting the caller check this and handle the details
of inserting a flush request, put the logic in the scheduler
insertion function. This fixes direct flush insertion outside
of the usual make_request_fn calls, like from dm via
blk_insert_cloned_request().

Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblock: add a op_is_flush helper
Christoph Hellwig [Fri, 27 Jan 2017 15:30:47 +0000 (08:30 -0700)]
block: add a op_is_flush helper

This centralizes the checks for bios that needs to be go into the flush
state machine.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblk-mq-sched: change ->dispatch_requests() to ->dispatch_request()
Jens Axboe [Thu, 26 Jan 2017 19:40:07 +0000 (12:40 -0700)]
blk-mq-sched: change ->dispatch_requests() to ->dispatch_request()

When we invoke dispatch_requests(), the scheduler empties everything
into the passed in list. This isn't always a good thing, since it
means that we remove items that we could have potentially merged
with.

Change the function to dispatch single requests at the time. If
we do that, we can backoff exactly at the point where the device
can't consume more IO, and leave the rest with the scheduler for
better merging and future dispatch decision making.

Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Tested-by: Hannes Reinecke <hare@suse.com>
8 years agoblk-mq-sched: fix starvation for multiple hardware queues and shared tags
Jens Axboe [Thu, 26 Jan 2017 21:42:34 +0000 (14:42 -0700)]
blk-mq-sched: fix starvation for multiple hardware queues and shared tags

If we have both multiple hardware queues and shared tag map between
devices, we need to ensure that we propagate the hardware queue
restart bit higher up. This is because we can get into a situation
where we don't have any IO pending on a hardware queue, yet we fail
getting a tag to start new IO. If that happens, it's not enough to
mark the hardware queue as needing a restart, we need to bubble
that up to the higher level queue as well.

Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Tested-by: Hannes Reinecke <hare@suse.com>
8 years agoblk-mq: release driver tag on a requeue event
Jens Axboe [Thu, 26 Jan 2017 19:32:32 +0000 (12:32 -0700)]
blk-mq: release driver tag on a requeue event

We don't want to hold on to this resource when we have a scheduler
attached.

Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Tested-by: Hannes Reinecke <hare@suse.com>
8 years agoblk-mq: fix potential race in queue restart and driver tag allocation
Jens Axboe [Thu, 26 Jan 2017 19:50:36 +0000 (12:50 -0700)]
blk-mq: fix potential race in queue restart and driver tag allocation

Once we mark the queue as needing a restart, re-check if we can
get a driver tag. This fixes a theoretical issue where the needed
IO completes _after_ blk_mq_get_driver_tag() fails, but before we
manage to set the restart bit.

Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Tested-by: Hannes Reinecke <hare@suse.com>
8 years agoblk-mq: improve scheduler queue sync/async running
Jens Axboe [Thu, 26 Jan 2017 19:28:10 +0000 (12:28 -0700)]
blk-mq: improve scheduler queue sync/async running

We'll use the same criteria for whether we need to run the queue sync
or async when we have a scheduler, as we do without one.

Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Tested-by: Hannes Reinecke <hare@suse.com>
8 years agoMerge branch 'for-4.11/block' into for-next
Jens Axboe [Fri, 27 Jan 2017 15:18:09 +0000 (08:18 -0700)]
Merge branch 'for-4.11/block' into for-next

8 years agoblk-mq: move hctx and ctx counters from sysfs to debugfs
Omar Sandoval [Wed, 25 Jan 2017 16:06:49 +0000 (08:06 -0800)]
blk-mq: move hctx and ctx counters from sysfs to debugfs

These counters aren't as out-of-place in sysfs as the other stuff, but
debugfs is a slightly better home for them.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblk-mq: move hctx io_poll, stats, and dispatched from sysfs to debugfs
Omar Sandoval [Wed, 25 Jan 2017 16:06:48 +0000 (08:06 -0800)]
blk-mq: move hctx io_poll, stats, and dispatched from sysfs to debugfs

These statistics _might_ be useful to userspace, but it's better not to
commit to an ABI for these yet. Also, the dispatched file in sysfs
couldn't be cleared, so make it clearable like the others in debugfs.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblk-mq: add tags and sched_tags bitmaps to debugfs
Omar Sandoval [Wed, 25 Jan 2017 16:06:47 +0000 (08:06 -0800)]
blk-mq: add tags and sched_tags bitmaps to debugfs

These can be used to debug issues like tag leaks and stuck requests.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblk-mq: move tags and sched_tags info from sysfs to debugfs
Omar Sandoval [Wed, 25 Jan 2017 16:06:46 +0000 (08:06 -0800)]
blk-mq: move tags and sched_tags info from sysfs to debugfs

These are very tied to the blk-mq tag implementation, so exposing them
to sysfs isn't a great idea. Move the debugging information to debugfs
and add basic entries for the number of tags and the number of reserved
tags to sysfs.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblk-mq: export software queue pending map to debugfs
Omar Sandoval [Wed, 25 Jan 2017 16:06:45 +0000 (08:06 -0800)]
blk-mq: export software queue pending map to debugfs

This is useful for debugging problems where we've gotten stuck with
requests in the software queues.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agosbitmap: add helpers for dumping to a seq_file
Omar Sandoval [Wed, 25 Jan 2017 22:32:13 +0000 (14:32 -0800)]
sbitmap: add helpers for dumping to a seq_file

This is useful debugging information that will be used in the blk-mq
debugfs directory.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Changed 'weight' to 'busy'.

Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblk-mq: add extra request information to debugfs
Omar Sandoval [Wed, 25 Jan 2017 16:06:43 +0000 (08:06 -0800)]
blk-mq: add extra request information to debugfs

The request pointers by themselves aren't super useful.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblk-mq: move hctx->dispatch and ctx->rq_list from sysfs to debugfs
Omar Sandoval [Wed, 25 Jan 2017 16:06:42 +0000 (08:06 -0800)]
blk-mq: move hctx->dispatch and ctx->rq_list from sysfs to debugfs

These lists are only useful for debugging; they definitely don't belong
in sysfs. Putting them in debugfs also removes the limitation of a
single page of output.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblk-mq: add hctx->{state,flags} to debugfs
Omar Sandoval [Wed, 25 Jan 2017 16:06:41 +0000 (08:06 -0800)]
blk-mq: add hctx->{state,flags} to debugfs

hctx->state could come in handy for bugs where the hardware queue gets
stuck in the stopped state, and hctx->flags is just useful to know.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblk-mq: create debugfs directory tree
Omar Sandoval [Wed, 25 Jan 2017 16:06:40 +0000 (08:06 -0800)]
blk-mq: create debugfs directory tree

In preparation for putting blk-mq debugging information in debugfs,
create a directory tree mirroring the one in sysfs:

    # tree -d /sys/kernel/debug/block
    /sys/kernel/debug/block
    |-- nvme0n1
    |   `-- mq
    |       |-- 0
    |       |   `-- cpu0
    |       |-- 1
    |       |   `-- cpu1
    |       |-- 2
    |       |   `-- cpu2
    |       `-- 3
    |           `-- cpu3
    `-- vda
        `-- mq
            `-- 0
                |-- cpu0
                |-- cpu1
                |-- cpu2
                `-- cpu3

Also add the scaffolding for the actual files that will go in here,
either under the hardware queue or software queue directories.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoMerge branch 'for-4.11/block' into for-next
Jens Axboe [Thu, 26 Jan 2017 21:53:48 +0000 (14:53 -0700)]
Merge branch 'for-4.11/block' into for-next

8 years agoblk-mq-sched: check for successful allocation before assigning tag
Jens Axboe [Thu, 26 Jan 2017 21:52:20 +0000 (14:52 -0700)]
blk-mq-sched: check for successful allocation before assigning tag

We don't trigger this from the normal IO path, since we always use
blocking allocations from there. But Bart saw it testing multipath
dm, since that is a heavy user of atomic request allocations in
the map and clone path.

Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblk-mq: don't lose flags passed in to blk_mq_alloc_request()
Jens Axboe [Thu, 26 Jan 2017 19:22:11 +0000 (12:22 -0700)]
blk-mq: don't lose flags passed in to blk_mq_alloc_request()

If we come in from blk_mq_alloc_requst() with NOWAIT set in flags,
we must ensure that we don't later overwrite that in
blk_mq_sched_get_request(). Initialize alloc_data->flags before
passing it in.

Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agosysctl: fix proc_doulongvec_ms_jiffies_minmax()
Eric Dumazet [Thu, 26 Jan 2017 02:20:55 +0000 (18:20 -0800)]
sysctl: fix proc_doulongvec_ms_jiffies_minmax()

We perform the conversion between kernel jiffies and ms only when
exporting kernel value to user space.

We need to do the opposite operation when value is written by user.

Only matters when HZ != 1000

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoMerge tag 'pinctrl-v4.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw...
Linus Torvalds [Thu, 26 Jan 2017 17:08:49 +0000 (09:08 -0800)]
Merge tag 'pinctrl-v4.10-3' of git://git./linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
 "A bunch of pin control fixes for v4.10 that didn't get sent off until
  now, sorry for the delay.

  It's only driver fixes:

   - A bunch of fixes to the Intel drivers: broxton, baytrail. Bugs
     related to register offsets, IRQ, debounce functionality.

   - Fix a conflict amongst UART settings on the meson.

   - Fix the ethernet setting on the Uniphier.

   - A compilation warning squelched"

* tag 'pinctrl-v4.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: uniphier: fix Ethernet (RMII) pin-mux setting for LD20
  pinctrl: meson: fix uart_ao_b for GXBB and GXL/GXM
  pinctrl: amd: avoid maybe-uninitalized warning
  pinctrl: baytrail: Do not add all GPIOs to IRQ domain
  pinctrl: baytrail: Rectify debounce support
  pinctrl: intel: Set pin direction properly
  pinctrl: broxton: Use correct PADCFGLOCK offset

8 years agoMerge tag 'drm-fixes-for-v4.10-rc6-revert-one' of git://people.freedesktop.org/~airli...
Linus Torvalds [Thu, 26 Jan 2017 16:55:33 +0000 (08:55 -0800)]
Merge tag 'drm-fixes-for-v4.10-rc6-revert-one' of git://people.freedesktop.org/~airlied/linux

Pull drm revert from Dave Airlie:
 "Revert one patch missing some prereqs.

  One of the connector fixes was missing some prereqs, we have an
  alternate driver fix that should work that I'll send tomorrow.

  Today is a holiday here so quickly smashing this out"

Daniel Vetter explains:
 "I pushed a locking change to fix a nouveau rpm issue to -fixes that
  needed the connector_list rework. And that's only in -next, but I
  missed that. Dave has the revert in a pull, and he'll follow-up with
  the hack nouveau patch for 4.10, and then we'll reapply the proper fix
  again for -next and revert the hacks. A bit a mess, but should be
  sorted soon"

* tag 'drm-fixes-for-v4.10-rc6-revert-one' of git://people.freedesktop.org/~airlied/linux:
  Revert "drm/probe-helpers: Drop locking from poll_enable"

8 years agoRevert "drm/probe-helpers: Drop locking from poll_enable"
Dave Airlie [Wed, 25 Jan 2017 20:44:03 +0000 (06:44 +1000)]
Revert "drm/probe-helpers: Drop locking from poll_enable"

This reverts commit 3846fd9b86001bea171943cc3bb9222cb6da6b42.

There were some precursor commits missing for this around connector
locking, we should probably merge Lyude's nouveau avoid the problem patch.

8 years agoMerge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Linus Torvalds [Wed, 25 Jan 2017 18:25:36 +0000 (10:25 -0800)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost

Pull virtio/vhost fixes from Michael Tsirkin:

 - ARM DMA fixes

 - vhost vsock bugfix

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vring: Force use of DMA API for ARM-based systems with legacy devices
  virtio_mmio: Set DMA masks appropriately
  vhost/vsock: handle vhost_vq_init_access() error

8 years agoMerge branch 'for-4.11/block' into for-next
Jens Axboe [Wed, 25 Jan 2017 15:13:13 +0000 (08:13 -0700)]
Merge branch 'for-4.11/block' into for-next

8 years agoblk-mq: only apply active queue tag throttling for driver tags
Jens Axboe [Wed, 25 Jan 2017 15:11:38 +0000 (08:11 -0700)]
blk-mq: only apply active queue tag throttling for driver tags

If we have a scheduler attached, we have two sets of tags. We don't
want to apply our active queue throttling for the scheduler side
of tags, that only applies to driver tags since that's the resource
we need to dispatch an IO.

Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Wed, 25 Jan 2017 00:54:39 +0000 (16:54 -0800)]
Merge branch 'akpm' (patches from Andrew)

Merge fixes from Andrew Morton:
 "26 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (26 commits)
  MAINTAINERS: add Dan Streetman to zbud maintainers
  MAINTAINERS: add Dan Streetman to zswap maintainers
  mm: do not export ioremap_page_range symbol for external module
  mn10300: fix build error of missing fpu_save()
  romfs: use different way to generate fsid for BLOCK or MTD
  frv: add missing atomic64 operations
  mm, page_alloc: fix premature OOM when racing with cpuset mems update
  mm, page_alloc: move cpuset seqcount checking to slowpath
  mm, page_alloc: fix fast-path race with cpuset update or removal
  mm, page_alloc: fix check for NULL preferred_zone
  kernel/panic.c: add missing \n
  fbdev: color map copying bounds checking
  frv: add atomic64_add_unless()
  mm/mempolicy.c: do not put mempolicy before using its nodemask
  radix-tree: fix private list warnings
  Documentation/filesystems/proc.txt: add VmPin
  mm, memcg: do not retry precharge charges
  proc: add a schedule point in proc_pid_readdir()
  mm: alloc_contig: re-allow CMA to compact FS pages
  mm/slub.c: trace free objects at KERN_INFO
  ...

8 years agoMAINTAINERS: add Dan Streetman to zbud maintainers
Dan Streetman [Tue, 24 Jan 2017 23:18:57 +0000 (15:18 -0800)]
MAINTAINERS: add Dan Streetman to zbud maintainers

Add myself as zbud maintainer.

Link: http://lkml.kernel.org/r/20170124221705.26523-1-ddstreet@ieee.org
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoMAINTAINERS: add Dan Streetman to zswap maintainers
Dan Streetman [Tue, 24 Jan 2017 23:18:55 +0000 (15:18 -0800)]
MAINTAINERS: add Dan Streetman to zswap maintainers

Add myself as zswap maintainer.

Link: http://lkml.kernel.org/r/20170124212200.19052-1-ddstreet@ieee.org
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Acked-by: Seth Jennings <sjenning@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm: do not export ioremap_page_range symbol for external module
zhong jiang [Tue, 24 Jan 2017 23:18:52 +0000 (15:18 -0800)]
mm: do not export ioremap_page_range symbol for external module

Recently, I've found cases in which ioremap_page_range was used
incorrectly, in external modules, leading to crashes.  This can be
partly attributed to the fact that ioremap_page_range is lower-level,
with fewer protections, as compared to the other functions that an
external module would typically call.  Those include:

     ioremap_cache
     ioremap_nocache
     ioremap_prot
     ioremap_uc
     ioremap_wc
     ioremap_wt

...each of which wraps __ioremap_caller, which in turn provides a safer
way to achieve the mapping.

Therefore, stop EXPORT-ing ioremap_page_range.

Link: http://lkml.kernel.org/r/1485173220-29010-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Suggested-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomn10300: fix build error of missing fpu_save()
Randy Dunlap [Tue, 24 Jan 2017 23:18:49 +0000 (15:18 -0800)]
mn10300: fix build error of missing fpu_save()

When CONFIG_FPU is not enabled on arch/mn10300, <asm/switch_to.h> causes
a build error with a call to fpu_save():

  kernel/built-in.o: In function `.L410':
  core.c:(.sched.text+0x28a): undefined reference to `fpu_save'

Fix this by including <asm/fpu.h> in <asm/switch_to.h> so that an empty
static inline fpu_save() is defined.

Link: http://lkml.kernel.org/r/dc421c4f-4842-4429-1b99-92865c2f24b6@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoromfs: use different way to generate fsid for BLOCK or MTD
Coly Li [Tue, 24 Jan 2017 23:18:46 +0000 (15:18 -0800)]
romfs: use different way to generate fsid for BLOCK or MTD

Commit 8a59f5d25265 ("fs/romfs: return f_fsid for statfs(2)") generates
a 64bit id from sb->s_bdev->bd_dev.  This is only correct when romfs is
defined with CONFIG_ROMFS_ON_BLOCK.  If romfs is only defined with
CONFIG_ROMFS_ON_MTD, sb->s_bdev is NULL, referencing sb->s_bdev->bd_dev
will triger an oops.

Richard Weinberger points out that when CONFIG_ROMFS_BACKED_BY_BOTH=y,
both CONFIG_ROMFS_ON_BLOCK and CONFIG_ROMFS_ON_MTD are defined.
Therefore when calling huge_encode_dev() to generate a 64bit id, I use
the follow order to choose parameter,

- CONFIG_ROMFS_ON_BLOCK defined
  use sb->s_bdev->bd_dev
- CONFIG_ROMFS_ON_BLOCK undefined and CONFIG_ROMFS_ON_MTD defined
  use sb->s_dev when,
- both CONFIG_ROMFS_ON_BLOCK and CONFIG_ROMFS_ON_MTD undefined
  leave id as 0

When CONFIG_ROMFS_ON_MTD is defined and sb->s_mtd is not NULL, sb->s_dev
is set to a device ID generated by MTD_BLOCK_MAJOR and mtd index,
otherwise sb->s_dev is 0.

This is a try-best effort to generate a uniq file system ID, if all the
above conditions are not meet, f_fsid of this romfs instance will be 0.
Generally only one romfs can be built on single MTD block device, this
method is enough to identify multiple romfs instances in a computer.

Link: http://lkml.kernel.org/r/1482928596-115155-1-git-send-email-colyli@suse.de
Signed-off-by: Coly Li <colyli@suse.de>
Reported-by: Nong Li <nongli1031@gmail.com>
Tested-by: Nong Li <nongli1031@gmail.com>
Cc: Richard Weinberger <richard.weinberger@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agofrv: add missing atomic64 operations
Sudip Mukherjee [Tue, 24 Jan 2017 23:18:43 +0000 (15:18 -0800)]
frv: add missing atomic64 operations

Some more atomic64 operations were missing and as a result frv
allmodconfig was failing.  Add the missing operations.

Link: http://lkml.kernel.org/r/1485193844-12850-1-git-send-email-sudip.mukherjee@codethink.co.uk
Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm, page_alloc: fix premature OOM when racing with cpuset mems update
Vlastimil Babka [Tue, 24 Jan 2017 23:18:41 +0000 (15:18 -0800)]
mm, page_alloc: fix premature OOM when racing with cpuset mems update

Ganapatrao Kulkarni reported that the LTP test cpuset01 in stress mode
triggers OOM killer in few seconds, despite lots of free memory.  The
test attempts to repeatedly fault in memory in one process in a cpuset,
while changing allowed nodes of the cpuset between 0 and 1 in another
process.

The problem comes from insufficient protection against cpuset changes,
which can cause get_page_from_freelist() to consider all zones as
non-eligible due to nodemask and/or current->mems_allowed.  This was
masked in the past by sufficient retries, but since commit 682a3385e773
("mm, page_alloc: inline the fast path of the zonelist iterator") we fix
the preferred_zoneref once, and don't iterate over the whole zonelist in
further attempts, thus the only eligible zones might be placed in the
zonelist before our starting point and we always miss them.

A previous patch fixed this problem for current->mems_allowed.  However,
cpuset changes also update the task's mempolicy nodemask.  The fix has
two parts.  We have to repeat the preferred_zoneref search when we
detect cpuset update by way of seqcount, and we have to check the
seqcount before considering OOM.

[akpm@linux-foundation.org: fix typo in comment]
Link: http://lkml.kernel.org/r/20170120103843.24587-5-vbabka@suse.cz
Fixes: c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Ganapatrao Kulkarni <gpkulkarni@gmail.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm, page_alloc: move cpuset seqcount checking to slowpath
Vlastimil Babka [Tue, 24 Jan 2017 23:18:38 +0000 (15:18 -0800)]
mm, page_alloc: move cpuset seqcount checking to slowpath

This is a preparation for the following patch to make review simpler.
While the primary motivation is a bug fix, this also simplifies the fast
path, although the moved code is only enabled when cpusets are in use.

Link: http://lkml.kernel.org/r/20170120103843.24587-4-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Ganapatrao Kulkarni <gpkulkarni@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm, page_alloc: fix fast-path race with cpuset update or removal
Vlastimil Babka [Tue, 24 Jan 2017 23:18:35 +0000 (15:18 -0800)]
mm, page_alloc: fix fast-path race with cpuset update or removal

Ganapatrao Kulkarni reported that the LTP test cpuset01 in stress mode
triggers OOM killer in few seconds, despite lots of free memory.  The
test attempts to repeatedly fault in memory in one process in a cpuset,
while changing allowed nodes of the cpuset between 0 and 1 in another
process.

One possible cause is that in the fast path we find the preferred
zoneref according to current mems_allowed, so that it points to the
middle of the zonelist, skipping e.g.  zones of node 1 completely.  If
the mems_allowed is updated to contain only node 1, we never reach it in
the zonelist, and trigger OOM before checking the cpuset_mems_cookie.

This patch fixes the particular case by redoing the preferred zoneref
search if we switch back to the original nodemask.  The condition is
also slightly changed so that when the last non-root cpuset is removed,
we don't miss it.

Note that this is not a full fix, and more patches will follow.

Link: http://lkml.kernel.org/r/20170120103843.24587-3-vbabka@suse.cz
Fixes: 682a3385e773 ("mm, page_alloc: inline the fast path of the zonelist iterator")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Ganapatrao Kulkarni <gpkulkarni@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm, page_alloc: fix check for NULL preferred_zone
Vlastimil Babka [Tue, 24 Jan 2017 23:18:32 +0000 (15:18 -0800)]
mm, page_alloc: fix check for NULL preferred_zone

Patch series "fix premature OOM regression in 4.7+ due to cpuset races".

This is v2 of my attempt to fix the recent report based on LTP cpuset
stress test [1].  The intention is to go to stable 4.9 LTSS with this,
as triggering repeated OOMs is not nice.  That's why the patches try to
be not too intrusive.

Unfortunately why investigating I found that modifying the testcase to
use per-VMA policies instead of per-task policies will bring the OOM's
back, but that seems to be much older and harder to fix problem.  I have
posted a RFC [2] but I believe that fixing the recent regressions has a
higher priority.

Longer-term we might try to think how to fix the cpuset mess in a better
and less error prone way.  I was for example very surprised to learn,
that cpuset updates change not only task->mems_allowed, but also
nodemask of mempolicies.  Until now I expected the parameter to
alloc_pages_nodemask() to be stable.  I wonder why do we then treat
cpusets specially in get_page_from_freelist() and distinguish HARDWALL
etc, when there's unconditional intersection between mempolicy and
cpuset.  I would expect the nodemask adjustment for saving overhead in
g_p_f(), but that clearly doesn't happen in the current form.  So we
have both crazy complexity and overhead, AFAICS.

[1] https://lkml.kernel.org/r/CAFpQJXUq-JuEP=QPidy4p_=FN0rkH5Z-kfB4qBvsf6jMS87Edg@mail.gmail.com
[2] https://lkml.kernel.org/r/7c459f26-13a6-a817-e508-b65b903a8378@suse.cz

This patch (of 4):

Since commit c33d6c06f60f ("mm, page_alloc: avoid looking up the first
zone in a zonelist twice") we have a wrong check for NULL preferred_zone,
which can theoretically happen due to concurrent cpuset modification.  We
check the zoneref pointer which is never NULL and we should check the zone
pointer.  Also document this in first_zones_zonelist() comment per Michal
Hocko.

Fixes: c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
Link: http://lkml.kernel.org/r/20170120103843.24587-2-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Ganapatrao Kulkarni <gpkulkarni@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agokernel/panic.c: add missing \n
Jiri Slaby [Tue, 24 Jan 2017 23:18:29 +0000 (15:18 -0800)]
kernel/panic.c: add missing \n

When a system panics, the "Rebooting in X seconds.." message is never
printed because it lacks a new line.  Fix it.

Link: http://lkml.kernel.org/r/20170119114751.2724-1-jslaby@suse.cz
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agofbdev: color map copying bounds checking
Kees Cook [Tue, 24 Jan 2017 23:18:24 +0000 (15:18 -0800)]
fbdev: color map copying bounds checking

Copying color maps to userspace doesn't check the value of to->start,
which will cause kernel heap buffer OOB read due to signedness wraps.

CVE-2016-8405

Link: http://lkml.kernel.org/r/20170105224249.GA50925@beast
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Peter Pi (@heisecode) of Trend Micro
Cc: Min Chong <mchong@google.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agofrv: add atomic64_add_unless()
Sudip Mukherjee [Tue, 24 Jan 2017 23:18:21 +0000 (15:18 -0800)]
frv: add atomic64_add_unless()

The build of frv allmodconfig was failing with the error:
lib/atomic64_test.c:209:9: error:

implicit declaration of function 'atomic64_add_unless'

All the atomic64 operations were defined in frv, but
atomic64_add_unless() was not done.

Implement atomic64_add_unless() as done in other arches.

Link: http://lkml.kernel.org/r/1484781236-6698-1-git-send-email-sudipm.mukherjee@gmail.com
Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm/mempolicy.c: do not put mempolicy before using its nodemask
Vlastimil Babka [Tue, 24 Jan 2017 23:18:18 +0000 (15:18 -0800)]
mm/mempolicy.c: do not put mempolicy before using its nodemask

Since commit be97a41b291e ("mm/mempolicy.c: merge alloc_hugepage_vma to
alloc_pages_vma") alloc_pages_vma() can potentially free a mempolicy by
mpol_cond_put() before accessing the embedded nodemask by
__alloc_pages_nodemask().  The commit log says it's so "we can use a
single exit path within the function" but that's clearly wrong.  We can
still do that when doing mpol_cond_put() after the allocation attempt.

Make sure the mempolicy is not freed prematurely, otherwise
__alloc_pages_nodemask() can end up using a bogus nodemask, which could
lead e.g.  to premature OOM.

Fixes: be97a41b291e ("mm/mempolicy.c: merge alloc_hugepage_vma to alloc_pages_vma")
Link: http://lkml.kernel.org/r/20170118141124.8345-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org> [4.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoradix-tree: fix private list warnings
Matthew Wilcox [Tue, 24 Jan 2017 23:18:16 +0000 (15:18 -0800)]
radix-tree: fix private list warnings

The newly introduced warning in radix_tree_free_nodes() was testing the
wrong variable; it should have been 'old' instead of 'node'.

Fixes: ea07b862ac8e ("mm: workingset: fix use-after-free in shadow node shrinker")
Link: http://lkml.kernel.org/r/20170118163746.GA32495@cmpxchg.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoDocumentation/filesystems/proc.txt: add VmPin
Fabian Frederick [Tue, 24 Jan 2017 23:18:13 +0000 (15:18 -0800)]
Documentation/filesystems/proc.txt: add VmPin

Commit bc3e53f682d9 ("mm: distinguish between mlocked and pinned pages")
added VmPin in /proc/<pid>/status.  Report that in
Documentation/filesystems/proc.txt

Also move Umask after Name to keep correct order.

Link: http://lkml.kernel.org/r/20170114201219.30387-1-fabf@skynet.be
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm, memcg: do not retry precharge charges
David Rientjes [Tue, 24 Jan 2017 23:18:10 +0000 (15:18 -0800)]
mm, memcg: do not retry precharge charges

When memory.move_charge_at_immigrate is enabled and precharges are
depleted during move, mem_cgroup_move_charge_pte_range() will attempt to
increase the size of the precharge.

Prevent precharges from ever looping by setting __GFP_NORETRY.  This was
probably the intention of the GFP_KERNEL & ~__GFP_NORETRY, which is
pointless as written.

Fixes: 0029e19ebf84 ("mm: memcontrol: remove explicit OOM parameter in charge path")
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1701130208510.69402@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoproc: add a schedule point in proc_pid_readdir()
Eric Dumazet [Tue, 24 Jan 2017 23:18:07 +0000 (15:18 -0800)]
proc: add a schedule point in proc_pid_readdir()

We have seen proc_pid_readdir() invocations holding cpu for more than 50
ms.  Add a cond_resched() to be gentle with other tasks.

[akpm@linux-foundation.org: coding style fix]
Link: http://lkml.kernel.org/r/1484238380.15816.42.camel@edumazet-glaptop3.roam.corp.google.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm: alloc_contig: re-allow CMA to compact FS pages
Lucas Stach [Tue, 24 Jan 2017 23:18:05 +0000 (15:18 -0800)]
mm: alloc_contig: re-allow CMA to compact FS pages

Commit 73e64c51afc5 ("mm, compaction: allow compaction for GFP_NOFS
requests") changed compation to skip FS pages if not explicitly allowed
to touch them, but missed to update the CMA compact_control.

This leads to a very high isolation failure rate, crippling performance
of CMA even on a lightly loaded system.  Re-allow CMA to compact FS
pages by setting the correct GFP flags, restoring CMA behavior and
performance to the kernel 4.9 level.

Fixes: 73e64c51afc5 (mm, compaction: allow compaction for GFP_NOFS requests)
Link: http://lkml.kernel.org/r/20170113115155.24335-1-l.stach@pengutronix.de
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm/slub.c: trace free objects at KERN_INFO
Daniel Thompson [Tue, 24 Jan 2017 23:18:02 +0000 (15:18 -0800)]
mm/slub.c: trace free objects at KERN_INFO

Currently when trace is enabled (e.g.  slub_debug=T,kmalloc-128 ) the
trace messages are mostly output at KERN_INFO.  However the trace code
also calls print_section() to hexdump the head of a free object.  This
is hard coded to use KERN_ERR, meaning the console is deluged with trace
messages even if we've asked for quiet.

Fix this the obvious way but adding a level parameter to
print_section(), allowing calls from the trace code to use the same
trace level as other trace messages.

Link: http://lkml.kernel.org/r/20170113154850.518-1-daniel.thompson@linaro.org
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agouserfaultfd: fix SIGBUS resulting from false rwsem wakeups
Andrea Arcangeli [Tue, 24 Jan 2017 23:17:59 +0000 (15:17 -0800)]
userfaultfd: fix SIGBUS resulting from false rwsem wakeups

With >=32 CPUs the userfaultfd selftest triggered a graceful but
unexpected SIGBUS because VM_FAULT_RETRY was returned by
handle_userfault() despite the UFFDIO_COPY wasn't completed.

This seems caused by rwsem waking the thread blocked in
handle_userfault() and we can't run up_read() before the wait_event
sequence is complete.

Keeping the wait_even sequence identical to the first one, would require
running userfaultfd_must_wait() again to know if the loop should be
repeated, and it would also require retaking the rwsem and revalidating
the whole vma status.

It seems simpler to wait the targeted wakeup so that if false wakeups
materialize we still wait for our specific wakeup event, unless of
course there are signals or the uffd was released.

Debug code collecting the stack trace of the wakeup showed this:

  $ ./userfaultfd 100 99999
  nr_pages: 25600, nr_pages_per_cpu: 800
  bounces: 99998, mode: racing ver poll, userfaults: 32 35 90 232 30 138 69 82 34 30 139 40 40 31 20 19 43 13 15 28 27 38 21 43 56 22 1 17 31 8 4 2
  bounces: 99997, mode: rnd ver poll, Bus error (core dumped)

    save_stack_trace+0x2b/0x50
    try_to_wake_up+0x2a6/0x580
    wake_up_q+0x32/0x70
    rwsem_wake+0xe0/0x120
    call_rwsem_wake+0x1b/0x30
    up_write+0x3b/0x40
    vm_mmap_pgoff+0x9c/0xc0
    SyS_mmap_pgoff+0x1a9/0x240
    SyS_mmap+0x22/0x30
    entry_SYSCALL_64_fastpath+0x1f/0xbd
    0xffffffffffffffff
    FAULT_FLAG_ALLOW_RETRY missing 70
  CPU: 24 PID: 1054 Comm: userfaultfd Tainted: G        W       4.8.0+ #30
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
  Call Trace:
    dump_stack+0xb8/0x112
    handle_userfault+0x572/0x650
    handle_mm_fault+0x12cb/0x1520
    __do_page_fault+0x175/0x500
    trace_do_page_fault+0x61/0x270
    do_async_page_fault+0x19/0x90
    async_page_fault+0x25/0x30

This always happens when the main userfault selftest thread is running
clone() while glibc runs either mprotect or mmap (both taking mmap_sem
down_write()) to allocate the thread stack of the background threads,
while locking/userfault threads already run at full throttle and are
susceptible to false wakeups that may cause handle_userfault() to return
before than expected (which results in graceful SIGBUS at the next
attempt).

This was reproduced only with >=32 CPUs because the loop to start the
thread where clone() is too quick with fewer CPUs, while with 32 CPUs
there's already significant activity on ~32 locking and userfault
threads when the last background threads are started with clone().

This >=32 CPUs SMP race condition is likely reproducible only with the
selftest because of the much heavier userfault load it generates if
compared to real apps.

We'll have to allow "one more" VM_FAULT_RETRY for the WP support and a
patch floating around that provides it also hidden this problem but in
reality only is successfully at hiding the problem.

False wakeups could still happen again the second time
handle_userfault() is invoked, even if it's a so rare race condition
that getting false wakeups twice in a row is impossible to reproduce.
This full fix is needed for correctness, the only alternative would be
to allow VM_FAULT_RETRY to be returned infinitely.  With this fix the WP
support can stick to a strict "one more" VM_FAULT_RETRY logic (no need
of returning it infinite times to avoid the SIGBUS).

Link: http://lkml.kernel.org/r/20170111005535.13832-2-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Shubham Kumar Sharma <shubham.kumar.sharma@oracle.com>
Tested-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michael Rapoport <RAPOPORT@il.ibm.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agodrivers/memstick/core/memstick.c: avoid -Wnonnull warning
Arnd Bergmann [Tue, 24 Jan 2017 23:17:56 +0000 (15:17 -0800)]
drivers/memstick/core/memstick.c: avoid -Wnonnull warning

gcc-7 produces a harmless false-postive warning about a possible NULL
pointer access:

  drivers/memstick/core/memstick.c: In function 'h_memstick_read_dev_id':
  drivers/memstick/core/memstick.c:309:3: error: argument 2 null where non-null expected [-Werror=nonnull]
     memcpy(mrq->data, buf, mrq->data_len);

This can't happen because the caller sets the command to 'MS_TPC_READ_REG',
which causes the data direction to be 'READ' and the NULL pointer not
accessed.

As a simple workaround for the warning, we can pass a pointer to the
data that we actually want to read into.  This is not needed here, but
also harmless, and lets the compiler know that the access is ok.

Link: http://lkml.kernel.org/r/20170111144143.548867-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Alex Dubov <oakad@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agokernel/watchdog: prevent false hardlockup on overloaded system
Don Zickus [Tue, 24 Jan 2017 23:17:53 +0000 (15:17 -0800)]
kernel/watchdog: prevent false hardlockup on overloaded system

On an overloaded system, it is possible that a change in the watchdog
threshold can be delayed long enough to trigger a false positive.

This can easily be achieved by having a cpu spinning indefinitely on a
task, while another cpu updates watchdog threshold.

What happens is while trying to park the watchdog threads, the hrtimers
on the other cpus trigger and reprogram themselves with the new slower
watchdog threshold.  Meanwhile, the nmi watchdog is still programmed
with the old faster threshold.

Because the one cpu is blocked, it prevents the thread parking on the
other cpus from completing, which is needed to shutdown the nmi watchdog
and reprogram it correctly.  As a result, a false positive from the nmi
watchdog is reported.

Fix this by setting a park_in_progress flag to block all lockups until
the parking is complete.

Fix provided by Ulrich Obergfell.

[akpm@linux-foundation.org: s/park_in_progress/watchdog_park_in_progress/]
Link: http://lkml.kernel.org/r/1481041033-192236-1-git-send-email-dzickus@redhat.com
Signed-off-by: Don Zickus <dzickus@redhat.com>
Reviewed-by: Aaron Tomlin <atomlin@redhat.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agodax: fix build warnings with FS_DAX and !FS_IOMAP
Ross Zwisler [Tue, 24 Jan 2017 23:17:51 +0000 (15:17 -0800)]
dax: fix build warnings with FS_DAX and !FS_IOMAP

As reported by Arnd:

  https://lkml.org/lkml/2017/1/10/756

Compiling with the following configuration:

  # CONFIG_EXT2_FS is not set
  # CONFIG_EXT4_FS is not set
  # CONFIG_XFS_FS is not set
  # CONFIG_FS_IOMAP depends on the above filesystems, as is not set
  CONFIG_FS_DAX=y

generates build warnings about unused functions in fs/dax.c:

  fs/dax.c:878:12: warning: `dax_insert_mapping' defined but not used [-Wunused-function]
   static int dax_insert_mapping(struct address_space *mapping,
              ^~~~~~~~~~~~~~~~~~
  fs/dax.c:572:12: warning: `copy_user_dax' defined but not used [-Wunused-function]
   static int copy_user_dax(struct block_device *bdev, sector_t sector, size_t size,
              ^~~~~~~~~~~~~
  fs/dax.c:542:12: warning: `dax_load_hole' defined but not used [-Wunused-function]
   static int dax_load_hole(struct address_space *mapping, void **entry,
              ^~~~~~~~~~~~~
  fs/dax.c:312:14: warning: `grab_mapping_entry' defined but not used [-Wunused-function]
   static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index,
                ^~~~~~~~~~~~~~~~~~

Now that the struct buffer_head based DAX fault paths and I/O path have
been removed we really depend on iomap support being present for DAX.
Make this explicit by selecting FS_IOMAP if we compile in DAX support.

This allows us to remove conditional selections of FS_IOMAP when FS_DAX
was present for ext2 and ext4, and to remove an #ifdef in fs/dax.c.

Link: http://lkml.kernel.org/r/1484087383-29478-1-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp
Keno Fischer [Tue, 24 Jan 2017 23:17:48 +0000 (15:17 -0800)]
mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp

In commit 19be0eaffa3a ("mm: remove gup_flags FOLL_WRITE games from
__get_user_pages()"), the mm code was changed from unsetting FOLL_WRITE
after a COW was resolved to setting the (newly introduced) FOLL_COW
instead.  Simultaneously, the check in gup.c was updated to still allow
writes with FOLL_FORCE set if FOLL_COW had also been set.

However, a similar check in huge_memory.c was forgotten.  As a result,
remote memory writes to ro regions of memory backed by transparent huge
pages cause an infinite loop in the kernel (handle_mm_fault sets
FOLL_COW and returns 0 causing a retry, but follow_trans_huge_pmd bails
out immidiately because `(flags & FOLL_WRITE) && !pmd_write(*pmd)` is
true.

While in this state the process is stil SIGKILLable, but little else
works (e.g.  no ptrace attach, no other signals).  This is easily
reproduced with the following code (assuming thp are set to always):

    #include <assert.h>
    #include <fcntl.h>
    #include <stdint.h>
    #include <stdio.h>
    #include <string.h>
    #include <sys/mman.h>
    #include <sys/stat.h>
    #include <sys/types.h>
    #include <sys/wait.h>
    #include <unistd.h>

    #define TEST_SIZE 5 * 1024 * 1024

    int main(void) {
      int status;
      pid_t child;
      int fd = open("/proc/self/mem", O_RDWR);
      void *addr = mmap(NULL, TEST_SIZE, PROT_READ,
                        MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
      assert(addr != MAP_FAILED);
      pid_t parent_pid = getpid();
      if ((child = fork()) == 0) {
        void *addr2 = mmap(NULL, TEST_SIZE, PROT_READ | PROT_WRITE,
                           MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
        assert(addr2 != MAP_FAILED);
        memset(addr2, 'a', TEST_SIZE);
        pwrite(fd, addr2, TEST_SIZE, (uintptr_t)addr);
        return 0;
      }
      assert(child == waitpid(child, &status, 0));
      assert(WIFEXITED(status) && WEXITSTATUS(status) == 0);
      return 0;
    }

Fix this by updating follow_trans_huge_pmd in huge_memory.c analogously
to the update in gup.c in the original commit.  The same pattern exists
in follow_devmap_pmd.  However, we should not be able to reach that
check with FOLL_COW set, so add WARN_ONCE to make sure we notice if we
ever do.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20170106015025.GA38411@juliacomputing.com
Signed-off-by: Keno Fischer <keno@juliacomputing.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomemory_hotplug: make zone_can_shift() return a boolean value
Yasuaki Ishimatsu [Tue, 24 Jan 2017 23:17:45 +0000 (15:17 -0800)]
memory_hotplug: make zone_can_shift() return a boolean value

online_{kernel|movable} is used to change the memory zone to
ZONE_{NORMAL|MOVABLE} and online the memory.

To check that memory zone can be changed, zone_can_shift() is used.
Currently the function returns minus integer value, plus integer
value and 0. When the function returns minus or plus integer value,
it means that the memory zone can be changed to ZONE_{NORNAL|MOVABLE}.

But when the function returns 0, there are two meanings.

One of the meanings is that the memory zone does not need to be changed.
For example, when memory is in ZONE_NORMAL and onlined by online_kernel
the memory zone does not need to be changed.

Another meaning is that the memory zone cannot be changed. When memory
is in ZONE_NORMAL and onlined by online_movable, the memory zone may
not be changed to ZONE_MOVALBE due to memory online limitation(see
Documentation/memory-hotplug.txt). In this case, memory must not be
onlined.

The patch changes the return type of zone_can_shift() so that memory
online operation fails when memory zone cannot be changed as follows:

Before applying patch:
   # grep -A 35 "Node 2" /proc/zoneinfo
   Node 2, zone   Normal
   <snip>
      node_scanned  0
           spanned  8388608
           present  7864320
           managed  7864320
   # echo online_movable > memory4097/state
   # grep -A 35 "Node 2" /proc/zoneinfo
   Node 2, zone   Normal
   <snip>
      node_scanned  0
           spanned  8388608
           present  8388608
           managed  8388608

   online_movable operation succeeded. But memory is onlined as
   ZONE_NORMAL, not ZONE_MOVABLE.

After applying patch:
   # grep -A 35 "Node 2" /proc/zoneinfo
   Node 2, zone   Normal
   <snip>
      node_scanned  0
           spanned  8388608
           present  7864320
           managed  7864320
   # echo online_movable > memory4097/state
   bash: echo: write error: Invalid argument
   # grep -A 35 "Node 2" /proc/zoneinfo
   Node 2, zone   Normal
   <snip>
      node_scanned  0
           spanned  8388608
           present  7864320
           managed  7864320

   online_movable operation failed because of failure of changing
   the memory zone from ZONE_NORMAL to ZONE_MOVABLE

Fixes: df429ac03936 ("memory-hotplug: more general validation of zone during online")
Link: http://lkml.kernel.org/r/2f9c3837-33d7-b6e5-59c0-6ca4372b2d84@gmail.com
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Reviewed-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agovring: Force use of DMA API for ARM-based systems with legacy devices
Will Deacon [Fri, 20 Jan 2017 10:33:32 +0000 (10:33 +0000)]
vring: Force use of DMA API for ARM-based systems with legacy devices

Booting Linux on an ARM fastmodel containing an SMMU emulation results
in an unexpected I/O page fault from the legacy virtio-blk PCI device:

[    1.211721] arm-smmu-v3 2b400000.smmu: event 0x10 received:
[    1.211800] arm-smmu-v3 2b400000.smmu: 0x00000000fffff010
[    1.211880] arm-smmu-v3 2b400000.smmu: 0x0000020800000000
[    1.211959] arm-smmu-v3 2b400000.smmu: 0x00000008fa081002
[    1.212075] arm-smmu-v3 2b400000.smmu: 0x0000000000000000
[    1.212155] arm-smmu-v3 2b400000.smmu: event 0x10 received:
[    1.212234] arm-smmu-v3 2b400000.smmu: 0x00000000fffff010
[    1.212314] arm-smmu-v3 2b400000.smmu: 0x0000020800000000
[    1.212394] arm-smmu-v3 2b400000.smmu: 0x00000008fa081000
[    1.212471] arm-smmu-v3 2b400000.smmu: 0x0000000000000000

<system hangs failing to read partition table>

This is because the legacy virtio-blk device is behind an SMMU, so we
have consequently swizzled its DMA ops and configured the SMMU to
translate accesses. This then requires the vring code to use the DMA API
to establish translations, otherwise all transactions will result in
fatal faults and termination.

Given that ARM-based systems only see an SMMU if one is really present
(the topology is all described by firmware tables such as device-tree or
IORT), then we can safely use the DMA API for all legacy virtio devices.
Modern devices can advertise the prescense of an IOMMU using the
VIRTIO_F_IOMMU_PLATFORM feature flag.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: <stable@vger.kernel.org>
Fixes: 876945dbf649 ("arm64: Hook up IOMMU dma_ops")
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
8 years agovirtio_mmio: Set DMA masks appropriately
Robin Murphy [Tue, 10 Jan 2017 17:51:17 +0000 (17:51 +0000)]
virtio_mmio: Set DMA masks appropriately

Once DMA API usage is enabled, it becomes apparent that virtio-mmio is
inadvertently relying on the default 32-bit DMA mask, which leads to
problems like rapidly exhausting SWIOTLB bounce buffers.

Ensure that we set the appropriate 64-bit DMA mask whenever possible,
with the coherent mask suitably limited for the legacy vring as per
a0be1db4304f ("virtio_pci: Limit DMA mask to 44 bits for legacy virtio
devices").

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Reported-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Fixes: b42111382f0e ("virtio_mmio: Use the DMA API if enabled")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
8 years agovhost/vsock: handle vhost_vq_init_access() error
Stefan Hajnoczi [Thu, 19 Jan 2017 10:43:53 +0000 (10:43 +0000)]
vhost/vsock: handle vhost_vq_init_access() error

Propagate the error when vhost_vq_init_access() fails and set
vq->private_data to NULL.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
8 years agoMerge tag 'platform-drivers-x86-v4.10-4' of git://git.infradead.org/linux-platform...
Linus Torvalds [Tue, 24 Jan 2017 20:38:43 +0000 (12:38 -0800)]
Merge tag 'platform-drivers-x86-v4.10-4' of git://git.infradead.org/linux-platform-drivers-x86

Pull x86 platform-driver fixes from Andy Shevchenko:
 "This is my first pull request since I become a co-maintainer of
  Platform Drivers x86 subsystem. It's a bit bigger than usual due to
  material collected for almost two weeks in a row.

  MAINTAINERS:
   - Add myself to X86 PLATFORM DRIVERS as a co-maintainer

  ideapad-laptop:
   - handle ACPI event 1

  intel_mid_powerbtn:
   - Set IRQ_ONESHOT

  surface3-wmi:
   - fix uninitialized symbol
   - Shut up unused-function warning

  mlx-platform:
   - free first dev on error"

* tag 'platform-drivers-x86-v4.10-4' of git://git.infradead.org/linux-platform-drivers-x86:
  MAINTAINERS: Add myself to X86 PLATFORM DRIVERS as a co-maintainer
  platform/x86: ideapad-laptop: handle ACPI event 1
  platform/x86: intel_mid_powerbtn: Set IRQ_ONESHOT
  platform/x86: surface3-wmi: fix uninitialized symbol
  platform/x86: surface3-wmi: Shut up unused-function warning
  platform/x86: mlx-platform: free first dev on error

8 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm...
Linus Torvalds [Tue, 24 Jan 2017 20:21:51 +0000 (12:21 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ebiederm/user-namespace

Pull namespace fix from Eric Biederman:
 "This has a single brown bag fix.

  The possible deadlock with dec_pid_namespaces that I had thought was
  fixed earlier turned out only to have been moved. So instead of being
  cleaver this change takes ucounts_lock with irqs disabled. So
  dec_ucount can be used from any context without fear of deadlock.

  The items accounted for dec_ucount and inc_ucount are all
  comparatively heavy weight objects so I don't exepct this will have
  any measurable performance impact"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  userns: Make ucounts lock irq-safe

8 years agoMAINTAINERS: Add myself to X86 PLATFORM DRIVERS as a co-maintainer
Andy Shevchenko [Tue, 24 Jan 2017 15:22:01 +0000 (17:22 +0200)]
MAINTAINERS: Add myself to X86 PLATFORM DRIVERS as a co-maintainer

For last few months Darren and I are co-maintaining PDx86 subsystem.
Make this fact official by updating MAINTAINERS database.

Acked-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
8 years agoMerge branch 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux...
Linus Torvalds [Mon, 23 Jan 2017 21:51:59 +0000 (13:51 -0800)]
Merge branch 'stable' of git://git./linux/kernel/git/cmetcalf/linux-tile

Pull tile bugfix from Chris Metcalf:
 "This avoids an issue with short userspace reads for regset via ptrace"

* 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
  tile/ptrace: Preserve previous registers for short regset write

8 years agoMerge tag 'gpio-v4.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw...
Linus Torvalds [Mon, 23 Jan 2017 21:36:37 +0000 (13:36 -0800)]
Merge tag 'gpio-v4.10-3' of git://git./linux/kernel/git/linusw/linux-gpio

Pull GPIO fix from Linus Walleij:
 "A single lockdep fix, nothing else going on. This makes lockdep
  noiseless and work properly with threaded GPIO IRQchips.

  Summary:

  Fix a lockdep issue: the threaded irqchips also need their unique key,
  and take this opportunity to get rid of the horrible macro and replace
  it with a static inline"

* tag 'gpio-v4.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: provide lockdep keys for nested/unnested irqchips

8 years agoMerge tag 'drm-fixes-for-v4.10-rc6' of git://people.freedesktop.org/~airlied/linux
Linus Torvalds [Mon, 23 Jan 2017 21:10:50 +0000 (13:10 -0800)]
Merge tag 'drm-fixes-for-v4.10-rc6' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
 "drm fixes across the board.

  Okay holidays and LCA kinda caught up with me, I thought I'd get some
  of this dequeued last week, but Hobart was sunny and warm and not all
  gloomy and rainy as usual.

  This is a bit large, but not too much considering it's two weeks stuff
  from AMD and Intel.

  core:
   - one locking fix that helps with dynamic suspend/resume races

  i915:
   - mostly GVT updates, GVT was a recent introduction so fixes for it
     shouldn't cause any notable side effects.

  amdgpu:
   - a bunch of fixes for GPUs with a different memory controller design
     that need different firmware.

  exynos:
   - decon regression fixes

  msm:
   - two regression fixes

  etnaviv:
   - a workaround for an mmu bug that needs a lot more work.

  virtio:
   - sparse fix, and a maintainers update"

* tag 'drm-fixes-for-v4.10-rc6' of git://people.freedesktop.org/~airlied/linux: (56 commits)
  drm/exynos/decon5433: set STANDALONE_UPDATE_F on output enablement
  drm/exynos/decon5433: fix CMU programming
  drm/exynos/decon5433: do not disable video after reset
  drm/i915: Ignore bogus plane coordinates on SKL when the plane is not visible
  drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround.
  drm/amdgpu: add support for new hainan variants
  drm/radeon: add support for new hainan variants
  drm/amdgpu: change clock gating mode for uvd_v4.
  drm/amdgpu: fix program vce instance logic error.
  drm/amdgpu: fix bug set incorrect value to vce register
  Revert "drm/amdgpu: Only update the CUR_SIZE register when necessary"
  drm/msm: fix potential null ptr issue in non-iommu case
  drm/msm/mdp5: rip out plane->pending tracking
  drm/exynos/decon5433: set STANDALONE_UPDATE_F also if planes are disabled
  drm/exynos/decon5433: update shadow registers iff there are active windows
  drm/i915/gvt: rewrite gt reset handler using new function intel_gvt_reset_vgpu_locked
  drm/i915/gvt: fix vGPU instance reuse issues by vGPU reset function
  drm/i915/gvt: introduce intel_vgpu_reset_mmio() to reset mmio space
  drm/i915/gvt: move mmio init/clean function to mmio.c
  drm/i915/gvt: introduce intel_vgpu_reset_cfg_space to reset configuration space
  ...

8 years agouserns: Make ucounts lock irq-safe
Nikolay Borisov [Fri, 20 Jan 2017 13:21:35 +0000 (15:21 +0200)]
userns: Make ucounts lock irq-safe

The ucounts_lock is being used to protect various ucounts lifecycle
management functionalities. However, those services can also be invoked
when a pidns is being freed in an RCU callback (e.g. softirq context).
This can lead to deadlocks. There were already efforts trying to
prevent similar deadlocks in add7c65ca426 ("pid: fix lockdep deadlock
warning due to ucount_lock"), however they just moved the context
from hardirq to softrq. Fix this issue once and for all by explictly
making the lock disable irqs altogether.

Dmitry Vyukov <dvyukov@google.com> reported:

> I've got the following deadlock report while running syzkaller fuzzer
> on eec0d3d065bfcdf9cd5f56dd2a36b94d12d32297 of linux-next (on odroid
> device if it matters):
>
> =================================
> [ INFO: inconsistent lock state ]
> 4.10.0-rc3-next-20170112-xc2-dirty #6 Not tainted
> ---------------------------------
> inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
> swapper/2/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
>  (ucounts_lock){+.?...}, at: [<     inline     >] spin_lock
> ./include/linux/spinlock.h:302
>  (ucounts_lock){+.?...}, at: [<ffff2000081678c8>]
> put_ucounts+0x60/0x138 kernel/ucount.c:162
> {SOFTIRQ-ON-W} state was registered at:
> [<ffff2000081c82d8>] mark_lock+0x220/0xb60 kernel/locking/lockdep.c:3054
> [<     inline     >] mark_irqflags kernel/locking/lockdep.c:2941
> [<ffff2000081c97a8>] __lock_acquire+0x388/0x3260 kernel/locking/lockdep.c:3295
> [<ffff2000081cce24>] lock_acquire+0xa4/0x138 kernel/locking/lockdep.c:3753
> [<     inline     >] __raw_spin_lock ./include/linux/spinlock_api_smp.h:144
> [<ffff200009798128>] _raw_spin_lock+0x90/0xd0 kernel/locking/spinlock.c:151
> [<     inline     >] spin_lock ./include/linux/spinlock.h:302
> [<     inline     >] get_ucounts kernel/ucount.c:131
> [<ffff200008167c28>] inc_ucount+0x80/0x6c8 kernel/ucount.c:189
> [<     inline     >] inc_mnt_namespaces fs/namespace.c:2818
> [<ffff200008481850>] alloc_mnt_ns+0x78/0x3a8 fs/namespace.c:2849
> [<ffff200008487298>] create_mnt_ns+0x28/0x200 fs/namespace.c:2959
> [<     inline     >] init_mount_tree fs/namespace.c:3199
> [<ffff200009bd6674>] mnt_init+0x258/0x384 fs/namespace.c:3251
> [<ffff200009bd60bc>] vfs_caches_init+0x6c/0x80 fs/dcache.c:3626
> [<ffff200009bb1114>] start_kernel+0x414/0x460 init/main.c:648
> [<ffff200009bb01e8>] __primary_switched+0x6c/0x70 arch/arm64/kernel/head.S:456
> irq event stamp: 2316924
> hardirqs last  enabled at (2316924): [<     inline     >] rcu_do_batch
> kernel/rcu/tree.c:2911
> hardirqs last  enabled at (2316924): [<     inline     >]
> invoke_rcu_callbacks kernel/rcu/tree.c:3182
> hardirqs last  enabled at (2316924): [<     inline     >]
> __rcu_process_callbacks kernel/rcu/tree.c:3149
> hardirqs last  enabled at (2316924): [<ffff200008210414>]
> rcu_process_callbacks+0x7a4/0xc28 kernel/rcu/tree.c:3166
> hardirqs last disabled at (2316923): [<     inline     >] rcu_do_batch
> kernel/rcu/tree.c:2900
> hardirqs last disabled at (2316923): [<     inline     >]
> invoke_rcu_callbacks kernel/rcu/tree.c:3182
> hardirqs last disabled at (2316923): [<     inline     >]
> __rcu_process_callbacks kernel/rcu/tree.c:3149
> hardirqs last disabled at (2316923): [<ffff20000820fe80>]
> rcu_process_callbacks+0x210/0xc28 kernel/rcu/tree.c:3166
> softirqs last  enabled at (2316912): [<ffff20000811b4c4>]
> _local_bh_enable+0x4c/0x80 kernel/softirq.c:155
> softirqs last disabled at (2316913): [<     inline     >]
> do_softirq_own_stack ./include/linux/interrupt.h:488
> softirqs last disabled at (2316913): [<     inline     >]
> invoke_softirq kernel/softirq.c:371
> softirqs last disabled at (2316913): [<ffff20000811c994>]
> irq_exit+0x264/0x308 kernel/softirq.c:405
>
> other info that might help us debug this:
>  Possible unsafe locking scenario:
>
>        CPU0
>        ----
>   lock(ucounts_lock);
>   <Interrupt>
>     lock(ucounts_lock);
>
>  *** DEADLOCK ***
>
> 1 lock held by swapper/2/0:
>  #0:  (rcu_callback){......}, at: [<     inline     >] __rcu_reclaim
> kernel/rcu/rcu.h:108
>  #0:  (rcu_callback){......}, at: [<     inline     >] rcu_do_batch
> kernel/rcu/tree.c:2919
>  #0:  (rcu_callback){......}, at: [<     inline     >]
> invoke_rcu_callbacks kernel/rcu/tree.c:3182
>  #0:  (rcu_callback){......}, at: [<     inline     >]
> __rcu_process_callbacks kernel/rcu/tree.c:3149
>  #0:  (rcu_callback){......}, at: [<ffff200008210390>]
> rcu_process_callbacks+0x720/0xc28 kernel/rcu/tree.c:3166
>
> stack backtrace:
> CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.10.0-rc3-next-20170112-xc2-dirty #6
> Hardware name: Hardkernel ODROID-C2 (DT)
> Call trace:
> [<ffff20000808fa60>] dump_backtrace+0x0/0x440 arch/arm64/kernel/traps.c:500
> [<ffff20000808fec0>] show_stack+0x20/0x30 arch/arm64/kernel/traps.c:225
> [<ffff2000088a99e0>] dump_stack+0x110/0x168
> [<ffff2000082fa2b4>] print_usage_bug.part.27+0x49c/0x4bc
> kernel/locking/lockdep.c:2387
> [<     inline     >] print_usage_bug kernel/locking/lockdep.c:2357
> [<     inline     >] valid_state kernel/locking/lockdep.c:2400
> [<     inline     >] mark_lock_irq kernel/locking/lockdep.c:2617
> [<ffff2000081c89ec>] mark_lock+0x934/0xb60 kernel/locking/lockdep.c:3065
> [<     inline     >] mark_irqflags kernel/locking/lockdep.c:2923
> [<ffff2000081c9a60>] __lock_acquire+0x640/0x3260 kernel/locking/lockdep.c:3295
> [<ffff2000081cce24>] lock_acquire+0xa4/0x138 kernel/locking/lockdep.c:3753
> [<     inline     >] __raw_spin_lock ./include/linux/spinlock_api_smp.h:144
> [<ffff200009798128>] _raw_spin_lock+0x90/0xd0 kernel/locking/spinlock.c:151
> [<     inline     >] spin_lock ./include/linux/spinlock.h:302
> [<ffff2000081678c8>] put_ucounts+0x60/0x138 kernel/ucount.c:162
> [<ffff200008168364>] dec_ucount+0xf4/0x158 kernel/ucount.c:214
> [<     inline     >] dec_pid_namespaces kernel/pid_namespace.c:89
> [<ffff200008293dc8>] delayed_free_pidns+0x40/0xe0 kernel/pid_namespace.c:156
> [<     inline     >] __rcu_reclaim kernel/rcu/rcu.h:118
> [<     inline     >] rcu_do_batch kernel/rcu/tree.c:2919
> [<     inline     >] invoke_rcu_callbacks kernel/rcu/tree.c:3182
> [<     inline     >] __rcu_process_callbacks kernel/rcu/tree.c:3149
> [<ffff2000082103d8>] rcu_process_callbacks+0x768/0xc28 kernel/rcu/tree.c:3166
> [<ffff2000080821dc>] __do_softirq+0x324/0x6e0 kernel/softirq.c:284
> [<     inline     >] do_softirq_own_stack ./include/linux/interrupt.h:488
> [<     inline     >] invoke_softirq kernel/softirq.c:371
> [<ffff20000811c994>] irq_exit+0x264/0x308 kernel/softirq.c:405
> [<ffff2000081ecc28>] __handle_domain_irq+0xc0/0x150 kernel/irq/irqdesc.c:636
> [<ffff200008081c80>] gic_handle_irq+0x68/0xd8
> Exception stack(0xffff8000648e7dd0 to 0xffff8000648e7f00)
> 7dc0:                                   ffff8000648d4b3c 0000000000000007
> 7de0: 0000000000000000 1ffff0000c91a967 1ffff0000c91a967 1ffff0000c91a967
> 7e00: ffff20000a4b6b68 0000000000000001 0000000000000007 0000000000000001
> 7e20: 1fffe4000149ae90 ffff200009d35000 0000000000000000 0000000000000002
> 7e40: 0000000000000000 0000000000000000 0000000002624a1a 0000000000000000
> 7e60: 0000000000000000 ffff200009cbcd88 000060006d2ed000 0000000000000140
> 7e80: ffff200009cff000 ffff200009cb6000 ffff200009cc2020 ffff200009d2159d
> 7ea0: 0000000000000000 ffff8000648d4380 0000000000000000 ffff8000648e7f00
> 7ec0: ffff20000820a478 ffff8000648e7f00 ffff20000820a47c 0000000010000145
> 7ee0: 0000000000000140 dfff200000000000 ffffffffffffffff ffff20000820a478
> [<ffff2000080837f8>] el1_irq+0xb8/0x130 arch/arm64/kernel/entry.S:486
> [<     inline     >] arch_local_irq_restore
> ./arch/arm64/include/asm/irqflags.h:81
> [<ffff20000820a47c>] rcu_idle_exit+0x64/0xa8 kernel/rcu/tree.c:1030
> [<     inline     >] cpuidle_idle_call kernel/sched/idle.c:200
> [<ffff2000081bcbfc>] do_idle+0x1dc/0x2d0 kernel/sched/idle.c:243
> [<ffff2000081bd1cc>] cpu_startup_entry+0x24/0x28 kernel/sched/idle.c:345
> [<ffff200008099f8c>] secondary_start_kernel+0x2cc/0x358
> arch/arm64/kernel/smp.c:276
> [<000000000279f1a4>] 0x279f1a4

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: add7c65ca426 ("pid: fix lockdep deadlock warning due to ucount_lock")
Fixes: f333c700c610 ("pidns: Add a limit on the number of pid namespaces")
Cc: stable@vger.kernel.org
Link: https://www.spinics.net/lists/kernel/msg2426637.html
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
8 years agoMerge branch 'for-4.11/block' into for-next
Jens Axboe [Mon, 23 Jan 2017 15:32:33 +0000 (08:32 -0700)]
Merge branch 'for-4.11/block' into for-next

8 years agocfq-iosched: Adjust one function call together with a variable assignment
Markus Elfring [Sat, 21 Jan 2017 21:44:07 +0000 (22:44 +0100)]
cfq-iosched: Adjust one function call together with a variable assignment

The script "checkpatch.pl" pointed information out like the following.

ERROR: do not use assignment in if condition

Thus fix the affected source code place.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblk-throttle: Adjust two function calls together with a variable assignment
Markus Elfring [Sat, 21 Jan 2017 21:15:33 +0000 (22:15 +0100)]
blk-throttle: Adjust two function calls together with a variable assignment

The script "checkpatch.pl" pointed information out like the following.

ERROR: do not use assignment in if condition

Thus fix the affected source code places.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoblock: Initialize cfqq->ioprio_class in cfq_get_queue()
Alexander Potapenko [Mon, 23 Jan 2017 14:06:43 +0000 (15:06 +0100)]
block: Initialize cfqq->ioprio_class in cfq_get_queue()

KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of
uninitialized memory in cfq_init_cfqq():

==================================================================
BUG: KMSAN: use of unitialized memory
...
Call Trace:
 [<     inline     >] __dump_stack lib/dump_stack.c:15
 [<ffffffff8202ac97>] dump_stack+0x157/0x1d0 lib/dump_stack.c:51
 [<ffffffff813e9b65>] kmsan_report+0x205/0x360 ??:?
 [<ffffffff813eabbb>] __msan_warning+0x5b/0xb0 ??:?
 [<     inline     >] cfq_init_cfqq block/cfq-iosched.c:3754
 [<ffffffff8201e110>] cfq_get_queue+0xc80/0x14d0 block/cfq-iosched.c:3857
...
origin:
 [<ffffffff8103ab37>] save_stack_trace+0x27/0x50 arch/x86/kernel/stacktrace.c:67
 [<ffffffff813e836b>] kmsan_internal_poison_shadow+0xab/0x150 ??:?
 [<ffffffff813e88ab>] kmsan_poison_slab+0xbb/0x120 ??:?
 [<     inline     >] allocate_slab mm/slub.c:1627
 [<ffffffff813e533f>] new_slab+0x3af/0x4b0 mm/slub.c:1641
 [<     inline     >] new_slab_objects mm/slub.c:2407
 [<ffffffff813e0ef3>] ___slab_alloc+0x323/0x4a0 mm/slub.c:2564
 [<     inline     >] __slab_alloc mm/slub.c:2606
 [<     inline     >] slab_alloc_node mm/slub.c:2669
 [<ffffffff813dfb42>] kmem_cache_alloc_node+0x1d2/0x1f0 mm/slub.c:2746
 [<ffffffff8201d90d>] cfq_get_queue+0x47d/0x14d0 block/cfq-iosched.c:3850
...
==================================================================
(the line numbers are relative to 4.8-rc6, but the bug persists
upstream)

The uninitialized struct cfq_queue is created by kmem_cache_alloc_node()
and then passed to cfq_init_cfqq(), which accesses cfqq->ioprio_class
before it's initialized.

Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
8 years agoMerge tag 'drm-qemu-20170110' of git://git.kraxel.org/linux into drm-fixes
Dave Airlie [Sun, 22 Jan 2017 23:25:53 +0000 (09:25 +1000)]
Merge tag 'drm-qemu-20170110' of git://git.kraxel.org/linux into drm-fixes

drm-qemu: virtio sparse fix, MAINTAINERS updates.

* tag 'drm-qemu-20170110' of git://git.kraxel.org/linux:
  drm: flip cirrus driver status to "obsolete".
  drm: update MAINTAINERS for qemu drivers (bochs, cirrus, qxl, virtio-gpu)
  drm/virtio: fix framebuffer sparse warning

8 years agoMerge branch 'drm-etnaviv-fixes' of https://git.pengutronix.de/git/lst/linux into...
Dave Airlie [Sun, 22 Jan 2017 23:25:00 +0000 (09:25 +1000)]
Merge branch 'drm-etnaviv-fixes' of https://git.pengutronix.de/git/lst/linux into drm-fixes

a single fix for a FE hang after IOVA rollover on GC3000. This isn't
pretty, but is the minimal fix for the issue. A larger rework of the
code, that will also fix this issue properly, is currently in the works,
but that needs to wait for at least the next feature pull.

* 'drm-etnaviv-fixes' of https://git.pengutronix.de/git/lst/linux:
  drm/etnaviv: trick drm_mm into giving out a low IOVA

8 years agoMerge branch 'exynos-drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git...
Dave Airlie [Sun, 22 Jan 2017 23:14:36 +0000 (09:14 +1000)]
Merge branch 'exynos-drm-fixes' of git://git./linux/kernel/git/daeinki/drm-exynos into drm-fixes

Just regression fixups to resolve page fault issue of DECON device.

* 'exynos-drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos:
  drm/exynos/decon5433: set STANDALONE_UPDATE_F on output enablement
  drm/exynos/decon5433: fix CMU programming
  drm/exynos/decon5433: do not disable video after reset
  drm/exynos/decon5433: set STANDALONE_UPDATE_F also if planes are disabled
  drm/exynos/decon5433: update shadow registers iff there are active windows

8 years agoMerge branch 'drm-fixes-4.10' of git://people.freedesktop.org/~agd5f/linux into drm...
Dave Airlie [Sun, 22 Jan 2017 23:14:01 +0000 (09:14 +1000)]
Merge branch 'drm-fixes-4.10' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

A little bigger than usual since it's two weeks worth.  Highlights:
- Add support for new smc firmware on some new hainan variants
- add support for SI chips that require special mc firmware
- remove workarounds for issues fixed by new mc firmware
- fix a regression in cursor handling
- various VCE fixes
- fix for UVD clockgating

* 'drm-fixes-4.10' of git://people.freedesktop.org/~agd5f/linux:
  drm/amdgpu: add support for new hainan variants
  drm/radeon: add support for new hainan variants
  drm/amdgpu: change clock gating mode for uvd_v4.
  drm/amdgpu: fix program vce instance logic error.
  drm/amdgpu: fix bug set incorrect value to vce register
  Revert "drm/amdgpu: Only update the CUR_SIZE register when necessary"
  drm/amd/powerplay: refine vce dpm update code on Cz.
  drm/amdgpu: fix vm_fault_stop on gfx6
  drm/amd/powerplay: fix vce cg logic error on CZ/St.
  drm/radeon: drop the mclk quirk for hainan
  drm/radeon: drop oland quirks
  drm/amdgpu: drop the mclk quirk for hainan
  drm/amdgpu: drop oland quirks
  drm/amdgpu/si: load special ucode for certain MC configs
  drm/radeon/si: load special ucode for certain MC configs