Laura Abbott [Tue, 15 Mar 2016 21:55:02 +0000 (14:55 -0700)]
slub: fix/clean free_debug_processing return paths
Since commit
19c7ff9ecd89 ("slub: Take node lock during object free
checks") check_object has been incorrectly returning success as it
follows the out label which just returns the node.
Thanks to refactoring, the out and fail paths are now basically the
same. Combine the two into one and just use a single label.
Credit to Mathias Krause for the original work which inspired this
series
Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Laura Abbott [Tue, 15 Mar 2016 21:54:59 +0000 (14:54 -0700)]
slub: drop lock at the end of free_debug_processing
This series takes the suggestion of Christoph Lameter and only focuses
on optimizing the slow path where the debug processing runs. The two
main optimizations in this series are letting the consistency checks be
skipped and relaxing the cmpxchg restrictions when we are not doing
consistency checks. With hackbench -g 20 -l 1000 averaged over 100
runs:
Before slub_debug=P
mean 15.607
variance .086
stdev .294
After slub_debug=P
mean 10.836
variance .155
stdev .394
This still isn't as fast as what is in grsecurity unfortunately so there's
still work to be done. Profiling ___slab_alloc shows that 25-50% of time
is spent in deactivate_slab. I haven't looked too closely to see if this
is something that can be optimized. My plan for now is to focus on
getting all of this merged (if appropriate) before digging in to another
task.
This patch (of 4):
Currently, free_debug_processing has a comment "Keep node_lock to preserve
integrity until the object is actually freed". In actuallity, the lock is
dropped immediately in __slab_free. Rather than wait until __slab_free
and potentially throw off the unlikely marking, just drop the lock in
__slab_free. This also lets free_debug_processing take its own copy of
the spinlock flags rather than trying to share the ones from __slab_free.
Since there is no use for the node afterwards, change the return type of
free_debug_processing to return an int like alloc_debug_processing.
Credit to Mathias Krause for the original work which inspired this series
[akpm@linux-foundation.org: fix build]
Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Tue, 15 Mar 2016 21:54:56 +0000 (14:54 -0700)]
mm/slab: re-implement pfmemalloc support
Current implementation of pfmemalloc handling in SLAB has some problems.
1) pfmemalloc_active is set to true when there is just one or more
pfmemalloc slabs in the system, but it is cleared when there is no
pfmemalloc slab in one arbitrary kmem_cache. So, pfmemalloc_active
could be wrongly cleared.
2) Search to partial and free list doesn't happen when non-pfmemalloc
object are not found in cpu cache. Instead, allocating new slab
happens and it is not optimal.
3) Even after sk_memalloc_socks() is disabled, cpu cache would keep
pfmemalloc objects tagged with SLAB_OBJ_PFMEMALLOC. It isn't cleared
if sk_memalloc_socks() is disabled so it could cause problem.
4) If cpu cache is filled with pfmemalloc objects, it would cause slow
down non-pfmemalloc allocation.
To me, current pointer tagging approach looks complex and fragile so this
patch re-implement whole thing instead of fixing problems one by one.
Design principle for new implementation is that
1) Don't disrupt non-pfmemalloc allocation in fast path even if
sk_memalloc_socks() is enabled. It's more likely case than pfmemalloc
allocation.
2) Ensure that pfmemalloc slab is used only for pfmemalloc allocation.
3) Don't consider performance of pfmemalloc allocation in memory
deficiency state.
As a result, all pfmemalloc alloc/free in memory tight state will be
handled in slow-path. If there is non-pfmemalloc free object, it will be
returned first even for pfmemalloc user in fast-path so that performance
of pfmemalloc user isn't affected in normal case and pfmemalloc objects
will be kept as long as possible.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Tested-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Tue, 15 Mar 2016 21:54:53 +0000 (14:54 -0700)]
mm/slab: avoid returning values by reference
Returing values by reference is bad practice. Instead, just use
function return value.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Suggested-by: Christoph Lameter <cl@linux.com>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Tue, 15 Mar 2016 21:54:50 +0000 (14:54 -0700)]
mm/slab: introduce new slab management type, OBJFREELIST_SLAB
SLAB needs an array to manage freed objects in a slab. It is only used
if some objects are freed so we can use free object itself as this
array. This requires additional branch in somewhat critical lock path
to check if it is first freed object or not but that's all we need.
Benefits is that we can save extra memory usage and reduce some
computational overhead by allocating a management array when new slab is
created.
Code change is rather complex than what we can expect from the idea, in
order to handle debugging feature efficiently. If you want to see core
idea only, please remove '#if DEBUG' block in the patch.
Although this idea can apply to all caches whose size is larger than
management array size, it isn't applied to caches which have a
constructor. If such cache's object is used for management array,
constructor should be called for it before that object is returned to
user. I guess that overhead overwhelm benefit in that case so this idea
doesn't applied to them at least now.
For summary, from now on, slab management type is determined by
following logic.
1) if management array size is smaller than object size and no ctor, it
becomes OBJFREELIST_SLAB.
2) if management array size is smaller than leftover, it becomes
NORMAL_SLAB which uses leftover as a array.
3) if OFF_SLAB help to save memory than way 4), it becomes OFF_SLAB.
It allocate a management array from the other cache so memory waste
happens.
4) others become NORMAL_SLAB. It uses dedicated internal memory in a
slab as a management array so it causes memory waste.
In my system, without enabling CONFIG_DEBUG_SLAB, Almost caches become
OBJFREELIST_SLAB and NORMAL_SLAB (using leftover) which doesn't waste
memory. Following is the result of number of caches with specific slab
management type.
TOTAL = OBJFREELIST + NORMAL(leftover) + NORMAL + OFF
/Before/
126 = 0 + 60 + 25 + 41
/After/
126 = 97 + 12 + 15 + 2
Result shows that number of caches that doesn't waste memory increase
from 60 to 109.
I did some benchmarking and it looks that benefit are more than loss.
Kmalloc: Repeatedly allocate then free test
/Before/
[ 0.286809] 1. Kmalloc: Repeatedly allocate then free test
[ 1.143674] 100000 times kmalloc(32) -> 116 cycles kfree -> 78 cycles
[ 1.441726] 100000 times kmalloc(64) -> 121 cycles kfree -> 80 cycles
[ 1.815734] 100000 times kmalloc(128) -> 168 cycles kfree -> 85 cycles
[ 2.380709] 100000 times kmalloc(256) -> 287 cycles kfree -> 95 cycles
[ 3.101153] 100000 times kmalloc(512) -> 370 cycles kfree -> 117 cycles
[ 3.942432] 100000 times kmalloc(1024) -> 413 cycles kfree -> 156 cycles
[ 5.227396] 100000 times kmalloc(2048) -> 622 cycles kfree -> 248 cycles
[ 7.519793] 100000 times kmalloc(4096) -> 1102 cycles kfree -> 452 cycles
/After/
[ 1.205313] 100000 times kmalloc(32) -> 117 cycles kfree -> 78 cycles
[ 1.510526] 100000 times kmalloc(64) -> 124 cycles kfree -> 81 cycles
[ 1.827382] 100000 times kmalloc(128) -> 130 cycles kfree -> 84 cycles
[ 2.226073] 100000 times kmalloc(256) -> 177 cycles kfree -> 92 cycles
[ 2.814747] 100000 times kmalloc(512) -> 286 cycles kfree -> 112 cycles
[ 3.532952] 100000 times kmalloc(1024) -> 344 cycles kfree -> 141 cycles
[ 4.608777] 100000 times kmalloc(2048) -> 519 cycles kfree -> 210 cycles
[ 6.350105] 100000 times kmalloc(4096) -> 789 cycles kfree -> 391 cycles
In fact, I tested another idea implementing OBJFREELIST_SLAB with
extendable linked array through another freed object. It can remove
memory waste completely but it causes more computational overhead in
critical lock path and it seems that overhead outweigh benefit. So, this
patch doesn't include it.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Tue, 15 Mar 2016 21:54:47 +0000 (14:54 -0700)]
mm/slab: factor out debugging initialization in cache_init_objs()
cache_init_objs() will be changed in following patch and current form
doesn't fit well for that change. So, before doing it, this patch
separates debugging initialization. This would cause two loop iteration
when debugging is enabled, but, this overhead seems too light than debug
feature itself so effect may not be visible. This patch will greatly
simplify changes in cache_init_objs() in following patch.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Tue, 15 Mar 2016 21:54:44 +0000 (14:54 -0700)]
mm/slab: factor out slab list fixup code
Slab list should be fixed up after object is detached from the slab and
this happens at two places. They do exactly same thing. They will be
changed in the following patch, so, to reduce code duplication, this
patch factor out them and make it common function.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Tue, 15 Mar 2016 21:54:41 +0000 (14:54 -0700)]
mm/slab: make criteria for off slab determination robust and simple
To become an off slab, there are some constraints to avoid bootstrapping
problem and recursive call. This can be avoided differently by simply
checking that corresponding kmalloc cache is ready and it's not a off
slab. It would be more robust because static size checking can be
affected by cache size change or architecture type but dynamic checking
isn't.
One check 'freelist_cache->size > cachep->size / 2' is added to check
benefit of choosing off slab, because, now, there is no size constraint
which ensures enough advantage when selecting off slab.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Tue, 15 Mar 2016 21:54:38 +0000 (14:54 -0700)]
mm/slab: do not change cache size if debug pagealloc isn't possible
We can fail to setup off slab in some conditions. Even in this case,
debug pagealloc increases cache size to PAGE_SIZE in advance and it is
waste because debug pagealloc cannot work for it when it isn't the off
slab. To improve this situation, this patch checks first that this
cache with increased size is suitable for off slab. It actually
increases cache size when it is suitable for off-slab, so possible waste
is removed.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Tue, 15 Mar 2016 21:54:35 +0000 (14:54 -0700)]
mm/slab: clean up cache type determination
Current cache type determination code is open-code and looks not
understandable. Following patch will introduce one more cache type and
it would make code more complex. So, before it happens, this patch
abstracts these codes.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Tue, 15 Mar 2016 21:54:33 +0000 (14:54 -0700)]
mm/slab: align cache size first before determination of OFF_SLAB candidate
Finding suitable OFF_SLAB candidate is more related to aligned cache
size rather than original size. Same reasoning can be applied to the
debug pagealloc candidate. So, this patch moves up alignment fixup to
proper position. From that point, size is aligned so we can remove some
alignment fixups.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Tue, 15 Mar 2016 21:54:30 +0000 (14:54 -0700)]
mm/slab: put the freelist at the end of slab page
Currently, the freelist is at the front of slab page. This requires
extra space to meet object alignment requirement. If we put the
freelist at the end of a slab page, objects could start at page boundary
and will be at correct alignment. This is possible because freelist has
no alignment constraint itself.
This gives us two benefits: It removes extra memory space for the
freelist alignment and remove complex calculation at cache
initialization step. I can't think notable drawback here.
I mentioned that this would reduce extra memory space, but, this benefit
is rather theoretical because it can be applied to very few cases.
Following is the example cache type that can get benefit from this
change.
size align num before after
32 8 124 4100 4092
64 8 63 4103 4095
88 8 46 4102 4094
272 8 15 4103 4095
408 8 10 4098 4090
32 16 124 4108 4092
64 16 63 4111 4095
32 32 124 4124 4092
64 32 63 4127 4095
96 32 42 4106 4074
before means whole size for objects and aligned freelist before applying
patch and after shows the result of this patch.
Since before is more than 4096, number of object should decrease and
memory waste happens.
Anyway, this patch removes complex calculation so looks beneficial to
me.
[akpm@linux-foundation.org: fix kerneldoc]
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Tue, 15 Mar 2016 21:54:27 +0000 (14:54 -0700)]
mm/slab: remove object status buffer for DEBUG_SLAB_LEAK
Now, we don't use object status buffer in any setup. Remove it.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Tue, 15 Mar 2016 21:54:24 +0000 (14:54 -0700)]
mm/slab: alternative implementation for DEBUG_SLAB_LEAK
DEBUG_SLAB_LEAK is a debug option. It's current implementation requires
status buffer so we need more memory to use it. And, it cause
kmem_cache initialization step more complex.
To remove this extra memory usage and to simplify initialization step,
this patch implement this feature with another way.
When user requests to get slab object owner information, it marks that
getting information is started. And then, all free objects in caches
are flushed to corresponding slab page. Now, we can distinguish all
freed object so we can know all allocated objects, too. After
collecting slab object owner information on allocated objects, mark is
checked that there is no free during the processing. If true, we can be
sure that our information is correct so information is returned to user.
Although this way is rather complex, it has two important benefits
mentioned above. So, I think it is worth changing.
There is one drawback that it takes more time to get slab object owner
information but it is just a debug option so it doesn't matter at all.
To help review, this patch implements new way only. Following patch
will remove useless code.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Tue, 15 Mar 2016 21:54:21 +0000 (14:54 -0700)]
mm/slab: clean up DEBUG_PAGEALLOC processing code
Currently, open code for checking DEBUG_PAGEALLOC cache is spread to
some sites. It makes code unreadable and hard to change.
This patch cleans up this code. The following patch will change the
criteria for DEBUG_PAGEALLOC cache so this clean-up will help it, too.
[akpm@linux-foundation.org: fix build with CONFIG_DEBUG_PAGEALLOC=n]
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Tue, 15 Mar 2016 21:54:18 +0000 (14:54 -0700)]
mm/slab: use more appropriate condition check for debug_pagealloc
debug_pagealloc debugging is related to SLAB_POISON flag rather than
FORCED_DEBUG option, although FORCED_DEBUG option will enable
SLAB_POISON. Fix it.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Tue, 15 Mar 2016 21:54:15 +0000 (14:54 -0700)]
mm/slab: activate debug_pagealloc in SLAB when it is actually enabled
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Tue, 15 Mar 2016 21:54:12 +0000 (14:54 -0700)]
mm/slab: remove the checks for slab implementation bug
Some of "#if DEBUG" are for reporting slab implementation bug rather
than user usecase bug. It's not really needed because slab is stable
for a quite long time and it makes code too dirty. This patch remove
it.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Tue, 15 Mar 2016 21:54:09 +0000 (14:54 -0700)]
mm/slab: remove useless structure define
It is obsolete so remove it.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Tue, 15 Mar 2016 21:54:06 +0000 (14:54 -0700)]
mm/slab: fix stale code comment
This patchset implements a new freed object management way, that is,
OBJFREELIST_SLAB. Purpose of it is to reduce memory overhead in SLAB.
SLAB needs a array to manage freed objects in a slab. If there is
leftover after objects are packed into a slab, we can use it as a
management array, and, in this case, there is no memory waste. But, in
the other cases, we need to allocate extra memory for a management array
or utilize dedicated internal memory in a slab for it. Both cases
causes memory waste so it's not good.
With this patchset, freed object itself can be used for a management
array. So, memory waste could be reduced. Detailed idea and numbers
are described in last patch's commit description. Please refer it.
In fact, I tested another idea implementing OBJFREELIST_SLAB with
extendable linked array through another freed object. It can remove
memory waste completely but it causes more computational overhead in
critical lock path and it seems that overhead outweigh benefit. So,
this patchset doesn't include it. I will attach prototype just for a
reference.
This patch (of 16):
We use freelist_idx_t type for free object management whose size would be
smaller than size of unsigned int. Fix it.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jesper Dangaard Brouer [Tue, 15 Mar 2016 21:54:03 +0000 (14:54 -0700)]
mm: fix some spelling
Fix up trivial spelling errors, noticed while reading the code.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jesper Dangaard Brouer [Tue, 15 Mar 2016 21:54:00 +0000 (14:54 -0700)]
mm: new API kfree_bulk() for SLAB+SLUB allocators
This patch introduce a new API call kfree_bulk() for bulk freeing memory
objects not bound to a single kmem_cache.
Christoph pointed out that it is possible to implement freeing of
objects, without knowing the kmem_cache pointer as that information is
available from the object's page->slab_cache. Proposing to remove the
kmem_cache argument from the bulk free API.
Jesper demonstrated that these extra steps per object comes at a
performance cost. It is only in the case CONFIG_MEMCG_KMEM is compiled
in and activated runtime that these steps are done anyhow. The extra
cost is most visible for SLAB allocator, because the SLUB allocator does
the page lookup (virt_to_head_page()) anyhow.
Thus, the conclusion was to keep the kmem_cache free bulk API with a
kmem_cache pointer, but we can still implement a kfree_bulk() API fairly
easily. Simply by handling if kmem_cache_free_bulk() gets called with a
kmem_cache NULL pointer.
This does increase the code size a bit, but implementing a separate
kfree_bulk() call would likely increase code size even more.
Below benchmarks cost of alloc+free (obj size 256 bytes) on CPU i7-4790K
@ 4.00GHz, no PREEMPT and CONFIG_MEMCG_KMEM=y.
Code size increase for SLAB:
add/remove: 0/0 grow/shrink: 1/0 up/down: 74/0 (74)
function old new delta
kmem_cache_free_bulk 660 734 +74
SLAB fastpath: 87 cycles(tsc) 21.814
sz - fallback - kmem_cache_free_bulk - kfree_bulk
1 - 103 cycles 25.878 ns - 41 cycles 10.498 ns - 81 cycles 20.312 ns
2 - 94 cycles 23.673 ns - 26 cycles 6.682 ns - 42 cycles 10.649 ns
3 - 92 cycles 23.181 ns - 21 cycles 5.325 ns - 39 cycles 9.950 ns
4 - 90 cycles 22.727 ns - 18 cycles 4.673 ns - 26 cycles 6.693 ns
8 - 89 cycles 22.270 ns - 14 cycles 3.664 ns - 23 cycles 5.835 ns
16 - 88 cycles 22.038 ns - 14 cycles 3.503 ns - 22 cycles 5.543 ns
30 - 89 cycles 22.284 ns - 13 cycles 3.310 ns - 20 cycles 5.197 ns
32 - 88 cycles 22.249 ns - 13 cycles 3.420 ns - 20 cycles 5.166 ns
34 - 88 cycles 22.224 ns - 14 cycles 3.643 ns - 20 cycles 5.170 ns
48 - 88 cycles 22.088 ns - 14 cycles 3.507 ns - 20 cycles 5.203 ns
64 - 88 cycles 22.063 ns - 13 cycles 3.428 ns - 20 cycles 5.152 ns
128 - 89 cycles 22.483 ns - 15 cycles 3.891 ns - 23 cycles 5.885 ns
158 - 89 cycles 22.381 ns - 15 cycles 3.779 ns - 22 cycles 5.548 ns
250 - 91 cycles 22.798 ns - 16 cycles 4.152 ns - 23 cycles 5.967 ns
SLAB when enabling MEMCG_KMEM runtime:
- kmemcg fastpath: 130 cycles(tsc) 32.684 ns (step:0)
1 - 148 cycles 37.220 ns - 66 cycles 16.622 ns - 66 cycles 16.583 ns
2 - 141 cycles 35.510 ns - 51 cycles 12.820 ns - 58 cycles 14.625 ns
3 - 140 cycles 35.017 ns - 37 cycles 9.326 ns - 33 cycles 8.474 ns
4 - 137 cycles 34.507 ns - 31 cycles 7.888 ns - 33 cycles 8.300 ns
8 - 140 cycles 35.069 ns - 25 cycles 6.461 ns - 25 cycles 6.436 ns
16 - 138 cycles 34.542 ns - 23 cycles 5.945 ns - 22 cycles 5.670 ns
30 - 136 cycles 34.227 ns - 22 cycles 5.502 ns - 22 cycles 5.587 ns
32 - 136 cycles 34.253 ns - 21 cycles 5.475 ns - 21 cycles 5.324 ns
34 - 136 cycles 34.254 ns - 21 cycles 5.448 ns - 20 cycles 5.194 ns
48 - 136 cycles 34.075 ns - 21 cycles 5.458 ns - 21 cycles 5.367 ns
64 - 135 cycles 33.994 ns - 21 cycles 5.350 ns - 21 cycles 5.259 ns
128 - 137 cycles 34.446 ns - 23 cycles 5.816 ns - 22 cycles 5.688 ns
158 - 137 cycles 34.379 ns - 22 cycles 5.727 ns - 22 cycles 5.602 ns
250 - 138 cycles 34.755 ns - 24 cycles 6.093 ns - 23 cycles 5.986 ns
Code size increase for SLUB:
function old new delta
kmem_cache_free_bulk 717 799 +82
SLUB benchmark:
SLUB fastpath: 46 cycles(tsc) 11.691 ns (step:0)
sz - fallback - kmem_cache_free_bulk - kfree_bulk
1 - 61 cycles 15.486 ns - 53 cycles 13.364 ns - 57 cycles 14.464 ns
2 - 54 cycles 13.703 ns - 32 cycles 8.110 ns - 33 cycles 8.482 ns
3 - 53 cycles 13.272 ns - 25 cycles 6.362 ns - 27 cycles 6.947 ns
4 - 51 cycles 12.994 ns - 24 cycles 6.087 ns - 24 cycles 6.078 ns
8 - 50 cycles 12.576 ns - 21 cycles 5.354 ns - 22 cycles 5.513 ns
16 - 49 cycles 12.368 ns - 20 cycles 5.054 ns - 20 cycles 5.042 ns
30 - 49 cycles 12.273 ns - 18 cycles 4.748 ns - 19 cycles 4.758 ns
32 - 49 cycles 12.401 ns - 19 cycles 4.821 ns - 19 cycles 4.810 ns
34 - 98 cycles 24.519 ns - 24 cycles 6.154 ns - 24 cycles 6.157 ns
48 - 83 cycles 20.833 ns - 21 cycles 5.446 ns - 21 cycles 5.429 ns
64 - 75 cycles 18.891 ns - 20 cycles 5.247 ns - 20 cycles 5.238 ns
128 - 93 cycles 23.271 ns - 27 cycles 6.856 ns - 27 cycles 6.823 ns
158 - 102 cycles 25.581 ns - 30 cycles 7.714 ns - 30 cycles 7.695 ns
250 - 107 cycles 26.917 ns - 38 cycles 9.514 ns - 38 cycles 9.506 ns
SLUB when enabling MEMCG_KMEM runtime:
- kmemcg fastpath: 71 cycles(tsc) 17.897 ns (step:0)
1 - 85 cycles 21.484 ns - 78 cycles 19.569 ns - 75 cycles 18.938 ns
2 - 81 cycles 20.363 ns - 45 cycles 11.258 ns - 44 cycles 11.076 ns
3 - 78 cycles 19.709 ns - 33 cycles 8.354 ns - 32 cycles 8.044 ns
4 - 77 cycles 19.430 ns - 28 cycles 7.216 ns - 28 cycles 7.003 ns
8 - 101 cycles 25.288 ns - 23 cycles 5.849 ns - 23 cycles 5.787 ns
16 - 76 cycles 19.148 ns - 20 cycles 5.162 ns - 20 cycles 5.081 ns
30 - 76 cycles 19.067 ns - 19 cycles 4.868 ns - 19 cycles 4.821 ns
32 - 76 cycles 19.052 ns - 19 cycles 4.857 ns - 19 cycles 4.815 ns
34 - 121 cycles 30.291 ns - 25 cycles 6.333 ns - 25 cycles 6.268 ns
48 - 108 cycles 27.111 ns - 21 cycles 5.498 ns - 21 cycles 5.458 ns
64 - 100 cycles 25.164 ns - 20 cycles 5.242 ns - 20 cycles 5.229 ns
128 - 155 cycles 38.976 ns - 27 cycles 6.886 ns - 27 cycles 6.892 ns
158 - 132 cycles 33.034 ns - 30 cycles 7.711 ns - 30 cycles 7.728 ns
250 - 130 cycles 32.612 ns - 38 cycles 9.560 ns - 38 cycles 9.549 ns
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jesper Dangaard Brouer [Tue, 15 Mar 2016 21:53:56 +0000 (14:53 -0700)]
slab: implement bulk free in SLAB allocator
This patch implements the free side of bulk API for the SLAB allocator
kmem_cache_free_bulk(), and concludes the implementation of optimized
bulk API for SLAB allocator.
Benchmarked[1] cost of alloc+free (obj size 256 bytes) on CPU i7-4790K @
4.00GHz, with no debug options, no PREEMPT and CONFIG_MEMCG_KMEM=y but
no active user of kmemcg.
SLAB single alloc+free cost: 87 cycles(tsc) 21.814 ns with this
optimized config.
bulk- Current fallback - optimized SLAB bulk
1 - 102 cycles(tsc) 25.747 ns - 41 cycles(tsc) 10.490 ns - improved 59.8%
2 - 94 cycles(tsc) 23.546 ns - 26 cycles(tsc) 6.567 ns - improved 72.3%
3 - 92 cycles(tsc) 23.127 ns - 20 cycles(tsc) 5.244 ns - improved 78.3%
4 - 90 cycles(tsc) 22.663 ns - 18 cycles(tsc) 4.588 ns - improved 80.0%
8 - 88 cycles(tsc) 22.242 ns - 14 cycles(tsc) 3.656 ns - improved 84.1%
16 - 88 cycles(tsc) 22.010 ns - 13 cycles(tsc) 3.480 ns - improved 85.2%
30 - 89 cycles(tsc) 22.305 ns - 13 cycles(tsc) 3.303 ns - improved 85.4%
32 - 89 cycles(tsc) 22.277 ns - 13 cycles(tsc) 3.309 ns - improved 85.4%
34 - 88 cycles(tsc) 22.246 ns - 13 cycles(tsc) 3.294 ns - improved 85.2%
48 - 88 cycles(tsc) 22.121 ns - 13 cycles(tsc) 3.492 ns - improved 85.2%
64 - 88 cycles(tsc) 22.052 ns - 13 cycles(tsc) 3.411 ns - improved 85.2%
128 - 89 cycles(tsc) 22.452 ns - 15 cycles(tsc) 3.841 ns - improved 83.1%
158 - 89 cycles(tsc) 22.403 ns - 14 cycles(tsc) 3.746 ns - improved 84.3%
250 - 91 cycles(tsc) 22.775 ns - 16 cycles(tsc) 4.111 ns - improved 82.4%
Notice it is not recommended to do very large bulk operation with
this bulk API, because local IRQs are disabled in this period.
[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_bulk_test01.c
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jesper Dangaard Brouer [Tue, 15 Mar 2016 21:53:53 +0000 (14:53 -0700)]
slab: avoid running debug SLAB code with IRQs disabled for alloc_bulk
Move the call to cache_alloc_debugcheck_after() outside the IRQ disabled
section in kmem_cache_alloc_bulk().
When CONFIG_DEBUG_SLAB is disabled the compiler should remove this code.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jesper Dangaard Brouer [Tue, 15 Mar 2016 21:53:50 +0000 (14:53 -0700)]
slab: implement bulk alloc in SLAB allocator
This patch implements the alloc side of bulk API for the SLAB allocator.
Further optimization are still possible by changing the call to
__do_cache_alloc() into something that can return multiple objects.
This optimization is left for later, given end results already show in
the area of 80% speedup.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jesper Dangaard Brouer [Tue, 15 Mar 2016 21:53:47 +0000 (14:53 -0700)]
slab: use slab_post_alloc_hook in SLAB allocator shared with SLUB
Reviewers notice that the order in slab_post_alloc_hook() of
kmemcheck_slab_alloc() and kmemleak_alloc_recursive() gets swapped
compared to slab.c / SLAB allocator.
Also notice memset now occurs before calling kmemcheck_slab_alloc() and
kmemleak_alloc_recursive().
I assume this reordering of kmemcheck, kmemleak and memset is okay
because this is the order they are used by the SLUB allocator.
This patch completes the sharing of alloc_hook's between SLUB and SLAB.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jesper Dangaard Brouer [Tue, 15 Mar 2016 21:53:44 +0000 (14:53 -0700)]
mm: kmemcheck skip object if slab allocation failed
In the SLAB allocator kmemcheck_slab_alloc() is guarded against being
called in case the object is NULL. In SLUB allocator this NULL pointer
invocation can happen, which seems like an oversight.
Move the NULL pointer check into kmemcheck code (kmemcheck_slab_alloc)
so the check gets moved out of the fastpath, when not compiled with
CONFIG_KMEMCHECK.
This is a step towards sharing post_alloc_hook between SLUB and SLAB,
because slab_post_alloc_hook() does not perform this check before
calling kmemcheck_slab_alloc().
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jesper Dangaard Brouer [Tue, 15 Mar 2016 21:53:41 +0000 (14:53 -0700)]
slab: use slab_pre_alloc_hook in SLAB allocator shared with SLUB
Deduplicate code in SLAB allocator functions slab_alloc() and
slab_alloc_node() by using the slab_pre_alloc_hook() call, which is now
shared between SLUB and SLAB.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jesper Dangaard Brouer [Tue, 15 Mar 2016 21:53:38 +0000 (14:53 -0700)]
mm: fault-inject take over bootstrap kmem_cache check
Remove the SLAB specific function slab_should_failslab(), by moving the
check against fault-injection for the bootstrap slab, into the shared
function should_failslab() (used by both SLAB and SLUB).
This is a step towards sharing alloc_hook's between SLUB and SLAB.
This bootstrap slab "kmem_cache" is used for allocating struct
kmem_cache objects to the allocator itself.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jesper Dangaard Brouer [Tue, 15 Mar 2016 21:53:35 +0000 (14:53 -0700)]
mm/slab: move SLUB alloc hooks to common mm/slab.h
First step towards sharing alloc_hook's between SLUB and SLAB
allocators. Move the SLUB allocators *_alloc_hook to the common
mm/slab.h for internal slab definitions.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jesper Dangaard Brouer [Tue, 15 Mar 2016 21:53:32 +0000 (14:53 -0700)]
slub: clean up code for kmem cgroup support to kmem_cache_free_bulk
This change is primarily an attempt to make it easier to realize the
optimizations the compiler performs in-case CONFIG_MEMCG_KMEM is not
enabled.
Performance wise, even when CONFIG_MEMCG_KMEM is compiled in, the
overhead is zero. This is because, as long as no process have enabled
kmem cgroups accounting, the assignment is replaced by asm-NOP
operations. This is possible because memcg_kmem_enabled() uses a
static_key_false() construct.
It also helps readability as it avoid accessing the p[] array like:
p[size - 1] which "expose" that the array is processed backwards inside
helper function build_detached_freelist().
Lastly this also makes the code more robust, in error case like passing
NULL pointers in the array. Which were previously handled before commit
033745189b1b ("slub: add missing kmem cgroup support to
kmem_cache_free_bulk").
Fixes:
033745189b1b ("slub: add missing kmem cgroup support to kmem_cache_free_bulk")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Tue, 15 Mar 2016 21:53:29 +0000 (14:53 -0700)]
paride: make 'verbose' parameter an 'int' again
gcc-6.0 found an ancient bug in the paride driver, which had a
"module_param(verbose, bool, 0);" since before 2.6.12, but actually uses
it to accept '0', '1' or '2' as arguments:
drivers/block/paride/pd.c: In function 'pd_init_dev_parms':
drivers/block/paride/pd.c:298:29: warning: comparison of constant '1' with boolean expression is always false [-Wbool-compare]
#define DBMSG(msg) ((verbose>1)?(msg):NULL)
In 2012, Rusty did a cleanup patch that also changed the type of the
variable to 'bool', which introduced what is now a gcc warning.
This changes the type back to 'int' and adapts the module_param() line
instead, so it should work as documented in case anyone ever cares about
running the ancient driver with debugging.
Fixes:
90ab5ee94171 ("module_param: make bool parameters really bool (drivers & misc)")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Rusty Russell <rusty@rustcorp.com.au>
Cc: Tim Waugh <tim@cyberelk.net>
Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
San Mehat [Tue, 15 Mar 2016 21:53:26 +0000 (14:53 -0700)]
block: partition: add partition specific uevent callbacks for partition info
This patch has been carried in the Android tree for quite some time and
is one of the few patches required to get a mainline kernel up and
running with an exsiting Android userspace. So I wanted to submit it
for review and consideration if it should be merged.
For partitions, add new uevent parameters 'PARTN' which specifies the
partitions index in the table, and 'PARTNAME', which specifies PARTNAME
specifices the partition name of a partition device.
Android's userspace uses this for creating device node links from the
partition name and number, ie:
/dev/block/platform/soc/by-name/system
or
/dev/block/platform/soc/by-num/p1
One can see its usage here:
https://android.googlesource.com/platform/system/core/+/master/init/devices.cpp#355
and
https://android.googlesource.com/platform/system/core/+/master/init/devices.cpp#494
[john.stultz@linaro.org: dropped NPARTS and reworded commit message for context]
Signed-off-by: Dima Zavin <dima@android.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Rom Lemarchand <romlem@google.com>
Cc: Android Kernel Team <kernel-team@android.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: <harald@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jun Piao [Tue, 15 Mar 2016 21:53:23 +0000 (14:53 -0700)]
ocfs2/dlm: fix a variable overflow problem in dlmdomain.c
In dlm_send_join_cancels(), node is defined with type unsigned int, but
initialized with -1, this will lead variable overflow. Although this
won't cause any runtime problem, the code looks a little uncoordinated.
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jiufei Xue [Tue, 15 Mar 2016 21:53:20 +0000 (14:53 -0700)]
ocfs2: fix a tiny race that leads file system read-only
when o2hb detect a node down, it first set the dead node to recovery map
and create ocfs2rec which will replay journal for dead node. o2hb
thread then call dlm_do_local_recovery_cleanup() to delete the lock for
dead node. After the lock of dead node is gone, locks for other nodes
can be granted and may modify the meta data without replaying journal of
the dead node. The detail is described as follows.
N1 N2 N3(master)
modify the extent tree of
inode, and commit
dirty metadata to journal,
then goes down.
o2hb thread detects
N1 goes down, set
recovery map and
delete the lock of N1.
dlm_thread flush ast
for the lock of N2.
do not detect the death
of N1, so recovery map is
empty.
read inode from disk
without replaying
the journal of N1 and
modify the extent tree
of the inode that N1
had modified.
ocfs2rec recover the
journal of N1.
The modification of N2
is lost.
The modification of N1 and N2 are not serial, and it will lead to
read-only file system. We can set recovery_waiting flag to the lock
resource after delete the lock for dead node to prevent other node from
getting the lock before dlm recovery. After dlm recovery, the recovery
map on N2 is not empty, ocfs2_inode_lock_full_nested() will wait for ocfs2
recovery.
Signed-off-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
xuejiufei [Tue, 15 Mar 2016 21:53:17 +0000 (14:53 -0700)]
ocfs2/dlm: return EINVAL when the lockres on migration target is in DROPPING_REF state
If master migrate this lock resource to node when it happened to purge
it, a new lock resource will be created and inserted into hash list. If
then master goes down, the lock resource being purged is recovered, so
there exist two lock resource with different owner. So return error to
master if the lock resource is in DROPPING state, master will retry to
migrate this lock resource.
Signed-off-by: xuejiufei <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
xuejiufei [Tue, 15 Mar 2016 21:53:14 +0000 (14:53 -0700)]
ocfs2/dlm: clear DROPPING_REF flag when the master goes down
If the master goes down after return in-progress for deref message. The
lock resource on non-master node can not be purged. Clear the
DROPPING_REF flag and recovery it.
Signed-off-by: xuejiufei <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
xuejiufei [Tue, 15 Mar 2016 21:53:11 +0000 (14:53 -0700)]
ocfs2/dlm: return in progress if master can not clear the refmap bit right now
Master returns in-progress to non-master node when it can not clear the
refmap bit right now. And non-master node will not purge the lock
resource until receiving deref done message.
Signed-off-by: xuejiufei <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
xuejiufei [Tue, 15 Mar 2016 21:53:08 +0000 (14:53 -0700)]
ocfs2/dlm: add DEREF_DONE message
This series of patches is to fix the dis-order issue of setting/clearing
refmap bit described below.
Node 1 Node 2(master)
dlmlock
dlm_do_master_request
dlm_master_request_handler
-> dlm_lockres_set_refmap_bit
dlmlock succeed
dlmunlock succeed
dlm_purge_lockres
dlm_deref_handler
-> find lock resource is in
DLM_LOCK_RES_SETREF_INPROG state,
so dispatch a deref work
dlm_purge_lockres succeed.
call dlmlock again
dlm_do_master_request
dlm_master_request_handler
-> dlm_lockres_set_refmap_bit
deref work trigger, call
dlm_lockres_clear_refmap_bit
to clear Node 1 from refmap
dlm_purge_lockres succeed
dlm_send_remote_lock_request
return DLM_IVLOCKID because
the lockres is not exist
BUG if the lockres is $RECOVERY
This series of patches add a new message to keep the order of set and
clear. Other nodes can purge the lock resource only after the refmap bit
on master is cleared.
This patch is to add DEREF_DONE message and corresponding handler. Node
can purge the lock resource after receiving this message. As a new
message is added, so increase the minor number of dlm protocol version.
Signed-off-by: xuejiufei <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joseph Qi [Tue, 15 Mar 2016 21:53:05 +0000 (14:53 -0700)]
ocfs2/dlm: fix a typo in dlmcommon.h
Refer to cluster/tcp.h, NET_MAX_PAYLOAD_BYTES is a typo for
O2NET_MAX_PAYLOAD_BYTES.
Since currently DLM_MIG_LOCKRES_RESERVED is not actually used, it won't
cause any problem. But we'd better correct it for further use.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
jiangyiwen [Tue, 15 Mar 2016 21:53:01 +0000 (14:53 -0700)]
ocfs2: use spinlock_irqsave() to downconvert lock in ocfs2_osb_dump()
Commit
a75e9ccabd92 ("ocfs2: use spinlock irqsave for downconvert lock")
missed an unmodified place in ocfs2_osb_dump(), so it still exists a
deadlock scenario.
ocfs2_wake_downconvert_thread
ocfs2_rw_unlock
ocfs2_dio_end_io
dio_complete
.....
bio_endio
req_bio_endio
....
scsi_io_completion
blk_done_softirq
__do_softirq
do_softirq
irq_exit
do_IRQ
ocfs2_osb_dump
cat /sys/kernel/debug/ocfs2/${uuid}/fs_state
This patch still uses spin_lock_irqsave() - replace spin_lock() to solve
this situation.
Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
jiangyiwen [Tue, 15 Mar 2016 21:52:58 +0000 (14:52 -0700)]
ocfs2/cluster: replace the interrupt safe spinlocks with common ones
There actually no hardware or software interrupts in the context which
using o2hb_live_lock, so we don't need to worry about race conditions
caused by irq/softirq with spinlock held. Turning off irq is not good
for system performance after all. Just replace them with a non
interrupt safe function.
Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sudip Mukherjee [Tue, 15 Mar 2016 21:52:55 +0000 (14:52 -0700)]
blackfin: define dummy pgprot_writecombine for !MMU
blackfin allmodconfig build fails with the error:
../sound/core/pcm_native.c: In function 'snd_pcm_lib_default_mmap':
../sound/core/pcm_native.c:3386:24: error: implicit declaration of function 'pgprot_writecombine' [-Werror=implicit-function-declaration]
area->vm_page_prot = pgprot_writecombine(area->vm_page_prot);
^
../sound/core/pcm_native.c:3386:22: error: incompatible types when assigning to type 'pgprot_t {aka struct <anonymous>}' from type 'int'
area->vm_page_prot = pgprot_writecombine(area->vm_page_prot);
^
When !MMU, asm-generic will not define default pgprot_writecombine, so
blackfin needs to define it by itself.
The patch idea is from commit
65b9ab888cd7 ("arch/c6x/include/asm/pgtable.h:
define dummy pgprot_writecombine for !MMU")
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Cc: Steven Miao <realmz6@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sudip Mukherjee [Tue, 15 Mar 2016 21:52:52 +0000 (14:52 -0700)]
m32r: mm: fix build warning
While building we are getting warnings:
arch/m32r/mm/init.c:63:17: warning: unused variable 'low'
arch/m32r/mm/init.c:62:17: warning: unused variable 'max_dma'
max_dma and low are only used if CONFIG_MMU is defined. Lets declare
the variables inside the #ifdef.
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Zijlstra [Tue, 15 Mar 2016 21:52:49 +0000 (14:52 -0700)]
tags: Fix DEFINE_PER_CPU expansions
$ make tags
GEN tags
ctags: Warning: drivers/acpi/processor_idle.c:64: null expansion of name pattern "\1"
ctags: Warning: drivers/xen/events/events_2l.c:41: null expansion of name pattern "\1"
ctags: Warning: kernel/locking/lockdep.c:151: null expansion of name pattern "\1"
ctags: Warning: kernel/rcu/rcutorture.c:133: null expansion of name pattern "\1"
ctags: Warning: kernel/rcu/rcutorture.c:135: null expansion of name pattern "\1"
ctags: Warning: kernel/workqueue.c:323: null expansion of name pattern "\1"
ctags: Warning: net/ipv4/syncookies.c:53: null expansion of name pattern "\1"
ctags: Warning: net/ipv6/syncookies.c:44: null expansion of name pattern "\1"
ctags: Warning: net/rds/page.c:45: null expansion of name pattern "\1"
Which are all the result of the DEFINE_PER_CPU pattern:
scripts/tags.sh:200: '/\<DEFINE_PER_CPU([^,]*, *\([[:alnum:]_]*\)/\1/v/'
scripts/tags.sh:201: '/\<DEFINE_PER_CPU_SHARED_ALIGNED([^,]*, *\([[:alnum:]_]*\)/\1/v/'
The below cures them. All except the workqueue one are within reasonable
distance of the 80 char limit. TJ do you have any preference on how to
fix the wq one, or shall we just not care its too long?
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Geliang Tang [Tue, 15 Mar 2016 21:52:46 +0000 (14:52 -0700)]
init/main.c: use list_for_each_entry()
Use list_for_each_entry() instead of list_for_each() to simplify the code.
Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
James Bottomley [Tue, 15 Mar 2016 22:24:44 +0000 (15:24 -0700)]
Merge branch 'fixes' into misc
Linus Torvalds [Tue, 15 Mar 2016 20:50:29 +0000 (13:50 -0700)]
Merge branch 'smp-hotplug-for-linus' of git://git./linux/kernel/git/tip/tip
Pull cpu hotplug updates from Thomas Gleixner:
"This is the first part of the ongoing cpu hotplug rework:
- Initial implementation of the state machine
- Runs all online and prepare down callbacks on the plugged cpu and
not on some random processor
- Replaces busy loop waiting with completions
- Adds tracepoints so the states can be followed"
More detailed commentary on this work from an earlier email:
"What's wrong with the current cpu hotplug infrastructure?
- Asymmetry
The hotplug notifier mechanism is asymmetric versus the bringup and
teardown. This is mostly caused by the notifier mechanism.
- Largely undocumented dependencies
While some notifiers use explicitely defined notifier priorities,
we have quite some notifiers which use numerical priorities to
express dependencies without any documentation why.
- Control processor driven
Most of the bringup/teardown of a cpu is driven by a control
processor. While it is understandable, that preperatory steps,
like idle thread creation, memory allocation for and initialization
of essential facilities needs to be done before a cpu can boot,
there is no reason why everything else must run on a control
processor. Before this patch series, bringup looks like this:
Control CPU Booting CPU
do preparatory steps
kick cpu into life
do low level init
sync with booting cpu sync with control cpu
bring the rest up
- All or nothing approach
There is no way to do partial bringups. That's something which is
really desired because we waste e.g. at boot substantial amount of
time just busy waiting that the cpu comes to life. That's stupid
as we could very well do preparatory steps and the initial IPI for
other cpus and then go back and do the necessary low level
synchronization with the freshly booted cpu.
- Minimal debuggability
Due to the notifier based design, it's impossible to switch between
two stages of the bringup/teardown back and forth in order to test
the correctness. So in many hotplug notifiers the cancel
mechanisms are either not existant or completely untested.
- Notifier [un]registering is tedious
To [un]register notifiers we need to protect against hotplug at
every callsite. There is no mechanism that bringup/teardown
callbacks are issued on the online cpus, so every caller needs to
do it itself. That also includes error rollback.
What's the new design?
The base of the new design is a symmetric state machine, where both
the control processor and the booting/dying cpu execute a well
defined set of states. Each state is symmetric in the end, except
for some well defined exceptions, and the bringup/teardown can be
stopped and reversed at almost all states.
So the bringup of a cpu will look like this in the future:
Control CPU Booting CPU
do preparatory steps
kick cpu into life
do low level init
sync with booting cpu sync with control cpu
bring itself up
The synchronization step does not require the control cpu to wait.
That mechanism can be done asynchronously via a worker or some
other mechanism.
The teardown can be made very similar, so that the dying cpu cleans
up and brings itself down. Cleanups which need to be done after
the cpu is gone, can be scheduled asynchronously as well.
There is a long way to this, as we need to refactor the notion when a
cpu is available. Today we set the cpu online right after it comes
out of the low level bringup, which is not really correct.
The proper mechanism is to set it to available, i.e. cpu local
threads, like softirqd, hotplug thread etc. can be scheduled on that
cpu, and once it finished all booting steps, it's set to online, so
general workloads can be scheduled on it. The reverse happens on
teardown. First thing to do is to forbid scheduling of general
workloads, then teardown all the per cpu resources and finally shut it
off completely.
This patch series implements the basic infrastructure for this at the
core level. This includes the following:
- Basic state machine implementation with well defined states, so
ordering and prioritization can be expressed.
- Interfaces to [un]register state callbacks
This invokes the bringup/teardown callback on all online cpus with
the proper protection in place and [un]installs the callbacks in
the state machine array.
For callbacks which have no particular ordering requirement we have
a dynamic state space, so that drivers don't have to register an
explicit hotplug state.
If a callback fails, the code automatically does a rollback to the
previous state.
- Sysfs interface to drive the state machine to a particular step.
This is only partially functional today. Full functionality and
therefor testability will be achieved once we converted all
existing hotplug notifiers over to the new scheme.
- Run all CPU_ONLINE/DOWN_PREPARE notifiers on the booting/dying
processor:
Control CPU Booting CPU
do preparatory steps
kick cpu into life
do low level init
sync with booting cpu sync with control cpu
wait for boot
bring itself up
Signal completion to control cpu
In a previous step of this work we've done a full tree mechanical
conversion of all hotplug notifiers to the new scheme. The balance
is a net removal of about 4000 lines of code.
This is not included in this series, as we decided to take a
different approach. Instead of mechanically converting everything
over, we will do a proper overhaul of the usage sites one by one so
they nicely fit into the symmetric callback scheme.
I decided to do that after I looked at the ugliness of some of the
converted sites and figured out that their hotplug mechanism is
completely buggered anyway. So there is no point to do a
mechanical conversion first as we need to go through the usage
sites one by one again in order to achieve a full symmetric and
testable behaviour"
* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
cpu/hotplug: Document states better
cpu/hotplug: Fix smpboot thread ordering
cpu/hotplug: Remove redundant state check
cpu/hotplug: Plug death reporting race
rcu: Make CPU_DYING_IDLE an explicit call
cpu/hotplug: Make wait for dead cpu completion based
cpu/hotplug: Let upcoming cpu bring itself fully up
arch/hotplug: Call into idle with a proper state
cpu/hotplug: Move online calls to hotplugged cpu
cpu/hotplug: Create hotplug threads
cpu/hotplug: Split out the state walk into functions
cpu/hotplug: Unpark smpboot threads from the state machine
cpu/hotplug: Move scheduler cpu_online notifier to hotplug core
cpu/hotplug: Implement setup/removal interface
cpu/hotplug: Make target state writeable
cpu/hotplug: Add sysfs state interface
cpu/hotplug: Hand in target state to _cpu_up/down
cpu/hotplug: Convert the hotplugged cpu work to a state machine
cpu/hotplug: Convert to a state machine for the control processor
cpu/hotplug: Add tracepoints
...
Linus Torvalds [Tue, 15 Mar 2016 19:48:48 +0000 (12:48 -0700)]
Merge branch 'irq-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"The 4.6 pile of irq updates contains:
- Support for IPI irqdomains to support proper integration of IPIs to
and from coprocessors. The first user of this new facility is
MIPS. The relevant MIPS patches come with the core to avoid merge
ordering issues and have been acked by Ralf.
- A new command line option to set the default interrupt affinity
mask at boot time.
- Support for some more new ARM and MIPS interrupt controllers:
tango, alpine-msix and bcm6345-l1
- Two small cleanups for x86/apic which we merged into irq/core to
avoid yet another branch in x86 with two tiny commits.
- The usual set of updates, cleanups in drivers/irqchip. Mostly in
the area of ARM-GIC, arada-37-xp and atmel chips. Nothing
outstanding here"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
irqchip/irq-alpine-msi: Release the correct domain on error
irqchip/mxs: Fix error check of of_io_request_and_map()
irqchip/sunxi-nmi: Fix error check of of_io_request_and_map()
genirq: Export IRQ functions for module use
irqchip/gic/realview: Support more RealView DCC variants
Documentation/bindings: Document the Alpine MSIX driver
irqchip: Add the Alpine MSIX interrupt controller
irqchip/gic-v3: Always return IRQ_SET_MASK_OK_DONE in gic_set_affinity
irqchip/gic-v3-its: Mark its_init() and its children as __init
irqchip/gic-v3: Remove gic_root_node variable from the ITS code
irqchip/gic-v3: ACPI: Add redistributor support via GICC structures
irqchip/gic-v3: Add ACPI support for GICv3/4 initialization
irqchip/gic-v3: Refactor gic_of_init() for GICv3 driver
x86/apic: Deinline _flat_send_IPI_mask, save ~150 bytes
x86/apic: Deinline __default_send_IPI_*, save ~200 bytes
dt-bindings: interrupt-controller: Add SoC-specific compatible string to Marvell ODMI
irqchip/mips-gic: Add new DT property to reserve IPIs
MIPS: Delete smp-gic.c
MIPS: Make smp CMP, CPS and MT use the new generic IPI functions
MIPS: Add generic SMP IPI support
...
Linus Torvalds [Tue, 15 Mar 2016 19:13:56 +0000 (12:13 -0700)]
Merge branch 'timers-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"The timer department delivers this time:
- Support for cross clock domain timestamps in the core code plus a
first user. That allows more precise timestamping for PTP and
later for audio and other peripherals.
The ptp/e1000e patches have been acked by the relevant maintainers
and are carried in the timer tree to avoid merge ordering issues.
- Support for unregistering the current clocksource watchdog. That
lifts a limitation for switching clocksources which has been there
from day 1
- The usual pile of fixes and updates to the core and the drivers.
Nothing outstanding and exciting"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
time/timekeeping: Work around false positive GCC warning
e1000e: Adds hardware supported cross timestamp on e1000e nic
ptp: Add PTP_SYS_OFFSET_PRECISE for driver crosstimestamping
x86/tsc: Always Running Timer (ART) correlated clocksource
hrtimer: Revert CLOCK_MONOTONIC_RAW support
time: Add history to cross timestamp interface supporting slower devices
time: Add driver cross timestamp interface for higher precision time synchronization
time: Remove duplicated code in ktime_get_raw_and_real()
time: Add timekeeping snapshot code capturing system time and counter
time: Add cycles to nanoseconds translation
jiffies: Use CLOCKSOURCE_MASK instead of constant
clocksource: Introduce clocksource_freq2mult()
clockevents/drivers/exynos_mct: Implement ->set_state_oneshot_stopped()
clockevents/drivers/arm_global_timer: Implement ->set_state_oneshot_stopped()
clockevents/drivers/arm_arch_timer: Implement ->set_state_oneshot_stopped()
clocksource/drivers/arm_global_timer: Register delay timer
clocksource/drivers/lpc32xx: Support timer-based ARM delay
clocksource/drivers/lpc32xx: Support periodic mode
clocksource/drivers/lpc32xx: Don't use the prescaler counter for clockevents
clocksource/drivers/rockchip: Add err handle for rk_timer_init
...
Linus Torvalds [Tue, 15 Mar 2016 18:38:05 +0000 (11:38 -0700)]
Merge branch 'core-rcu-for-linus' of git://git./linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
"The main changes in this cycle were:
- Miscellaneous fixes, cleanups, restructuring.
- RCU torture-test updates"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rcu: Export rcu_gp_is_normal()
rcu: Remove rcu_user_hooks_switch
rcu: Catch up rcu_report_qs_rdp() comment with reality
rcu: Document unique-name limitation for DEFINE_STATIC_SRCU()
rcu: Make rcu/tiny_plugin.h explicitly non-modular
irq: Privatize irq_common_data::state_use_accessors
RCU: Privatize rcu_node::lock
sparse: Add __private to privatize members of structs
rcu: Remove useless rcu_data_p when !PREEMPT_RCU
rcutorture: Correct no-expedite console messages
rcu: Set rdp->gpwrap when CPU is idle
rcu: Stop treating in-kernel CPU-bound workloads as errors
rcu: Update rcu_report_qs_rsp() comment
rcu: Assign false instead of 0 for ->core_needs_qs
rcutorture: Check for self-detected stalls
rcutorture: Don't keep empty console.log.diags files
rcutorture: Add checks for rcutorture writer starvation
Linus Torvalds [Tue, 15 Mar 2016 18:29:24 +0000 (11:29 -0700)]
Merge branch 'x86-timers-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 timer update from Ingo Molnar:
"A single simplification of the x86 TSC code"
* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tsc: Use topology functions
Linus Torvalds [Tue, 15 Mar 2016 18:20:44 +0000 (11:20 -0700)]
Merge branch 'x86-platform-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 core platform updates from Ingo Molnar:
"Intel Quark and Geode SoC platform updates"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/platform/intel/quark: Drop IMR lock bit support
x86/platform/intel/mid: Remove dead code
x86/platform: Make platform/geode/net5501.c explicitly non-modular
x86/platform: Make platform/geode/alix.c explicitly non-modular
x86/platform: Make platform/geode/geos.c explicitly non-modular
x86/platform: Make platform/intel-quark/imr_selftest.c explicitly non-modular
x86/platform: Make platform/intel-quark/imr.c explicitly non-modular
Linus Torvalds [Tue, 15 Mar 2016 17:45:39 +0000 (10:45 -0700)]
Merge branch 'x86-mm-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
"The main changes in this cycle were:
- Enable full ASLR randomization for 32-bit programs (Hector
Marco-Gisbert)
- Add initial minimal INVPCI support, to flush global mappings (Andy
Lutomirski)
- Add KASAN enhancements (Andrey Ryabinin)
- Fix mmiotrace for huge pages (Karol Herbst)
- ... misc cleanups and small enhancements"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm/32: Enable full randomization on i386 and X86_32
x86/mm/kmmio: Fix mmiotrace for hugepages
x86/mm: Avoid premature success when changing page attributes
x86/mm/ptdump: Remove paravirt_enabled()
x86/mm: Fix INVPCID asm constraint
x86/dmi: Switch dmi_remap() from ioremap() [uncached] to ioremap_cache()
x86/mm: If INVPCID is available, use it to flush global mappings
x86/mm: Add a 'noinvpcid' boot option to turn off INVPCID
x86/mm: Add INVPCID helpers
x86/kasan: Write protect kasan zero shadow
x86/kasan: Clear kasan_zero_page after TLB flush
x86/mm/numa: Check for failures in numa_clear_kernel_node_hotplug()
x86/mm/numa: Clean up numa_clear_kernel_node_hotplug()
x86/mm: Make kmap_prot into a #define
x86/mm/32: Set NX in __supported_pte_mask before enabling paging
x86/mm: Streamline and restore probe_memory_block_size()
Linus Torvalds [Tue, 15 Mar 2016 17:39:22 +0000 (10:39 -0700)]
Merge branch 'x86-microcode-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 microcode updates from Ingo Molnar:
"The biggest change in this cycle was the separation of the microcode
loading mechanism from the initrd code plus the support of built-in
microcode images.
There were also lots cleanups and general restructuring (by Borislav
Petkov)"
* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86/microcode/intel: Drop orig_sum from ext signature checksum
x86/microcode/intel: Improve microcode sanity-checking error messages
x86/microcode/intel: Merge two consecutive if-statements
x86/microcode/intel: Get rid of DWSIZE
x86/microcode/intel: Change checksum variables to u32
x86/microcode: Use kmemdup() rather than duplicating its implementation
x86/microcode: Remove unnecessary paravirt_enabled check
x86/microcode: Document builtin microcode loading method
x86/microcode/AMD: Issue microcode updated message later
x86/microcode/intel: Cleanup get_matching_model_microcode()
x86/microcode/intel: Remove unused arg of get_matching_model_microcode()
x86/microcode/intel: Rename mc_saved_in_initrd
x86/microcode/intel: Use *wrmsrl variants
x86/microcode/intel: Cleanup apply_microcode_intel()
x86/microcode/intel: Move the BUG_ON up and turn it into WARN_ON
x86/microcode/intel: Rename mc_intel variable to mc
x86/microcode/intel: Rename mc_saved_count to num_saved
x86/microcode/intel: Rename local variables of type struct mc_saved_data
x86/microcode/AMD: Drop redundant printk prefix
x86/microcode: Issue update message only once
...
Linus Torvalds [Tue, 15 Mar 2016 17:23:56 +0000 (10:23 -0700)]
Merge branch 'x86-fpu-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fpu updates from Ingo Molnar:
"The biggest change in terms of impact is the changing of the FPU
context switch model to 'eagerfpu' for all CPU types, via: commit
58122bf1d856: "x86/fpu: Default eagerfpu=on on all CPUs"
This makes all FPU saves and restores synchronous and makes the FPU
code a lot more obvious to read. In the next cycle, if this change is
problem free, we'll remove the old lazy FPU restore code altogether.
This change flushed out some old bugs, which should all be fixed by
now, BYMMV"
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Default eagerfpu=on on all CPUs
x86/fpu: Speed up lazy FPU restores slightly
x86/fpu: Fold fpu_copy() into fpu__copy()
x86/fpu: Fix FNSAVE usage in eagerfpu mode
x86/fpu: Fix math emulation in eager fpu mode
Linus Torvalds [Tue, 15 Mar 2016 17:16:48 +0000 (10:16 -0700)]
Merge branch 'x86-build-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 build update from Ingo Molnar:
"A single adjustment of a defconfig value"
* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/defconfigs/32: Set CONFIG_FRAME_WARN to the Kconfig default
Linus Torvalds [Tue, 15 Mar 2016 17:02:25 +0000 (10:02 -0700)]
Merge branch 'x86-boot-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
"Early command line options parsing enhancements from Dave Hansen, plus
minor cleanups and enhancements"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Remove unused 'is_big_kernel' variable
x86/boot: Use proper array element type in memset() size calculation
x86/boot: Pass in size to early cmdline parsing
x86/boot: Simplify early command line parsing
x86/boot: Fix early command-line parsing when partial word matches
x86/boot: Fix early command-line parsing when matching at end
x86/boot: Simplify kernel load address alignment check
x86/boot: Micro-optimize reset_early_page_tables()
Linus Torvalds [Tue, 15 Mar 2016 16:32:27 +0000 (09:32 -0700)]
Merge branch 'x86-asm-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
"This is another big update. Main changes are:
- lots of x86 system call (and other traps/exceptions) entry code
enhancements. In particular the complex parts of the 64-bit entry
code have been migrated to C code as well, and a number of dusty
corners have been refreshed. (Andy Lutomirski)
- vDSO special mapping robustification and general cleanups (Andy
Lutomirski)
- cpufeature refactoring, cleanups and speedups (Borislav Petkov)
- lots of other changes ..."
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
x86/cpufeature: Enable new AVX-512 features
x86/entry/traps: Show unhandled signal for i386 in do_trap()
x86/entry: Call enter_from_user_mode() with IRQs off
x86/entry/32: Change INT80 to be an interrupt gate
x86/entry: Improve system call entry comments
x86/entry: Remove TIF_SINGLESTEP entry work
x86/entry/32: Add and check a stack canary for the SYSENTER stack
x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup
x86/entry: Only allocate space for tss_struct::SYSENTER_stack if needed
x86/entry: Vastly simplify SYSENTER TF (single-step) handling
x86/entry/traps: Clear DR6 early in do_debug() and improve the comment
x86/entry/traps: Clear TIF_BLOCKSTEP on all debug exceptions
x86/entry/32: Restore FLAGS on SYSEXIT
x86/entry/32: Filter NT and speed up AC filtering in SYSENTER
x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test
selftests/x86: In syscall_nt, test NT|TF as well
x86/asm-offsets: Remove PARAVIRT_enabled
x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled
uprobes: __create_xol_area() must nullify xol_mapping.fault
x86/cpufeature: Create a new synthetic cpu capability for machine check recovery
...
Bjorn Helgaas [Tue, 15 Mar 2016 13:56:28 +0000 (08:56 -0500)]
Merge branch 'pci/resource' into next
* pci/resource:
PCI: Simplify pci_create_attr() control flow
PCI: Don't leak memory if sysfs_create_bin_file() fails
PCI: Simplify sysfs ROM cleanup
PCI: Remove unused IORESOURCE_ROM_COPY and IORESOURCE_ROM_BIOS_COPY
MIPS: Loongson 3: Keep CPU physical (not virtual) addresses in shadow ROM resource
MIPS: Loongson 3: Use temporary struct resource * to avoid repetition
ia64/PCI: Keep CPU physical (not virtual) addresses in shadow ROM resource
ia64/PCI: Use ioremap() instead of open-coded equivalent
ia64/PCI: Use temporary struct resource * to avoid repetition
PCI: Clean up pci_map_rom() whitespace
PCI: Remove arch-specific IORESOURCE_ROM_SHADOW size from sysfs
PCI: Set ROM shadow location in arch code, not in PCI core
PCI: Don't enable/disable ROM BAR if we're using a RAM shadow copy
PCI: Don't assign or reassign immutable resources
PCI: Mark shadow copy of VGA ROM as IORESOURCE_PCI_FIXED
x86/PCI: Mark Broadwell-EP Home Agent & PCU as having non-compliant BARs
PCI: Disable IO/MEM decoding for devices with non-compliant BARs
Bjorn Helgaas [Tue, 15 Mar 2016 13:56:16 +0000 (08:56 -0500)]
Merge branch 'pci/host-hv' into next
* pci/host-hv:
PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs
PCI: Look up IRQ domain by fwnode_handle
PCI: Add fwnode_handle to x86 pci_sysdata
Bjorn Helgaas [Tue, 15 Mar 2016 13:55:52 +0000 (08:55 -0500)]
Merge branch 'pci/host-designware' into next
* pci/host-designware:
PCI: designware: Add driver for prototyping kits based on ARC SDP
PCI: designware: Add default link up check if sub-driver doesn't override
PCI: designware: Add generic dw_pcie_wait_for_link()
ARC: Add PCI support
Bjorn Helgaas [Tue, 15 Mar 2016 13:55:19 +0000 (08:55 -0500)]
Merge branches 'pci/host-altera', 'pci/host-imx6', 'pci/host-keystone', 'pci/host-rcar', 'pci/host-tegra', 'pci/host-thunder', 'pci/host-vmd', 'pci/host-xilinx' and 'pci/host-xilinx-nwl' into next
* pci/host-altera:
PCI: altera: Fix altera_pcie_link_is_up()
* pci/host-imx6:
PCI: imx6: Add DT bindings to configure PHY Tx driver settings
* pci/host-keystone:
PCI: keystone: Defer probing if devm_phy_get() returns -EPROBE_DEFER
* pci/host-rcar:
PCI: rcar: Depend on ARCH_RENESAS, not ARCH_SHMOBILE
* pci/host-tegra:
PCI: tegra: Remove misleading PHYS_OFFSET
PCI: tegra: Track bus -> CPU mapping
PCI: tegra: Remove unused struct tegra_pcie.num_ports field
PCI: tegra: Implement ->{add,remove}_bus() callbacks
PCI: Add pci_ops.{add,remove}_bus() callbacks
* pci/host-thunder:
PCI: thunder: Add driver for ThunderX-pass{1,2} on-chip devices
PCI: thunder: Add PCIe host driver for ThunderX processors
PCI: generic: Expose pci_host_common_probe() for use by other drivers
PCI: generic: Add pci_host_common_probe(), based on gen_pci_probe()
PCI: generic: Move structure definitions to separate header file
* pci/host-vmd:
x86/PCI: VMD: Attach VMD resources to parent domain's resource tree
x86/PCI: VMD: Set bus resource start to 0
x86/PCI: VMD: Document code for maintainability
* pci/host-xilinx:
microblaze/PCI: Support generic Xilinx AXI PCIe Host Bridge IP driver
PCI: xilinx: Update Zynq binding with Microblaze node
PCI: xilinx: Don't call pci_fixup_irqs() on Microblaze
PCI: xilinx: Remove dependency on ARM-specific struct hw_pci
PCI: xilinx: Use of_pci_get_host_bridge_resources() to parse DT
* pci/host-xilinx-nwl:
PCI: xilinx-nwl: Add support for Xilinx NWL PCIe Host Controller
Bjorn Helgaas [Tue, 15 Mar 2016 13:55:02 +0000 (08:55 -0500)]
Merge branches 'pci/aer', 'pci/enumeration', 'pci/kconfig', 'pci/misc', 'pci/virtualization' and 'pci/vpd' into next
* pci/aer:
PCI/AER: Log aer_inject error injections
PCI/AER: Log actual error causes in aer_inject
PCI/AER: Use dev_warn() in aer_inject
PCI/AER: Fix aer_inject error codes
* pci/enumeration:
PCI: Fix broken URL for Dell biosdevname
* pci/kconfig:
PCI: Cleanup pci/pcie/Kconfig whitespace
PCI: Include pci/hotplug Kconfig directly from pci/Kconfig
PCI: Include pci/pcie/Kconfig directly from pci/Kconfig
* pci/misc:
PCI: Add PCI_CLASS_SERIAL_USB_DEVICE definition
PCI: Add QEMU top-level IDs for (sub)vendor & device
unicore32: Remove unused HAVE_ARCH_PCI_SET_DMA_MASK definition
PCI: Consolidate PCI DMA constants and interfaces in linux/pci-dma-compat.h
PCI: Move pci_dma_* helpers to common code
frv/PCI: Remove stray pci_{alloc,free}_consistent() declaration
* pci/virtualization:
PCI: Wait for up to 1000ms after FLR reset
PCI: Support SR-IOV on any function type
* pci/vpd:
PCI: Prevent VPD access for buggy devices
PCI: Sleep rather than busy-wait for VPD access completion
PCI: Fold struct pci_vpd_pci22 into struct pci_vpd
PCI: Rename VPD symbols to remove unnecessary "pci22"
PCI: Remove struct pci_vpd_ops.release function pointer
PCI: Move pci_vpd_release() from header file to pci/access.c
PCI: Move pci_read_vpd() and pci_write_vpd() close to other VPD code
PCI: Determine actual VPD size on first access
PCI: Use bitfield instead of bool for struct pci_vpd_pci22.busy
PCI: Allow access to VPD attributes with size 0
PCI: Update VPD definitions
Heikki Krogerus [Tue, 15 Mar 2016 12:06:00 +0000 (14:06 +0200)]
PCI: Add PCI_CLASS_SERIAL_USB_DEVICE definition
PCI-SIG has defined Interface FEh for Base Class 0Ch, Sub-Class 03h as "USB
Device (not host controller)". It is already being used in various USB
device controller drivers for matching, so add PCI_CLASS_SERIAL_USB_DEVICE
and use it.
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Joao Pinto [Thu, 10 Mar 2016 20:44:52 +0000 (14:44 -0600)]
PCI: designware: Add driver for prototyping kits based on ARC SDP
Add a reference platform driver for PCI RC IP Protoyping Kits based on the
ARC SDP.
[bhelgaas: changelog, split patch up, MAINTAINERS update]
Signed-off-by: Joao Pinto <jpinto@synopsys.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Pratyush Anand <pratyush.anand@gmail.com>
Joao Pinto [Thu, 10 Mar 2016 20:44:44 +0000 (14:44 -0600)]
PCI: designware: Add default link up check if sub-driver doesn't override
Add a default DesignWare "link_up" test for use when a sub-driver doesn't
supply its own pcie_host_ops.link_up() method.
[bhelgaas: changelog, split into its own patch]
Signed-off-by: Joao Pinto <jpinto@synopsys.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Pratyush Anand <pratyush.anand@gmail.com>
Joao Pinto [Thu, 10 Mar 2016 20:44:35 +0000 (14:44 -0600)]
PCI: designware: Add generic dw_pcie_wait_for_link()
Several DesignWare-based drivers (dra7xx, exynos, imx6, keystone, qcom, and
spear13xx) had similar loops waiting for the link to come up.
Add a generic dw_pcie_wait_for_link() for use by all these drivers so the
waiting is done consistently, e.g., always using usleep_range() rather than
mdelay() and using similar timeouts and retry counts.
Note that this changes the Keystone link training/wait for link strategy,
so we initiate link training, then wait longer for the link to come up
before re-initiating link training.
[bhelgaas: changelog, split into its own patch, update pci-keystone.c, pcie-qcom.c]
Signed-off-by: Joao Pinto <jpinto@synopsys.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Pratyush Anand <pratyush.anand@gmail.com>
Andreas Ziegler [Tue, 15 Mar 2016 11:28:32 +0000 (12:28 +0100)]
PCI: Cleanup pci/pcie/Kconfig whitespace
Clean up style issues in drivers/pci/pcie/Kconfig, in particular all
indentation is now done using tabs, not spaces, and the definition of
PCIEASPM_DEBUG is now separated from the definition of PCIEASPM with a
newline.
Signed-off-by: Andreas Ziegler <andreas.ziegler@fau.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Mauro Carvalho Chehab [Tue, 15 Mar 2016 10:48:28 +0000 (07:48 -0300)]
Merge commit '
840f5b0572ea' into v4l_for_linus
* commit '
840f5b0572ea': (381 commits)
media: au0828 disable tuner to demod link in au0828_media_device_register()
[media] touptek: cast char types on %x printk
[media] touptek: don't DMA at the stack
[media] mceusb: use %*ph for small buffer dumps
[media] v4l: exynos4-is: Drop unneeded check when setting up fimc-lite links
[media] v4l: vsp1: Check if an entity is a subdev with the right function
[media] hide unused functions for !MEDIA_CONTROLLER
[media] em28xx: fix Terratec Grabby AC97 codec detection
[media] media: add prefixes to interface types
[media] media: rc: nuvoton: switch attribute wakeup_data to text
[media] v4l2-ioctl: fix YUV422P pixel format description
[media] media: fix null pointer dereference in v4l_vb2q_enable_media_source()
[media] v4l2-mc.h: fix yet more compiler errors
[media] staging/media: add missing TODO files
[media] media.h: always start with 1 for the audio entities
[media] sound/usb: Use meaninful names for goto labels
[media] v4l2-mc.h: fix compiler warnings
[media] media: au0828 audio mixer isn't connected to decoder
[media] sound/usb: Use Media Controller API to share media resources
[media] dw2102: add support for TeVii S662
...
Ingo Molnar [Tue, 15 Mar 2016 08:01:17 +0000 (09:01 +0100)]
Merge commit 'torture.2015.02.23a' into core/rcu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Tue, 15 Mar 2016 08:00:12 +0000 (09:00 +0100)]
Merge commit 'fixes.2015.02.23a' into core/rcu
Conflicts:
kernel/rcu/tree.c
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Tue, 15 Mar 2016 02:44:38 +0000 (19:44 -0700)]
Merge branch 'timers-nohz-for-linus' of git://git./linux/kernel/git/tip/tip
Pull NOHZ updates from Ingo Molnar:
"NOHZ enhancements, by Frederic Weisbecker, which reorganizes/refactors
the NOHZ 'can the tick be stopped?' infrastructure and related code to
be data driven, and harmonizes the naming and handling of all the
various properties"
[ This makes the ugly "fetch_or()" macro that the scheduler used
internally a new generic helper, and does a bad job at it.
I'm pulling it, but I've asked Ingo and Frederic to get this
fixed up ]
* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched-clock: Migrate to use new tick dependency mask model
posix-cpu-timers: Migrate to use new tick dependency mask model
sched: Migrate sched to use new tick dependency mask model
sched: Account rr tasks
perf: Migrate perf to use new tick dependency mask model
nohz: Use enum code for tick stop failure tracing message
nohz: New tick dependency mask
nohz: Implement wide kick on top of irq work
atomic: Export fetch_or()
Linus Torvalds [Tue, 15 Mar 2016 02:14:06 +0000 (19:14 -0700)]
Merge branch 'sched-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle are:
- Make schedstats a runtime tunable (disabled by default) and
optimize it via static keys.
As most distributions enable CONFIG_SCHEDSTATS=y due to its
instrumentation value, this is a nice performance enhancement.
(Mel Gorman)
- Implement 'simple waitqueues' (swait): these are just pure
waitqueues without any of the more complex features of full-blown
waitqueues (callbacks, wake flags, wake keys, etc.). Simple
waitqueues have less memory overhead and are faster.
Use simple waitqueues in the RCU code (in 4 different places) and
for handling KVM vCPU wakeups.
(Peter Zijlstra, Daniel Wagner, Thomas Gleixner, Paul Gortmaker,
Marcelo Tosatti)
- sched/numa enhancements (Rik van Riel)
- NOHZ performance enhancements (Rik van Riel)
- Various sched/deadline enhancements (Steven Rostedt)
- Various fixes (Peter Zijlstra)
- ... and a number of other fixes, cleanups and smaller enhancements"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits)
sched/cputime: Fix steal_account_process_tick() to always return jiffies
sched/deadline: Remove dl_new from struct sched_dl_entity
Revert "kbuild: Add option to turn incompatible pointer check into error"
sched/deadline: Remove superfluous call to switched_to_dl()
sched/debug: Fix preempt_disable_ip recording for preempt_disable()
sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity
time, acct: Drop irq save & restore from __acct_update_integrals()
acct, time: Change indentation in __acct_update_integrals()
sched, time: Remove non-power-of-two divides from __acct_update_integrals()
sched/rt: Kick RT bandwidth timer immediately on start up
sched/debug: Add deadline scheduler bandwidth ratio to /proc/sched_debug
sched/debug: Move sched_domain_sysctl to debug.c
sched/debug: Move the /sys/kernel/debug/sched_features file setup into debug.c
sched/rt: Fix PI handling vs. sched_setscheduler()
sched/core: Remove duplicated sched_group_set_shares() prototype
sched/fair: Consolidate nohz CPU load update code
sched/fair: Avoid using decay_load_missed() with a negative value
sched/deadline: Always calculate end of period on sched_yield()
sched/cgroup: Fix cgroup entity load tracking tear-down
rcu: Use simple wait queues where possible in rcutree
...
Linus Torvalds [Tue, 15 Mar 2016 01:43:51 +0000 (18:43 -0700)]
Merge branch 'ras-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull RAS updates from Ingo Molnar:
"Various RAS updates:
- AMD MCE support updates for future CPUs, fixes and 'SMCA' (Scalable
MCA) error decoding support (Aravind Gopalakrishnan)
- x86 memcpy_mcsafe() support, to enable smart(er) hardware error
recovery in NVDIMM drivers, based on an extension of the x86
exception handling code. (Tony Luck)"
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
EDAC/sb_edac: Fix computation of channel address
x86/mm, x86/mce: Add memcpy_mcsafe()
x86/mce/AMD: Document some functionality
x86/mce: Clarify comments regarding deferred error
x86/mce/AMD: Fix logic to obtain block address
x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errors
x86/mce: Move MCx_CONFIG MSR definitions
x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries
x86/mm: Expand the exception table logic to allow new handling options
x86/mce/AMD: Set MCAX Enable bit
x86/mce/AMD: Carve out threshold block preparation
x86/mce/AMD: Fix LVT offset configuration for thresholding
x86/mce/AMD: Reduce number of blocks scanned per bank
x86/mce/AMD: Do not perform shared bank check for future processors
x86/mce: Fix order of AMD MCE init function call
Linus Torvalds [Tue, 15 Mar 2016 00:58:53 +0000 (17:58 -0700)]
Merge branch 'perf-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"Main kernel side changes:
- Big reorganization of the x86 perf support code. The old code grew
organically deep inside arch/x86/kernel/cpu/perf* and its naming
became somewhat messy.
The new location is under arch/x86/events/, using the following
cleaner hierarchy of source code files:
perf/x86: Move perf_event.c .................. => x86/events/core.c
perf/x86: Move perf_event_amd.c .............. => x86/events/amd/core.c
perf/x86: Move perf_event_amd_ibs.c .......... => x86/events/amd/ibs.c
perf/x86: Move perf_event_amd_iommu.[ch] ..... => x86/events/amd/iommu.[ch]
perf/x86: Move perf_event_amd_uncore.c ....... => x86/events/amd/uncore.c
perf/x86: Move perf_event_intel_bts.c ........ => x86/events/intel/bts.c
perf/x86: Move perf_event_intel.c ............ => x86/events/intel/core.c
perf/x86: Move perf_event_intel_cqm.c ........ => x86/events/intel/cqm.c
perf/x86: Move perf_event_intel_cstate.c ..... => x86/events/intel/cstate.c
perf/x86: Move perf_event_intel_ds.c ......... => x86/events/intel/ds.c
perf/x86: Move perf_event_intel_lbr.c ........ => x86/events/intel/lbr.c
perf/x86: Move perf_event_intel_pt.[ch] ...... => x86/events/intel/pt.[ch]
perf/x86: Move perf_event_intel_rapl.c ....... => x86/events/intel/rapl.c
perf/x86: Move perf_event_intel_uncore.[ch] .. => x86/events/intel/uncore.[ch]
perf/x86: Move perf_event_intel_uncore_nhmex.c => x86/events/intel/uncore_nmhex.c
perf/x86: Move perf_event_intel_uncore_snb.c => x86/events/intel/uncore_snb.c
perf/x86: Move perf_event_intel_uncore_snbep.c => x86/events/intel/uncore_snbep.c
perf/x86: Move perf_event_knc.c .............. => x86/events/intel/knc.c
perf/x86: Move perf_event_p4.c ............... => x86/events/intel/p4.c
perf/x86: Move perf_event_p6.c ............... => x86/events/intel/p6.c
perf/x86: Move perf_event_msr.c .............. => x86/events/msr.c
(Borislav Petkov)
- Update various x86 PMU constraint and hw support details (Stephane
Eranian)
- Optimize kprobes for BPF execution (Martin KaFai Lau)
- Rewrite, refactor and fix the Intel uncore PMU driver code (Thomas
Gleixner)
- Rewrite, refactor and fix the Intel RAPL PMU code (Thomas Gleixner)
- Various fixes and smaller cleanups.
There are lots of perf tooling updates as well. A few highlights:
perf report/top:
- Hierarchy histogram mode for 'perf top' and 'perf report',
showing multiple levels, one per --sort entry: (Namhyung Kim)
On a mostly idle system:
# perf top --hierarchy -s comm,dso
Then expand some levels and use 'P' to take a snapshot:
# cat perf.hist.0
- 92.32% perf
58.20% perf
22.29% libc-2.22.so
5.97% [kernel]
4.18% libelf-0.165.so
1.69% [unknown]
- 4.71% qemu-system-x86
3.10% [kernel]
1.60% qemu-system-x86_64 (deleted)
+ 2.97% swapper
#
- Add 'L' hotkey to dynamicly set the percent threshold for
histogram entries and callchains, i.e. dynamicly do what the
--percent-limit command line option to 'top' and 'report' does.
(Namhyung Kim)
perf mem:
- Allow specifying events via -e in 'perf mem record', also listing
what events can be specified via 'perf mem record -e list' (Jiri
Olsa)
perf record:
- Add 'perf record' --all-user/--all-kernel options, so that one
can tell that all the events in the command line should be
restricted to the user or kernel levels (Jiri Olsa), i.e.:
perf record -e cycles:u,instructions:u
is equivalent to:
perf record --all-user -e cycles,instructions
- Make 'perf record' collect CPU cache info in the perf.data file header:
$ perf record usleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.017 MB perf.data (7 samples) ]
$ perf report --header-only -I | tail -10 | head -8
# CPU cache info:
# L1 Data 32K [0-1]
# L1 Instruction 32K [0-1]
# L1 Data 32K [2-3]
# L1 Instruction 32K [2-3]
# L2 Unified 256K [0-1]
# L2 Unified 256K [2-3]
# L3 Unified 4096K [0-3]
Will be used in 'perf c2c' and eventually in 'perf diff' to
allow, for instance running the same workload in multiple
machines and then when using 'diff' show the hardware difference.
(Jiri Olsa)
- Improved support for Java, using the JVMTI agent library to do
jitdumps that then will be inserted in synthesized
PERF_RECORD_MMAP2 events via 'perf inject' pointed to synthesized
ELF files stored in ~/.debug and keyed with build-ids, to allow
symbol resolution and even annotation with source line info, see
the changeset comments to see how to use it (Stephane Eranian)
perf script/trace:
- Decode data_src values (e.g. perf.data files generated by 'perf
mem record') in 'perf script': (Jiri Olsa)
# perf script
perf 693 [1] 4.088652: 1 cpu/mem-loads,ldlat=30/P:
ffff88007d0b0f40 68100142 L1 hit|SNP None|TLB L1 or L2 hit|LCK No <SNIP>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- Improve support to 'data_src', 'weight' and 'addr' fields in
'perf script' (Jiri Olsa)
- Handle empty print fmts in 'perf script -s' i.e. when running
python or perl scripts (Taeung Song)
perf stat:
- 'perf stat' now shows shadow metrics (insn per cycle, etc) in
interval mode too. E.g:
# perf stat -I 1000 -e instructions,cycles sleep 1
# time counts unit events
1.
000215928 519,620 instructions # 0.69 insn per cycle
1.
000215928 752,003 cycles
<SNIP>
- Port 'perf kvm stat' to PowerPC (Hemant Kumar)
- Implement CSV metrics output in 'perf stat' (Andi Kleen)
perf BPF support:
- Support converting data from bpf events in 'perf data' (Wang Nan)
- Print bpf-output events in 'perf script': (Wang Nan).
# perf record -e bpf-output/no-inherit,name=evt/ -e ./test_bpf_output_3.c/map:channel.event=evt/ usleep 1000
# perf script
usleep 4882 21384.532523: evt:
ffffffff810e97d1 sys_nanosleep ([kernel.kallsyms])
BPF output: 0000: 52 61 69 73 65 20 61 20 Raise a
0008: 42 50 46 20 65 76 65 6e BPF even
0010: 74 21 00 00 t!..
BPF string: "Raise a BPF event!"
#
- Add API to set values of map entries in a BPF object, be it
individual map slots or ranges (Wang Nan)
- Introduce support for the 'bpf-output' event (Wang Nan)
- Add glue to read perf events in a BPF program (Wang Nan)
- Improve support for bpf-output events in 'perf trace' (Wang Nan)
... and tons of other changes as well - see the shortlog and git log
for details!"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (342 commits)
perf stat: Add --metric-only support for -A
perf stat: Implement --metric-only mode
perf stat: Document CSV format in manpage
perf hists browser: Check sort keys before hot key actions
perf hists browser: Allow thread filtering for comm sort key
perf tools: Add sort__has_comm variable
perf tools: Recalc total periods using top-level entries in hierarchy
perf tools: Remove nr_sort_keys field
perf hists browser: Cleanup hist_browser__fprintf_hierarchy_entry()
perf tools: Remove hist_entry->fmt field
perf tools: Fix command line filters in hierarchy mode
perf tools: Add more sort entry check functions
perf tools: Fix hist_entry__filter() for hierarchy
perf jitdump: Build only on supported archs
tools lib traceevent: Add '~' operation within arg_num_eval()
perf tools: Omit unnecessary cast in perf_pmu__parse_scale
perf tools: Pass perf_hpp_list all the way through setup_sort_list
perf tools: Fix perf script python database export crash
perf jitdump: DWARF is also needed
perf bench mem: Prepare the x86-64 build for upstream memcpy_mcsafe() changes
...
Linus Torvalds [Mon, 14 Mar 2016 23:58:50 +0000 (16:58 -0700)]
Merge branch 'mm-readonly-for-linus' of git://git./linux/kernel/git/tip/tip
Pull read-only kernel memory updates from Ingo Molnar:
"This tree adds two (security related) enhancements to the kernel's
handling of read-only kernel memory:
- extend read-only kernel memory to a new class of formerly writable
kernel data: 'post-init read-only memory' via the __ro_after_init
attribute, and mark the ARM and x86 vDSO as such read-only memory.
This kind of attribute can be used for data that requires a once
per bootup initialization sequence, but is otherwise never modified
after that point.
This feature was based on the work by PaX Team and Brad Spengler.
(by Kees Cook, the ARM vDSO bits by David Brown.)
- make CONFIG_DEBUG_RODATA always enabled on x86 and remove the
Kconfig option. This simplifies the kernel and also signals that
read-only memory is the default model and a first-class citizen.
(Kees Cook)"
* 'mm-readonly-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ARM/vdso: Mark the vDSO code read-only after init
x86/vdso: Mark the vDSO code read-only after init
lkdtm: Verify that '__ro_after_init' works correctly
arch: Introduce post-init read-only memory
x86/mm: Always enable CONFIG_DEBUG_RODATA and remove the Kconfig option
mm/init: Add 'rodata=off' boot cmdline parameter to disable read-only kernel mappings
asm-generic: Consolidate mark_rodata_ro()
Linus Torvalds [Mon, 14 Mar 2016 23:31:41 +0000 (16:31 -0700)]
Merge branch 'mm-pat-for-linus' of git://git./linux/kernel/git/tip/tip
Pull dma_*_writecombine rename from Ingo Molnar:
"Rename dma_*_writecombine() to dma_*_wc()
This is a tree-wide API rename, to move the dma_*() write-combining
APIs closer in name to their usual API families. (The old API names
are kept as compatibility wrappers to not introduce extra breakage.)
The patch was Coccinelle generated"
* 'mm-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
dma, mm/pat: Rename dma_*_writecombine() to dma_*_wc()
Linus Torvalds [Mon, 14 Mar 2016 22:50:44 +0000 (15:50 -0700)]
Merge branch 'locking-core-for-linus' of git://git./linux/kernel/git/tip/tip
Pull locking changes from Ingo Molnar:
"Various updates:
- Futex scalability improvements: remove page lock use for shared
futex get_futex_key(), which speeds up 'perf bench futex hash'
benchmarks by over 40% on a 60-core Westmere. This makes anon-mem
shared futexes perform close to private futexes. (Mel Gorman)
- lockdep hash collision detection and fix (Alfredo Alvarez
Fernandez)
- lockdep testing enhancements (Alfredo Alvarez Fernandez)
- robustify lockdep init by using hlists (Andrew Morton, Andrey
Ryabinin)
- mutex and csd_lock micro-optimizations (Davidlohr Bueso)
- small x86 barriers tweaks (Michael S Tsirkin)
- qspinlock updates (Waiman Long)"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
locking/csd_lock: Use smp_cond_acquire() in csd_lock_wait()
locking/csd_lock: Explicitly inline csd_lock*() helpers
futex: Replace barrier() in unqueue_me() with READ_ONCE()
locking/lockdep: Detect chain_key collisions
locking/lockdep: Prevent chain_key collisions
tools/lib/lockdep: Fix link creation warning
tools/lib/lockdep: Add tests for AA and ABBA locking
tools/lib/lockdep: Add userspace version of READ_ONCE()
tools/lib/lockdep: Fix the build on recent kernels
locking/qspinlock: Move __ARCH_SPIN_LOCK_UNLOCKED to qspinlock_types.h
locking/mutex: Allow next waiter lockless wakeup
locking/pvqspinlock: Enable slowpath locking count tracking
locking/qspinlock: Use smp_cond_acquire() in pending code
locking/pvqspinlock: Move lock stealing count tracking code into pv_queued_spin_steal_lock()
locking/mcs: Fix mcs_spin_lock() ordering
futex: Remove requirement for lock_page() in get_futex_key()
futex: Rename barrier references in ordering guarantees
locking/atomics: Update comment about READ_ONCE() and structures
locking/lockdep: Eliminate lockdep_init()
locking/lockdep: Convert hash tables to hlists
...
Linus Torvalds [Mon, 14 Mar 2016 22:15:51 +0000 (15:15 -0700)]
Merge branch 'core-resources-for-linus' of git://git./linux/kernel/git/tip/tip
Pull ram resource handling changes from Ingo Molnar:
"Core kernel resource handling changes to support NVDIMM error
injection.
This tree introduces a new I/O resource type, IORESOURCE_SYSTEM_RAM,
for System RAM while keeping the current IORESOURCE_MEM type bit set
for all memory-mapped ranges (including System RAM) for backward
compatibility.
With this resource flag it no longer takes a strcmp() loop through the
resource tree to find "System RAM" resources.
The new resource type is then used to extend ACPI/APEI error injection
facility to also support NVDIMM"
* 'core-resources-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ACPI/EINJ: Allow memory error injection to NVDIMM
resource: Kill walk_iomem_res()
x86/kexec: Remove walk_iomem_res() call with GART type
x86, kexec, nvdimm: Use walk_iomem_res_desc() for iomem search
resource: Add walk_iomem_res_desc()
memremap: Change region_intersects() to take @flags and @desc
arm/samsung: Change s3c_pm_run_res() to use System RAM type
resource: Change walk_system_ram() to use System RAM type
drivers: Initialize resource entry to zero
xen, mm: Set IORESOURCE_SYSTEM_RAM to System RAM
kexec: Set IORESOURCE_SYSTEM_RAM for System RAM
arch: Set IORESOURCE_SYSTEM_RAM flag for System RAM
ia64: Set System RAM type and descriptor
x86/e820: Set System RAM type and descriptor
resource: Add I/O resource descriptor
resource: Handle resource flags properly
resource: Add System RAM resource type
Bryn M. Reeves [Mon, 14 Mar 2016 21:04:34 +0000 (17:04 -0400)]
dm: fix rq_end_stats() NULL pointer in dm_requeue_original_request()
An "old" (.request_fn) DM 'struct request' stores a pointer to the
associated 'struct dm_rq_target_io' in rq->special.
dm_requeue_original_request(), previously named
dm_requeue_unmapped_original_request(), called dm_unprep_request() to
reset rq->special to NULL. But rq_end_stats() would go on to hit a NULL
pointer deference because its call to tio_from_request() returned NULL.
Fix this by calling rq_end_stats() _before_ dm_unprep_request()
Signed-off-by: Bryn M. Reeves <bmr@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Fixes:
e262f34741 ("dm stats: add support for request-based DM devices")
Cc: stable@vger.kernel.org # 4.2+
James Bottomley [Mon, 14 Mar 2016 19:42:27 +0000 (12:42 -0700)]
MAINTAINERS: use new email address for James Bottomley
The @odin.com one has been bouncing for a while now, so replace with
new Employer email.
Signed-off-by: James Bottomley <jejb@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Akinobu Mita [Mon, 14 Mar 2016 14:45:01 +0000 (23:45 +0900)]
rtc: pcf2127: add pcf2129 device id
There are only a few differences between PCF2127 and PCF2129 (PCF2127
has 512 bytes of general purpose SRAM and count-down timer).
The rtc-pcf2127 driver currently doesn't use the PCF2127 specific
functionality and Kconfig help text already says this driver supports
PCF2127/29, so we can simply add pcf2129 to device id list.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Akinobu Mita [Mon, 14 Mar 2016 14:45:00 +0000 (23:45 +0900)]
rtc: pcf2127: add support for spi interface
pcf2127 has selectable I2C-bus and SPI-bus interface support.
This adds support for SPI interface.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Akinobu Mita [Mon, 14 Mar 2016 14:44:59 +0000 (23:44 +0900)]
rtc: pcf2127: convert to use regmap
pcf2127 has selectable I2C-bus and SPI-bus interface support.
Currently rtc-pcf2127 driver only supports I2C.
This is preparation for support for SPI interface.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Michael Büsch [Thu, 10 Mar 2016 17:34:46 +0000 (18:34 +0100)]
rtc: rv3029: Add thermometer hwmon support
This adds support to
- enable/disable the thermometer
- set the temperature scanning interval
- read the current temperature that is used for temp compensation.
via hwmon interface
Signed-off-by: Michael Buesch <m@bues.ch>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Michael Büsch [Thu, 10 Mar 2016 17:34:23 +0000 (18:34 +0100)]
rtc: rv3029: Add update_bits helper for eeprom access
This simplifies the update of single bits in the eeprom.
Signed-off-by: Michael Buesch <m@bues.ch>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Josh Poimboeuf [Mon, 7 Mar 2016 15:03:02 +0000 (09:03 -0600)]
rtc: ds1685: actually spin forever in poweroff error path
objtool reports the following warnings:
drivers/rtc/rtc-ds1685.o: warning: objtool: ds1685_rtc_work_queue()+0x0: duplicate frame pointer save
drivers/rtc/rtc-ds1685.o: warning: objtool: ds1685_rtc_work_queue()+0x3: duplicate frame pointer setup
drivers/rtc/rtc-ds1685.o: warning: objtool: ds1685_rtc_work_queue()+0x0: frame pointer state mismatch
The warning message needs to be improved, but what it really means in
this case is that ds1685_rtc_poweroff() has a possible code path where
it can actually fall through to the next function in the object code,
ds1685_rtc_work_queue().
The bug is caused by the use of the unreachable() macro in a place which
is actually reachable. That causes gcc to assume that the printk()
immediately before the unreachable() macro never returns, when in fact
it does. So gcc places the printk() at the very end of the function's
object code. When the printk() returns, the next function starts
executing.
The surrounding comment and printk message state that the code should
spin forever, which explains the unreachable() statement. However the
actual spin code is missing.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Alexander Kochetkov [Sun, 6 Mar 2016 09:43:57 +0000 (12:43 +0300)]
rtc: hym8563: fix invalid year calculation
Year field must be in BCD format, according to
hym8563 datasheet.
Due to the bug year 2016 became 2010.
Fixes:
dcaf03849352 ("rtc: add hym8563 rtc-driver")
Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Akinobu Mita [Sun, 6 Mar 2016 15:27:53 +0000 (00:27 +0900)]
rtc: ds3232: use rtc->ops_lock to protect alarm operations
ds3232->mutex is used to protect for alarm operations which
need to access status and control registers.
But we can use rtc->ops_lock instead. rtc->ops_lock is held when most
of rtc_class_ops methods are called, so we only need to explicitly
acquire it from irq handler in order to protect form concurrent
accesses.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Akinobu Mita [Sun, 6 Mar 2016 15:27:52 +0000 (00:27 +0900)]
rtc: ds3232: fix issue when irq is shared several devices
ds3232-core requests irq with IRQF_SHARED, so irq can be shared by
several devices. But the irq handler for ds3232 unconditionally
disables the irq at first and the irq is re-enabled only when the
interrupt source was the ds3232's alarm. This behaviour breaks the
devices sharing the same irq in the various scenarios.
This converts to use threaded irq and remove outdated code in
suspend/resume paths.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Suggested-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Akinobu Mita [Sun, 6 Mar 2016 15:27:51 +0000 (00:27 +0900)]
rtc: ds3232: remove unused UIE code
UIE mode irqs are handled by the generic rtc core now. But there are
remaining unused code fragments for it.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Akinobu Mita [Sun, 6 Mar 2016 15:27:50 +0000 (00:27 +0900)]
rtc: ds3232: add register access error checks
Add missing register access error checks and make it return error code
or print error message.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Akinobu Mita [Sun, 6 Mar 2016 15:27:49 +0000 (00:27 +0900)]
rtc: ds3232: fix read on /dev/rtc after RTC_AIE_ON
The rtctest (tools/testing/selftests/timers/rtctest.c) found that
reading ds3232 rtc device immediately return the value 0x20 (RTC_AF)
without waiting alarm interrupt.
This is because alarm_irq_enable() of ds3232 driver changes RTC_AF
flag in rtc->irq_data. So calling ioctl with RTC_AIE_ON generates
invalid value in rtc device.
The lower-level driver should not touch rtc->irq_data directly.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Akinobu Mita [Sun, 6 Mar 2016 15:27:48 +0000 (00:27 +0900)]
rtc: merge ds3232 and ds3234
According to "Feature Comparison of the DS323x Real-Time Clocks"
(http://pdfserv.maximintegrated.com/en/an/AN5143.pdf), DS3232 and
DS3234 are very similar.
This merges rtc-ds3232 and rtc-ds3234 with using regmap.
This change also enables to support alarm for ds3234.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Suggested-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Akinobu Mita [Sun, 6 Mar 2016 15:27:47 +0000 (00:27 +0900)]
rtc: ds3232: convert to use regmap
This is preparation for merging rtc-ds3232 i2c driver and rtc-ds3234
spi driver.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Cc: Dennis Aberilla <denzzzhome@yahoo.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Alexandre Belloni [Thu, 10 Mar 2016 04:46:18 +0000 (05:46 +0100)]
rtc: pxa: fix Kconfig indentation
The pxa section is indented using spaces, use tabs.
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Michael Büsch [Fri, 4 Mar 2016 21:41:19 +0000 (22:41 +0100)]
rtc: rv3029: Add device tree property for trickle charger
The trickle charger resistor can be enabled via device tree
property trickle-resistor-ohms.
Signed-off-by: Michael Buesch <m@bues.ch>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Michael Büsch [Fri, 4 Mar 2016 21:40:55 +0000 (22:40 +0100)]
rtc: rv3029: Add functions for EEPROM access
This adds functions for access to the EEPROM memory on the rv3029.
Signed-off-by: Michael Buesch <m@bues.ch>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Michael Büsch [Fri, 4 Mar 2016 21:40:30 +0000 (22:40 +0100)]
rtc: rv3029: Add i2c register update-bits helper
This simplifies mask/set operations on device I2C registers.
Signed-off-by: Michael Buesch <m@bues.ch>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>