Sebastian Andrzej Siewior [Tue, 15 May 2018 16:53:28 +0000 (18:53 +0200)]
sched/fair: Fix documentation file path
The 'tip' prefix probably referred to the -tip tree and is not required,
remove it.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180515165328.24899-1-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mathieu Malaterre [Wed, 16 May 2018 20:09:02 +0000 (22:09 +0200)]
sched/deadline: Make the grub_reclaim() function static
Since the grub_reclaim() function can be made static, make it so.
Silences the following GCC warning (W=1):
kernel/sched/deadline.c:1120:5: warning: no previous prototype for ‘grub_reclaim’ [-Wmissing-prototypes]
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180516200902.959-1-malat@debian.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mathieu Malaterre [Wed, 16 May 2018 19:53:47 +0000 (21:53 +0200)]
sched/debug: Move the print_rt_rq() and print_dl_rq() declarations to kernel/sched/sched.h
In the following commit:
6b55c9654fcc ("sched/debug: Move print_cfs_rq() declaration to kernel/sched/sched.h")
the print_cfs_rq() prototype was added to <kernel/sched/sched.h>,
right next to the prototypes for print_cfs_stats(), print_rt_stats()
and print_dl_stats().
Finish this previous commit and also move related prototypes for
print_rt_rq() and print_dl_rq().
Remove existing extern declarations now that they not needed anymore.
Silences the following GCC warning, triggered by W=1:
kernel/sched/debug.c:573:6: warning: no previous prototype for ‘print_rt_rq’ [-Wmissing-prototypes]
kernel/sched/debug.c:603:6: warning: no previous prototype for ‘print_dl_rq’ [-Wmissing-prototypes]
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180516195348.30426-1-malat@debian.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Dave Airlie [Fri, 18 May 2018 02:01:49 +0000 (12:01 +1000)]
Merge tag 'drm-intel-fixes-2018-05-17' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Userptr IOCTL zero size check (Matt)
- Two hardware quirk fixes (Michel & Chris)
* tag 'drm-intel-fixes-2018-05-17' of git://anongit.freedesktop.org/drm/drm-intel:
drm/i915/gen9: Add WaClearHIZ_WM_CHICKEN3 for bxt and glk
drm/i915/execlists: Use rmb() to order CSB reads
drm/i915/userptr: reject zero user_size
Linus Torvalds [Thu, 17 May 2018 22:58:12 +0000 (15:58 -0700)]
Merge tag 'hwmon-for-linus-v4.17-rc6' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
"Two k10temp fixes:
- fix race condition when accessing System Management Network
registers
- fix reading critical temperatures on F15h M60h and M70h
Also add PCI ID's for the AMD Raven Ridge root bridge"
* tag 'hwmon-for-linus-v4.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (k10temp) Use API function to access System Management Network
x86/amd_nb: Add support for Raven Ridge CPUs
hwmon: (k10temp) Fix reading critical temperature register
Thomas Gleixner [Thu, 17 May 2018 12:36:39 +0000 (14:36 +0200)]
x86/apic/x2apic: Initialize cluster ID properly
Rick bisected a regression on large systems which use the x2apic cluster
mode for interrupt delivery to the commit wich reworked the cluster
management.
The problem is caused by a missing initialization of the clusterid field
in the shared cluster data structures. So all structures end up with
cluster ID 0 which only allows sharing between all CPUs which belong to
cluster 0. All other CPUs with a cluster ID > 0 cannot share the data
structure because they cannot find existing data with their cluster
ID. This causes malfunction with IPIs because IPIs are sent to the wrong
cluster and the caller waits for ever that the target CPU handles the IPI.
Add the missing initialization when a upcoming CPU is the first in a
cluster so that the later booting CPUs can find the data and share it for
proper operation.
Fixes:
023a611748fd ("x86/apic/x2apic: Simplify cluster management")
Reported-by: Rick Warner <rick@microway.com>
Bisected-by: Rick Warner <rick@microway.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Rick Warner <rick@microway.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1805171418210.1947@nanos.tec.linutronix.de
Linus Torvalds [Thu, 17 May 2018 17:23:36 +0000 (10:23 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
- ARM/ARM64 locking fixes
- x86 fixes: PCID, UMIP, locking
- improved support for recent Windows version that have a 2048 Hz APIC
timer
- rename KVM_HINTS_DEDICATED CPUID bit to KVM_HINTS_REALTIME
- better behaved selftests
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: rename KVM_HINTS_DEDICATED to KVM_HINTS_REALTIME
KVM: arm/arm64: VGIC/ITS save/restore: protect kvm_read_guest() calls
KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock
KVM: arm/arm64: VGIC/ITS: Promote irq_lock() in update_affinity
KVM: arm/arm64: Properly protect VGIC locks from IRQs
KVM: X86: Lower the default timer frequency limit to 200us
KVM: vmx: update sec exec controls for UMIP iff emulating UMIP
kvm: x86: Suppress CR3_PCID_INVD bit only when PCIDs are enabled
KVM: selftests: exit with 0 status code when tests cannot be run
KVM: hyperv: idr_find needs RCU protection
x86: Delay skip of emulated hypercall instruction
KVM: Extend MAX_IRQ_ROUTES to 4096 for all archs
Linus Torvalds [Thu, 17 May 2018 17:13:44 +0000 (10:13 -0700)]
Merge tag 'sound-4.17-rc6' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"We have a core fix in the compat code for covering a potential race
(double references), but it's a very minor change.
The rest are all small device-specific quirks, as well as a correction
of the new UAC3 support code"
* tag 'sound-4.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Use Class Specific EP for UAC3 devices.
ALSA: hda/realtek - Clevo P950ER ALC1220 Fixup
ALSA: usb: mixer: volume quirk for CM102-A+/102S+
ALSA: hda: Add Lenovo C50 All in one to the power_save blacklist
ALSA: control: fix a redundant-copy issue
Michael S. Tsirkin [Thu, 17 May 2018 14:54:24 +0000 (17:54 +0300)]
kvm: rename KVM_HINTS_DEDICATED to KVM_HINTS_REALTIME
KVM_HINTS_DEDICATED seems to be somewhat confusing:
Guest doesn't really care whether it's the only task running on a host
CPU as long as it's not preempted.
And there are more reasons for Guest to be preempted than host CPU
sharing, for example, with memory overcommit it can get preempted on a
memory access, post copy migration can cause preemption, etc.
Let's call it KVM_HINTS_REALTIME which seems to better
match what guests expect.
Also, the flag most be set on all vCPUs - current guests assume this.
Note so in the documentation.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Linus Torvalds [Thu, 17 May 2018 17:11:44 +0000 (10:11 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
- a fix for the vfio ccw translation code
- update an incorrect email address in the MAINTAINERS file
- fix a division by zero oops in the cpum_sf code found by trinity
- two fixes for the error handling of the qdio code
- several spectre related patches to convert all left-over indirect
branches in the kernel to expoline branches
- update defconfigs to avoid warnings due to the netfilter Kconfig
changes
- avoid several compiler warnings in the kexec_file code for s390
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/qdio: don't release memory in qdio_setup_irq()
s390/qdio: fix access to uninitialized qdio_q fields
s390/cpum_sf: ensure sample frequency of perf event attributes is non-zero
s390: use expoline thunks in the BPF JIT
s390: extend expoline to BC instructions
s390: remove indirect branch from do_softirq_own_stack
s390: move spectre sysfs attribute code
s390/kernel: use expoline for indirect branches
s390/ftrace: use expoline for indirect branches
s390/lib: use expoline for indirect branches
s390/crc32-vx: use expoline for indirect branches
s390: move expoline assembler macros to a header
vfio: ccw: fix cleanup if cp_prefetch fails
s390/kexec_file: add declaration of purgatory related globals
s390: update defconfigs
MAINTAINERS: update s390 zcrypt maintainers email address
Linus Torvalds [Thu, 17 May 2018 17:02:19 +0000 (10:02 -0700)]
Merge tag 'selinux-pr-
20180516' of git://git./linux/kernel/git/pcmoore/selinux
Pull SELinux fixes from Paul Moore:
"A small pull request to fix a few regressions in the SELinux/SCTP code
with applications that call bind() with AF_UNSPEC/INADDR_ANY.
The individual commit descriptions have more information, but the
commits themselves should be self explanatory"
* tag 'selinux-pr-
20180516' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: correctly handle sa_family cases in selinux_sctp_bind_connect()
selinux: fix address family in bind() and connect() to match address/port
selinux: add AF_UNSPEC and INADDR_ANY checks to selinux_socket_bind()
Willy Tarreau [Fri, 11 May 2018 06:11:44 +0000 (08:11 +0200)]
proc: do not access cmdline nor environ from file-backed areas
proc_pid_cmdline_read() and environ_read() directly access the target
process' VM to retrieve the command line and environment. If this
process remaps these areas onto a file via mmap(), the requesting
process may experience various issues such as extra delays if the
underlying device is slow to respond.
Let's simply refuse to access file-backed areas in these functions.
For this we add a new FOLL_ANON gup flag that is passed to all calls
to access_remote_vm(). The code already takes care of such failures
(including unmapped areas). Accesses via /proc/pid/mem were not
changed though.
This was assigned CVE-2018-1120.
Note for stable backports: the patch may apply to kernels prior to 4.11
but silently miss one location; it must be checked that no call to
access_remote_vm() keeps zero as the last argument.
Reported-by: Qualys Security Advisory <qsa@qualys.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Coly Li [Thu, 17 May 2018 15:33:26 +0000 (23:33 +0800)]
bcache: return 0 from bch_debug_init() if CONFIG_DEBUG_FS=n
Commit
539d39eb2708 ("bcache: fix wrong return value in bch_debug_init()")
returns the return value of debugfs_create_dir() to bcache_init(). When
CONFIG_DEBUG_FS=n, bch_debug_init() always returns 1 and makes
bcache_init() failedi.
This patch makes bch_debug_init() always returns 0 if CONFIG_DEBUG_FS=n,
so bcache can continue to work for the kernels which don't have debugfs
enanbled.
Changelog:
v4: Add Acked-by from Kent Overstreet.
v3: Use IS_ENABLED(CONFIG_DEBUG_FS) to replace #ifdef DEBUG_FS.
v2: Remove a warning information
v1: Initial version.
Fixes: Commit
539d39eb2708 ("bcache: fix wrong return value in bch_debug_init()")
Cc: stable@vger.kernel.org
Signed-off-by: Coly Li <colyli@suse.de>
Reported-by: Massimo B. <massimo.b@gmx.net>
Reported-by: Kai Krakow <kai@kaishome.de>
Tested-by: Kai Krakow <kai@kaishome.de>
Acked-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Nicholas Piggin [Mon, 14 May 2018 15:59:47 +0000 (01:59 +1000)]
powerpc/powernv: Fix NVRAM sleep in invalid context when crashing
Similarly to opal_event_shutdown, opal_nvram_write can be called in
the crash path with irqs disabled. Special case the delay to avoid
sleeping in invalid context.
Fixes:
3b8070335f75 ("powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops")
Cc: stable@vger.kernel.org # v3.2
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pierre-Yves MORDRET [Fri, 11 May 2018 08:22:39 +0000 (10:22 +0200)]
MAINTAINERS: add entry for STM32 I2C driver
Add I2C/SMBUS Driver entry for STM32 family from ST Microelectronics.
Signed-off-by: Pierre-Yves MORDRET <pierre-yves.mordret@st.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Anand Jain [Thu, 17 May 2018 07:16:51 +0000 (15:16 +0800)]
btrfs: fix crash when trying to resume balance without the resume flag
We set the BTRFS_BALANCE_RESUME flag in the btrfs_recover_balance()
only, which isn't called during the remount. So when resuming from
the paused balance we hit the bug:
kernel: kernel BUG at fs/btrfs/volumes.c:3890!
::
kernel: balance_kthread+0x51/0x60 [btrfs]
kernel: kthread+0x111/0x130
::
kernel: RIP: btrfs_balance+0x12e1/0x1570 [btrfs] RSP:
ffffba7d0090bde8
Reproducer:
On a mounted filesystem:
btrfs balance start --full-balance /btrfs
btrfs balance pause /btrfs
mount -o remount,ro /dev/sdb /btrfs
mount -o remount,rw /dev/sdb /btrfs
To fix this set the BTRFS_BALANCE_RESUME flag in
btrfs_resume_balance_async().
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Fri, 27 Apr 2018 09:21:53 +0000 (12:21 +0300)]
btrfs: Fix delalloc inodes invalidation during transaction abort
When a transaction is aborted btrfs_cleanup_transaction is called to
cleanup all the various in-flight bits and pieces which migth be
active. One of those is delalloc inodes - inodes which have dirty
pages which haven't been persisted yet. Currently the process of
freeing such delalloc inodes in exceptional circumstances such as
transaction abort boiled down to calling btrfs_invalidate_inodes whose
sole job is to invalidate the dentries for all inodes related to a
root. This is in fact wrong and insufficient since such delalloc inodes
will likely have pending pages or ordered-extents and will be linked to
the sb->s_inode_list. This means that unmounting a btrfs instance with
an aborted transaction could potentially lead inodes/their pages
visible to the system long after their superblock has been freed. This
in turn leads to a "use-after-free" situation once page shrink is
triggered. This situation could be simulated by running generic/019
which would cause such inodes to be left hanging, followed by
generic/176 which causes memory pressure and page eviction which lead
to touching the freed super block instance. This situation is
additionally detected by the unmount code of VFS with the following
message:
"VFS: Busy inodes after unmount of Self-destruct in 5 seconds. Have a nice day..."
Additionally btrfs hits WARN_ON(!RB_EMPTY_ROOT(&root->inode_tree));
in free_fs_root for the same reason.
This patch aims to rectify the sitaution by doing the following:
1. Change btrfs_destroy_delalloc_inodes so that it calls
invalidate_inode_pages2 for every inode on the delalloc list, this
ensures that all the pages of the inode are released. This function
boils down to calling btrfs_releasepage. During test I observed cases
where inodes on the delalloc list were having an i_count of 0, so this
necessitates using igrab to be sure we are working on a non-freed inode.
2. Since calling btrfs_releasepage might queue delayed iputs move the
call out to btrfs_cleanup_transaction in btrfs_error_commit_super before
calling run_delayed_iputs for the last time. This is necessary to ensure
that delayed iputs are run.
Note: this patch is tagged for 4.14 stable but the fix applies to older
versions too but needs to be backported manually due to conflicts.
CC: stable@vger.kernel.org # 4.14.x: 2b8773313494: btrfs: Split btrfs_del_delalloc_inode into 2 functions
CC: stable@vger.kernel.org # 4.14.x
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add comment to igrab ]
Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Fri, 27 Apr 2018 09:21:51 +0000 (12:21 +0300)]
btrfs: Split btrfs_del_delalloc_inode into 2 functions
This is in preparation of fixing delalloc inodes leakage on transaction
abort. Also export the new function.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Liu Bo [Tue, 15 May 2018 17:37:36 +0000 (01:37 +0800)]
btrfs: fix reading stale metadata blocks after degraded raid1 mounts
If a btree block, aka. extent buffer, is not available in the extent
buffer cache, it'll be read out from the disk instead, i.e.
btrfs_search_slot()
read_block_for_search() # hold parent and its lock, go to read child
btrfs_release_path()
read_tree_block() # read child
Unfortunately, the parent lock got released before reading child, so
commit
5bdd3536cbbe ("Btrfs: Fix block generation verification race") had
used 0 as parent transid to read the child block. It forces
read_tree_block() not to check if parent transid is different with the
generation id of the child that it reads out from disk.
A simple PoC is included in btrfs/124,
0. A two-disk raid1 btrfs,
1. Right after mkfs.btrfs, block A is allocated to be device tree's root.
2. Mount this filesystem and put it in use, after a while, device tree's
root got COW but block A hasn't been allocated/overwritten yet.
3. Umount it and reload the btrfs module to remove both disks from the
global @fs_devices list.
4. mount -odegraded dev1 and write some data, so now block A is allocated
to be a leaf in checksum tree. Note that only dev1 has the latest
metadata of this filesystem.
5. Umount it and mount it again normally (with both disks), since raid1
can pick up one disk by the writer task's pid, if btrfs_search_slot()
needs to read block A, dev2 which does NOT have the latest metadata
might be read for block A, then we got a stale block A.
6. As parent transid is not checked, block A is marked as uptodate and
put into the extent buffer cache, so the future search won't bother
to read disk again, which means it'll make changes on this stale
one and make it dirty and flush it onto disk.
To avoid the problem, parent transid needs to be passed to
read_tree_block().
In order to get a valid parent transid, we need to hold the parent's
lock until finishing reading child.
This patch needs to be slightly adapted for stable kernels, the
&first_key parameter added to read_tree_block() is from 4.16+
(
581c1760415c4). The fix is to replace 0 by 'gen'.
Fixes:
5bdd3536cbbe ("Btrfs: Fix block generation verification race")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
Misono Tomohiro [Tue, 15 May 2018 07:51:26 +0000 (16:51 +0900)]
btrfs: property: Set incompat flag if lzo/zstd compression is set
Incompat flag of LZO/ZSTD compression should be set at:
1. mount time (-o compress/compress-force)
2. when defrag is done
3. when property is set
Currently 3. is missing and this commit adds this.
This could lead to a filesystem that uses ZSTD but is not marked as
such. If a kernel without a ZSTD support encounteres a ZSTD compressed
extent, it will handle that but this could be confusing to the user.
Typically the filesystem is mounted with the ZSTD option, but the
discrepancy can arise when a filesystem is never mounted with ZSTD and
then the property on some file is set (and some new extents are
written). A simple mount with -o compress=zstd will fix that up on an
unpatched kernel.
Same goes for LZO, but this has been around for a very long time
(2.6.37) so it's unlikely that a pre-LZO kernel would be used.
Fixes:
5c1aab1dd544 ("btrfs: Add zstd support")
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add user visible impact ]
Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Wed, 9 May 2018 15:01:46 +0000 (16:01 +0100)]
Btrfs: fix duplicate extents after fsync of file with prealloc extents
In commit
471d557afed1 ("Btrfs: fix loss of prealloc extents past i_size
after fsync log replay"), on fsync, we started to always log all prealloc
extents beyond an inode's i_size in order to avoid losing them after a
power failure. However under some cases this can lead to the log replay
code to create duplicate extent items, with different lengths, in the
extent tree. That happens because, as of that commit, we can now log
extent items based on extent maps that are not on the "modified" list
of extent maps of the inode's extent map tree. Logging extent items based
on extent maps is used during the fast fsync path to save time and for
this to work reliably it requires that the extent maps are not merged
with other adjacent extent maps - having the extent maps in the list
of modified extents gives such guarantee.
Consider the following example, captured during a long run of fsstress,
which illustrates this problem.
We have inode 271, in the filesystem tree (root 5), for which all of the
following operations and discussion apply to.
A buffered write starts at offset 312391 with a length of 933471 bytes
(end offset at
1245862). At this point we have, for this inode, the
following extent maps with the their field values:
em A, start 0, orig_start 0, len 40960, block_start
18446744073709551613,
block_len 0, orig_block_len 0
em B, start 40960, orig_start 40960, len 376832, block_start
1106399232,
block_len 376832, orig_block_len 376832
em C, start 417792, orig_start 417792, len 782336, block_start
18446744073709551613, block_len 0, orig_block_len 0
em D, start
1200128, orig_start
1200128, len 835584, block_start
1106776064, block_len 835584, orig_block_len 835584
em E, start
2035712, orig_start
2035712, len 245760, block_start
1107611648, block_len 245760, orig_block_len 245760
Extent map A corresponds to a hole and extent maps D and E correspond to
preallocated extents.
Extent map D ends where extent map E begins (
1106776064 + 835584 =
1107611648), but these extent maps were not merged because they are in
the inode's list of modified extent maps.
An fsync against this inode is made, which triggers the fast path
(BTRFS_INODE_NEEDS_FULL_SYNC is not set). This fsync triggers writeback
of the data previously written using buffered IO, and when the respective
ordered extent finishes, btrfs_drop_extents() is called against the
(aligned) range 311296..
1249279. This causes a split of extent map D at
btrfs_drop_extent_cache(), replacing extent map D with a new extent map
D', also added to the list of modified extents, with the following
values:
em D', start
1249280, orig_start of
1200128,
block_start
1106825216 (=
1106776064 +
1249280 -
1200128),
orig_block_len 835584,
block_len 786432 (835584 - (
1249280 -
1200128))
Then, during the fast fsync, btrfs_log_changed_extents() is called and
extent maps D' and E are removed from the list of modified extents. The
flag EXTENT_FLAG_LOGGING is also set on them. After the extents are logged
clear_em_logging() is called on each of them, and that makes extent map E
to be merged with extent map D' (try_merge_map()), resulting in D' being
deleted and E adjusted to:
em E, start
1249280, orig_start
1200128, len
1032192,
block_start
1106825216, block_len
1032192,
orig_block_len 245760
A direct IO write at offset
1847296 and length of 360448 bytes (end offset
at
2207744) starts, and at that moment the following extent maps exist for
our inode:
em A, start 0, orig_start 0, len 40960, block_start
18446744073709551613,
block_len 0, orig_block_len 0
em B, start 40960, orig_start 40960, len 270336, block_start
1106399232,
block_len 270336, orig_block_len 376832
em C, start 311296, orig_start 311296, len 937984, block_start
1112842240,
block_len 937984, orig_block_len 937984
em E (prealloc), start
1249280, orig_start
1200128, len
1032192,
block_start
1106825216, block_len
1032192, orig_block_len 245760
The dio write results in drop_extent_cache() being called twice. The first
time for a range that starts at offset
1847296 and ends at offset
2035711
(length of 188416), which results in a double split of extent map E,
replacing it with two new extent maps:
em F, start
1249280, orig_start
1200128, block_start
1106825216,
block_len 598016, orig_block_len 598016
em G, start
2035712, orig_start
1200128, block_start
1107611648,
block_len 245760, orig_block_len
1032192
It also creates a new extent map that represents a part of the requested
IO (through create_io_em()):
em H, start
1847296, len 188416, block_start
1107423232, block_len 188416
The second call to drop_extent_cache() has a range with a start offset of
2035712 and end offset of
2207743 (length of 172032). This leads to
replacing extent map G with a new extent map I with the following values:
em I, start
2207744, orig_start
1200128, block_start
1107783680,
block_len 73728, orig_block_len
1032192
It also creates a new extent map that represents the second part of the
requested IO (through create_io_em()):
em J, start
2035712, len 172032, block_start
1107611648, block_len 172032
The dio write set the inode's i_size to
2207744 bytes.
After the dio write the inode has the following extent maps:
em A, start 0, orig_start 0, len 40960, block_start
18446744073709551613,
block_len 0, orig_block_len 0
em B, start 40960, orig_start 40960, len 270336, block_start
1106399232,
block_len 270336, orig_block_len 376832
em C, start 311296, orig_start 311296, len 937984, block_start
1112842240,
block_len 937984, orig_block_len 937984
em F, start
1249280, orig_start
1200128, len 598016,
block_start
1106825216, block_len 598016, orig_block_len 598016
em H, start
1847296, orig_start
1200128, len 188416,
block_start
1107423232, block_len 188416, orig_block_len 835584
em J, start
2035712, orig_start
2035712, len 172032,
block_start
1107611648, block_len 172032, orig_block_len 245760
em I, start
2207744, orig_start
1200128, len 73728,
block_start
1107783680, block_len 73728, orig_block_len
1032192
Now do some change to the file, like adding a xattr for example and then
fsync it again. This triggers a fast fsync path, and as of commit
471d557afed1 ("Btrfs: fix loss of prealloc extents past i_size after fsync
log replay"), we use the extent map I to log a file extent item because
it's a prealloc extent and it starts at an offset matching the inode's
i_size. However when we log it, we create a file extent item with a value
for the disk byte location that is wrong, as can be seen from the
following output of "btrfs inspect-internal dump-tree":
item 1 key (271 EXTENT_DATA
2207744) itemoff 3782 itemsize 53
generation 22 type 2 (prealloc)
prealloc data disk byte
1106776064 nr
1032192
prealloc data offset
1007616 nr 73728
Here the disk byte value corresponds to calculation based on some fields
from the extent map I:
1106776064 = block_start (
1107783680) -
1007616 (extent_offset)
extent_offset =
2207744 (start) -
1200128 (orig_start) =
1007616
The disk byte value of
1106776064 clashes with disk byte values of the
file extent items at offsets
1249280 and
1847296 in the fs tree:
item 6 key (271 EXTENT_DATA
1249280) itemoff 3568 itemsize 53
generation 20 type 2 (prealloc)
prealloc data disk byte
1106776064 nr 835584
prealloc data offset 49152 nr 598016
item 7 key (271 EXTENT_DATA
1847296) itemoff 3515 itemsize 53
generation 20 type 1 (regular)
extent data disk byte
1106776064 nr 835584
extent data offset 647168 nr 188416 ram 835584
extent compression 0 (none)
item 8 key (271 EXTENT_DATA
2035712) itemoff 3462 itemsize 53
generation 20 type 1 (regular)
extent data disk byte
1107611648 nr 245760
extent data offset 0 nr 172032 ram 245760
extent compression 0 (none)
item 9 key (271 EXTENT_DATA
2207744) itemoff 3409 itemsize 53
generation 20 type 2 (prealloc)
prealloc data disk byte
1107611648 nr 245760
prealloc data offset 172032 nr 73728
Instead of the disk byte value of
1106776064, the value of
1107611648
should have been logged. Also the data offset value should have been
172032 and not
1007616.
After a log replay we end up getting two extent items in the extent tree
with different lengths, one of 835584, which is correct and existed
before the log replay, and another one of
1032192 which is wrong and is
based on the logged file extent item:
item 12 key (
1106776064 EXTENT_ITEM 835584) itemoff 3406 itemsize 53
refs 2 gen 15 flags DATA
extent data backref root 5 objectid 271 offset
1200128 count 2
item 13 key (
1106776064 EXTENT_ITEM
1032192) itemoff 3353 itemsize 53
refs 1 gen 22 flags DATA
extent data backref root 5 objectid 271 offset
1200128 count 1
Obviously this leads to many problems and a filesystem check reports many
errors:
(...)
checking extents
Extent back ref already exists for
1106776064 parent 0 root 5 owner 271 offset
1200128 num_refs 1
extent item
1106776064 has multiple extent items
ref mismatch on [
1106776064 835584] extent item 2, found 3
Incorrect local backref count on
1106776064 root 5 owner 271 offset
1200128 found 2 wanted 1 back 0x55b1d0ad7680
Backref
1106776064 root 5 owner 271 offset
1200128 num_refs 0 not found in extent tree
Incorrect local backref count on
1106776064 root 5 owner 271 offset
1200128 found 1 wanted 0 back 0x55b1d0ad4e70
Backref bytes do not match extent backref, bytenr=
1106776064, ref bytes=835584, backref bytes=
1032192
backpointer mismatch on [
1106776064 835584]
checking free space cache
block group
1103101952 has wrong amount of free space
failed to load free space cache for block group
1103101952
checking fs roots
(...)
So fix this by logging the prealloc extents beyond the inode's i_size
based on searches in the subvolume tree instead of the extent maps.
Fixes:
471d557afed1 ("Btrfs: fix loss of prealloc extents past i_size after fsync log replay")
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Srinivas Kandagatla [Thu, 17 May 2018 09:42:32 +0000 (10:42 +0100)]
dmaengine: qcom: bam_dma: check if the runtime pm enabled
Disabling pm runtime at probe is not sufficient to get BAM working
on remotely controller instances. pm_runtime_get_sync() would return
-EACCES in such cases.
So check if runtime pm is enabled before returning error from bam functions.
Fixes:
5b4a68952a89 ("dmaengine: qcom: bam_dma: disable runtime pm on remote controlled")
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Dave Airlie [Thu, 17 May 2018 02:00:53 +0000 (12:00 +1000)]
Merge branch 'vmwgfx-fixes-4.17' of git://people.freedesktop.org/~thomash/linux into drm-fixes
A single fix for a recent regression.
* 'vmwgfx-fixes-4.17' of git://people.freedesktop.org/~thomash/linux:
drm/vmwgfx: Set dmabuf_size when vmw_dmabuf_init is successful
Dave Airlie [Thu, 17 May 2018 02:00:17 +0000 (12:00 +1000)]
Merge tag 'drm-misc-fixes-2018-05-16' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
- core: Fix regression in dev node offsets (Haneen)
- vc4: Fix memory leak on driver close (Eric)
- dumb-buffers: Prevent overflow in DIV_ROUND_UP() (Dan)
Cc: Haneen Mohammed <hamohammed.sa@gmail.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
* tag 'drm-misc-fixes-2018-05-16' of git://anongit.freedesktop.org/drm/drm-misc:
drm/dumb-buffers: Integer overflow in drm_mode_create_ioctl()
drm/vc4: Fix leak of the file_priv that stored the perfmon.
drm: Match sysfs name in link removal to link creation
Linus Torvalds [Wed, 16 May 2018 23:45:23 +0000 (16:45 -0700)]
Merge tag 'trace-v4.17-rc4-2' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"Some of the ftrace internal events use a zero for a data size of a
field event. This is increasingly important for the histogram trigger
work that is being extended.
While auditing trace events, I found that a couple of the xen events
were used as just marking that a function was called, by creating a
static array of size zero. This can play havoc with the tracing
features if these events are used, because a zero size of a static
array is denoted as a special nul terminated dynamic array (this is
what the trace_marker code uses). But since the xen events have no
size, they are not nul terminated, and unexpected results may occur.
As trace events were never intended on being a marker to denote that a
function was hit or not, especially since function tracing and kprobes
can trivially do the same, the best course of action is to simply
remove these events"
* tag 'trace-v4.17-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing/x86/xen: Remove zero data size trace events trace_xen_mmu_flush_tlb{_all}
Linus Torvalds [Wed, 16 May 2018 18:02:54 +0000 (11:02 -0700)]
Merge tag 'trace-v4.17-rc5-vsprintf' of git://git./linux/kernel/git/rostedt/linux-trace
Pull memory barrier for from Steven Rostedt:
"The memory barrier usage in updating the random ptr hash for %p in
vsprintf is incorrect.
Instead of adding the read memory barrier into vsprintf() which will
cause a slight degradation to a commonly used function in the kernel
just to solve a very unlikely race condition that can only happen at
boot up, change the code from using a variable branch to a
static_branch.
Not only does this solve the race condition, it actually will improve
the performance of vsprintf() by removing the conditional branch that
is only needed at boot"
* tag 'trace-v4.17-rc5-vsprintf' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
vsprintf: Replace memory barrier with static_key for random_ptr_key update
Shuah Khan (Samsung OSG) [Tue, 15 May 2018 23:57:23 +0000 (17:57 -0600)]
usbip: usbip_host: fix bad unlock balance during stub_probe()
stub_probe() calls put_busid_priv() in an error path when device isn't
found in the busid_table. Fix it by making put_busid_priv() safe to be
called with null struct bus_id_priv pointer.
This problem happens when "usbip bind" is run without loading usbip_host
driver and then running modprobe. The first failed bind attempt unbinds
the device from the original driver and when usbip_host is modprobed,
stub_probe() runs and doesn't find the device in its busid table and calls
put_busid_priv(0 with null bus_id_priv pointer.
usbip-host 3-10.2: 3-10.2 is not in match_busid table... skip!
[ 367.359679] =====================================
[ 367.359681] WARNING: bad unlock balance detected!
[ 367.359683] 4.17.0-rc4+ #5 Not tainted
[ 367.359685] -------------------------------------
[ 367.359688] modprobe/2768 is trying to release lock (
[ 367.359689]
==================================================================
[ 367.359696] BUG: KASAN: null-ptr-deref in print_unlock_imbalance_bug+0x99/0x110
[ 367.359699] Read of size 8 at addr
0000000000000058 by task modprobe/2768
[ 367.359705] CPU: 4 PID: 2768 Comm: modprobe Not tainted 4.17.0-rc4+ #5
Fixes:
22076557b07c ("usbip: usbip_host: fix NULL-ptr deref and use-after-free errors") in usb-linus
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dan Carpenter [Wed, 16 May 2018 14:00:26 +0000 (17:00 +0300)]
drm/dumb-buffers: Integer overflow in drm_mode_create_ioctl()
There is a comment here which says that DIV_ROUND_UP() and that's where
the problem comes from. Say you pick:
args->bpp = UINT_MAX - 7;
args->width = 4;
args->height = 1;
The integer overflow in DIV_ROUND_UP() means "cpp" is UINT_MAX / 8 and
because of how we picked args->width that means cpp < UINT_MAX / 4.
I've fixed it by preventing the integer overflow in DIV_ROUND_UP(). I
removed the check for !cpp because it's not possible after this change.
I also changed all the 0xffffffffU references to U32_MAX.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20180516140026.GA19340@mwanda
Steven Rostedt (VMware) [Wed, 16 May 2018 02:24:52 +0000 (22:24 -0400)]
vsprintf: Replace memory barrier with static_key for random_ptr_key update
Reviewing Tobin's patches for getting pointers out early before
entropy has been established, I noticed that there's a lone smp_mb() in
the code. As with most lone memory barriers, this one appears to be
incorrectly used.
We currently basically have this:
get_random_bytes(&ptr_key, sizeof(ptr_key));
/*
* have_filled_random_ptr_key==true is dependent on get_random_bytes().
* ptr_to_id() needs to see have_filled_random_ptr_key==true
* after get_random_bytes() returns.
*/
smp_mb();
WRITE_ONCE(have_filled_random_ptr_key, true);
And later we have:
if (unlikely(!have_filled_random_ptr_key))
return string(buf, end, "(ptrval)", spec);
/* Missing memory barrier here. */
hashval = (unsigned long)siphash_1u64((u64)ptr, &ptr_key);
As the CPU can perform speculative loads, we could have a situation
with the following:
CPU0 CPU1
---- ----
load ptr_key = 0
store ptr_key = random
smp_mb()
store have_filled_random_ptr_key
load have_filled_random_ptr_key = true
BAD BAD BAD! (you're so bad!)
Because nothing prevents CPU1 from loading ptr_key before loading
have_filled_random_ptr_key.
But this race is very unlikely, but we can't keep an incorrect smp_mb() in
place. Instead, replace the have_filled_random_ptr_key with a static_branch
not_filled_random_ptr_key, that is initialized to true and changed to false
when we get enough entropy. If the update happens in early boot, the
static_key is updated immediately, otherwise it will have to wait till
entropy is filled and this happens in an interrupt handler which can't
enable a static_key, as that requires a preemptible context. In that case, a
work_queue is used to enable it, as entropy already took too long to
establish in the first place waiting a little more shouldn't hurt anything.
The benefit of using the static key is that the unlikely branch in
vsprintf() now becomes a nop.
Link: http://lkml.kernel.org/r/20180515100558.21df515e@gandalf.local.home
Cc: stable@vger.kernel.org
Fixes:
ad67b74d2469d ("printk: hash addresses printed with %p")
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Kirill A. Shutemov [Wed, 16 May 2018 08:01:29 +0000 (11:01 +0300)]
x86/boot/compressed/64: Fix moving page table out of trampoline memory
cleanup_trampoline() relocates the top-level page table out of
trampoline memory. We use 'top_pgtable' as our new top-level page table.
But if the 'top_pgtable' would be referenced from C in a usual way,
the address of the table will be calculated relative to RIP.
After kernel gets relocated, the address will be in the middle of
decompression buffer and the page table may get overwritten.
This leads to a crash.
We calculate the address of other page tables relative to the relocation
address. It makes them safe. We should do the same for 'top_pgtable'.
Calculate the address of 'top_pgtable' in assembly and pass down to
cleanup_trampoline().
Move the page table to .pgtable section where the rest of page tables
are. The section is @nobits so we save 4k in kernel image.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes:
e9d0e6330eb8 ("x86/boot/compressed/64: Prepare new top-level page table for trampoline")
Link: http://lkml.kernel.org/r/20180516080131.27913-3-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Kirill A. Shutemov [Wed, 16 May 2018 08:01:28 +0000 (11:01 +0300)]
x86/boot/compressed/64: Set up GOT for paging_prepare() and cleanup_trampoline()
Eric and Hugh have reported instant reboot due to my recent changes in
decompression code.
The root cause is that I didn't realize that we need to adjust GOT to be
able to run C code that early.
The problem is only visible with an older toolchain. Binutils >= 2.24 is
able to eliminate GOT references by replacing them with RIP-relative
address loads:
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commitdiff;h=
80d873266dec
We need to adjust GOT two times:
- before calling paging_prepare() using the initial load address
- before calling C code from the relocated kernel
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes:
194a9749c73d ("x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G")
Link: http://lkml.kernel.org/r/20180516080131.27913-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Waiman Long [Tue, 15 May 2018 21:49:51 +0000 (17:49 -0400)]
locking/percpu-rwsem: Annotate rwsem ownership transfer by setting RWSEM_OWNER_UNKNOWN
The filesystem freezing code needs to transfer ownership of a rwsem
embedded in a percpu-rwsem from the task that does the freezing to
another one that does the thawing by calling percpu_rwsem_release()
after freezing and percpu_rwsem_acquire() before thawing.
However, the new rwsem debug code runs afoul with this scheme by warning
that the task that releases the rwsem isn't the one that acquires it,
as reported by Amir Goldstein:
DEBUG_LOCKS_WARN_ON(sem->owner != get_current())
WARNING: CPU: 1 PID: 1401 at /home/amir/build/src/linux/kernel/locking/rwsem.c:133 up_write+0x59/0x79
Call Trace:
percpu_up_write+0x1f/0x28
thaw_super_locked+0xdf/0x120
do_vfs_ioctl+0x270/0x5f1
ksys_ioctl+0x52/0x71
__x64_sys_ioctl+0x16/0x19
do_syscall_64+0x5d/0x167
entry_SYSCALL_64_after_hwframe+0x49/0xbe
To work properly with the rwsem debug code, we need to annotate that the
rwsem ownership is unknown during the tranfer period until a brave soul
comes forward to acquire the ownership. During that period, optimistic
spinning will be disabled.
Reported-by: Amir Goldstein <amir73il@gmail.com>
Tested-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jan Kara <jack@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Theodore Y. Ts'o <tytso@mit.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-fsdevel@vger.kernel.org
Link: http://lkml.kernel.org/r/1526420991-21213-3-git-send-email-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Waiman Long [Tue, 15 May 2018 21:49:50 +0000 (17:49 -0400)]
locking/rwsem: Add a new RWSEM_ANONYMOUSLY_OWNED flag
There are use cases where a rwsem can be acquired by one task, but
released by another task. In thess cases, optimistic spinning may need
to be disabled. One example will be the filesystem freeze/thaw code
where the task that freezes the filesystem will acquire a write lock
on a rwsem and then un-owns it before returning to userspace. Later on,
another task will come along, acquire the ownership, thaw the filesystem
and release the rwsem.
Bit 0 of the owner field was used to designate that it is a reader
owned rwsem. It is now repurposed to mean that the owner of the rwsem
is not known. If only bit 0 is set, the rwsem is reader owned. If bit
0 and other bits are set, it is writer owned with an unknown owner.
One such value for the latter case is (-1L). So we can set owner to 1 for
reader-owned, -1 for writer-owned. The owner is unknown in both cases.
To handle transfer of rwsem ownership, the higher level code should
set the owner field to -1 to indicate a write-locked rwsem with unknown
owner. Optimistic spinning will be disabled in this case.
Once the higher level code figures who the new owner is, it can then
set the owner field accordingly.
Tested-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jan Kara <jack@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Theodore Y. Ts'o <tytso@mit.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-fsdevel@vger.kernel.org
Link: http://lkml.kernel.org/r/1526420991-21213-2-git-send-email-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Michel Thierry [Mon, 14 May 2018 16:54:45 +0000 (09:54 -0700)]
drm/i915/gen9: Add WaClearHIZ_WM_CHICKEN3 for bxt and glk
Factor in clear values wherever required while updating destination
min/max.
References: HSDES#
1604444184
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Cc: mesa-dev@lists.freedesktop.org
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180510200708.18097-1-michel.thierry@intel.com
Cc: stable@vger.kernel.org
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180514165445.9198-1-michel.thierry@intel.com
(backported from commit
0c79f9cb77eae28d48a4f9fc1b3341aacbbd260c)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Deepak Rawat [Tue, 15 May 2018 13:39:09 +0000 (15:39 +0200)]
drm/vmwgfx: Set dmabuf_size when vmw_dmabuf_init is successful
SOU primary plane prepare_fb hook depends upon dmabuf_size to pin up BO
(and not call a new vmw_dmabuf_init) when a new fb size is same as
current fb. This was changed in a recent commit which is causing
page_flip to fail on VM with low display memory and multi-mon failure
when cycle monitors from secondary display.
Cc: <stable@vger.kernel.org> # 4.14, 4.16
Fixes:
20fb5a635a0c ("drm/vmwgfx: Unpin the screen object backup buffer when not used")
Signed-off-by: Deepak Rawat <drawat@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Gabriel Fernandez [Thu, 3 May 2018 06:40:09 +0000 (08:40 +0200)]
clk: stm32: fix: stm32 clock drivers are not compiled by default
Clock driver is mandatory if the machine is selected.
Then don't use 'bool' and 'depends on' commands, but 'def_bool'
with the machine(s).
Fixes:
da32d3539fca ("clk: stm32: add configuration flags for each of the stm32 drivers")
Signed-off-by: Gabriel Fernandez <gabriel.fernandez@st.com>
Acked-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Stefan Agner [Wed, 18 Apr 2018 12:49:08 +0000 (14:49 +0200)]
clk: imx6ull: use OSC clock during AXI rate change
On i.MX6 ULL using PLL3 seems to cause a freeze when setting
the parent to IMX6UL_CLK_PLL3_USB_OTG. This only seems to appear
since commit
6f9575e55632 ("clk: imx: Add CLK_IS_CRITICAL flag
for busy divider and busy mux"), probably because the clock is
now forced to be on.
Fixes:
6f9575e55632("clk: imx: Add CLK_IS_CRITICAL flag for busy divider and busy mux")
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Olof Johansson [Tue, 15 May 2018 20:49:55 +0000 (13:49 -0700)]
Merge tag 'davinci-fixes-for-v4.17-part-2' of git://git./linux/kernel/git/nsekhar/linux-davinci into fixes
Second set of fixes for TI DaVinci.
They are needed for DM6467 EVM to work. The first patch fixes an
issue with timer interrupt and the second two are needed for video
driver to probe successfully.
* tag 'davinci-fixes-for-v4.17-part-2' of git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci:
ARM: davinci: board-dm646x-evm: set VPIF capture card name
ARM: davinci: board-dm646x-evm: pass correct I2C adapter id for VPIF
ARM: davinci: dm646x: fix timer interrupt generation
Signed-off-by: Olof Johansson <olof@lixom.net>
Dexuan Cui [Tue, 15 May 2018 19:52:50 +0000 (19:52 +0000)]
tick/broadcast: Use for_each_cpu() specially on UP kernels
for_each_cpu() unintuitively reports CPU0 as set independent of the actual
cpumask content on UP kernels. This causes an unexpected PIT interrupt
storm on a UP kernel running in an SMP virtual machine on Hyper-V, and as
a result, the virtual machine can suffer from a strange random delay of 1~20
minutes during boot-up, and sometimes it can hang forever.
Protect if by checking whether the cpumask is empty before entering the
for_each_cpu() loop.
[ tglx: Use !IS_ENABLED(CONFIG_SMP) instead of #ifdeffery ]
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Josh Poulson <jopoulso@microsoft.com>
Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: stable@vger.kernel.org
Cc: Rakib Mullick <rakib.mullick@gmail.com>
Cc: Jork Loeser <Jork.Loeser@microsoft.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: KY Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Link: https://lkml.kernel.org/r/KL1P15301MB000678289FE55BA365B3279ABF990@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM
Link: https://lkml.kernel.org/r/KL1P15301MB0006FA63BC22BEB64902EAA0BF930@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM
Nick Dyer [Tue, 15 May 2018 19:03:12 +0000 (12:03 -0700)]
Input: usbtouchscreen - add sysfs attribute for 3M MTouch firmware rev
Allow querying of the firmware revision of the device
example:
4.10
Tested on ZII RDU2 platform and on Intel x86_64 PC.
Signed-off-by: Nick Dyer <nick@shmanahar.org>
Tested-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Wolfram Sang [Tue, 8 May 2018 22:39:31 +0000 (15:39 -0700)]
Input: ati_remote2 - fix typo 'can by' to 'can be'
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Nick Simonov [Tue, 15 May 2018 17:33:52 +0000 (10:33 -0700)]
Input: replace hard coded string with __func__ in pr_err()
Change hardcoded string "input_set_capability" in pr_err() function call,
replace it with "%s" __func__ instead.
Signed-off-by: Nick Simonov <nicksimonovv@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Linus Torvalds [Tue, 15 May 2018 17:48:36 +0000 (10:48 -0700)]
Merge tag 'afs-fixes-
20180514' of git://git./linux/kernel/git/dhowells/linux-fs
Pull AFS fixes from David Howells:
"Here's a set of patches that fix a number of bugs in the in-kernel AFS
client, including:
- Fix directory locking to not use individual page locks for
directory reading/scanning but rather to use a semaphore on the
afs_vnode struct as the directory contents must be read in a single
blob and data from different reads must not be mixed as the entire
contents may be shuffled about between reads.
- Fix address list parsing to handle port specifiers correctly.
- Only give up callback records on a server if we actually talked to
that server (we might not be able to access a server).
- Fix some callback handling bugs, including refcounting,
whole-volume callbacks and when callbacks actually get broken in
response to a CB.CallBack op.
- Fix some server/address rotation bugs, including giving up if we
can't probe a server; giving up if a server says it doesn't have a
volume, but there are more servers to try.
- Fix the decoding of fetched statuses to be OpenAFS compatible.
- Fix the handling of server lookups in Cache Manager ops (such as
CB.InitCallBackState3) to use a UUID if possible and to handle no
server being found.
- Fix a bug in server lookup where not all addresses are compared.
- Fix the non-encryption of calls that prevents some servers from
being accessed (this also requires an AF_RXRPC patch that has
already gone in through the net tree).
There's also a patch that adds tracepoints to log Cache Manager ops
that don't find a matching server, either by UUID or by address"
* tag 'afs-fixes-
20180514' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Fix the non-encryption of calls
afs: Fix CB.CallBack handling
afs: Fix whole-volume callback handling
afs: Fix afs_find_server search loop
afs: Fix the handling of an unfound server in CM operations
afs: Add a tracepoint to record callbacks from unlisted servers
afs: Fix the handling of CB.InitCallBackState3 to find the server by UUID
afs: Fix VNOVOL handling in address rotation
afs: Fix AFSFetchStatus decoder to provide OpenAFS compatibility
afs: Fix server rotation's handling of fileserver probe failure
afs: Fix refcounting in callback registration
afs: Fix giving up callbacks on server destruction
afs: Fix address list parsing
afs: Fix directory page locking
Linus Torvalds [Tue, 15 May 2018 17:15:48 +0000 (10:15 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two small driver fixes: aacraid to fix an unknown IU type on task
management functions which causes a firmware fault and vmw_pvscsi to
change a return code to retry the operation instead of causing an
immediate error"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: aacraid: Correct hba_send to include iu_type
scsi: vmw-pvscsi: return DID_BUS_BUSY for adapter-initated aborts
Linus Torvalds [Tue, 15 May 2018 16:58:01 +0000 (09:58 -0700)]
Merge tag 'drm-fixes-for-v4.17-rc6-urgent' of git://people.freedesktop.org/~airlied/linux
Pull drm fix from Dave Airlie:
"This fixes the mmap regression reported to me on irc by an i686 kernel
user today, he's tested the fix works, and I've audited all the drm
drivers for the bad mmap usage and since we use the mmap offset as a
lookup in a table we aren't inclined to have anything bad in there"
[ See commit
be83bbf80682 ("mmap: introduce sane default mmap limits")
for details and the note on why the GPU drivers were expected to be a
special case. - Linus ]
* tag 'drm-fixes-for-v4.17-rc6-urgent' of git://people.freedesktop.org/~airlied/linux:
drm: set FMODE_UNSIGNED_OFFSET for drm files
Geert Uytterhoeven [Mon, 14 May 2018 10:49:37 +0000 (12:49 +0200)]
mtd: rawnand: Fix return type of __DIVIDE() when called with 32-bit
The __DIVIDE() macro checks whether it is called with a 32-bit or 64-bit
dividend, to select the appropriate divide-and-round-up routine.
As the check uses the ternary operator, the result will always be
promoted to a type that can hold both results, i.e. unsigned long long.
When using this result in a division on a 32-bit system, this may lead
to link errors like:
ERROR: "__udivdi3" [drivers/mtd/nand/raw/nand.ko] undefined!
Fix this by casting the result of the division to the type of the
dividend.
Fixes:
8878b126df769831 ("mtd: nand: add ->exec_op() implementation")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Andre Przywara [Fri, 11 May 2018 14:20:15 +0000 (15:20 +0100)]
KVM: arm/arm64: VGIC/ITS save/restore: protect kvm_read_guest() calls
kvm_read_guest() will eventually look up in kvm_memslots(), which requires
either to hold the kvm->slots_lock or to be inside a kvm->srcu critical
section.
In contrast to x86 and s390 we don't take the SRCU lock on every guest
exit, so we have to do it individually for each kvm_read_guest() call.
Use the newly introduced wrapper for that.
Cc: Stable <stable@vger.kernel.org> # 4.12+
Reported-by: Jan Glauber <jan.glauber@caviumnetworks.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Andre Przywara [Fri, 11 May 2018 14:20:14 +0000 (15:20 +0100)]
KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock
kvm_read_guest() will eventually look up in kvm_memslots(), which requires
either to hold the kvm->slots_lock or to be inside a kvm->srcu critical
section.
In contrast to x86 and s390 we don't take the SRCU lock on every guest
exit, so we have to do it individually for each kvm_read_guest() call.
Provide a wrapper which does that and use that everywhere.
Note that ending the SRCU critical section before returning from the
kvm_read_guest() wrapper is safe, because the data has been *copied*, so
we don't need to rely on valid references to the memslot anymore.
Cc: Stable <stable@vger.kernel.org> # 4.8+
Reported-by: Jan Glauber <jan.glauber@caviumnetworks.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Andre Przywara [Fri, 11 May 2018 14:20:13 +0000 (15:20 +0100)]
KVM: arm/arm64: VGIC/ITS: Promote irq_lock() in update_affinity
Apparently the development of update_affinity() overlapped with the
promotion of irq_lock to be _irqsave, so the patch didn't convert this
lock over. This will make lockdep complain.
Fix this by disabling IRQs around the lock.
Cc: stable@vger.kernel.org
Fixes:
08c9fd042117 ("KVM: arm/arm64: vITS: Add a helper to update the affinity of an LPI")
Reported-by: Jan Glauber <jan.glauber@caviumnetworks.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Andre Przywara [Fri, 11 May 2018 14:20:12 +0000 (15:20 +0100)]
KVM: arm/arm64: Properly protect VGIC locks from IRQs
As Jan reported [1], lockdep complains about the VGIC not being bullet
proof. This seems to be due to two issues:
- When commit
006df0f34930 ("KVM: arm/arm64: Support calling
vgic_update_irq_pending from irq context") promoted irq_lock and
ap_list_lock to _irqsave, we forgot two instances of irq_lock.
lockdeps seems to pick those up.
- If a lock is _irqsave, any other locks we take inside them should be
_irqsafe as well. So the lpi_list_lock needs to be promoted also.
This fixes both issues by simply making the remaining instances of those
locks _irqsave.
One irq_lock is addressed in a separate patch, to simplify backporting.
[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-May/575718.html
Cc: stable@vger.kernel.org
Fixes:
006df0f34930 ("KVM: arm/arm64: Support calling vgic_update_irq_pending from irq context")
Reported-by: Jan Glauber <jan.glauber@caviumnetworks.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Philippe Bergheaud [Mon, 14 May 2018 08:27:36 +0000 (10:27 +0200)]
cxl: Report the tunneled operations status
Failure to synchronize the tunneled operations does not prevent
the initialization of the cxl card. This patch reports the tunneled
operations status via /sys.
Signed-off-by: Philippe Bergheaud <felix@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Philippe Bergheaud [Mon, 14 May 2018 08:27:35 +0000 (10:27 +0200)]
cxl: Set the PBCQ Tunnel BAR register when enabling capi mode
Skiboot used to set the default Tunnel BAR register value when capi
mode was enabled. This approach was ok for the cxl driver, but
prevented other drivers from choosing different values.
Skiboot versions > 5.11 will not set the default value any longer.
This patch modifies the cxl driver to set/reset the Tunnel BAR
register when entering/exiting the cxl mode, with
pnv_pci_set_tunnel_bar().
That should work with old skiboot (since we are re-writing the value
already set) and new skiboot.
mpe: The tunnel support was only merged into Linux recently, in commit
d6a90bb83b50 ("powerpc/powernv: Enable tunneled operations")
(v4.17-rc1), so with new skiboot kernels between that commit and this
will not work correctly.
Fixes:
d6a90bb83b50 ("powerpc/powernv: Enable tunneled operations")
Signed-off-by: Philippe Bergheaud <felix@linux.ibm.com>
Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Eric Anholt [Mon, 9 Apr 2018 20:58:13 +0000 (13:58 -0700)]
drm/vc4: Fix leak of the file_priv that stored the perfmon.
Signed-off-by: Eric Anholt <eric@anholt.net>
Fixes:
65101d8c9108 ("drm/vc4: Expose performance counters to userspace")
Link: https://patchwork.freedesktop.org/patch/msgid/20180409205813.7077-1-eric@anholt.net
Reviewed-by: Boris Brezillon <boris.brezillon@bootlin.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Wanpeng Li [Sat, 5 May 2018 11:02:32 +0000 (04:02 -0700)]
KVM: X86: Lower the default timer frequency limit to 200us
Anthoine reported:
The period used by Windows change over time but it can be 1
milliseconds or less. I saw the limit_periodic_timer_frequency
print so 500 microseconds is sometimes reached.
As suggested by Paolo, lower the default timer frequency limit to a
smaller interval of 200 us (5000 Hz) to leave some headroom. This
is required due to Windows 10 changing the scheduler tick limit
from 1024 Hz to 2048 Hz.
Reported-by: Anthoine Bourgeois <anthoine.bourgeois@blade-group.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Anthoine Bourgeois <anthoine.bourgeois@blade-group.com>
Cc: Darren Kenny <darren.kenny@oracle.com>
Cc: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sekhar Nori [Fri, 11 May 2018 15:21:36 +0000 (20:51 +0530)]
ARM: davinci: board-dm646x-evm: set VPIF capture card name
VPIF capture driver expects card name to be set since it
uses it without checking for NULL. The commit which
introduced VPIF display and capture support added card
name only for display, not for capture.
Set it in platform data to probe driver successfully.
While at it, also fix the display card name to something more
appropriate.
Fixes:
85609c1ccda6 ("DaVinci: DM646x - platform changes for vpif capture and display drivers")
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Sekhar Nori [Fri, 11 May 2018 15:21:35 +0000 (20:51 +0530)]
ARM: davinci: board-dm646x-evm: pass correct I2C adapter id for VPIF
commit
a16cb91ad9c4 ("[media] media: vpif: use a configurable
i2c_adapter_id for vpif display") removed hardcoded I2C adaptor
setting in VPIF driver, but missed updating platform data passed
from DM646x board.
Fix it.
Fixes:
a16cb91ad9c4 ("[media] media: vpif: use a configurable i2c_adapter_id for vpif display")
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Sekhar Nori [Fri, 11 May 2018 15:21:34 +0000 (20:51 +0530)]
ARM: davinci: dm646x: fix timer interrupt generation
commit
b38434145b34 ("ARM: davinci: irqs: Correct McASP1 TX interrupt
definition for DM646x") inadvertently removed priority setting for
timer0_12 (bottom half of timer0). This timer is used as clockevent.
When INTPRIn register setting for an interrupt is left at 0, it is
mapped to FIQ by the AINTC causing the timer interrupt to not get
generated.
Fix it by including an entry for timer0_12 in interrupt priority map
array. While at it, move the clockevent comment to the right place.
Fixes:
b38434145b34 ("ARM: davinci: irqs: Correct McASP1 TX interrupt definition for DM646x")
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Shuah Khan (Samsung OSG) [Tue, 15 May 2018 02:49:58 +0000 (20:49 -0600)]
usbip: usbip_host: fix NULL-ptr deref and use-after-free errors
usbip_host updates device status without holding lock from stub probe,
disconnect and rebind code paths. When multiple requests to import a
device are received, these unprotected code paths step all over each
other and drive fails with NULL-ptr deref and use-after-free errors.
The driver uses a table lock to protect the busid array for adding and
deleting busids to the table. However, the probe, disconnect and rebind
paths get the busid table entry and update the status without holding
the busid table lock. Add a new finer grain lock to protect the busid
entry. This new lock will be held to search and update the busid entry
fields from get_busid_idx(), add_match_busid() and del_match_busid().
match_busid_show() does the same to access the busid entry fields.
get_busid_priv() changed to return the pointer to the busid entry holding
the busid lock. stub_probe(), stub_disconnect() and stub_device_rebind()
call put_busid_priv() to release the busid lock before returning. This
changes fixes the unprotected code paths eliminating the race conditions
in updating the busid entries.
Reported-by: Jakub Jirasek
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Shuah Khan (Samsung OSG) [Mon, 30 Apr 2018 22:17:20 +0000 (16:17 -0600)]
usbip: usbip_host: run rebind from exit when module is removed
After removing usbip_host module, devices it releases are left without
a driver. For example, when a keyboard or a mass storage device are
bound to usbip_host when it is removed, these devices are no longer
bound to any driver.
Fix it to run device_attach() from the module exit routine to restore
the devices to their original drivers. This includes cleanup changes
and moving device_attach() code to a common routine to be called from
rebind_store() and usbip_host_exit().
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Shuah Khan (Samsung OSG) [Mon, 30 Apr 2018 22:17:19 +0000 (16:17 -0600)]
usbip: usbip_host: delete device from busid_table after rebind
Device is left in the busid_table after unbind and rebind. Rebind
initiates usb bus scan and the original driver claims the device.
After rescan the device should be deleted from the busid_table as
it no longer belongs to usbip_host.
Fix it to delete the device after device_attach() succeeds.
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Shuah Khan [Thu, 12 Apr 2018 00:13:30 +0000 (18:13 -0600)]
usbip: usbip_host: refine probe and disconnect debug msgs to be useful
Refine probe and disconnect debug msgs to be useful and say what is
in progress.
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Rosin [Wed, 9 May 2018 19:47:48 +0000 (21:47 +0200)]
i2c: viperboard: return message count on master_xfer success
Returning zero is wrong in this case.
Signed-off-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Fixes:
174a13aa8669 ("i2c: Add viperboard i2c master driver")
Peter Rosin [Wed, 9 May 2018 19:46:30 +0000 (21:46 +0200)]
i2c: pmcmsp: fix error return from master_xfer
Returning -1 (-EPERM) is not appropriate here, go with -EIO.
Signed-off-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Fixes:
1b144df1d7d6 ("i2c: New PMC MSP71xx TWI bus driver")
Peter Rosin [Wed, 9 May 2018 19:46:29 +0000 (21:46 +0200)]
i2c: pmcmsp: return message count on master_xfer success
Returning zero is wrong in this case.
Signed-off-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Fixes:
1b144df1d7d6 ("i2c: New PMC MSP71xx TWI bus driver")
Ingo Molnar [Tue, 15 May 2018 06:20:45 +0000 (08:20 +0200)]
Merge tag 'perf-urgent-for-mingo-4.17-
20180514' of git://git./linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
- Fix segfault when processing unknown threads in cs-etm (Leo Yan)
- Fix "perf test inet_pton" on s390 failing due to missing inline (Thomas Richter)
- Display all available events on 'perf annotate --stdio' (Jin Yao)
- Add missing newline when parsing empty BPF proggie (Arnaldo Carvalho de Melo)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Julian Wiedmann [Wed, 2 May 2018 06:28:34 +0000 (08:28 +0200)]
s390/qdio: don't release memory in qdio_setup_irq()
Calling qdio_release_memory() on error is just plain wrong. It frees
the main qdio_irq struct, when following code still uses it.
Also, no other error path in qdio_establish() does this. So trust
callers to clean up via qdio_free() if some step of the QDIO
initialization fails.
Fixes:
779e6e1c724d ("[S390] qdio: new qdio driver.")
Cc: <stable@vger.kernel.org> #v2.6.27+
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Julian Wiedmann [Wed, 2 May 2018 06:48:43 +0000 (08:48 +0200)]
s390/qdio: fix access to uninitialized qdio_q fields
Ever since CQ/QAOB support was added, calling qdio_free() straight after
qdio_alloc() results in qdio_release_memory() accessing uninitialized
memory (ie. q->u.out.use_cq and q->u.out.aobs). Followed by a
kmem_cache_free() on the random AOB addresses.
For older kernels that don't have
6e30c549f6ca, the same applies if
qdio_establish() fails in the DEV_STATE_ONLINE check.
While initializing q->u.out.use_cq would be enough to fix this
particular bug, the more future-proof change is to just zero-alloc the
whole struct.
Fixes:
104ea556ee7f ("qdio: support asynchronous delivery of storage blocks")
Cc: <stable@vger.kernel.org> #v3.2+
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Josh Poimboeuf [Mon, 14 May 2018 13:53:24 +0000 (08:53 -0500)]
objtool: Detect RIP-relative switch table references
Typically a switch table can be found by detecting a .rodata access
followed an indirect jump:
1969: 4a 8b 0c e5 00 00 00 mov 0x0(,%r12,8),%rcx
1970: 00
196d: R_X86_64_32S .rodata+0x438
1971: e9 00 00 00 00 jmpq 1976 <dispc_runtime_suspend+0xb6a>
1972: R_X86_64_PC32 __x86_indirect_thunk_rcx-0x4
Randy Dunlap reported a case (seen with GCC 4.8) where the .rodata
access uses RIP-relative addressing:
19bd: 48 8b 3d 00 00 00 00 mov 0x0(%rip),%rdi # 19c4 <dispc_runtime_suspend+0xbb8>
19c0: R_X86_64_PC32 .rodata+0x45c
19c4: e9 00 00 00 00 jmpq 19c9 <dispc_runtime_suspend+0xbbd>
19c5: R_X86_64_PC32 __x86_indirect_thunk_rdi-0x4
In this case the relocation addend needs to be adjusted accordingly in
order to find the location of the switch table.
The fix is for case 3 (as described in the comments), but also make the
existing case 1 & 2 checks more precise by only adjusting the addend for
R_X86_64_PC32 relocations.
This fixes the following warnings:
drivers/video/fbdev/omap2/omapfb/dss/dispc.o: warning: objtool: dispc_runtime_suspend()+0xbb8: sibling call from callable instruction with modified stack frame
drivers/video/fbdev/omap2/omapfb/dss/dispc.o: warning: objtool: dispc_runtime_resume()+0xcc5: sibling call from callable instruction with modified stack frame
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/b6098294fd67afb69af8c47c9883d7a68bf0f8ea.1526305958.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jorge Sanjuan [Fri, 11 May 2018 15:25:35 +0000 (16:25 +0100)]
ALSA: usb-audio: Use Class Specific EP for UAC3 devices.
bmAtributes offset doesn't exist in the UAC3 CS_EP descriptor.
Hence, checking for pitch control as if it was UAC2 doesn't make
any sense. Use the defined UAC3 offsets instead.
Fixes:
9a2fe9b801f5 ("ALSA: usb: initial USB Audio Device Class 3.0 support")
Signed-off-by: Jorge Sanjuan <jorge.sanjuan@codethink.co.uk>
Reviewed-by: Ruslan Bilovol <ruslan.bilovol@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Dave Airlie [Tue, 15 May 2018 03:38:15 +0000 (13:38 +1000)]
drm: set FMODE_UNSIGNED_OFFSET for drm files
Since we have the ttm and gem vma managers using a subset
of the file address space for objects, and these start at
0x100000000 they will overflow the new mmap checks.
I've checked all the mmap routines I could see for any
bad behaviour but overall most people use GEM/TTM VMA
managers even the legacy drivers have a hashtable.
Reported-and-Tested-by: Arthur Marsh (amarsh04 on #radeon)
Fixes:
be83bbf8068 (mmap: introduce sane default mmap limits)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Steven Rostedt (VMware) [Wed, 9 May 2018 18:36:09 +0000 (14:36 -0400)]
tracing/x86/xen: Remove zero data size trace events trace_xen_mmu_flush_tlb{_all}
Doing an audit of trace events, I discovered two trace events in the xen
subsystem that use a hack to create zero data size trace events. This is not
what trace events are for. Trace events add memory footprint overhead, and
if all you need to do is see if a function is hit or not, simply make that
function noinline and use function tracer filtering.
Worse yet, the hack used was:
__array(char, x, 0)
Which creates a static string of zero in length. There's assumptions about
such constructs in ftrace that this is a dynamic string that is nul
terminated. This is not the case with these tracepoints and can cause
problems in various parts of ftrace.
Nuke the trace events!
Link: http://lkml.kernel.org/r/20180509144605.5a220327@gandalf.local.home
Cc: stable@vger.kernel.org
Fixes:
95a7d76897c1e ("xen/mmu: Use Xen specific TLB flush instead of the generic one.")
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Miquel Raynal [Tue, 24 Apr 2018 15:45:06 +0000 (17:45 +0200)]
cpufreq: armada-37xx: driver relies on cpufreq-dt
Armada-37xx driver registers a cpufreq-dt driver. Not having
CONFIG_CPUFREQ_DT selected leads to a silent abort during the probe.
Prevent that situation by having the former depending on the latter.
Fixes:
92ce45fb875d7 (cpufreq: Add DVFS support for Armada 37xx)
Cc: 4.16+ <stable@vger.kernel.org> # 4.16+
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Bob Moore [Tue, 8 May 2018 21:06:15 +0000 (14:06 -0700)]
ACPICA: Add deferred package support for the Load and loadTable operators
Completes the support and fixes a regression introduced in
version
20180209.
The regression caused package objects that were loaded by the Load and
loadTable operators. This created an error message like the following:
[ 0.251922] ACPI Error: No pointer back to namespace node in package
00000000fd2a44cd (
20180313/dsargs-303)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=199413
Fixes:
5a8361f7ecce (ACPICA: Integrate package handling with module-level code)
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Erik Schmauss <erik.schmauss@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Alexey Kodanev [Fri, 11 May 2018 17:15:13 +0000 (20:15 +0300)]
selinux: correctly handle sa_family cases in selinux_sctp_bind_connect()
Allow to pass the socket address structure with AF_UNSPEC family for
compatibility purposes. selinux_socket_bind() will further check it
for INADDR_ANY and selinux_socket_connect_helper() should return
EINVAL.
For a bad address family return EINVAL instead of AFNOSUPPORT error,
i.e. what is expected from SCTP protocol in such case.
Fixes:
d452930fd3b9 ("selinux: Add SCTP support")
Suggested-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Alexey Kodanev [Fri, 11 May 2018 17:15:12 +0000 (20:15 +0300)]
selinux: fix address family in bind() and connect() to match address/port
Since sctp_bindx() and sctp_connectx() can have multiple addresses,
sk_family can differ from sa_family. Therefore, selinux_socket_bind()
and selinux_socket_connect_helper(), which process sockaddr structure
(address and port), should use the address family from that structure
too, and not from the socket one.
The initialization of the data for the audit record is moved above,
in selinux_socket_bind(), so that there is no duplicate changes and
code.
Fixes:
d452930fd3b9 ("selinux: Add SCTP support")
Suggested-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Alexey Kodanev [Fri, 11 May 2018 17:15:11 +0000 (20:15 +0300)]
selinux: add AF_UNSPEC and INADDR_ANY checks to selinux_socket_bind()
Commit
d452930fd3b9 ("selinux: Add SCTP support") breaks compatibility
with the old programs that can pass sockaddr_in structure with AF_UNSPEC
and INADDR_ANY to bind(). As a result, bind() returns EAFNOSUPPORT error.
This was found with LTP/asapi_01 test.
Similar to commit
29c486df6a20 ("net: ipv4: relax AF_INET check in
bind()"), which relaxed AF_INET check for compatibility, add AF_UNSPEC
case to AF_INET and make sure that the address is INADDR_ANY.
Fixes:
d452930fd3b9 ("selinux: Add SCTP support")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Olof Johansson [Mon, 14 May 2018 16:27:33 +0000 (09:27 -0700)]
Merge tag 'reset-fixes-for-4.17' of git://git.pengutronix.de/pza/linux into fixes
Reset controller fixes for v4.17
Fix the USB3 reset (offset 0x200c, bit 5) on Uniphier LD20. It was
incorrectly labeled as GIO reset. This reset line is not yet used in
uniphier-ld20.dtsi.
* tag 'reset-fixes-for-4.17' of git://git.pengutronix.de/pza/linux:
reset: uniphier: fix USB clock line for LD20
Signed-off-by: Olof Johansson <olof@lixom.net>
Olof Johansson [Mon, 14 May 2018 16:25:14 +0000 (09:25 -0700)]
Merge tag 'mvebu-fixes-4.17-1' of git://git.infradead.org/linux-mvebu into fixes
mvebu fixes for 4.17 (part 1)
Declare missing clocks needed for network on Armada 8040 base boards
(such as the McBin)
* tag 'mvebu-fixes-4.17-1' of git://git.infradead.org/linux-mvebu:
ARM64: dts: marvell: armada-cp110: Add mg_core_clk for ethernet node
ARM64: dts: marvell: armada-cp110: Add clocks for the xmdio node
Signed-off-by: Olof Johansson <olof@lixom.net>
Russell King [Thu, 10 May 2018 13:24:20 +0000 (14:24 +0100)]
ARM: keystone: fix platform_domain_notifier array overrun
platform_domain_notifier contains a variable sized array, which the
pm_clk_notify() notifier treats as a NULL terminated array:
for (con_id = clknb->con_ids; *con_id; con_id++)
pm_clk_add(dev, *con_id);
Omitting the initialiser for con_ids means that the array is zero
sized, and there is no NULL terminator. This leads to pm_clk_notify()
overrunning into what ever structure follows, which may not be NULL.
This leads to an oops:
Unable to handle kernel NULL pointer dereference at virtual address
0000008c
pgd =
c0003000
[
0000008c] *pgd=
80000800004003c, *pmd=
00000000c
Internal error: Oops: 206 [#1] PREEMPT SMP ARM
Modules linked in:c
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0+ #9
Hardware name: Keystone
PC is at strlen+0x0/0x34
LR is at kstrdup+0x18/0x54
pc : [<
c0623340>] lr : [<
c0111d6c>] psr:
20000013
sp :
eec73dc0 ip :
eed780c0 fp :
00000001
r10:
00000000 r9 :
00000000 r8 :
eed71e10
r7 :
0000008c r6 :
0000008c r5 :
014000c0 r4 :
c03a6ff4
r3 :
c09445d0 r2 :
00000000 r1 :
014000c0 r0 :
0000008c
Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Control:
30c5387d Table:
00003000 DAC:
fffffffd
Process swapper/0 (pid: 1, stack limit = 0xeec72210)
Stack: (0xeec73dc0 to 0xeec74000)
...
[<
c0623340>] (strlen) from [<
c0111d6c>] (kstrdup+0x18/0x54)
[<
c0111d6c>] (kstrdup) from [<
c03a6ff4>] (__pm_clk_add+0x58/0x120)
[<
c03a6ff4>] (__pm_clk_add) from [<
c03a731c>] (pm_clk_notify+0x64/0xa8)
[<
c03a731c>] (pm_clk_notify) from [<
c004614c>] (notifier_call_chain+0x44/0x84)
[<
c004614c>] (notifier_call_chain) from [<
c0046320>] (__blocking_notifier_call_chain+0x48/0x60)
[<
c0046320>] (__blocking_notifier_call_chain) from [<
c0046350>] (blocking_notifier_call_chain+0x18/0x20)
[<
c0046350>] (blocking_notifier_call_chain) from [<
c0390234>] (device_add+0x36c/0x534)
[<
c0390234>] (device_add) from [<
c047fc00>] (of_platform_device_create_pdata+0x70/0xa4)
[<
c047fc00>] (of_platform_device_create_pdata) from [<
c047fea0>] (of_platform_bus_create+0xf0/0x1ec)
[<
c047fea0>] (of_platform_bus_create) from [<
c047fff8>] (of_platform_populate+0x5c/0xac)
[<
c047fff8>] (of_platform_populate) from [<
c08b1f04>] (of_platform_default_populate_init+0x8c/0xa8)
[<
c08b1f04>] (of_platform_default_populate_init) from [<
c000a78c>] (do_one_initcall+0x3c/0x164)
[<
c000a78c>] (do_one_initcall) from [<
c087bd9c>] (kernel_init_freeable+0x10c/0x1d0)
[<
c087bd9c>] (kernel_init_freeable) from [<
c0628db0>] (kernel_init+0x8/0xf0)
[<
c0628db0>] (kernel_init) from [<
c00090d8>] (ret_from_fork+0x14/0x3c)
Exception stack(0xeec73fb0 to 0xeec73ff8)
3fa0:
00000000 00000000 00000000 00000000
3fc0:
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
3fe0:
00000000 00000000 00000000 00000000 00000013 00000000
Code:
e3520000 1afffff7 e12fff1e c0801730 (
e5d02000)
---[ end trace
cafa8f148e262e80 ]---
Fix this by adding the necessary initialiser.
Fixes:
fc20ffe1213b ("ARM: keystone: add PM domain support for clock management")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
Daniel Glöckner [Mon, 14 May 2018 14:40:05 +0000 (09:40 -0500)]
usb: musb: fix remote wakeup racing with suspend
It has been observed that writing 0xF2 to the power register while it
reads as 0xF4 results in the register having the value 0xF0, i.e. clearing
RESUME and setting SUSPENDM in one go does not work. It might also violate
the USB spec to transition directly from resume to suspend, especially
when not taking T_DRSMDN into account. But this is what happens when a
remote wakeup occurs between SetPortFeature USB_PORT_FEAT_SUSPEND on the
root hub and musb_bus_suspend being called.
This commit returns -EBUSY when musb_bus_suspend is called while remote
wakeup is signalled and thus avoids to reset the RESUME bit. Ignoring
this error when musb_port_suspend is called from musb_hub_control is ok.
Signed-off-by: Daniel Glöckner <dg@emlix.com>
Signed-off-by: Bin Liu <b-liu@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Filipe Manana [Fri, 11 May 2018 15:42:42 +0000 (16:42 +0100)]
Btrfs: fix xattr loss after power failure
If a file has xattrs, we fsync it, to ensure we clear the flags
BTRFS_INODE_NEEDS_FULL_SYNC and BTRFS_INODE_COPY_EVERYTHING from its
inode, the current transaction commits and then we fsync it (without
either of those bits being set in its inode), we end up not logging
all its xattrs. This results in deleting all xattrs when replying the
log after a power failure.
Trivial reproducer
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ touch /mnt/foobar
$ setfattr -n user.xa -v qwerty /mnt/foobar
$ xfs_io -c "fsync" /mnt/foobar
$ sync
$ xfs_io -c "pwrite -S 0xab 0 64K" /mnt/foobar
$ xfs_io -c "fsync" /mnt/foobar
<power failure>
$ mount /dev/sdb /mnt
$ getfattr --absolute-names --dump /mnt/foobar
<empty output>
$
So fix this by making sure all xattrs are logged if we log a file's inode
item and neither the flags BTRFS_INODE_NEEDS_FULL_SYNC nor
BTRFS_INODE_COPY_EVERYTHING were set in the inode.
Fixes:
36283bf777d9 ("Btrfs: fix fsync xattr loss in the fast fsync path")
Cc: <stable@vger.kernel.org> # 4.2+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Robbie Ko [Mon, 14 May 2018 02:51:34 +0000 (10:51 +0800)]
Btrfs: send, fix invalid access to commit roots due to concurrent snapshotting
[BUG]
btrfs incremental send BUG happens when creating a snapshot of snapshot
that is being used by send.
[REASON]
The problem can happen if while we are doing a send one of the snapshots
used (parent or send) is snapshotted, because snapshoting implies COWing
the root of the source subvolume/snapshot.
1. When doing an incremental send, the send process will get the commit
roots from the parent and send snapshots, and add references to them
through extent_buffer_get().
2. When a snapshot/subvolume is snapshotted, its root node is COWed
(transaction.c:create_pending_snapshot()).
3. COWing releases the space used by the node immediately, through:
__btrfs_cow_block()
--btrfs_free_tree_block()
----btrfs_add_free_space(bytenr of node)
4. Because send doesn't hold a transaction open, it's possible that
the transaction used to create the snapshot commits, switches the
commit root and the old space used by the previous root node gets
assigned to some other node allocation. Allocation of a new node will
use the existing extent buffer found in memory, which we previously
got a reference through extent_buffer_get(), and allow the extent
buffer's content (pages) to be modified:
btrfs_alloc_tree_block
--btrfs_reserve_extent
----find_free_extent (get bytenr of old node)
--btrfs_init_new_buffer (use bytenr of old node)
----btrfs_find_create_tree_block
------alloc_extent_buffer
--------find_extent_buffer (get old node)
5. So send can access invalid memory content and have unpredictable
behaviour.
[FIX]
So we fix the problem by copying the commit roots of the send and
parent snapshots and use those copies.
CallTrace looks like this:
------------[ cut here ]------------
kernel BUG at fs/btrfs/ctree.c:1861!
invalid opcode: 0000 [#1] SMP
CPU: 6 PID: 24235 Comm: btrfs Tainted: P O 3.10.105 #23721
ffff88046652d680 ti:
ffff88041b720000 task.ti:
ffff88041b720000
RIP: 0010:[<
ffffffffa08dd0e8>] read_node_slot+0x108/0x110 [btrfs]
RSP: 0018:
ffff88041b723b68 EFLAGS:
00010246
RAX:
ffff88043ca6b000 RBX:
ffff88041b723c50 RCX:
ffff880000000000
RDX:
000000000000004c RSI:
ffff880314b133f8 RDI:
ffff880458b24000
RBP:
0000000000000000 R08:
0000000000000001 R09:
ffff88041b723c66
R10:
0000000000000001 R11:
0000000000001000 R12:
ffff8803f3e48890
R13:
ffff8803f3e48880 R14:
ffff880466351800 R15:
0000000000000001
FS:
00007f8c321dc8c0(0000) GS:
ffff88047fcc0000(0000)
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
R2:
00007efd1006d000 CR3:
0000000213a24000 CR4:
00000000003407e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Stack:
ffff88041b723c50 ffff8803f3e48880 ffff8803f3e48890 ffff8803f3e48880
ffff880466351800 0000000000000001 ffffffffa08dd9d7 ffff88041b723c50
ffff8803f3e48880 ffff88041b723c66 ffffffffa08dde85 a9ff88042d2c4400
Call Trace:
[<
ffffffffa08dd9d7>] ? tree_move_down.isra.33+0x27/0x50 [btrfs]
[<
ffffffffa08dde85>] ? tree_advance+0xb5/0xc0 [btrfs]
[<
ffffffffa08e83d4>] ? btrfs_compare_trees+0x2d4/0x760 [btrfs]
[<
ffffffffa0982050>] ? finish_inode_if_needed+0x870/0x870 [btrfs]
[<
ffffffffa09841ea>] ? btrfs_ioctl_send+0xeda/0x1050 [btrfs]
[<
ffffffffa094bd3d>] ? btrfs_ioctl+0x1e3d/0x33f0 [btrfs]
[<
ffffffff81111133>] ? handle_pte_fault+0x373/0x990
[<
ffffffff8153a096>] ? atomic_notifier_call_chain+0x16/0x20
[<
ffffffff81063256>] ? set_task_cpu+0xb6/0x1d0
[<
ffffffff811122c3>] ? handle_mm_fault+0x143/0x2a0
[<
ffffffff81539cc0>] ? __do_page_fault+0x1d0/0x500
[<
ffffffff81062f07>] ? check_preempt_curr+0x57/0x90
[<
ffffffff8115075a>] ? do_vfs_ioctl+0x4aa/0x990
[<
ffffffff81034f83>] ? do_fork+0x113/0x3b0
[<
ffffffff812dd7d7>] ? trace_hardirqs_off_thunk+0x3a/0x6c
[<
ffffffff81150cc8>] ? SyS_ioctl+0x88/0xa0
[<
ffffffff8153e422>] ? system_call_fastpath+0x16/0x1b
---[ end trace
29576629ee80b2e1 ]---
Fixes:
7069830a9e38 ("Btrfs: add btrfs_compare_trees function")
CC: stable@vger.kernel.org # 3.6+
Signed-off-by: Robbie Ko <robbieko@synology.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
David Howells [Thu, 10 May 2018 22:10:40 +0000 (23:10 +0100)]
afs: Fix the non-encryption of calls
Some AFS servers refuse to accept unencrypted traffic, so can't be accessed
with kAFS. Set the AF_RXRPC security level to encrypt client calls to deal
with this.
Note that incoming service calls are set by the remote client and so aren't
affected by this.
This requires an AF_RXRPC patch to pass the value set by setsockopt to calls
begun by the kernel.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Fri, 11 May 2018 23:28:58 +0000 (00:28 +0100)]
afs: Fix CB.CallBack handling
The handling of CB.CallBack messages sent by the fileserver to the client
is broken in that they are currently being processed after the reply has
been transmitted.
This is not what the fileserver expects, however. It holds up change
visibility until the reply comes so as to maintain cache coherency, and so
expects the client to have to refetch the state on the affected files.
Fix CB.CallBack handling to perform the callback break before sending the
reply.
The fileserver is free to hold up status fetches issued by other threads on
the same client that occur in reponse to the callback until any pending
changes have been committed.
Fixes:
d001648ec7cf ("rxrpc: Don't expose skbs to in-kernel users [ver #2]")
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Sat, 12 May 2018 21:31:33 +0000 (22:31 +0100)]
afs: Fix whole-volume callback handling
It's possible for an AFS file server to issue a whole-volume notification
that callbacks on all the vnodes in the file have been broken. This is
done for R/O and backup volumes (which don't have per-file callbacks) and
for things like a volume being taken offline.
Fix callback handling to detect whole-volume notifications, to track it
across operations and to check it during inode validation.
Fixes:
c435ee34551e ("afs: Overhaul the callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>
Marc Dionne [Sat, 12 May 2018 00:35:06 +0000 (21:35 -0300)]
afs: Fix afs_find_server search loop
The code that looks up servers by addresses makes the assumption
that the list of addresses for a server is sorted. It exits the
loop if it finds that the target address is larger than the
current candidate. As the list is not currently sorted, this
can lead to a failure to find a matching server, which can cause
callbacks from that server to be ignored.
Remove the early exit case so that the complete list is searched.
Fixes:
d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Fri, 11 May 2018 22:45:40 +0000 (23:45 +0100)]
afs: Fix the handling of an unfound server in CM operations
If the client cache manager operations that need the server record
(CB.Callback, CB.InitCallBackState, and CB.InitCallBackState3) can't find
the server record, they abort the call from the file server with
RX_CALL_DEAD when they should return okay.
Fixes:
c35eccb1f614 ("[AFS]: Implement the CB.InitCallBackState3 operation.")
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Fri, 11 May 2018 21:59:42 +0000 (22:59 +0100)]
afs: Add a tracepoint to record callbacks from unlisted servers
Add a tracepoint to record callbacks from servers for which we don't have a
record.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Fri, 11 May 2018 22:21:35 +0000 (23:21 +0100)]
afs: Fix the handling of CB.InitCallBackState3 to find the server by UUID
Fix the handling of the CB.InitCallBackState3 service call to find the
record of a server that we're using by looking it up by the UUID passed as
the parameter rather than by its address (of which it might have many, and
which may change).
Fixes:
c35eccb1f614 ("[AFS]: Implement the CB.InitCallBackState3 operation.")
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Fri, 11 May 2018 21:55:59 +0000 (22:55 +0100)]
afs: Fix VNOVOL handling in address rotation
If a volume location record lists multiple file servers for a volume, then
it's possible that due to a misconfiguration or a changing configuration
that one of the file servers doesn't know about it yet and will abort
VNOVOL. Currently, the rotation algorithm will stop with EREMOTEIO.
Fix this by moving on to try the next server if VNOVOL is returned. Once
all the servers have been tried and the record rechecked, the algorithm
will stop with EREMOTEIO or ENOMEDIUM.
Fixes:
d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Thu, 10 May 2018 20:51:47 +0000 (21:51 +0100)]
afs: Fix AFSFetchStatus decoder to provide OpenAFS compatibility
The OpenAFS server's RXAFS_InlineBulkStatus implementation has a bug
whereby if an error occurs on one of the vnodes being queried, then the
errorCode field is set correctly in the corresponding status, but the
interfaceVersion field is left unset.
Fix kAFS to deal with this by evaluating the AFSFetchStatus blob against
the following cases when called from FS.InlineBulkStatus delivery:
(1) If InterfaceVersion == 0 then:
(a) If errorCode != 0 then it indicates the abort code for the
corresponding vnode.
(b) If errorCode == 0 then the status record is invalid.
(2) If InterfaceVersion == 1 then:
(a) If errorCode != 0 then it indicates the abort code for the
corresponding vnode.
(b) If errorCode == 0 then the status record is valid and can be
parsed.
(3) If InterfaceVersion is anything else then the status record is
invalid.
Fixes:
dd9fbcb8e103 ("afs: Rearrange status mapping")
Reported-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Boris Brezillon [Wed, 9 May 2018 07:13:58 +0000 (09:13 +0200)]
mtd: rawnand: marvell: Fix read logic for layouts with ->nchunks > 2
The code is doing monolithic reads for all chunks except the last one
which is wrong since a monolithic read will issue the
READ0+ADDRS+READ_START sequence. It not only takes longer because it
forces the NAND chip to reload the page content into its internal
cache, but by doing that we also reset the column pointer to 0, which
means we'll always read the first chunk instead of moving to the next
one.
Rework the code to do a monolithic read only for the first chunk,
then switch to naked reads for all intermediate chunks and finally
issue a last naked read for the last chunk.
Fixes:
02f26ecf8c77 mtd: nand: add reworked Marvell NAND controller driver
Cc: stable@vger.kernel.org
Reported-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
Tested-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Ben Hutchings [Thu, 10 May 2018 18:20:54 +0000 (19:20 +0100)]
mtd: Fix comparison in map_word_andequal()
Commit
9e343e87d2c4 ("mtd: cfi: convert inline functions to macros")
changed map_word_andequal() into a macro, but also changed the right
hand side of the comparison from val3 to val2. Change it back to use
val3 on the right hand side.
Thankfully this did not cause a regression because all callers
currently pass the same argument for val2 and val3.
Fixes:
9e343e87d2c4 ("mtd: cfi: convert inline functions to macros")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
David Howells [Thu, 10 May 2018 13:22:38 +0000 (14:22 +0100)]
afs: Fix server rotation's handling of fileserver probe failure
The server rotation algorithm just gives up if it fails to probe a
fileserver. Fix this by rotating to the next fileserver instead.
Fixes:
d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Thu, 10 May 2018 07:43:04 +0000 (08:43 +0100)]
afs: Fix refcounting in callback registration
The refcounting on afs_cb_interest struct objects in
afs_register_server_cb_interest() is wrong as it uses the server list
entry's call back interest pointer without regard for the fact that it
might be replaced at any time and the object thrown away.
Fix this by:
(1) Put a lock on the afs_server_list struct that can be used to
mediate access to the callback interest pointers in the servers array.
(2) Keep a ref on the callback interest that we get from the entry.
(3) Dropping the old reference held by vnode->cb_interest if we replace
the pointer.
Fixes:
c435ee34551e ("afs: Overhaul the callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Thu, 10 May 2018 13:12:50 +0000 (14:12 +0100)]
afs: Fix giving up callbacks on server destruction
When a server record is destroyed, we want to send a message to the server
telling it that we're giving up all the callbacks it has promised us.
Apply two fixes to this:
(1) Only send the FS.GiveUpAllCallBacks message if we actually got a
callback from that server. We assume this to be the case if we
performed at least one successful FS operation on that server.
(2) Send it to the address last used for that server rather than always
picking the first address in the list (which might be unreachable).
Fixes:
d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Wed, 9 May 2018 21:03:18 +0000 (22:03 +0100)]
afs: Fix address list parsing
The parsing of port specifiers in the address list obtained from the DNS
resolution upcall doesn't work as in4_pton() and in6_pton() will fail on
encountering an unexpected delimiter (in this case, the '+' marking the
port number). However, in*_pton() can't be given multiple specifiers.
Fix this by finding the delimiter in advance and not relying on in*_pton()
to find the end of the address for us.
Fixes:
8b2a464ced77 ("afs: Add an address list concept")
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Fri, 27 Apr 2018 19:46:22 +0000 (20:46 +0100)]
afs: Fix directory page locking
The afs directory loading code (primarily afs_read_dir()) locks all the
pages that hold a directory's content blob to defend against
getdents/getdents races and getdents/lookup races where the competitors
issue conflicting reads on the same data. As the reads will complete
consecutively, they may retrieve different versions of the data and
one may overwrite the data that the other is busy parsing.
Fix this by not locking the pages at all, but rather by turning the
validation lock into an rwsem and getting an exclusive lock on it whilst
reading the data or validating the attributes and a shared lock whilst
parsing the data. Sharing the attribute validation lock should be fine as
the data fetch will retrieve the attributes also.
The individual page locks aren't needed at all as the only place they're
being used is to serialise data loading.
Without this patch, the:
if (!test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags)) {
...
}
part of afs_read_dir() may be skipped, leaving the pages unlocked when we
hit the success: clause - in which case we try to unlock the not-locked
pages, leading to the following oops:
page:
ffffe38b405b4300 count:3 mapcount:0 mapping:
ffff98156c83a978 index:0x0
flags: 0xfffe000001004(referenced|private)
raw:
000fffe000001004 ffff98156c83a978 0000000000000000 00000003ffffffff
raw:
dead000000000100 dead000000000200 0000000000000001 ffff98156b27c000
page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
page->mem_cgroup:
ffff98156b27c000
------------[ cut here ]------------
kernel BUG at mm/filemap.c:1205!
...
RIP: 0010:unlock_page+0x43/0x50
...
Call Trace:
afs_dir_iterate+0x789/0x8f0 [kafs]
? _cond_resched+0x15/0x30
? kmem_cache_alloc_trace+0x166/0x1d0
? afs_do_lookup+0x69/0x490 [kafs]
? afs_do_lookup+0x101/0x490 [kafs]
? key_default_cmp+0x20/0x20
? request_key+0x3c/0x80
? afs_lookup+0xf1/0x340 [kafs]
? __lookup_slow+0x97/0x150
? lookup_slow+0x35/0x50
? walk_component+0x1bf/0x490
? path_lookupat.isra.52+0x75/0x200
? filename_lookup.part.66+0xa0/0x170
? afs_end_vnode_operation+0x41/0x60 [kafs]
? __check_object_size+0x9c/0x171
? strncpy_from_user+0x4a/0x170
? vfs_statx+0x73/0xe0
? __do_sys_newlstat+0x39/0x70
? __x64_sys_getdents+0xc9/0x140
? __x64_sys_getdents+0x140/0x140
? do_syscall_64+0x5b/0x160
? entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes:
f3ddee8dc4e2 ("afs: Fix directory handling")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Chris Wilson [Fri, 11 May 2018 12:11:45 +0000 (13:11 +0100)]
drm/i915/execlists: Use rmb() to order CSB reads
We assume that the CSB is written using the normal ringbuffer
coherency protocols, as outlined in kernel/events/ring_buffer.c:
* (HW) (DRIVER)
*
* if (LOAD ->data_tail) { LOAD ->data_head
* (A) smp_rmb() (C)
* STORE $data LOAD $data
* smp_wmb() (B) smp_mb() (D)
* STORE ->data_head STORE ->data_tail
* }
So we assume that the HW fulfils its ordering requirements (B), and so
we should use a complimentary rmb (C) to ensure that our read of its
WRITE pointer is completed before we start accessing the data.
The final mb (D) is implied by the uncached mmio we perform to inform
the HW of our READ pointer.
References: https://bugs.freedesktop.org/show_bug.cgi?id=105064
References: https://bugs.freedesktop.org/show_bug.cgi?id=105888
References: https://bugs.freedesktop.org/show_bug.cgi?id=106185
Fixes:
767a983ab255 ("drm/i915/execlists: Read the context-status HEAD from the HWSP")
References:
61bf9719fa17 ("drm/i915/cnl: Use mmio access to context status buffer")
Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Timo Aaltonen <tjaalton@ubuntu.com>
Tested-by: Timo Aaltonen <tjaalton@ubuntu.com>
Acked-by: Michel Thierry <michel.thierry@intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180511121147.31915-1-chris@chris-wilson.co.uk
(cherry picked from commit
77dfedb5be03779f9a5d83e323a1b36e32090105)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Matthew Auld [Wed, 2 May 2018 19:50:21 +0000 (20:50 +0100)]
drm/i915/userptr: reject zero user_size
Operating on a zero sized GEM userptr object will lead to explosions.
Fixes:
5cc9ed4b9a7a ("drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl")
Testcase: igt/gem_userptr_blits/input-checking
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180502195021.30900-1-matthew.auld@intel.com
(cherry picked from commit
c11c7bfd213495784b22ef82a69b6489f8d0092f)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>