linux-2.6-block.git
4 years agopowerpc/64s/exception: clean up system call entry
Nicholas Piggin [Fri, 28 Jun 2019 05:33:20 +0000 (15:33 +1000)]
powerpc/64s/exception: clean up system call entry

syscall / hcall entry unnecessarily differs between KVM and non-KVM
builds. Move the SMT priority instruction to the same location
(after INTERRUPT_TO_KERNEL).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64s/exception: move paca save area offsets into exception-64s.S
Nicholas Piggin [Sat, 22 Jun 2019 13:15:35 +0000 (23:15 +1000)]
powerpc/64s/exception: move paca save area offsets into exception-64s.S

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64s/exception: remove pointless EXCEPTION_PROLOG macro indirection
Nicholas Piggin [Sat, 22 Jun 2019 13:15:34 +0000 (23:15 +1000)]
powerpc/64s/exception: remove pointless EXCEPTION_PROLOG macro indirection

No generated code change. Final vmlinux is changed only due to change
in bug table line numbers.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64s/exception: generate regs clear instructions using .rept
Nicholas Piggin [Sat, 22 Jun 2019 13:15:33 +0000 (23:15 +1000)]
powerpc/64s/exception: generate regs clear instructions using .rept

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64s/exception: fix indenting irregularities
Nicholas Piggin [Sat, 22 Jun 2019 13:15:32 +0000 (23:15 +1000)]
powerpc/64s/exception: fix indenting irregularities

Generally, macros that result in instructions being expanded are
indented by a tab, and those that don't have no indent. Fix the
obvious cases that go contrary to style.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64s/exception: use a gas macro for system call handler code
Nicholas Piggin [Sat, 22 Jun 2019 13:15:31 +0000 (23:15 +1000)]
powerpc/64s/exception: use a gas macro for system call handler code

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64s/exception: remove unused BRANCH_TO_COMMON
Nicholas Piggin [Sat, 22 Jun 2019 13:15:30 +0000 (23:15 +1000)]
powerpc/64s/exception: remove unused BRANCH_TO_COMMON

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64s/exception: remove __BRANCH_TO_KVM
Nicholas Piggin [Sat, 22 Jun 2019 13:15:29 +0000 (23:15 +1000)]
powerpc/64s/exception: remove __BRANCH_TO_KVM

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64s/exception: move head-64.h code to exception-64s.S where it is used
Nicholas Piggin [Sat, 22 Jun 2019 13:15:28 +0000 (23:15 +1000)]
powerpc/64s/exception: move head-64.h code to exception-64s.S where it is used

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64s/exception: move exception-64s.h code to exception-64s.S where it is used
Nicholas Piggin [Sat, 22 Jun 2019 13:15:27 +0000 (23:15 +1000)]
powerpc/64s/exception: move exception-64s.h code to exception-64s.S where it is used

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64s/exception: move KVM related code together
Nicholas Piggin [Sat, 22 Jun 2019 13:15:26 +0000 (23:15 +1000)]
powerpc/64s/exception: move KVM related code together

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64s/exception: remove STD_EXCEPTION_COMMON variants
Nicholas Piggin [Sat, 22 Jun 2019 13:15:25 +0000 (23:15 +1000)]
powerpc/64s/exception: remove STD_EXCEPTION_COMMON variants

These are only called in one place each.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64s/exception: move EXCEPTION_PROLOG_2* to a more logical place
Nicholas Piggin [Sat, 22 Jun 2019 13:15:24 +0000 (23:15 +1000)]
powerpc/64s/exception: move EXCEPTION_PROLOG_2* to a more logical place

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64s/exception: improve 0x500 handler code
Nicholas Piggin [Sat, 22 Jun 2019 13:15:23 +0000 (23:15 +1000)]
powerpc/64s/exception: improve 0x500 handler code

After the previous cleanup, it becomes possible to consolidate some
common code outside the runtime alternate patching. Also remove
unused labels.

This results in some code change, but unchanged runtime instruction
sequence.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64s/exception: unwind exception-64s.h macros
Nicholas Piggin [Sat, 22 Jun 2019 13:15:22 +0000 (23:15 +1000)]
powerpc/64s/exception: unwind exception-64s.h macros

Many of these macros just specify 1-4 lines which are only called a
few times each at most, and often just once. Remove this indirection.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64s/exception: Move EXCEPTION_COMMON additions into callers
Nicholas Piggin [Sat, 22 Jun 2019 13:15:21 +0000 (23:15 +1000)]
powerpc/64s/exception: Move EXCEPTION_COMMON additions into callers

More cases of code insertion via macros that does not add a great
deal. All the additions have to be specified in the macro arguments,
so they can just as well go after the macro.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64s/exception: Move EXCEPTION_COMMON handler and return branches into callers
Nicholas Piggin [Sat, 22 Jun 2019 13:15:20 +0000 (23:15 +1000)]
powerpc/64s/exception: Move EXCEPTION_COMMON handler and return branches into callers

The aim is to reduce the amount of indirection it takes to get through
the exception handler macros, particularly where it provides little
code sharing.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64s/exception: Make EXCEPTION_PROLOG_0 a gas macro for consistency with others
Nicholas Piggin [Sat, 22 Jun 2019 13:15:19 +0000 (23:15 +1000)]
powerpc/64s/exception: Make EXCEPTION_PROLOG_0 a gas macro for consistency with others

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64s/exception: KVM handler can set the HSRR trap bit
Nicholas Piggin [Sat, 22 Jun 2019 13:15:18 +0000 (23:15 +1000)]
powerpc/64s/exception: KVM handler can set the HSRR trap bit

Move the KVM trap HSRR bit into the KVM handler, which can be
conditionally applied when hsrr parameter is set.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64s/exception: merge KVM handler and skip variants
Nicholas Piggin [Sat, 22 Jun 2019 13:15:17 +0000 (23:15 +1000)]
powerpc/64s/exception: merge KVM handler and skip variants

Conditionally expand the skip case if it is specified.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64s/exception: consolidate maskable and non-maskable prologs
Nicholas Piggin [Sat, 22 Jun 2019 13:15:16 +0000 (23:15 +1000)]
powerpc/64s/exception: consolidate maskable and non-maskable prologs

Conditionally expand the soft-masking test if a mask is passed in.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64s/exception: remove the "extra" macro parameter
Nicholas Piggin [Sat, 22 Jun 2019 13:15:15 +0000 (23:15 +1000)]
powerpc/64s/exception: remove the "extra" macro parameter

Rather than pass in the soft-masking and KVM tests via macro that is
passed to another macro to expand it, switch to usig gas macros and
conditionally expand the soft-masking and KVM tests.

The system reset with its idle test is open coded as it is a one-off.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64s/exception: fix sreset KVM test code
Nicholas Piggin [Sat, 22 Jun 2019 13:15:14 +0000 (23:15 +1000)]
powerpc/64s/exception: fix sreset KVM test code

The sreset handler KVM test theoretically should not depend on P7.
In practice KVM now only supports P7 and up so no real bug fix, but
this change is made now so the quirk is not propagated through
cleanup patches.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64s/exception: move and tidy EXCEPTION_PROLOG_2 variants
Nicholas Piggin [Sat, 22 Jun 2019 13:15:13 +0000 (23:15 +1000)]
powerpc/64s/exception: move and tidy EXCEPTION_PROLOG_2 variants

- Re-name the macros to _REAL and _VIRT suffixes rather than no and
  _RELON suffix.

- Move the macro definitions together in the file.

- Move RELOCATABLE ifdef inside the _VIRT macro.

Further consolidation between variants does not buy much here.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64s/exception: consolidate EXCEPTION_PROLOG_2 with _NORI variant
Nicholas Piggin [Sat, 22 Jun 2019 13:15:12 +0000 (23:15 +1000)]
powerpc/64s/exception: consolidate EXCEPTION_PROLOG_2 with _NORI variant

Switch to a gas macro that conditionally expands the RI clearing
instruction.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64s/exception: remove H concatenation for EXC_HV variants
Nicholas Piggin [Sat, 22 Jun 2019 13:15:11 +0000 (23:15 +1000)]
powerpc/64s/exception: remove H concatenation for EXC_HV variants

Replace all instances of this with gas macros that test the hsrr
parameter and use the appropriate register names / labels.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Remove extraneous 2nd check for 0xea0 in SOFTEN_TEST]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64s/exception: Remove unused SOFTEN_VALUE_0x980
Michael Ellerman [Tue, 2 Jul 2019 08:20:52 +0000 (18:20 +1000)]
powerpc/64s/exception: Remove unused SOFTEN_VALUE_0x980

Remove SOFTEN_VALUE_0x980, it's been unused since commit
dabe859ec636 ("powerpc: Give hypervisor decrementer interrupts their
own handler") (Sep 2012).

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64s/exception: fix line wrap and semicolon inconsistencies in macros
Nicholas Piggin [Tue, 11 Jun 2019 14:30:13 +0000 (00:30 +1000)]
powerpc/64s/exception: fix line wrap and semicolon inconsistencies in macros

By convention, all lines should be separated by a semicolons. Last line
should have neither semicolon or line wrap.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/powernv: remove the unused vas_win_paste_addr and vas_win_id functions
Christoph Hellwig [Tue, 25 Jun 2019 14:52:39 +0000 (16:52 +0200)]
powerpc/powernv: remove the unused vas_win_paste_addr and vas_win_id functions

These two function have never been used anywhere in the kernel tree
since they were added to the kernel.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/powernv: remove unused NPU DMA code
Christoph Hellwig [Tue, 25 Jun 2019 14:52:38 +0000 (16:52 +0200)]
powerpc/powernv: remove unused NPU DMA code

None of these routines were ever used anywhere in the kernel tree
since they were added to the kernel.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/powernv: remove the unused tunneling exports
Christoph Hellwig [Tue, 25 Jun 2019 14:52:37 +0000 (16:52 +0200)]
powerpc/powernv: remove the unused tunneling exports

These have been unused anywhere in the kernel tree ever since they've
been added to the kernel.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/powernv: remove the unused pnv_pci_set_p2p function
Christoph Hellwig [Tue, 25 Jun 2019 14:52:36 +0000 (16:52 +0200)]
powerpc/powernv: remove the unused pnv_pci_set_p2p function

This function has never been used anywhere in the kernel tree since it
was added to the tree.  We also now have proper PCIe P2P APIs in the core
kernel, and any new P2P support should be using those.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/xmon: Fix disabling tracing while in xmon
Naveen N. Rao [Thu, 27 Jun 2019 09:59:40 +0000 (15:29 +0530)]
powerpc/xmon: Fix disabling tracing while in xmon

Commit ed49f7fd6438d ("powerpc/xmon: Disable tracing when entering
xmon") added code to disable recording trace entries while in xmon. The
commit introduced a variable 'tracing_enabled' to record if tracing was
enabled on xmon entry, and used this to conditionally enable tracing
during exit from xmon.

However, we are not checking the value of 'fromipi' variable in
xmon_core() when setting 'tracing_enabled'. Due to this, when secondary
cpus enter xmon, they will see tracing as being disabled already and
tracing won't be re-enabled on exit. Fix the same.

Fixes: ed49f7fd6438d ("powerpc/xmon: Disable tracing when entering xmon")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agoselftests/powerpc: ppc_asm.h: typo in the header guard
Denis Efremov [Sun, 23 Jun 2019 15:52:00 +0000 (18:52 +0300)]
selftests/powerpc: ppc_asm.h: typo in the header guard

The guard macro __PPC_ASM_H in the header ppc_asm.h
doesn't match the #ifndef macro _PPC_ASM_H. The patch
makes them the same.

Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/cacheflush: fix variable set but not used
Qian Cai [Thu, 6 Jun 2019 13:58:13 +0000 (09:58 -0400)]
powerpc/cacheflush: fix variable set but not used

The powerpc's flush_cache_vmap() is defined as a macro and never use
both of its arguments, so it will generate a compilation warning,

lib/ioremap.c: In function 'ioremap_page_range':
lib/ioremap.c:203:16: warning: variable 'start' set but not used
[-Wunused-but-set-variable]

Fix it by making it an inline function.

Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/eeh_cache: fix a W=1 kernel-doc warning
Qian Cai [Wed, 5 Jun 2019 20:46:19 +0000 (16:46 -0400)]
powerpc/eeh_cache: fix a W=1 kernel-doc warning

The opening comment mark "/**" is reserved for kernel-doc comments, so
it will generate a warning with "make W=1".

arch/powerpc/kernel/eeh_cache.c:37: warning: cannot understand function
prototype: 'struct pci_io_addr_range

Since this is not a kernel-doc for the struct below, but rather an
overview of this source eeh_cache.c, just use the free-form comments
kernel-doc syntax instead.

Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/ftrace: Enable C Version of recordmcount
Christophe Leroy [Tue, 7 May 2019 13:31:38 +0000 (13:31 +0000)]
powerpc/ftrace: Enable C Version of recordmcount

Selects HAVE_C_RECORDMCOUNT to use the C version of the recordmcount
intead of the old Perl Version of recordmcount.

This should improve build time. It also seems like the old Perl Version
misses some calls to _mcount that the C version finds.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agorecordmcount: Fix spurious mcount entries on powerpc
Naveen N. Rao [Wed, 26 Jun 2019 18:38:01 +0000 (00:08 +0530)]
recordmcount: Fix spurious mcount entries on powerpc

An impending change to enable HAVE_C_RECORDMCOUNT on powerpc leads to
warnings such as the following:

  # modprobe kprobe_example
  ftrace-powerpc: Not expected bl: opcode is 3c4c0001
  WARNING: CPU: 0 PID: 227 at kernel/trace/ftrace.c:2001 ftrace_bug+0x90/0x318
  Modules linked in:
  CPU: 0 PID: 227 Comm: modprobe Not tainted 5.2.0-rc6-00678-g1c329100b942 #2
  NIP:  c000000000264318 LR: c00000000025d694 CTR: c000000000f5cd30
  REGS: c000000001f2b7b0 TRAP: 0700   Not tainted  (5.2.0-rc6-00678-g1c329100b942)
  MSR:  900000010282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 28228222  XER: 00000000
  CFAR: c0000000002642fc IRQMASK: 0
  <snip>
  NIP [c000000000264318] ftrace_bug+0x90/0x318
  LR [c00000000025d694] ftrace_process_locs+0x4f4/0x5e0
  Call Trace:
  [c000000001f2ba40] [0000000000000004] 0x4 (unreliable)
  [c000000001f2bad0] [c00000000025d694] ftrace_process_locs+0x4f4/0x5e0
  [c000000001f2bb90] [c00000000020ff10] load_module+0x25b0/0x30c0
  [c000000001f2bd00] [c000000000210cb0] sys_finit_module+0xc0/0x130
  [c000000001f2be20] [c00000000000bda4] system_call+0x5c/0x70
  Instruction dump:
  419e0018 2f83ffff 419e00bc 2f83ffea 409e00cc 4800001c 0fe00000 3c62ff96
  39000001 39400000 386386d0 480000c4 <0fe000003ce20003 39000001 3c62ff96
  ---[ end trace 4c438d5cebf78381 ]---
  ftrace failed to modify
  [<c0080000012a0008>] 0xc0080000012a0008
   actual:   01:00:4c:3c
  Initializing ftrace call sites
  ftrace record flags: 2000000
   (0)
   expected tramp: c00000000006af4c

Looking at the relocation records in __mcount_loc shows a few spurious
entries:

  RELOCATION RECORDS FOR [__mcount_loc]:
  OFFSET           TYPE              VALUE
  0000000000000000 R_PPC64_ADDR64    .text.unlikely+0x0000000000000008
  0000000000000008 R_PPC64_ADDR64    .text.unlikely+0x0000000000000014
  0000000000000010 R_PPC64_ADDR64    .text.unlikely+0x0000000000000060
  0000000000000018 R_PPC64_ADDR64    .text.unlikely+0x00000000000000b4
  0000000000000020 R_PPC64_ADDR64    .init.text+0x0000000000000008
  0000000000000028 R_PPC64_ADDR64    .init.text+0x0000000000000014

The first entry in each section is incorrect. Looking at the
relocation records, the spurious entries correspond to the
R_PPC64_ENTRY records:

  RELOCATION RECORDS FOR [.text.unlikely]:
  OFFSET           TYPE              VALUE
  0000000000000000 R_PPC64_REL64     .TOC.-0x0000000000000008
  0000000000000008 R_PPC64_ENTRY     *ABS*
  0000000000000014 R_PPC64_REL24     _mcount
  <snip>

The problem is that we are not validating the return value from
get_mcountsym() in sift_rel_mcount(). With this entry, mcountsym is 0,
but Elf_r_sym(relp) also ends up being 0. Fix this by ensuring
mcountsym is valid before processing the entry.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Tested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/rtas: retry when cpu offline races with suspend/migration
Nathan Lynch [Fri, 21 Jun 2019 06:05:18 +0000 (01:05 -0500)]
powerpc/rtas: retry when cpu offline races with suspend/migration

The protocol for suspending or migrating an LPAR requires all present
processor threads to enter H_JOIN. So if we have threads offline, we
have to temporarily bring them up. This can race with administrator
actions such as SMT state changes. As of dfd718a2ed1f ("powerpc/rtas:
Fix a potential race between CPU-Offline & Migration"),
rtas_ibm_suspend_me() accounts for this, but errors out with -EBUSY
for what almost certainly is a transient condition in any reasonable
scenario.

Callers of rtas_ibm_suspend_me() already retry when -EAGAIN is
returned, and it is typical during a migration for that to happen
repeatedly for several minutes polling the H_VASI_STATE hcall result
before proceeding to the next stage.

So return -EAGAIN instead of -EBUSY when this race is
encountered. Additionally: logging this event is still appropriate but
use pr_info instead of pr_err; and remove use of unlikely() while here
as this is not a hot path at all.

Fixes: dfd718a2ed1f ("powerpc/rtas: Fix a potential race between CPU-Offline & Migration")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc: Document xive=off option
Michael Neuling [Mon, 13 May 2019 05:39:10 +0000 (15:39 +1000)]
powerpc: Document xive=off option

commit 243e25112d06 ("powerpc/xive: Native exploitation of the XIVE
interrupt controller") added an option to turn off Linux native XIVE
usage via the xive=off kernel command line option.

This documents this option.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Stewart Smith <stewart@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agoMerge branch 'fixes' into next
Michael Ellerman [Mon, 1 Jul 2019 04:04:39 +0000 (14:04 +1000)]
Merge branch 'fixes' into next

Merge our fixes branch into next, this brings in a number of commits
that fix bugs we don't want to hit in next, in particular the fix for
CVE-2019-12817.

4 years agoMerge tag 'powerpc-5.2-6' into fixes
Michael Ellerman [Mon, 1 Jul 2019 03:59:40 +0000 (13:59 +1000)]
Merge tag 'powerpc-5.2-6' into fixes

This merges the commits that were the fix for CVE-2019-12817, which was
developed under embargo. They have already been merged by Linus

Merge them into fixes now so that this branch contains all the fixes for
this release.

4 years agopowerpc/64s/exception: Fix machine check early corrupting AMR
Nicholas Piggin [Fri, 21 Jun 2019 22:55:54 +0000 (08:55 +1000)]
powerpc/64s/exception: Fix machine check early corrupting AMR

The early machine check runs in real mode, so locking is unnecessary.
Worse, the windup does not restore AMR, so this can result in a false
KUAP fault after a recoverable machine check hits inside a user copy
operation.

Fix this similarly to HMI by just avoiding the kuap lock in the
early machine check handler (it will be set by the late handler that
runs in virtual mode if that runs). If the virtual mode handler is
reached, it will lock and restore the AMR.

Fixes: 890274c2dc4c0 ("powerpc/64s: Implement KUAP for Radix MMU")
Cc: Russell Currey <ruscur@russell.cc>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agoKVM: PPC: Book3S HV: Clear pending decrementer exceptions on nested guest entry
Suraj Jitindar Singh [Thu, 20 Jun 2019 01:46:51 +0000 (11:46 +1000)]
KVM: PPC: Book3S HV: Clear pending decrementer exceptions on nested guest entry

If we enter an L1 guest with a pending decrementer exception then this
is cleared on guest exit if the guest has writtien a positive value
into the decrementer (indicating that it handled the decrementer
exception) since there is no other way to detect that the guest has
handled the pending exception and that it should be dequeued. In the
event that the L1 guest tries to run a nested (L2) guest immediately
after this and the L2 guest decrementer is negative (which is loaded
by L1 before making the H_ENTER_NESTED hcall), then the pending
decrementer exception isn't cleared and the L2 entry is blocked since
L1 has a pending exception, even though L1 may have already handled
the exception and written a positive value for it's decrementer. This
results in a loop of L1 trying to enter the L2 guest and L0 blocking
the entry since L1 has an interrupt pending with the outcome being
that L2 never gets to run and hangs.

Fix this by clearing any pending decrementer exceptions when L1 makes
the H_ENTER_NESTED hcall since it won't do this if it's decrementer
has gone negative, and anyway it's decrementer has been communicated
to L0 in the hdec_expires field and L0 will return control to L1 when
this goes negative by delivering an H_DECREMENTER exception.

Fixes: 95a6432ce903 ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests")
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agoKVM: PPC: Book3S HV: Signed extend decrementer value if not using large decrementer
Suraj Jitindar Singh [Thu, 20 Jun 2019 01:46:50 +0000 (11:46 +1000)]
KVM: PPC: Book3S HV: Signed extend decrementer value if not using large decrementer

On POWER9 the decrementer can operate in large decrementer mode where
the decrementer is 56 bits and signed extended to 64 bits. When not
operating in this mode the decrementer behaves as a 32 bit decrementer
which is NOT signed extended (as on POWER8).

Currently when reading a guest decrementer value we don't take into
account whether the large decrementer is enabled or not, and this
means the value will be incorrect when the guest is not using the
large decrementer. Fix this by sign extending the value read when the
guest isn't using the large decrementer.

Fixes: 95a6432ce903 ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests")
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agoKVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries
Suraj Jitindar Singh [Thu, 20 Jun 2019 01:46:49 +0000 (11:46 +1000)]
KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries

When a guest vcpu moves from one physical thread to another it is
necessary for the host to perform a tlb flush on the previous core if
another vcpu from the same guest is going to run there. This is because the
guest may use the local form of the tlb invalidation instruction meaning
stale tlb entries would persist where it previously ran. This is handled
on guest entry in kvmppc_check_need_tlb_flush() which calls
flush_guest_tlb() to perform the tlb flush.

Previously the generic radix__local_flush_tlb_lpid_guest() function was
used, however the functionality was reimplemented in flush_guest_tlb()
to avoid the trace_tlbie() call as the flushing may be done in real
mode. The reimplementation in flush_guest_tlb() was missing an erat
invalidation after flushing the tlb.

This lead to observable memory corruption in the guest due to the
caching of stale translations. Fix this by adding the erat invalidation.

Fixes: 70ea13f6e609 ("KVM: PPC: Book3S HV: Flush TLB on secondary radix threads")
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/pci/of: Fix OF flags parsing for 64bit BARs
Alexey Kardashevskiy [Wed, 5 Jun 2019 03:38:14 +0000 (13:38 +1000)]
powerpc/pci/of: Fix OF flags parsing for 64bit BARs

When the firmware does PCI BAR resource allocation, it passes the assigned
addresses and flags (prefetch/64bit/...) via the "reg" property of
a PCI device device tree node so the kernel does not need to do
resource allocation.

The flags are stored in resource::flags - the lower byte stores
PCI_BASE_ADDRESS_SPACE/etc bits and the other bytes are IORESOURCE_IO/etc.
Some flags from PCI_BASE_ADDRESS_xxx and IORESOURCE_xxx are duplicated,
such as PCI_BASE_ADDRESS_MEM_PREFETCH/PCI_BASE_ADDRESS_MEM_TYPE_64/etc.
When parsing the "reg" property, we copy the prefetch flag but we skip
on PCI_BASE_ADDRESS_MEM_TYPE_64 which leaves the flags out of sync.

The missing IORESOURCE_MEM_64 flag comes into play under 2 conditions:
1. we remove PCI_PROBE_ONLY for pseries (by hacking pSeries_setup_arch()
or by passing "/chosen/linux,pci-probe-only");
2. we request resource alignment (by passing pci=resource_alignment=
via the kernel cmd line to request PAGE_SIZE alignment or defining
ppc_md.pcibios_default_alignment which returns anything but 0). Note that
the alignment requests are ignored if PCI_PROBE_ONLY is enabled.

With 1) and 2), the generic PCI code in the kernel unconditionally
decides to:
- reassign the BARs in pci_specified_resource_alignment() (works fine)
- write new BARs to the device - this fails for 64bit BARs as the generic
code looks at IORESOURCE_MEM_64 (not set) and writes only lower 32bits
of the BAR and leaves the upper 32bit unmodified which breaks BAR mapping
in the hypervisor.

This fixes the issue by copying the flag. This is useful if we want to
enforce certain BAR alignment per platform as handling subpage sized BARs
is proven to cause problems with hotplug (SLOF already aligns BARs to 64k).

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Reviewed-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Shawn Anastasio <shawn@anastas.io>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc: enable a 30-bit ZONE_DMA for 32-bit pmac
Christoph Hellwig [Thu, 13 Jun 2019 08:24:46 +0000 (10:24 +0200)]
powerpc: enable a 30-bit ZONE_DMA for 32-bit pmac

With the strict dma mask checking introduced with the switch to
the generic DMA direct code common wifi chips on 32-bit powerbooks
stopped working.  Add a 30-bit ZONE_DMA to the 32-bit pmac builds
to allow them to reliably allocate dma coherent memory.

Fixes: 65a21b71f948 ("powerpc/dma: remove dma_nommu_dma_supported")
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Larry Finger <Larry.Finger@lwfinger.net>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64s/radix: Enable HAVE_ARCH_HUGE_VMAP
Nicholas Piggin [Mon, 10 Jun 2019 03:08:18 +0000 (13:08 +1000)]
powerpc/64s/radix: Enable HAVE_ARCH_HUGE_VMAP

This sets the HAVE_ARCH_HUGE_VMAP option, and defines the required
page table functions.

This enables huge (2MB and 1GB) ioremap mappings. I don't have a
benchmark for this change, but huge vmap will be used by a later core
kernel change to enable huge vmalloc memory mappings. This improves
cached `git diff` performance by about 5% on a 2-node POWER9 with 32MB
size dentry cache hash.

  Profiling git diff dTLB misses with a vanilla kernel:

  81.75%  git      [kernel.vmlinux]    [k] __d_lookup_rcu
   7.21%  git      [kernel.vmlinux]    [k] strncpy_from_user
   1.77%  git      [kernel.vmlinux]    [k] find_get_entry
   1.59%  git      [kernel.vmlinux]    [k] kmem_cache_free

            40,168      dTLB-miss
       0.100342754 seconds time elapsed

  With powerpc huge vmalloc:

             2,987      dTLB-miss
       0.095933138 seconds time elapsed

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64s/radix: ioremap use ioremap_page_range
Nicholas Piggin [Mon, 10 Jun 2019 03:08:17 +0000 (13:08 +1000)]
powerpc/64s/radix: ioremap use ioremap_page_range

Radix can use ioremap_page_range for ioremap, after slab is available.
This makes it possible to enable huge ioremap mapping support.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64: __ioremap_at clean up in the error case
Nicholas Piggin [Mon, 10 Jun 2019 03:08:16 +0000 (13:08 +1000)]
powerpc/64: __ioremap_at clean up in the error case

__ioremap_at error handling is wonky, it requires caller to clean up
after it. Implement a helper that does the map and error cleanup and
remove the requirement from the caller.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/perf: Use cpumask_last() to determine the designated cpu for nest/core units.
Anju T Sudhakar [Mon, 10 Jun 2019 06:32:29 +0000 (12:02 +0530)]
powerpc/perf: Use cpumask_last() to determine the designated cpu for nest/core units.

Nest and core IMC (In-Memory Collection counters) assigns a particular
cpu as the designated target for counter data collection. During
system boot, the first online cpu in a chip gets assigned as the
designated cpu for that chip(for nest-imc) and the first online cpu in
a core gets assigned as the designated cpu for that core(for
core-imc).

If the designated cpu goes offline, the next online cpu from the same
chip(for nest-imc)/core(for core-imc) is assigned as the next target,
and the event context is migrated to the target cpu. Currently,
cpumask_any_but() function is used to find the target cpu. Though this
function is expected to return a `random` cpu, this always returns the
next online cpu.

If all cpus in a chip/core is offlined in a sequential manner,
starting from the first cpu, the event migration has to happen for all
the cpus which goes offline. Since the migration process involves a
grace period, the total time taken to offline all the cpus will be
significantly high.

Example:
  In a system which has 2 sockets, with
  NUMA node0 CPU(s):     0-87
  NUMA node8 CPU(s):     88-175

  Time taken to offline cpu 88-175:
  real    2m56.099s
  user    0m0.191s
  sys     0m0.000s

Use cpumask_last() to choose the target cpu, when the designated cpu
goes online, so the migration will happen only when the last_cpu in
the mask goes offline. This way the time taken to offline all cpus in
a chip/core can be reduced.

With the patch:

  Time taken  to offline cpu 88-175:
  real    0m12.207s
  user    0m0.171s
  sys     0m0.000s

Offlining all cpus in reverse order is also taken care because,
cpumask_any_but() is used to find the designated cpu if the last cpu
in the mask goes offline. Since cpumask_any_but() always return the
first cpu in the mask, that becomes the designated cpu and migration
will happen only when the first_cpu in the mask goes offline.

Example: With the patch,

  Time taken to offline cpu from 175-88:
  real    0m9.330s
  user    0m0.110s
  sys     0m0.000s

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/64s: Fix misleading SPR and timebase information
Shaokun Zhang [Wed, 29 May 2019 09:21:51 +0000 (17:21 +0800)]
powerpc/64s: Fix misleading SPR and timebase information

pr_info shows SPR and timebase as a decimal value with a '0x'
prefix, which is somewhat misleading.

Fix it to print hexadecimal, as was intended.

Fixes: 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C")
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/pseries: avoid blocking in irq when queuing hotplug events
Nathan Lynch [Tue, 28 May 2019 23:28:01 +0000 (18:28 -0500)]
powerpc/pseries: avoid blocking in irq when queuing hotplug events

A couple of bugs in queue_hotplug_event():

1. Unchecked kmalloc result which could lead to an oops.
2. Use of GFP_KERNEL allocations in interrupt context (this code's
   only caller is ras_hotplug_interrupt()).

Use kmemdup to avoid open-coding the allocation+copy and check for
failure; use GFP_ATOMIC for both allocations.

Ultimately it probably would be better to avoid or reduce allocations
in this path if possible.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/watchpoint: Restore NV GPRs while returning from exception
Ravi Bangoria [Thu, 13 Jun 2019 03:30:14 +0000 (09:00 +0530)]
powerpc/watchpoint: Restore NV GPRs while returning from exception

powerpc hardware triggers watchpoint before executing the instruction.
To make trigger-after-execute behavior, kernel emulates the
instruction. If the instruction is 'load something into non-volatile
register', exception handler should restore emulated register state
while returning back, otherwise there will be register state
corruption. eg, adding a watchpoint on a list can corrput the list:

  # cat /proc/kallsyms | grep kthread_create_list
  c00000000121c8b8 d kthread_create_list

Add watchpoint on kthread_create_list->prev:

  # perf record -e mem:0xc00000000121c8c0

Run some workload such that new kthread gets invoked. eg, I just
logged out from console:

  list_add corruption. next->prev should be prev (c000000001214e00), \
but was c00000000121c8b8. (next=c00000000121c8b8).
  WARNING: CPU: 59 PID: 309 at lib/list_debug.c:25 __list_add_valid+0xb4/0xc0
  CPU: 59 PID: 309 Comm: kworker/59:0 Kdump: loaded Not tainted 5.1.0-rc7+ #69
  ...
  NIP __list_add_valid+0xb4/0xc0
  LR __list_add_valid+0xb0/0xc0
  Call Trace:
  __list_add_valid+0xb0/0xc0 (unreliable)
  __kthread_create_on_node+0xe0/0x260
  kthread_create_on_node+0x34/0x50
  create_worker+0xe8/0x260
  worker_thread+0x444/0x560
  kthread+0x160/0x1a0
  ret_from_kernel_thread+0x5c/0x70

List corruption happened because it uses 'load into non-volatile
register' instruction:

Snippet from __kthread_create_on_node:

  c000000000136be8:     addis   r29,r2,-19
  c000000000136bec:     ld      r29,31424(r29)
        if (!__list_add_valid(new, prev, next))
  c000000000136bf0:     mr      r3,r30
  c000000000136bf4:     mr      r5,r28
  c000000000136bf8:     mr      r4,r29
  c000000000136bfc:     bl      c00000000059a2f8 <__list_add_valid+0x8>

Register state from WARN_ON():

  GPR00: c00000000059a3a0 c000007ff23afb50 c000000001344e00 0000000000000075
  GPR04: 0000000000000000 0000000000000000 0000001852af8bc1 0000000000000000
  GPR08: 0000000000000001 0000000000000007 0000000000000006 00000000000004aa
  GPR12: 0000000000000000 c000007ffffeb080 c000000000137038 c000005ff62aaa00
  GPR16: 0000000000000000 0000000000000000 c000007fffbe7600 c000007fffbe7370
  GPR20: c000007fffbe7320 c000007fffbe7300 c000000001373a00 0000000000000000
  GPR24: fffffffffffffef7 c00000000012e320 c000007ff23afcb0 c000000000cb8628
  GPR28: c00000000121c8b8 c000000001214e00 c000007fef5b17e8 c000007fef5b17c0

Watchpoint hit at 0xc000000000136bec.

  addis   r29,r2,-19
   => r29 = 0xc000000001344e00 + (-19 << 16)
   => r29 = 0xc000000001214e00

  ld      r29,31424(r29)
   => r29 = *(0xc000000001214e00 + 31424)
   => r29 = *(0xc00000000121c8c0)

0xc00000000121c8c0 is where we placed a watchpoint and thus this
instruction was emulated by emulate_step. But because handle_dabr_fault
did not restore emulated register state, r29 still contains stale
value in above register state.

Fixes: 5aae8a5370802 ("powerpc, hw_breakpoints: Implement hw_breakpoints for 64-bit server processors")
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: stable@vger.kernel.org # 2.6.36+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agocxl: no need to check return value of debugfs_create functions
Greg Kroah-Hartman [Wed, 12 Jun 2019 15:54:18 +0000 (17:54 +0200)]
cxl: no need to check return value of debugfs_create functions

When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Because there's no need to check, also make the return value of the
local debugfs_create_io_x64() call void, as no one ever did anything
with the return value (as they did not need to.)

And make the cxl_debugfs_* calls return void as no one was even checking
their return value at all.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Acked-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/ps3: Use [] to denote a flexible array member
Geert Uytterhoeven [Mon, 17 Jun 2019 09:07:13 +0000 (11:07 +0200)]
powerpc/ps3: Use [] to denote a flexible array member

Flexible array members should be denoted using [] instead of [0], else
gcc will not warn when they are no longer at the end of the structure.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/mm/32s: fix condition that is always true
Andreas Schwab [Mon, 17 Jun 2019 21:22:20 +0000 (23:22 +0200)]
powerpc/mm/32s: fix condition that is always true

Move a misplaced paren that makes the condition always true.

Fixes: 63b2bc619565 ("powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX")
Cc: stable@vger.kernel.org # v5.1+
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agopowerpc/32s: fix suspend/resume when IBATs 4-7 are used
Christophe Leroy [Mon, 17 Jun 2019 21:42:14 +0000 (21:42 +0000)]
powerpc/32s: fix suspend/resume when IBATs 4-7 are used

Previously, only IBAT1 and IBAT2 were used to map kernel linear mem.
Since commit 63b2bc619565 ("powerpc/mm/32s: Use BATs for
STRICT_KERNEL_RWX"), we may have all 8 BATs used for mapping
kernel text. But the suspend/restore functions only save/restore
BATs 0 to 3, and clears BATs 4 to 7.

Make suspend and restore functions respectively save and reload
the 8 BATs on CPUs having MMU_FTR_USE_HIGH_BATS feature.

Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
4 years agoselftests/powerpc: Fix earlyclobber in tm-vmxcopy
Gustavo Romero [Mon, 17 Jun 2019 21:24:58 +0000 (17:24 -0400)]
selftests/powerpc: Fix earlyclobber in tm-vmxcopy

In some cases, compiler can allocate the same register for operand 'res'
and 'vecoutptr', resulting in segfault at 'stxvd2x 40,0,%[vecoutptr]'
because base register will contain 1, yielding a false-positive.

This is because output 'res' must be marked as an earlyclobber operand so
it may not overlap an input operand ('vecoutptr').

Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agoKVM: PPC: Book3S HV: Only write DAWR[X] when handling h_set_dawr in real mode
Suraj Jitindar Singh [Mon, 17 Jun 2019 07:16:19 +0000 (17:16 +1000)]
KVM: PPC: Book3S HV: Only write DAWR[X] when handling h_set_dawr in real mode

The hcall H_SET_DAWR is used by a guest to set the data address
watchpoint register (DAWR). This hcall is handled in the host in
kvmppc_h_set_dawr() which can be called in either real mode on the
guest exit path from hcall_try_real_mode() in book3s_hv_rmhandlers.S,
or in virtual mode when called from kvmppc_pseries_do_hcall() in
book3s_hv.c.

The function kvmppc_h_set_dawr() updates the dawr and dawrx fields in
the vcpu struct accordingly and then also writes the respective values
into the DAWR and DAWRX registers directly. It is necessary to write
the registers directly here when calling the function in real mode
since the path to re-enter the guest won't do this. However when in
virtual mode the host DAWR and DAWRX values have already been
restored, and so writing the registers would overwrite these.
Additionally there is no reason to write the guest values here as
these will be read from the vcpu struct and written to the registers
appropriately the next time the vcpu is run.

This also avoids the case when handling h_set_dawr for a nested guest
where the guest hypervisor isn't able to write the DAWR and DAWRX
registers directly and must rely on the real hypervisor to do this for
it when it calls H_ENTER_NESTED.

Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option")
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agoKVM: PPC: Book3S HV: Fix r3 corruption in h_set_dabr()
Michael Neuling [Mon, 17 Jun 2019 07:16:18 +0000 (17:16 +1000)]
KVM: PPC: Book3S HV: Fix r3 corruption in h_set_dabr()

Commit c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option")
screwed up some assembler and corrupted a pointer in r3. This resulted
in crashes like the below:

  BUG: Kernel NULL pointer dereference at 0x000013bf
  Faulting instruction address: 0xc00000000010b044
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  CPU: 8 PID: 1771 Comm: qemu-system-ppc Kdump: loaded Not tainted 5.2.0-rc4+ #3
  NIP:  c00000000010b044 LR: c0080000089dacf4 CTR: c00000000010aff4
  REGS: c00000179b397710 TRAP: 0300   Not tainted  (5.2.0-rc4+)
  MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 42244842  XER: 00000000
  CFAR: c00000000010aff8 DAR: 00000000000013bf DSISR: 42000000 IRQMASK: 0
  GPR00: c0080000089dd6bc c00000179b3979a0 c008000008a04300 ffffffffffffffff
  GPR04: 0000000000000000 0000000000000003 000000002444b05d c0000017f11c45d0
  ...
  NIP kvmppc_h_set_dabr+0x50/0x68
  LR  kvmppc_pseries_do_hcall+0xa3c/0xeb0 [kvm_hv]
  Call Trace:
    0xc0000017f11c0000 (unreliable)
    kvmppc_vcpu_run_hv+0x694/0xec0 [kvm_hv]
    kvmppc_vcpu_run+0x34/0x48 [kvm]
    kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm]
    kvm_vcpu_ioctl+0x460/0x850 [kvm]
    do_vfs_ioctl+0xe4/0xb40
    ksys_ioctl+0xc4/0x110
    sys_ioctl+0x28/0x80
    system_call+0x5c/0x70
  Instruction dump:
  4082fff4 4c00012c 38600000 4e800020 e96280c0 896b0000 2c2b0000 3860ffff
  4d820020 50852e74 508516f6 78840724 <f88313c0f8a313c8 7c942ba6 7cbc2ba6

Fix the bug by only changing r3 when we are returning immediately.

Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option")
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reported-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/32: fix build failure on book3e with KVM
Christophe Leroy [Thu, 23 May 2019 08:39:27 +0000 (08:39 +0000)]
powerpc/32: fix build failure on book3e with KVM

Build failure was introduced by the commit identified below,
due to missed macro expension leading to wrong called function's name.

arch/powerpc/kernel/head_fsl_booke.o: In function `SystemCall':
arch/powerpc/kernel/head_fsl_booke.S:416: undefined reference to `kvmppc_handler_BOOKE_INTERRUPT_SYSCALL_SPRN_SRR1'
Makefile:1052: recipe for target 'vmlinux' failed

The called function should be kvmppc_handler_8_0x01B(). This patch fixes it.

Reported-by: Paul Mackerras <paulus@ozlabs.org>
Fixes: 1a4b739bbb4f ("powerpc/32: implement fast entry for syscalls on BOOKE")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/64: mark start_here_multiplatform as __ref
Christophe Leroy [Fri, 10 May 2019 06:31:28 +0000 (06:31 +0000)]
powerpc/64: mark start_here_multiplatform as __ref

Otherwise, the following warning is encountered:

WARNING: vmlinux.o(.text+0x3dc6): Section mismatch in reference from the variable start_here_multiplatform to the function .init.text:.early_setup()
The function start_here_multiplatform() references
the function __init .early_setup().
This is often because start_here_multiplatform lacks a __init
annotation or the annotation of .early_setup is wrong.

Fixes: 56c46bba9bbf ("powerpc/64: Fix booting large kernels with STRICT_KERNEL_RWX")
Cc: Russell Currey <ruscur@russell.cc>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/booke: fix fast syscall entry on SMP
Christophe Leroy [Thu, 13 Jun 2019 13:52:30 +0000 (13:52 +0000)]
powerpc/booke: fix fast syscall entry on SMP

Use r10 instead of r9 to calculate CPU offset as r9 contains
the value from SRR1 which is used later.

Fixes: 1a4b739bbb4f ("powerpc/32: implement fast entry for syscalls on BOOKE")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/32s: fix initial setup of segment registers on secondary CPU
Christophe Leroy [Tue, 11 Jun 2019 15:47:20 +0000 (15:47 +0000)]
powerpc/32s: fix initial setup of segment registers on secondary CPU

The patch referenced below moved the loading of segment registers
out of load_up_mmu() in order to do it earlier in the boot sequence.
However, the secondary CPU still needs it to be done when loading up
the MMU.

Reported-by: Erhard F. <erhard_f@mailbox.org>
Fixes: 215b823707ce ("powerpc/32s: set up an early static hash table for KASAN")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/pseries/mobility: rebuild cacheinfo hierarchy post-migration
Nathan Lynch [Wed, 12 Jun 2019 04:45:06 +0000 (23:45 -0500)]
powerpc/pseries/mobility: rebuild cacheinfo hierarchy post-migration

It's common for the platform to replace the cache device nodes after a
migration. Since the cacheinfo code is never informed about this, it
never drops its references to the source system's cache nodes, causing
it to wind up in an inconsistent state resulting in warnings and oopses
as soon as CPU online/offline occurs after the migration, e.g.

  cache for /cpus/l3-cache@3113(Unified) refers to cache for /cpus/l2-cache@200d(Unified)
  WARNING: CPU: 15 PID: 86 at arch/powerpc/kernel/cacheinfo.c:176 release_cache+0x1bc/0x1d0
  [...]
  NIP release_cache+0x1bc/0x1d0
  LR  release_cache+0x1b8/0x1d0
  Call Trace:
    release_cache+0x1b8/0x1d0 (unreliable)
    cacheinfo_cpu_offline+0x1c4/0x2c0
    unregister_cpu_online+0x1b8/0x260
    cpuhp_invoke_callback+0x114/0xf40
    cpuhp_thread_fun+0x270/0x310
    smpboot_thread_fn+0x2c8/0x390
    kthread+0x1b8/0x1c0
    ret_from_kernel_thread+0x5c/0x68

Using device tree notifiers won't work since we want to rebuild the
hierarchy only after all the removals and additions have occurred and
the device tree is in a consistent state. Call cacheinfo_teardown()
before processing device tree updates, and rebuild the hierarchy
afterward.

Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/pseries/mobility: prevent cpu hotplug during DT update
Nathan Lynch [Wed, 12 Jun 2019 04:45:05 +0000 (23:45 -0500)]
powerpc/pseries/mobility: prevent cpu hotplug during DT update

CPU online/offline code paths are sensitive to parts of the device
tree (various cpu node properties, cache nodes) that can be changed as
a result of a migration.

Prevent CPU hotplug while the device tree potentially is inconsistent.

Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/cacheinfo: add cacheinfo_teardown, cacheinfo_rebuild
Nathan Lynch [Wed, 12 Jun 2019 04:45:04 +0000 (23:45 -0500)]
powerpc/cacheinfo: add cacheinfo_teardown, cacheinfo_rebuild

Allow external callers to force the cacheinfo code to release all its
references to cache nodes, e.g. before processing device tree updates
post-migration, and to rebuild the hierarchy afterward.

CPU online/offline must be blocked by callers; enforce this.

Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/pseries: Fix oops in hotplug memory notifier
Nathan Lynch [Fri, 7 Jun 2019 05:04:07 +0000 (00:04 -0500)]
powerpc/pseries: Fix oops in hotplug memory notifier

During post-migration device tree updates, we can oops in
pseries_update_drconf_memory() if the source device tree has an
ibm,dynamic-memory-v2 property and the destination has a
ibm,dynamic_memory (v1) property. The notifier processes an "update"
for the ibm,dynamic-memory property but it's really an add in this
scenario. So make sure the old property object is there before
dereferencing it.

Fixes: 2b31e3aec1db ("powerpc/drmem: Add support for ibm, dynamic-memory-v2 property")
Cc: stable@vger.kernel.org # v4.16+
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/pseries/hvconsole: Fix stack overread via udbg
Daniel Axtens [Mon, 3 Jun 2019 06:56:57 +0000 (16:56 +1000)]
powerpc/pseries/hvconsole: Fix stack overread via udbg

While developing KASAN for 64-bit book3s, I hit the following stack
over-read.

It occurs because the hypercall to put characters onto the terminal
takes 2 longs (128 bits/16 bytes) of characters at a time, and so
hvc_put_chars() would unconditionally copy 16 bytes from the argument
buffer, regardless of supplied length. However, udbg_hvc_putc() can
call hvc_put_chars() with a single-byte buffer, leading to the error.

  ==================================================================
  BUG: KASAN: stack-out-of-bounds in hvc_put_chars+0xdc/0x110
  Read of size 8 at addr c0000000023e7a90 by task swapper/0

  CPU: 0 PID: 0 Comm: swapper Not tainted 5.2.0-rc2-next-20190528-02824-g048a6ab4835b #113
  Call Trace:
    dump_stack+0x104/0x154 (unreliable)
    print_address_description+0xa0/0x30c
    __kasan_report+0x20c/0x224
    kasan_report+0x18/0x30
    __asan_report_load8_noabort+0x24/0x40
    hvc_put_chars+0xdc/0x110
    hvterm_raw_put_chars+0x9c/0x110
    udbg_hvc_putc+0x154/0x200
    udbg_write+0xf0/0x240
    console_unlock+0x868/0xd30
    register_console+0x970/0xe90
    register_early_udbg_console+0xf8/0x114
    setup_arch+0x108/0x790
    start_kernel+0x104/0x784
    start_here_common+0x1c/0x534

  Memory state around the buggy address:
   c0000000023e7980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   c0000000023e7a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1
  >c0000000023e7a80: f1 f1 01 f2 f2 f2 00 00 00 00 00 00 00 00 00 00
                           ^
   c0000000023e7b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   c0000000023e7b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  ==================================================================

Document that a 16-byte buffer is requred, and provide it in udbg.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agoocxl: do not use C++ style comments in uapi header
Masahiro Yamada [Tue, 4 Jun 2019 11:16:32 +0000 (20:16 +0900)]
ocxl: do not use C++ style comments in uapi header

Linux kernel tolerates C++ style comments these days. Actually, the
SPDX License tags for .c files start with //.

On the other hand, uapi headers are written in more strict C, where
the C++ comment style is forbidden.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Frederic Barrat <fbarrat@linux.ibm.com>
Acked-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agoMerge branch 'context-id-fix' into fixes
Michael Ellerman [Thu, 13 Jun 2019 05:00:34 +0000 (15:00 +1000)]
Merge branch 'context-id-fix' into fixes

This merges a fix for a bug in our context id handling on 64-bit hash
CPUs.

The fix was written against v5.1 to ease backporting to stable
releases. Here we are merging it up to a v5.2-rc2 base, which involves
a bit of manual resolution.

It also adds a test case for the bug.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agoselftests/powerpc: Add test of fork with mapping above 512TB
Michael Ellerman [Thu, 13 Jun 2019 02:07:59 +0000 (12:07 +1000)]
selftests/powerpc: Add test of fork with mapping above 512TB

This tests that when a process with a mapping above 512TB forks we
correctly separate the parent and child address spaces. This exercises
the bug in the context id handling fixed in the previous commit.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm/64s/hash: Reallocate context ids on fork
Michael Ellerman [Wed, 12 Jun 2019 13:35:07 +0000 (23:35 +1000)]
powerpc/mm/64s/hash: Reallocate context ids on fork

When using the Hash Page Table (HPT) MMU, userspace memory mappings
are managed at two levels. Firstly in the Linux page tables, much like
other architectures, and secondly in the SLB (Segment Lookaside
Buffer) and HPT. It's the SLB and HPT that are actually used by the
hardware to do translations.

As part of the series adding support for 4PB user virtual address
space using the hash MMU, we added support for allocating multiple
"context ids" per process, one for each 512TB chunk of address space.
These are tracked in an array called extended_id in the mm_context_t
of a process that has done a mapping above 512TB.

If such a process forks (ie. clone(2) without CLONE_VM set) it's mm is
copied, including the mm_context_t, and then init_new_context() is
called to reinitialise parts of the mm_context_t as appropriate to
separate the address spaces of the two processes.

The key step in ensuring the two processes have separate address
spaces is to allocate a new context id for the process, this is done
at the beginning of hash__init_new_context(). If we didn't allocate a
new context id then the two processes would share mappings as far as
the SLB and HPT are concerned, even though their Linux page tables
would be separate.

For mappings above 512TB, which use the extended_id array, we
neglected to allocate new context ids on fork, meaning the parent and
child use the same ids and therefore share those mappings even though
they're supposed to be separate. This can lead to the parent seeing
writes done by the child, which is essentially memory corruption.

There is an additional exposure which is that if the child process
exits, all its context ids are freed, including the context ids that
are still in use by the parent for mappings above 512TB. One or more
of those ids can then be reallocated to a third process, that process
can then read/write to the parent's mappings above 512TB. Additionally
if the freed id is used for the third process's primary context id,
then the parent is able to read/write to the third process's mappings
*below* 512TB.

All of these are fundamental failures to enforce separation between
processes. The only mitigating factor is that the bug only occurs if a
process creates mappings above 512TB, and most applications still do
not create such mappings.

Only machines using the hash page table MMU are affected, eg. PowerPC
970 (G5), PA6T, Power5/6/7/8/9. By default Power9 bare metal machines
(powernv) use the Radix MMU and are not affected, unless the machine
has been explicitly booted in HPT mode (using disable_radix on the
kernel command line). KVM guests on Power9 may be affected if the host
or guest is configured to use the HPT MMU. LPARs under PowerVM on
Power9 are affected as they always use the HPT MMU. Kernels built with
PAGE_SIZE=4K are not affected.

The fix is relatively simple, we need to reallocate context ids for
all extended mappings on fork.

Fixes: f384796c40dc ("powerpc/mm: Add support for handling > 512TB address in SLB miss")
Cc: stable@vger.kernel.org # v4.17+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/32s: fix booting with CONFIG_PPC_EARLY_DEBUG_BOOTX
Christophe Leroy [Mon, 3 Jun 2019 13:00:51 +0000 (13:00 +0000)]
powerpc/32s: fix booting with CONFIG_PPC_EARLY_DEBUG_BOOTX

When booting through OF, setup_disp_bat() does nothing because
disp_BAT are not set. By change, it used to work because BOOTX
buffer is mapped 1:1 at address 0x81000000 by the bootloader, and
btext_setup_display() sets virt addr same as phys addr.

But since commit 215b823707ce ("powerpc/32s: set up an early static
hash table for KASAN."), a temporary page table overrides the
bootloader mapping.

This 0x81000000 is also problematic with the newly implemented
Kernel Userspace Access Protection (KUAP) because it is within user
address space.

This patch fixes those issues by properly setting disp_BAT through
a call to btext_prepare_BAT(), allowing setup_disp_bat() to
properly setup BAT3 for early bootx screen buffer access.

Reported-by: Mathieu Malaterre <malat@debian.org>
Fixes: 215b823707ce ("powerpc/32s: set up an early static hash table for KASAN.")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Tested-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/64s: __find_linux_pte() synchronization vs pmdp_invalidate()
Nicholas Piggin [Fri, 7 Jun 2019 03:56:36 +0000 (13:56 +1000)]
powerpc/64s: __find_linux_pte() synchronization vs pmdp_invalidate()

The change to pmdp_invalidate() to mark the pmd with _PAGE_INVALID
broke the synchronisation against lock free lookups,
__find_linux_pte()'s pmd_none() check no longer returns true for such
cases.

Fix this by adding a check for this condition as well.

Fixes: da7ad366b497 ("powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit")
Cc: stable@vger.kernel.org # v4.20+
Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/64s: Fix THP PMD collapse serialisation
Nicholas Piggin [Fri, 7 Jun 2019 03:56:35 +0000 (13:56 +1000)]
powerpc/64s: Fix THP PMD collapse serialisation

Commit 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian
conversion in pte helpers") changed the actual bitwise tests in
pte_access_permitted by using pte_write() and pte_present() helpers
rather than raw bitwise testing _PAGE_WRITE and _PAGE_PRESENT bits.

The pte_present() change now returns true for PTEs which are
!_PAGE_PRESENT and _PAGE_INVALID, which is the combination used by
pmdp_invalidate() to synchronize access from lock-free lookups.
pte_access_permitted() is used by pmd_access_permitted(), so allowing
GUP lock free access to proceed with such PTEs breaks this
synchronisation.

This bug has been observed on a host using the hash page table MMU,
with random crashes and corruption in guests, usually together with
bad PMD messages in the host.

Fix this by adding an explicit check in pmd_access_permitted(), and
documenting the condition explicitly.

The pte_write() change should be okay, and would prevent GUP from
falling back to the slow path when encountering savedwrite PTEs, which
matches what x86 (that does not implement savedwrite) does.

Fixes: 1b2443a547f9 ("powerpc/book3s64: Avoid multiple endian conversion in pte helpers")
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: Fix kexec failure on book3s/32
Christophe Leroy [Mon, 3 Jun 2019 08:20:28 +0000 (08:20 +0000)]
powerpc: Fix kexec failure on book3s/32

In the old days, _PAGE_EXEC didn't exist on 6xx aka book3s/32.
Therefore, allthough __mapin_ram_chunk() was already mapping kernel
text with PAGE_KERNEL_TEXT and the rest with PAGE_KERNEL, the entire
memory was executable. Part of the memory (first 512kbytes) was
mapped with BATs instead of page table, but it was also entirely
mapped as executable.

In commit 385e89d5b20f ("powerpc/mm: add exec protection on
powerpc 603"), we started adding exec protection to some 6xx, namely
the 603, for pages mapped via pagetables.

Then, in commit 63b2bc619565 ("powerpc/mm/32s: Use BATs for
STRICT_KERNEL_RWX"), the exec protection was extended to BAT mapped
memory, so that really only the kernel text could be executed.

The problem here is that kexec is based on copying some code into
upper part of memory then executing it from there in order to install
a fresh new kernel at its definitive location.

However, the code is position independant and first part of it is
just there to deactivate the MMU and jump to the second part. So it
is possible to run this first part inplace instead of running the
copy. Once the MMU is off, there is no protection anymore and the
second part of the code will just run as before.

Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Fixes: 63b2bc619565 ("powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX")
Cc: stable@vger.kernel.org # v5.1+
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/pseries: Fix xive=off command line
Greg Kurz [Wed, 15 May 2019 10:05:01 +0000 (12:05 +0200)]
powerpc/pseries: Fix xive=off command line

On POWER9, if the hypervisor supports XIVE exploitation mode, the
guest OS will unconditionally requests for the XIVE interrupt mode
even if XIVE was deactivated with the kernel command line xive=off.
Later on, when the spapr XIVE init code handles xive=off, it disables
XIVE and tries to fall back on the legacy mode XICS.

This discrepency causes a kernel panic because the hypervisor is
configured to provide the XIVE interrupt mode to the guest :

  kernel BUG at arch/powerpc/sysdev/xics/xics-common.c:135!
  ...
  NIP xics_smp_probe+0x38/0x98
  LR  xics_smp_probe+0x2c/0x98
  Call Trace:
    xics_smp_probe+0x2c/0x98 (unreliable)
    pSeries_smp_probe+0x40/0xa0
    smp_prepare_cpus+0x62c/0x6ec
    kernel_init_freeable+0x148/0x448
    kernel_init+0x2c/0x148
    ret_from_kernel_thread+0x5c/0x68

Look for xive=off during prom_init and don't ask for XIVE in this
case. One exception though: if the host only supports XIVE, we still
want to boot so we ignore xive=off.

Similarly, have the spapr XIVE init code to looking at the interrupt
mode negotiated during CAS, and ignore xive=off if the hypervisor only
supports XIVE.

Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
Cc: stable@vger.kernel.org # v4.20
Reported-by: Pavithra R. Prakash <pavrampu@in.ibm.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/powernv/npu: Fix reference leak
Greg Kurz [Fri, 19 Apr 2019 15:34:13 +0000 (17:34 +0200)]
powerpc/powernv/npu: Fix reference leak

Since 902bdc57451c, get_pci_dev() calls pci_get_domain_bus_and_slot(). This
has the effect of incrementing the reference count of the PCI device, as
explained in drivers/pci/search.c:

 * Given a PCI domain, bus, and slot/function number, the desired PCI
 * device is located in the list of PCI devices. If the device is
 * found, its reference count is increased and this function returns a
 * pointer to its data structure.  The caller must decrement the
 * reference count by calling pci_dev_put().  If no device is found,
 * %NULL is returned.

Nothing was done to call pci_dev_put() and the reference count of GPU and
NPU PCI devices rockets up.

A natural way to fix this would be to teach the callers about the change,
so that they call pci_dev_put() when done with the pointer. This turns
out to be quite intrusive, as it affects many paths in npu-dma.c,
pci-ioda.c and vfio_pci_nvlink2.c. Also, the issue appeared in 4.16 and
some affected code got moved around since then: it would be problematic
to backport the fix to stable releases.

All that code never cared for reference counting anyway. Call pci_dev_put()
from get_pci_dev() to revert to the previous behavior.

Fixes: 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn")
Cc: stable@vger.kernel.org # v4.16
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: Remove variable ‘path’ since not used
Mathieu Malaterre [Thu, 23 May 2019 10:25:20 +0000 (12:25 +0200)]
powerpc: Remove variable ‘path’ since not used

In commit eab00a208eb6 ("powerpc: Move `path` variable inside
DEBUG_PROM") DEBUG_PROM sentinels were added to silence a warning
(treated as error with W=1):

  arch/powerpc/kernel/prom_init.c:1388:8: error: variable ‘path’ set but not used [-Werror=unused-but-set-variable]

Rework the original patch and simplify the code, by removing the
variable ‘path’ completely. Fix line over 90 characters.

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/powernv: Show checkstop reason for NPU2 HMIs
Frederic Barrat [Thu, 23 May 2019 12:28:04 +0000 (14:28 +0200)]
powerpc/powernv: Show checkstop reason for NPU2 HMIs

If the kernel is notified of an HMI caused by the NPU2, it's currently
not being recognized and it logs the default message:

    Unknown Malfunction Alert of type 3

The NPU on Power 9 has 3 Fault Isolation Registers, so that's a lot of
possible causes, but we should at least log that it's an NPU problem
and report which FIR and which bit were raised if opal gave us the
information.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/powernv: Update firmware archaeology around OPAL_HANDLE_HMI
Stewart Smith [Fri, 24 May 2019 05:09:56 +0000 (15:09 +1000)]
powerpc/powernv: Update firmware archaeology around OPAL_HANDLE_HMI

The first machines to ship with OPAL firmware all got firmware updates
that have the new call, but just in case someone is foolish enough to
believe the first 4 months of firmware is the best, we keep this code
around.

Comment is updated to not refer to late 2014 as recent or the future.

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/pseries/dlpar: Fix a missing check in dlpar_parse_cc_property()
Gen Zhang [Sun, 26 May 2019 02:42:40 +0000 (10:42 +0800)]
powerpc/pseries/dlpar: Fix a missing check in dlpar_parse_cc_property()

In dlpar_parse_cc_property(), 'prop->name' is allocated by kstrdup().
kstrdup() may return NULL, so it should be checked and handle error.
And prop should be freed if 'prop->name' is NULL.

Signed-off-by: Gen Zhang <blackgod016574@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/lib: only build ldstfp.o when CONFIG_PPC_FPU is set
Christophe Leroy [Mon, 13 May 2019 10:00:15 +0000 (10:00 +0000)]
powerpc/lib: only build ldstfp.o when CONFIG_PPC_FPU is set

The entire code in ldstfp.o is enclosed into #ifdef CONFIG_PPC_FPU,
so there is no point in building it when this config is not selected.

Fixes: cd64d1697cf0 ("powerpc: mtmsrd not defined")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/lib: fix redundant inclusion of quad.o
Christophe Leroy [Mon, 13 May 2019 10:00:14 +0000 (10:00 +0000)]
powerpc/lib: fix redundant inclusion of quad.o

quad.o is only for PPC64, and already included in obj64-y,
so it doesn't have to be in obj-y

Fixes: 31bfdb036f12 ("powerpc: Use instruction emulation infrastructure to handle alignment faults")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agoocxl: Make ocxl_remove() static
YueHaibing [Sat, 4 May 2019 10:27:20 +0000 (18:27 +0800)]
ocxl: Make ocxl_remove() static

Fix sparse warning:

drivers/misc/ocxl/pci.c:44:6: warning:
 symbol 'ocxl_remove' was not declared. Should it be static?

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm: Make some symbols static that can be
YueHaibing [Sat, 4 May 2019 10:24:27 +0000 (18:24 +0800)]
powerpc/mm: Make some symbols static that can be

Noticed by sparse.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agoLinux 5.2-rc2 v5.2-rc2
Linus Torvalds [Sun, 26 May 2019 23:49:19 +0000 (16:49 -0700)]
Linux 5.2-rc2

5 years agoMerge tag 'trace-v5.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Sun, 26 May 2019 20:49:40 +0000 (13:49 -0700)]
Merge tag 'trace-v5.2-rc1-2' of git://git./linux/kernel/git/rostedt/linux-trace

Pull tracing warning fix from Steven Rostedt:
 "Make the GCC 9 warning for sub struct memset go away.

  GCC 9 now warns about calling memset() on partial structures when it
  goes across multiple fields. This adds a helper for the place in
  tracing that does this type of clearing of a structure"

* tag 'trace-v5.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Silence GCC 9 array bounds warning

5 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Sun, 26 May 2019 20:45:15 +0000 (13:45 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "The usual smattering of fixes and tunings that came in too late for
  the merge window, but should not wait four months before they appear
  in a release.

  I also travelled a bit more than usual in the first part of May, which
  didn't help with picking up patches and reports promptly"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (33 commits)
  KVM: x86: fix return value for reserved EFER
  tools/kvm_stat: fix fields filter for child events
  KVM: selftests: Wrap vcpu_nested_state_get/set functions with x86 guard
  kvm: selftests: aarch64: compile with warnings on
  kvm: selftests: aarch64: fix default vm mode
  kvm: selftests: aarch64: dirty_log_test: fix unaligned memslot size
  KVM: s390: fix memory slot handling for KVM_SET_USER_MEMORY_REGION
  KVM: x86/pmu: do not mask the value that is written to fixed PMUs
  KVM: x86/pmu: mask the result of rdpmc according to the width of the counters
  x86/kvm/pmu: Set AMD's virt PMU version to 1
  KVM: x86: do not spam dmesg with VMCS/VMCB dumps
  kvm: Check irqchip mode before assign irqfd
  kvm: svm/avic: fix off-by-one in checking host APIC ID
  KVM: selftests: do not blindly clobber registers in guest asm
  KVM: selftests: Remove duplicated TEST_ASSERT in hyperv_cpuid.c
  KVM: LAPIC: Expose per-vCPU timer_advance_ns to userspace
  KVM: LAPIC: Fix lapic_timer_advance_ns parameter overflow
  kvm: vmx: Fix -Wmissing-prototypes warnings
  KVM: nVMX: Fix using __this_cpu_read() in preemptible context
  kvm: fix compilation on s390
  ...

5 years agoMerge tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 26 May 2019 15:30:16 +0000 (08:30 -0700)]
Merge tag 'random_for_linus_stable' of git://git./linux/kernel/git/tytso/random

Pull /dev/random fix from Ted Ts'o:
 "Fix a soft lockup regression when reading from /dev/random in early
  boot"

* tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
  random: fix soft lockup when trying to read from an uninitialized blocking pool

5 years agorandom: fix soft lockup when trying to read from an uninitialized blocking pool
Theodore Ts'o [Wed, 22 May 2019 16:02:16 +0000 (12:02 -0400)]
random: fix soft lockup when trying to read from an uninitialized blocking pool

Fixes: eb9d1bf079bb: "random: only read from /dev/random after its pool has received 128 bits"
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
5 years agotracing: Silence GCC 9 array bounds warning
Miguel Ojeda [Thu, 23 May 2019 12:45:35 +0000 (14:45 +0200)]
tracing: Silence GCC 9 array bounds warning

Starting with GCC 9, -Warray-bounds detects cases when memset is called
starting on a member of a struct but the size to be cleared ends up
writing over further members.

Such a call happens in the trace code to clear, at once, all members
after and including `seq` on struct trace_iterator:

    In function 'memset',
        inlined from 'ftrace_dump' at kernel/trace/trace.c:8914:3:
    ./include/linux/string.h:344:9: warning: '__builtin_memset' offset
    [8505, 8560] from the object at 'iter' is out of the bounds of
    referenced subobject 'seq' with type 'struct trace_seq' at offset
    4368 [-Warray-bounds]
      344 |  return __builtin_memset(p, c, size);
          |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~

In order to avoid GCC complaining about it, we compute the address
ourselves by adding the offsetof distance instead of referring
directly to the member.

Since there are two places doing this clear (trace.c and trace_kdb.c),
take the chance to move the workaround into a single place in
the internal header.

Link: http://lkml.kernel.org/r/20190523124535.GA12931@gmail.com
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
[ Removed unnecessary parenthesis around "iter" ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
5 years agoMerge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 25 May 2019 22:03:12 +0000 (15:03 -0700)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Bug fixes (including a regression fix) for ext4"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix dcache lookup of !casefolded directories
  ext4: do not delete unlinked inode from orphan list on failed truncate
  ext4: wait for outstanding dio during truncate in nojournal mode
  ext4: don't perform block validity checks on the journal inode

5 years agoMerge tag 'libnvdimm-fixes-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 25 May 2019 17:11:23 +0000 (10:11 -0700)]
Merge tag 'libnvdimm-fixes-5.2-rc2' of git://git./linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm fixes from Dan Williams:

 - Fix a regression that disabled device-mapper dax support

 - Remove unnecessary hardened-user-copy overhead (>30%) for dax
   read(2)/write(2).

 - Fix some compilation warnings.

* tag 'libnvdimm-fixes-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  libnvdimm/pmem: Bypass CONFIG_HARDENED_USERCOPY overhead
  dax: Arrange for dax_supported check to span multiple devices
  libnvdimm: Fix compilation warnings with W=1

5 years agoMerge tag 'trace-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Sat, 25 May 2019 17:08:14 +0000 (10:08 -0700)]
Merge tag 'trace-v5.2-rc1' of git://git./linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Tom Zanussi sent me some small fixes and cleanups to the histogram
  code and I forgot to incorporate them.

  I also added a small clean up patch that was sent to me a while ago
  and I just noticed it"

* tag 'trace-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  kernel/trace/trace.h: Remove duplicate header of trace_seq.h
  tracing: Add a check_val() check before updating cond_snapshot() track_val
  tracing: Check keys for variable references in expressions too
  tracing: Prevent hist_field_var_ref() from accessing NULL tracing_map_elts

5 years agoext4: fix dcache lookup of !casefolded directories
Gabriel Krisman Bertazi [Sat, 25 May 2019 03:48:23 +0000 (23:48 -0400)]
ext4: fix dcache lookup of !casefolded directories

Found by visual inspection, this wasn't caught by my xfstest, since it's
effect is ignoring positive dentries in the cache the fallback just goes
to the disk.  it was introduced in the last iteration of the
case-insensitive patch.

d_compare should return 0 when the entries match, so make sure we are
correctly comparing the entire string if the encoding feature is set and
we are on a case-INsensitive directory.

Fixes: b886ee3e778e ("ext4: Support case-insensitive file name lookups")
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
5 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sat, 25 May 2019 00:30:28 +0000 (17:30 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "This is the same set of patches sent in the merge window as the final
  pull except that Martin's read only rework is replaced with a simple
  revert of the original change that caused the regression.

  Everything else is an obvious fix or small cleanup"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  Revert "scsi: sd: Keep disk read-only when re-reading partition"
  scsi: bnx2fc: fix incorrect cast to u64 on shift operation
  scsi: smartpqi: Reporting unhandled SCSI errors
  scsi: myrs: Fix uninitialized variable
  scsi: lpfc: Update lpfc version to 12.2.0.2
  scsi: lpfc: add check for loss of ndlp when sending RRQ
  scsi: lpfc: correct rcu unlock issue in lpfc_nvme_info_show
  scsi: lpfc: resolve lockdep warnings
  scsi: qedi: remove set but not used variables 'cdev' and 'udev'
  scsi: qedi: remove memset/memcpy to nfunc and use func instead
  scsi: qla2xxx: Add cleanup for PCI EEH recovery