linux-2.6-block.git
5 years agopowerpc/hugetlb: Don't do runtime allocation of 16G pages in LPAR configuration
Aneesh Kumar K.V [Fri, 22 Feb 2019 17:25:31 +0000 (22:55 +0530)]
powerpc/hugetlb: Don't do runtime allocation of 16G pages in LPAR configuration

We added runtime allocation of 16G pages in commit 4ae279c2c96a
("powerpc/mm/hugetlb: Allow runtime allocation of 16G.") That was done
to enable 16G allocation on PowerNV and KVM config. In case of KVM
config, we mostly would have the entire guest RAM backed by 16G
hugetlb pages for this to work. PAPR do support partial backing of
guest RAM with hugepages via ibm,expected#pages node of memory node in
the device tree. This means rest of the guest RAM won't be backed by
16G contiguous pages in the host and hence a hash page table insertion
can fail in such case.

An example error message will look like

  hash-mmu: mm: Hashing failure ! EA=0x7efc00000000 access=0x8000000000000006 current=readback
  hash-mmu:     trap=0x300 vsid=0x67af789 ssize=1 base psize=14 psize 14 pte=0xc000000400000386
  readback[12260]: unhandled signal 7 at 00007efc00000000 nip 00000000100012d0 lr 000000001000127c code 2

This patch address that by preventing runtime allocation of 16G
hugepages in LPAR config. To allocate 16G hugetlb one need to kernel
command line hugepagesz=16G hugepages=<number of 16G pages>

With radix translation mode we don't run into this issue.

This change will prevent runtime allocation of 16G hugetlb pages on
kvm with hash translation mode. However, with the current upstream it
was observed that 16G hugetlbfs backed guest doesn't boot at all.

We observe boot failure with the below message:
  [131354.647546] KVM: map_vrma at 0 failed, ret=-4

That means this patch is not resulting in an observable regression.
Once we fix the boot issue with 16G hugetlb backed memory, we need to
use ibm,expected#pages memory node attribute to indicate 16G page
reservation to the guest. This will also enable partial backing of
guest RAM with 16G pages.

Fixes: 4ae279c2c96a ("powerpc/mm/hugetlb: Allow runtime allocation of 16G.")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/32: Clear on-stack exception marker upon exception return
Christophe Leroy [Wed, 27 Feb 2019 11:45:30 +0000 (11:45 +0000)]
powerpc/32: Clear on-stack exception marker upon exception return

Clear the on-stack STACK_FRAME_REGS_MARKER on exception exit in order
to avoid confusing stacktrace like the one below.

  Call Trace:
  [c0e9dca0] [c01c42a0] print_address_description+0x64/0x2bc (unreliable)
  [c0e9dcd0] [c01c4684] kasan_report+0xfc/0x180
  [c0e9dd10] [c0895130] memchr+0x24/0x74
  [c0e9dd30] [c00a9e38] msg_print_text+0x124/0x574
  [c0e9dde0] [c00ab710] console_unlock+0x114/0x4f8
  [c0e9de40] [c00adc60] vprintk_emit+0x188/0x1c4
  --- interrupt: c0e9df00 at 0x400f330
      LR = init_stack+0x1f00/0x2000
  [c0e9de80] [c00ae3c4] printk+0xa8/0xcc (unreliable)
  [c0e9df20] [c0c27e44] early_irq_init+0x38/0x108
  [c0e9df50] [c0c15434] start_kernel+0x310/0x488
  [c0e9dff0] [00003484] 0x3484

With this patch the trace becomes:

  Call Trace:
  [c0e9dca0] [c01c42c0] print_address_description+0x64/0x2bc (unreliable)
  [c0e9dcd0] [c01c46a4] kasan_report+0xfc/0x180
  [c0e9dd10] [c0895150] memchr+0x24/0x74
  [c0e9dd30] [c00a9e58] msg_print_text+0x124/0x574
  [c0e9dde0] [c00ab730] console_unlock+0x114/0x4f8
  [c0e9de40] [c00adc80] vprintk_emit+0x188/0x1c4
  [c0e9de80] [c00ae3e4] printk+0xa8/0xcc
  [c0e9df20] [c0c27e44] early_irq_init+0x38/0x108
  [c0e9df50] [c0c15434] start_kernel+0x310/0x488
  [c0e9dff0] [00003484] 0x3484

Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: Remove export of save_stack_trace_tsk_reliable()
Joe Lawrence [Fri, 1 Mar 2019 19:17:21 +0000 (14:17 -0500)]
powerpc: Remove export of save_stack_trace_tsk_reliable()

As tglx points out, there are no in-tree module users of
save_stack_trace_tsk_reliable() and its x86 counterpart is not
exported, so remove the powerpc symbol export.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm: fix "section_base" set but not used
Qian Cai [Fri, 1 Mar 2019 14:20:40 +0000 (09:20 -0500)]
powerpc/mm: fix "section_base" set but not used

The commit 24b6d4164348 ("mm: pass the vmem_altmap to vmemmap_free")
removed a line in vmemmap_free(),

altmap = to_vmem_altmap((unsigned long) section_base);

but left a variable no longer used.

arch/powerpc/mm/init_64.c: In function 'vmemmap_free':
arch/powerpc/mm/init_64.c:277:16: error: variable 'section_base' set but
not used [-Werror=unused-but-set-variable]

Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm: Fix "sz" set but not used warning
Qian Cai [Thu, 28 Feb 2019 02:35:05 +0000 (21:35 -0500)]
powerpc/mm: Fix "sz" set but not used warning

Fix compiler warning:
  arch/powerpc/mm/hugetlbpage-hash64.c: In function '__hash_page_huge':
  arch/powerpc/mm/hugetlbpage-hash64.c:29:28: warning: variable 'sz' set
  but not used [-Wunused-but-set-variable]

mpe: The last usage of sz was removed in 0895ecda7942 ("powerpc/mm:
Bring hugepage PTE accessor functions back into sync with normal
accessors").

Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm: Check secondary hash page table
Rashmica Gupta [Tue, 12 Feb 2019 23:29:49 +0000 (10:29 +1100)]
powerpc/mm: Check secondary hash page table

We were always calling base_hpte_find() with primary = true,
even when we wanted to check the secondary table.

mpe: I broke this when refactoring Rashmica's original patch.

Fixes: 1515ab932156 ("powerpc/mm: Dump hash table")
Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: remove nargs from __SYSCALL
Firoz Khan [Wed, 2 Jan 2019 15:02:03 +0000 (20:32 +0530)]
powerpc: remove nargs from __SYSCALL

The __SYSCALL macro's arguments are system call number,
system call entry name and number of arguments for the
system call.

Argument- nargs in __SYSCALL(nr, entry, nargs) is neither
calculated nor used anywhere. So it would be better to
keep the implementaion as  __SYSCALL(nr, entry). This will
unifies the implementation with some other architetures
too.

Signed-off-by: Firoz Khan <firoz.khan@linaro.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agoMerge branch 'topic/ppc-kvm' into next
Michael Ellerman [Sat, 2 Mar 2019 03:42:28 +0000 (14:42 +1100)]
Merge branch 'topic/ppc-kvm' into next

Merge another commit in the topic/ppc-kvm branch we're sharing with
kvm-ppc.

5 years agopowerpc/64s: Fix unrelocated interrupt trampoline address test
Nicholas Piggin [Fri, 1 Mar 2019 12:56:36 +0000 (22:56 +1000)]
powerpc/64s: Fix unrelocated interrupt trampoline address test

The recent commit got this test wrong, it declared the assembler
symbols the wrong way, and also used the wrong symbol name
(xxx_start rather than start_xxx, see asm/head-64.h).

Fixes: ccd477028a ("powerpc/64s: Fix HV NMI vs HV interrupt recoverability test")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/powernv/ioda: Fix locked_vm counting for memory used by IOMMU tables
Alexey Kardashevskiy [Wed, 13 Feb 2019 03:38:18 +0000 (14:38 +1100)]
powerpc/powernv/ioda: Fix locked_vm counting for memory used by IOMMU tables

We store 2 multilevel tables in iommu_table - one for the hardware and
one with the corresponding userspace addresses. Before allocating
the tables, the iommu_table_group_ops::get_table_size() hook returns
the combined size of the two and VFIO SPAPR TCE IOMMU driver adjusts
the locked_vm counter correctly. When the table is actually allocated,
the amount of allocated memory is stored in iommu_table::it_allocated_size
and used to decrement the locked_vm counter when we release the memory
used by the table; .get_table_size() and .create_table() calculate it
independently but the result is expected to be the same.

However the allocator does not add the userspace table size to
.it_allocated_size so when we destroy the table because of VFIO PCI
unplug (i.e. VFIO container is gone but the userspace keeps running),
we decrement locked_vm by just a half of size of memory we are
releasing.

To make things worse, since we enabled on-demand allocation of
indirect levels, it_allocated_size contains only the amount of memory
actually allocated at the table creation time which can just be a
fraction. It is not a problem with incrementing locked_vm (as
get_table_size() value is used) but it is with decrementing.

As the result, we leak locked_vm and may not be able to allocate more
IOMMU tables after few iterations of hotplug/unplug.

This sets it_allocated_size in the pnv_pci_ioda2_ops::create_table()
hook to what pnv_pci_ioda2_get_table_size() returns so from now on we
have a single place which calculates the maximum memory a table can
occupy. The original meaning of it_allocated_size is somewhat lost now
though.

We do not ditch it_allocated_size whatsoever here and we do not call
get_table_size() from vfio_iommu_spapr_tce.c when decrementing
locked_vm as we may have multiple IOMMU groups per container and even
though they all are supposed to have the same get_table_size()
implementation, there is a small chance for failure or confusion.

Fixes: 090bad39b237 ("powerpc/powernv: Add indirect levels to it_userspace")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/fsl: Fix the flush of branch predictor.
Christophe Leroy [Tue, 26 Feb 2019 18:18:48 +0000 (18:18 +0000)]
powerpc/fsl: Fix the flush of branch predictor.

The commit identified below adds MC_BTB_FLUSH macro only when
CONFIG_PPC_FSL_BOOK3E is defined. This results in the following error
on some configs (seen several times with kisskb randconfig_defconfig)

arch/powerpc/kernel/exceptions-64e.S:576: Error: Unrecognized opcode: `mc_btb_flush'
make[3]: *** [scripts/Makefile.build:367: arch/powerpc/kernel/exceptions-64e.o] Error 1
make[2]: *** [scripts/Makefile.build:492: arch/powerpc/kernel] Error 2
make[1]: *** [Makefile:1043: arch/powerpc] Error 2
make: *** [Makefile:152: sub-make] Error 2

This patch adds a blank definition of MC_BTB_FLUSH for other cases.

Fixes: 10c5e83afd4a ("powerpc/fsl: Flush the branch predictor at each kernel entry (64bit)")
Cc: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/powernv: Make opal log only readable by root
Jordan Niethe [Wed, 27 Feb 2019 03:02:29 +0000 (14:02 +1100)]
powerpc/powernv: Make opal log only readable by root

Currently the opal log is globally readable. It is kernel policy to
limit the visibility of physical addresses / kernel pointers to root.
Given this and the fact the opal log may contain this information it
would be better to limit the readability to root.

Fixes: bfc36894a48b ("powerpc/powernv: Add OPAL message log interface")
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Reviewed-by: Stewart Smith <stewart@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/xmon: Fix opcode being uninitialized in print_insn_powerpc
Nathan Chancellor [Tue, 26 Feb 2019 05:38:55 +0000 (22:38 -0700)]
powerpc/xmon: Fix opcode being uninitialized in print_insn_powerpc

When building with -Wsometimes-uninitialized, Clang warns:

  arch/powerpc/xmon/ppc-dis.c:157:7: warning: variable 'opcode' is used
  uninitialized whenever 'if' condition is false
  [-Wsometimes-uninitialized]
    if (cpu_has_feature(CPU_FTRS_POWER9))
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  arch/powerpc/xmon/ppc-dis.c:167:7: note: uninitialized use occurs here
    if (opcode == NULL)
        ^~~~~~
  arch/powerpc/xmon/ppc-dis.c:157:3: note: remove the 'if' if its
  condition is always true
    if (cpu_has_feature(CPU_FTRS_POWER9))
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  arch/powerpc/xmon/ppc-dis.c:132:38: note: initialize the variable
  'opcode' to silence this warning
    const struct powerpc_opcode *opcode;
                                       ^
                                        = NULL
  1 warning generated.

This warning seems to make no sense on the surface because opcode is set
to NULL right below this statement. However, there is a comma instead of
semicolon to end the dialect assignment, meaning that the opcode
assignment only happens in the if statement. Properly terminate that
line so that Clang no longer warns.

Fixes: 5b102782c7f4 ("powerpc/xmon: Enable disassembly files (compilation changes)")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/powernv: move OPAL call wrapper tracing and interrupt handling to C
Nicholas Piggin [Tue, 26 Feb 2019 09:30:35 +0000 (19:30 +1000)]
powerpc/powernv: move OPAL call wrapper tracing and interrupt handling to C

The OPAL call wrapper gets interrupt disabling wrong. It disables
interrupts just by clearing MSR[EE], which has two problems:

- It doesn't call into the IRQ tracing subsystem, which means tracing
  across OPAL calls does not always notice IRQs have been disabled.

- It doesn't go through the IRQ soft-mask code, which causes a minor
  bug. MSR[EE] can not be restored by saving the MSR then clearing
  MSR[EE], because a racing interrupt while soft-masked could clear
  MSR[EE] between the two steps. This can cause MSR[EE] to be
  incorrectly enabled when the OPAL call returns. Fortunately that
  should only result in another masked interrupt being taken to
  disable MSR[EE] again, but it's a bit sloppy.

The existing code also saves MSR to PACA, which is not re-entrant if
there is a nested OPAL call from different MSR contexts, which can
happen these days with SRESET interrupts on bare metal.

To fix these issues, move the tracing and IRQ handling code to C, and
call into asm just for the low level call when everything is ready to
go. Save the MSR on stack rather than PACA.

Performance cost is kept to a minimum with a few optimisations:

- The endian switch upon return is combined with the MSR restore,
  which avoids an expensive context synchronizing operation for LE
  kernels. This makes up for the additional mtmsrd to enable
  interrupts with local_irq_enable().

- blr is now used to return from the opal_* functions that are called
  as C functions, to avoid link stack corruption. This requires a
  skiboot fix as well to keep the call stack balanced.

A NULL call is more costly after this, (410ns->430ns on POWER9), but
OPAL calls are generally not performance critical at this scale.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/64s: Fix data interrupts vs d-side MCE reentrancy
Nicholas Piggin [Tue, 26 Feb 2019 08:51:10 +0000 (18:51 +1000)]
powerpc/64s: Fix data interrupts vs d-side MCE reentrancy

Handlers for interrupts that set DAR / DSISR, set MSR[RI] before those
SPRs are read. If a d-side machine check hits in this window, DAR /
DSISR will be clobbered silently, leading to random corruption.

Fix this by having handlers save those registers before setting MSR[RI].

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/64s: Prepare to handle data interrupts vs d-side MCE reentrancy
Nicholas Piggin [Tue, 26 Feb 2019 08:51:09 +0000 (18:51 +1000)]
powerpc/64s: Prepare to handle data interrupts vs d-side MCE reentrancy

A subsequent fix for data interrupts (those that set DAR / DSISR)
requires some interrupt macros to be open-coded, and also requires
the 0x300 interrupt handler to be moved out-of-line.

This patch does that without changing behaviour, which makes the later
fix a smaller change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/64s: system reset interrupt preserve HSRRs
Nicholas Piggin [Tue, 26 Feb 2019 08:51:08 +0000 (18:51 +1000)]
powerpc/64s: system reset interrupt preserve HSRRs

Code that uses HSRR registers is not required to clear MSR[RI] by
convention, however the system reset NMI itself may use HSRR
registers (e.g., to call OPAL) and clobber them.

Rather than introduce the requirement to clear RI in order to use
HSRRs, have system reset interrupt save and restore HSRRs.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/64s: Fix HV NMI vs HV interrupt recoverability test
Nicholas Piggin [Tue, 26 Feb 2019 08:51:07 +0000 (18:51 +1000)]
powerpc/64s: Fix HV NMI vs HV interrupt recoverability test

HV interrupts that use HSRR registers do not enter with MSR[RI] clear,
but their entry code is not recoverable vs NMI, due to shared use of
HSPRG1 as a scratch register to save r13.

This means that a system reset or machine check that hits in HSRR
interrupt entry can cause r13 to be silently corrupted.

Fix this by marking NMIs non-recoverable if they land in HV interrupt
ranges.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm/hash: Handle mmap_min_addr correctly in get_unmapped_area topdown search
Aneesh Kumar K.V [Tue, 26 Feb 2019 04:39:35 +0000 (10:09 +0530)]
powerpc/mm/hash: Handle mmap_min_addr correctly in get_unmapped_area topdown search

When doing top-down search the low_limit is not PAGE_SIZE but rather
max(PAGE_SIZE, mmap_min_addr). This handle cases in which mmap_min_addr >
PAGE_SIZE.

Fixes: fba2369e6ceb ("mm: use vm_unmapped_area() on powerpc architecture")
Reviewed-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/hugetlb: Handle mmap_min_addr correctly in get_unmapped_area callback
Aneesh Kumar K.V [Tue, 26 Feb 2019 04:39:34 +0000 (10:09 +0530)]
powerpc/hugetlb: Handle mmap_min_addr correctly in get_unmapped_area callback

After we ALIGN up the address we need to make sure we didn't overflow
and resulted in zero address. In that case, we need to make sure that
the returned address is greater than mmap_min_addr.

This fixes selftest va_128TBswitch --run-hugetlb reporting failures when
run as non root user for

mmap(-1, MAP_HUGETLB)

The bug is that a non-root user requesting address -1 will be given address 0
which will then fail, whereas they should have been given something else that
would have succeeded.

We also avoid the first mmap(-1, MAP_HUGETLB) returning NULL address as mmap address
with this change. So we think this is not a security issue, because it only affects
whether we choose an address below mmap_min_addr, not whether we
actually allow that address to be mapped. ie. there are existing capability
checks to prevent a user mapping below mmap_min_addr and those will still be
honoured even without this fix.

Fixes: 484837601d4d ("powerpc/mm: Add radix support for hugetlb")
Reviewed-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agoselftests/powerpc: Remove duplicate header
Michael Ellerman [Tue, 26 Feb 2019 01:32:06 +0000 (12:32 +1100)]
selftests/powerpc: Remove duplicate header

Remove duplicate headers which are included twice.

Signed-off-by: Sabyasachi Gupta <sabyasachi.linux@gmail.com>
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc sstep: Add support for modsd, modud instructions
Sandipan Das [Fri, 22 Feb 2019 06:53:32 +0000 (12:23 +0530)]
powerpc sstep: Add support for modsd, modud instructions

This adds emulation support for the following integer instructions:
  * Modulo Signed Doubleword (modsd)
  * Modulo Unsigned Doubleword (modud)

Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc sstep: Add support for modsw, moduw instructions
PrasannaKumar Muralidharan [Fri, 22 Feb 2019 06:53:31 +0000 (12:23 +0530)]
powerpc sstep: Add support for modsw, moduw instructions

This adds emulation support for the following integer instructions:
  * Modulo Signed Word (modsw)
  * Modulo Unsigned Word (moduw)

Signed-off-by: PrasannaKumar Muralidharan <prasannatsmkumar@gmail.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc sstep: Add support for extswsli instruction
Sandipan Das [Fri, 22 Feb 2019 06:53:30 +0000 (12:23 +0530)]
powerpc sstep: Add support for extswsli instruction

This adds emulation support for the following integer instructions:
  * Extend-Sign Word and Shift Left Immediate (extswsli[.])

Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc sstep: Add support for cnttzw, cnttzd instructions
Sandipan Das [Fri, 22 Feb 2019 06:53:29 +0000 (12:23 +0530)]
powerpc sstep: Add support for cnttzw, cnttzd instructions

This adds emulation support for the following integer instructions:
  * Count Trailing Zeros Word (cnttzw[.])
  * Count Trailing Zeros Doubleword (cnttzd[.])

Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: sstep: Add support for darn instruction
Sandipan Das [Fri, 22 Feb 2019 06:53:28 +0000 (12:23 +0530)]
powerpc: sstep: Add support for darn instruction

This adds emulation support for the following integer instructions:
  * Deliver A Random Number (darn)

As suggested by Michael, this uses a raw .long for specifying the
instruction word when using inline assembly to retain compatibility
with older binutils.

Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: sstep: Add support for maddhd, maddhdu, maddld instructions
Sandipan Das [Fri, 22 Feb 2019 06:53:27 +0000 (12:23 +0530)]
powerpc: sstep: Add support for maddhd, maddhdu, maddld instructions

This adds emulation support for the following integer instructions:
  * Multiply-Add High Doubleword (maddhd)
  * Multiply-Add High Doubleword Unsigned (maddhdu)
  * Multiply-Add Low Doubleword (maddld)

As suggested by Michael, this uses a raw .long for specifying the
instruction word when using inline assembly to retain compatibility
with older binutils.

Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: clean stack pointers naming
Christophe Leroy [Sat, 12 Jan 2019 09:55:53 +0000 (09:55 +0000)]
powerpc: clean stack pointers naming

Some stack pointers used to also be thread_info pointers
and were called tp. Now that they are only stack pointers,
rename them sp.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/64: Replace CURRENT_THREAD_INFO with PACA_THREAD_INFO
Christophe Leroy [Sat, 12 Jan 2019 09:55:50 +0000 (09:55 +0000)]
powerpc/64: Replace CURRENT_THREAD_INFO with PACA_THREAD_INFO

Now that current_thread_info is located at the beginning of 'current'
task struct, CURRENT_THREAD_INFO macro is not really needed any more.

This patch replaces it by loads of the value at PACA_THREAD_INFO(r13).

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Add PACA_THREAD_INFO rather than using PACACURRENT]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU
Christophe Leroy [Thu, 31 Jan 2019 10:09:04 +0000 (10:09 +0000)]
powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU

Now that thread_info is similar to task_struct, its address is in r2
so CURRENT_THREAD_INFO() macro is useless. This patch removes it.

This patch also moves the 'tovirt(r2, r2)' down just before the
reactivation of MMU translation, so that we keep the physical address
of 'current' in r2 until then. It avoids a few calls to tophys().

At the same time, as the 'cpu' field is not anymore in thread_info,
TI_CPU is renamed TASK_CPU by this patch.

It also allows to get rid of a couple of
'#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE' as ACCOUNT_CPU_USER_ENTRY()
and ACCOUNT_CPU_USER_EXIT() are empty when
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not defined.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Fix a missed conversion of TI_CPU idle_6xx.S]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: 'current_set' is now a table of task_struct pointers
Christophe Leroy [Thu, 31 Jan 2019 10:09:02 +0000 (10:09 +0000)]
powerpc: 'current_set' is now a table of task_struct pointers

The table of pointers 'current_set' has been used for retrieving
the stack and current. They used to be thread_info pointers as
they were pointing to the stack and current was taken from the
'task' field of the thread_info.

Now, the pointers of 'current_set' table are now both pointers
to task_struct and pointers to thread_info.

As they are used to get current, and the stack pointer is
retrieved from current's stack field, this patch changes
their type to task_struct, and renames secondary_ti to
secondary_current.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: regain entire stack space
Christophe Leroy [Thu, 31 Jan 2019 10:09:00 +0000 (10:09 +0000)]
powerpc: regain entire stack space

thread_info is not anymore in the stack, so the entire stack
can now be used.

There is also no risk anymore of corrupting task_cpu(p) with a
stack overflow so the patch removes the test.

When doing this, an explicit test for NULL stack pointer is
needed in validate_sp() as it is not anymore implicitely covered
by the sizeof(thread_info) gap.

In the meantime, with the previous patch all pointers to the stacks
are not anymore pointers to thread_info so this patch changes them
to void*

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: Activate CONFIG_THREAD_INFO_IN_TASK
Christophe Leroy [Thu, 31 Jan 2019 10:08:58 +0000 (10:08 +0000)]
powerpc: Activate CONFIG_THREAD_INFO_IN_TASK

This patch activates CONFIG_THREAD_INFO_IN_TASK which
moves the thread_info into task_struct.

Moving thread_info into task_struct has the following advantages:
  - It protects thread_info from corruption in the case of stack
    overflows.
  - Its address is harder to determine if stack addresses are leaked,
    making a number of attacks more difficult.

This has the following consequences:
  - thread_info is now located at the beginning of task_struct.
  - The 'cpu' field is now in task_struct, and only exists when
    CONFIG_SMP is active.
  - thread_info doesn't have anymore the 'task' field.

This patch:
  - Removes all recopy of thread_info struct when the stack changes.
  - Changes the CURRENT_THREAD_INFO() macro to point to current.
  - Selects CONFIG_THREAD_INFO_IN_TASK.
  - Modifies raw_smp_processor_id() to get ->cpu from current without
    including linux/sched.h to avoid circular inclusion and without
    including asm/asm-offsets.h to avoid symbol names duplication
    between ASM constants and C constants.
  - Modifies klp_init_thread_info() to take a task_struct pointer
    argument.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Add task_stack.h to livepatch.h to fix build fails]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/idle/6xx: Use r1 with CURRENT_THREAD_INFO()
Christophe Leroy [Mon, 4 Feb 2019 11:16:48 +0000 (22:16 +1100)]
powerpc/idle/6xx: Use r1 with CURRENT_THREAD_INFO()

Make sure CURRENT_THREAD_INFO() is used with r1 which is the virtual
address of the stack, in order to ease the switch to r2 when we enable
THREAD_INFO_IN_TASK, as we have no register having the phys address of
current.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: Use task_stack_page() in current_pt_regs()
Christophe Leroy [Fri, 18 Jan 2019 07:40:34 +0000 (18:40 +1100)]
powerpc: Use task_stack_page() in current_pt_regs()

Change current_pt_regs() to use task_stack_page() rather than
current_thread_info() so that it keeps working once we enable
THREAD_INFO_IN_TASK.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Split out of large patch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: Use linux/thread_info.h in processor.h
Christophe Leroy [Thu, 17 Jan 2019 12:27:28 +0000 (23:27 +1100)]
powerpc: Use linux/thread_info.h in processor.h

When we enable THREAD_INFO_IN_TASK we will remove our definition of
current_thread_info(). Instead it will come from linux/thread_info.h

So switch processor.h to include the latter, so that it can continue
to find current_thread_info().

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: Use sizeof(struct thread_info) in INIT_SP_LIMIT
Christophe Leroy [Thu, 17 Jan 2019 12:27:40 +0000 (23:27 +1100)]
powerpc: Use sizeof(struct thread_info) in INIT_SP_LIMIT

Currently INIT_SP_LIMIT uses sizeof(init_thread_info), but that symbol
won't exist when we enable THREAD_INFO_IN_TASK. So just use the sizeof
the type which is the same value but will continue to work.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/64: Use task_stack_page() to initialise paca->kstack
Christophe Leroy [Thu, 17 Jan 2019 12:26:56 +0000 (23:26 +1100)]
powerpc/64: Use task_stack_page() to initialise paca->kstack

Rather than using the thread info use task_stack_page() to initialise
paca->kstack, that way it will work with THREAD_INFO_IN_TASK.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: Update comments in preparation for THREAD_INFO_IN_TASK
Christophe Leroy [Thu, 17 Jan 2019 12:25:53 +0000 (23:25 +1100)]
powerpc: Update comments in preparation for THREAD_INFO_IN_TASK

Update a few comments that talk about current_thread_info() in
preparation for THREAD_INFO_IN_TASK.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: Replace current_thread_info()->task with current
Christophe Leroy [Thu, 17 Jan 2019 12:25:12 +0000 (23:25 +1100)]
powerpc: Replace current_thread_info()->task with current

We have a few places that use current_thread_info()->task to access
current. This won't work with THREAD_INFO_IN_TASK so fix them now.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: Don't use CURRENT_THREAD_INFO to find the stack
Christophe Leroy [Thu, 17 Jan 2019 12:23:57 +0000 (23:23 +1100)]
powerpc: Don't use CURRENT_THREAD_INFO to find the stack

A few places use CURRENT_THREAD_INFO, or the C version, to find the
stack. This will no longer work with THREAD_INFO_IN_TASK so change
them to find the stack in other ways.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: call_do_[soft]irq() takes a pointer to the stack
Christophe Leroy [Thu, 17 Jan 2019 12:17:56 +0000 (23:17 +1100)]
powerpc: call_do_[soft]irq() takes a pointer to the stack

The purpose of the pointer given to call_do_softirq() and
call_do_irq() is to point the new stack. Currently that's the same
thing as the thread_info, but won't be with THREAD_INFO_IN_TASK.

So change the parameter to void* and rename it 'sp'.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: Rename THREAD_INFO to TASK_STACK
Christophe Leroy [Thu, 31 Jan 2019 10:08:54 +0000 (10:08 +0000)]
powerpc: Rename THREAD_INFO to TASK_STACK

This patch renames THREAD_INFO to TASK_STACK, because it is in fact
the offset of the pointer to the stack in task_struct so this pointer
will not be impacted by the move of THREAD_INFO.

Also make it available on 64-bit, as we'll need it there when we
activate THREAD_INFO_IN_TASK.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Make available on 64-bit]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: prep stack walkers for THREAD_INFO_IN_TASK
Christophe Leroy [Thu, 31 Jan 2019 10:08:52 +0000 (10:08 +0000)]
powerpc: prep stack walkers for THREAD_INFO_IN_TASK

[text copied from commit 9bbd4c56b0b6
("arm64: prep stack walkers for THREAD_INFO_IN_TASK")]

When CONFIG_THREAD_INFO_IN_TASK is selected, task stacks may be freed
before a task is destroyed. To account for this, the stacks are
refcounted, and when manipulating the stack of another task, it is
necessary to get/put the stack to ensure it isn't freed and/or re-used
while we do so.

This patch reworks the powerpc stack walking code to account for this.
When CONFIG_THREAD_INFO_IN_TASK is not selected these perform no
refcounting, and this should only be a structural change that does not
affect behaviour.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Move try_get_task_stack() below tsk == NULL check in show_stack()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: Only use task_struct 'cpu' field on SMP
Christophe Leroy [Thu, 31 Jan 2019 10:08:50 +0000 (10:08 +0000)]
powerpc: Only use task_struct 'cpu' field on SMP

When moving to CONFIG_THREAD_INFO_IN_TASK, the thread_info 'cpu' field
gets moved into task_struct and only defined when CONFIG_SMP is set.

This patch ensures that TI_CPU is only used when CONFIG_SMP is set and
that task_struct 'cpu' field is not used directly out of SMP code.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: Avoid circular header inclusion in mmu-hash.h
Christophe Leroy [Thu, 31 Jan 2019 10:08:48 +0000 (10:08 +0000)]
powerpc: Avoid circular header inclusion in mmu-hash.h

When activating CONFIG_THREAD_INFO_IN_TASK, linux/sched.h includes
asm/current.h. This generates a circular dependency. To avoid that,
asm/processor.h shall not be included in mmu-hash.h.

In order to do that, this patch moves into a new header called
asm/task_size_64/32.h all the TASK_SIZE related constants, which can
then be included in mmu-hash.h directly.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Split out all the TASK_SIZE constants not just 64-bit ones]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/irq: use memblock functions returning virtual address
Christophe Leroy [Thu, 31 Jan 2019 10:08:44 +0000 (10:08 +0000)]
powerpc/irq: use memblock functions returning virtual address

Since only the virtual address of allocated blocks is used,
lets use functions returning directly virtual address.

Those functions have the advantage of also zeroing the block.

Suggested-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/64: Simplify __secondary_start paca->kstack handling
Michael Ellerman [Fri, 22 Feb 2019 12:58:21 +0000 (23:58 +1100)]
powerpc/64: Simplify __secondary_start paca->kstack handling

In __secondary_start() we load the thread_info of the idle task of the
secondary CPU from current_set[cpu], and then convert it into a stack
pointer before storing that back to paca->kstack.

As pointed out in commit f761622e5943 ("powerpc: Initialise
paca->kstack before early_setup_secondary") it's important that we
initialise paca->kstack before calling the MMU setup code, in
particular slb_initialize(), because it will bolt the SLB entry for
the kstack into the SLB.

However we have already setup paca->kstack in cpu_idle_thread_init(),
since commit 3b5750644b2f ("[POWERPC] Bolt in SLB entry for kernel
stack on secondary cpus") (May 2008).

It's also in cpu_idle_thread_init() that we initialise current_set[cpu]
with the thread_info pointer, so there is no issue of the timing being
different between the two.

Therefore the initialisation of paca->kstack in __setup_secondary() is
completely redundant, so remove it.

This has the added benefit of removing code that runs in real mode,
and is therefore restricted by the RMO, and so opens the way for us to
enable THREAD_INFO_IN_TASK.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/64s: Remove MSR_RI optimisation in system_call_exit()
Michael Ellerman [Thu, 29 Nov 2018 06:42:24 +0000 (17:42 +1100)]
powerpc/64s: Remove MSR_RI optimisation in system_call_exit()

Currently in system_call_exit() we have an optimisation where we
disable MSR_RI (recoverable interrupt) and MSR_EE (external interrupt
enable) in a single mtmsrd instruction.

Unfortunately this will no longer work with THREAD_INFO_IN_TASK,
because then the load of TI_FLAGS might fault and faulting with MSR_RI
clear is treated as an unrecoverable exception which leads to a
panic().

So change the code to only clear MSR_EE prior to loading TI_FLAGS,
leaving the clear of MSR_RI until later. We have some latitude in
where do the clear of MSR_RI. A bit of experimentation has shown that
this location gives the least slow down.

This still causes a noticeable slow down in our null_syscall
performance. On a Power9 DD2.2:

  Before        After         Delta     Delta %
  955 cycles    999 cycles    -44 -4.6%

On the plus side this does simplify the code somewhat, because we
don't have to reenable MSR_RI on the restore_math() or
syscall_exit_work() paths which was necessitated previously by the
optimisation.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: Enable kcov
Andrew Donnellan [Fri, 22 Feb 2019 00:40:46 +0000 (11:40 +1100)]
powerpc: Enable kcov

kcov provides kernel coverage data that's useful for fuzzing tools like
syzkaller.

Wire up kcov support on powerpc. Disable kcov instrumentation on the same
files where we currently disable gcov and UBSan instrumentation, plus some
additional exclusions which appear necessary to boot on book3e machines.

Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Daniel Axtens <dja@axtens.net> # e6500
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/kconfig: make _etext and data areas alignment configurable on 8xx
Christophe Leroy [Thu, 21 Feb 2019 19:08:52 +0000 (19:08 +0000)]
powerpc/kconfig: make _etext and data areas alignment configurable on 8xx

On 8xx, large pages (512kb or 8M) are used to map kernel linear
memory. Aligning to 8M reduces TLB misses as only 8M pages are used
in that case. We make 8M the default for data.

This patchs allows the user to do it via Kconfig.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/8xx: don't disable large TLBs with CONFIG_STRICT_KERNEL_RWX
Christophe Leroy [Thu, 21 Feb 2019 19:08:51 +0000 (19:08 +0000)]
powerpc/8xx: don't disable large TLBs with CONFIG_STRICT_KERNEL_RWX

This patch implements handling of STRICT_KERNEL_RWX with
large TLBs directly in the TLB miss handlers.

To do so, etext and sinittext are aligned on 512kB boundaries
and the miss handlers use 512kB pages instead of 8Mb pages for
addresses close to the boundaries.

It sets RO PP flags for addresses under sinittext.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/kconfig: make _etext and data areas alignment configurable on Book3s 32
Christophe Leroy [Thu, 21 Feb 2019 19:08:50 +0000 (19:08 +0000)]
powerpc/kconfig: make _etext and data areas alignment configurable on Book3s 32

Depending on the number of available BATs for mapping the different
kernel areas, it might be needed to increase the alignment of _etext
and/or of data areas.

This patchs allows the user to do it via Kconfig.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX
Christophe Leroy [Thu, 21 Feb 2019 19:08:49 +0000 (19:08 +0000)]
powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX

Today, STRICT_KERNEL_RWX is based on the use of regular pages
to map kernel pages.

On Book3s 32, it has three consequences:
- Using pages instead of BAT for mapping kernel linear memory severely
impacts performance.
- Exec protection is not effective because no-execute cannot be set at
page level (except on 603 which doesn't have hash tables)
- Write protection is not effective because PP bits do not provide RO
mode for kernel-only pages (except on 603 which handles it in software
via PAGE_DIRTY)

On the 603+, we have:
- Independent IBAT and DBAT allowing limitation of exec parts.
- NX bit can be set in segment registers to forbit execution on memory
mapped by pages.
- RO mode on DBATs even for kernel-only blocks.

On the 601, there is nothing much we can do other than warn the user
about it, because:
- BATs are common to instructions and data.
- BAT do not provide RO mode for kernel-only blocks.
- segment registers don't have the NX bit.

In order to use IBAT for exec protection, this patch:
- Aligns _etext to BAT block sizes (128kb)
- Set NX bit in kernel segment register (Except on vmalloc area when
CONFIG_MODULES is selected)
- Maps kernel text with IBATs.

In order to use DBAT for exec protection, this patch:
- Aligns RW DATA to BAT block sizes (4M)
- Maps kernel RO area with write prohibited DBATs
- Maps remaining memory with remaining DBATs

Here is what we get with this patch on a 832x when activating
STRICT_KERNEL_RWX:

Symbols:
c0000000 T _stext
c0680000 R __start_rodata
c0680000 R _etext
c0800000 T __init_begin
c0800000 T _sinittext

~# cat /sys/kernel/debug/block_address_translation
---[ Instruction Block Address Translation ]---
0: 0xc0000000-0xc03fffff 0x00000000 Kernel EXEC coherent
1: 0xc0400000-0xc05fffff 0x00400000 Kernel EXEC coherent
2: 0xc0600000-0xc067ffff 0x00600000 Kernel EXEC coherent
3:         -
4:         -
5:         -
6:         -
7:         -

---[ Data Block Address Translation ]---
0: 0xc0000000-0xc07fffff 0x00000000 Kernel RO coherent
1: 0xc0800000-0xc0ffffff 0x00800000 Kernel RW coherent
2: 0xc1000000-0xc1ffffff 0x01000000 Kernel RW coherent
3: 0xc2000000-0xc3ffffff 0x02000000 Kernel RW coherent
4: 0xc4000000-0xc7ffffff 0x04000000 Kernel RW coherent
5: 0xc8000000-0xcfffffff 0x08000000 Kernel RW coherent
6: 0xd0000000-0xdfffffff 0x10000000 Kernel RW coherent
7:         -

~# cat /sys/kernel/debug/segment_registers
---[ User Segments ]---
0x00000000-0x0fffffff Kern key 1 User key 1 VSID 0xa085d0
0x10000000-0x1fffffff Kern key 1 User key 1 VSID 0xa086e1
0x20000000-0x2fffffff Kern key 1 User key 1 VSID 0xa087f2
0x30000000-0x3fffffff Kern key 1 User key 1 VSID 0xa08903
0x40000000-0x4fffffff Kern key 1 User key 1 VSID 0xa08a14
0x50000000-0x5fffffff Kern key 1 User key 1 VSID 0xa08b25
0x60000000-0x6fffffff Kern key 1 User key 1 VSID 0xa08c36
0x70000000-0x7fffffff Kern key 1 User key 1 VSID 0xa08d47
0x80000000-0x8fffffff Kern key 1 User key 1 VSID 0xa08e58
0x90000000-0x9fffffff Kern key 1 User key 1 VSID 0xa08f69
0xa0000000-0xafffffff Kern key 1 User key 1 VSID 0xa0907a
0xb0000000-0xbfffffff Kern key 1 User key 1 VSID 0xa0918b

---[ Kernel Segments ]---
0xc0000000-0xcfffffff Kern key 0 User key 1 No Exec VSID 0x000ccc
0xd0000000-0xdfffffff Kern key 0 User key 1 No Exec VSID 0x000ddd
0xe0000000-0xefffffff Kern key 0 User key 1 No Exec VSID 0x000eee
0xf0000000-0xffffffff Kern key 0 User key 1 No Exec VSID 0x000fff

Aligning _etext to 128kb allows to map up to 32Mb text with 8 IBATs:
16Mb + 8Mb + 4Mb + 2Mb + 1Mb + 512kb + 256kb + 128kb (+ 128kb) = 32Mb
(A 9th IBAT is unneeded as 32Mb would need only a single 32Mb block)

Aligning data to 4M allows to map up to 512Mb data with 8 DBATs:
16Mb + 8Mb + 4Mb + 4Mb + 32Mb + 64Mb + 128Mb + 256Mb = 512Mb

Because some processors only have 4 BATs and because some targets need
DBATs for mapping other areas, the following patch will allow to
modify _etext and data alignment.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm/32s: add setibat() clearibat() and update_bats()
Christophe Leroy [Thu, 21 Feb 2019 19:08:48 +0000 (19:08 +0000)]
powerpc/mm/32s: add setibat() clearibat() and update_bats()

setibat() and clearibat() allows to manipulate IBATs independently
of DBATs.

update_bats() allows to update bats after init. This is done
with MMU off.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/kconfig: define CONFIG_DATA_SHIFT and CONFIG_ETEXT_SHIFT
Christophe Leroy [Thu, 21 Feb 2019 19:08:47 +0000 (19:08 +0000)]
powerpc/kconfig: define CONFIG_DATA_SHIFT and CONFIG_ETEXT_SHIFT

CONFIG_STRICT_KERNEL_RWX requires a special alignment
for DATA for some subarches. Today it is just defined
as an #ifdef in vmlinux.lds.S

In order to get more flexibility, this patch moves the
definition of this alignment in Kconfig

On some subarches, CONFIG_STRICT_KERNEL_RWX will
require a special alignment of _etext.

This patch also adds a configuration item for it in Kconfig

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/kconfig: define PAGE_SHIFT inside Kconfig
Christophe Leroy [Thu, 21 Feb 2019 19:08:46 +0000 (19:08 +0000)]
powerpc/kconfig: define PAGE_SHIFT inside Kconfig

This patch defined CONFIG_PPC_PAGE_SHIFT in order
to be able to use PAGE_SHIFT value inside Kconfig.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mmu: add is_strict_kernel_rwx() helper
Christophe Leroy [Thu, 21 Feb 2019 19:08:45 +0000 (19:08 +0000)]
powerpc/mmu: add is_strict_kernel_rwx() helper

Add a helper to know whether STRICT_KERNEL_RWX is enabled.

This is based on rodata_enabled flag which is defined only
when CONFIG_STRICT_KERNEL_RWX is selected.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/32: add helper to write into segment registers
Christophe Leroy [Thu, 21 Feb 2019 19:08:44 +0000 (19:08 +0000)]
powerpc/32: add helper to write into segment registers

This patch add an helper which wraps 'mtsrin' instruction
to write into segment registers.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm/32s: use _PAGE_EXEC in setbat()
Christophe Leroy [Thu, 21 Feb 2019 19:08:43 +0000 (19:08 +0000)]
powerpc/mm/32s: use _PAGE_EXEC in setbat()

Do not set IBAT when setbat() is called without _PAGE_EXEC

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/wii: remove wii_mmu_mapin_mem2()
Christophe Leroy [Thu, 21 Feb 2019 19:08:42 +0000 (19:08 +0000)]
powerpc/wii: remove wii_mmu_mapin_mem2()

wii_mmu_mapin_mem2() is not used anymore, remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/32: always populate page tables for Abatron BDI.
Christophe Leroy [Thu, 21 Feb 2019 19:08:41 +0000 (19:08 +0000)]
powerpc/32: always populate page tables for Abatron BDI.

When CONFIG_BDI_SWITCH is set, the page tables have to be populated
allthough large TLBs are used, because the BDI switch knows nothing
about those large TLBs which are handled directly in TLB miss logic.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm/32s: use generic mmu_mapin_ram() for all blocks.
Christophe Leroy [Thu, 21 Feb 2019 19:08:40 +0000 (19:08 +0000)]
powerpc/mm/32s: use generic mmu_mapin_ram() for all blocks.

Now that mmu_mapin_ram() is able to handle other blocks
than the one starting at 0, the WII can use it for all
its blocks.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm/32s: rework mmu_mapin_ram()
Christophe Leroy [Thu, 21 Feb 2019 19:08:39 +0000 (19:08 +0000)]
powerpc/mm/32s: rework mmu_mapin_ram()

This patch reworks mmu_mapin_ram() to be more generic and map as much
blocks as possible. It now supports blocks not starting at address 0.

It scans DBATs array to find free ones instead of forcing the use of
BAT2 and BAT3.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/mm/32: add base address to mmu_mapin_ram()
Christophe Leroy [Thu, 21 Feb 2019 19:08:38 +0000 (19:08 +0000)]
powerpc/mm/32: add base address to mmu_mapin_ram()

At the time being, mmu_mapin_ram() always maps RAM from the beginning.
But some platforms like the WII have to map a second block of RAM.

This patch adds to mmu_mapin_ram() the base address of the block.
At the moment, only base address 0 is supported.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/wii: properly disable use of BATs when requested.
Christophe Leroy [Thu, 21 Feb 2019 19:08:37 +0000 (19:08 +0000)]
powerpc/wii: properly disable use of BATs when requested.

'nobats' kernel parameter or some options like CONFIG_DEBUG_PAGEALLOC
deny the use of BATS for mapping memory.

This patch makes sure that the specific wii RAM mapping function
takes it into account as well.

Fixes: de32400dd26e ("wii: use both mem1 and mem2 as ram")
Cc: stable@vger.kernel.org
Reviewed-by: Jonathan Neuschafer <j.neuschaefer@gmx.net>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/8xx: Map 32Mb of RAM at init.
Christophe Leroy [Wed, 13 Feb 2019 16:06:21 +0000 (16:06 +0000)]
powerpc/8xx: Map 32Mb of RAM at init.

At the time being, initial MMU setup allows 24 Mbytes
of DATA and 8 Mbytes of code.

Some debug setup like CONFIG_KASAN generate huge
kernels with text size over the 8M limit and data over the
24 Mbytes limit.

Here is an 8xx kernel compiled with CONFIG_KASAN_INLINE for
one of my boards:

[root@po16846vm linux-powerpc]# size -x vmlinux
   text    data     bss     dec     hex filename
0x111019c 0x41b0d4 0x490de0 26984528 19bc050 vmlinux

This patch maps up to 32 Mbytes code based on _einittext symbol
and allows 32 Mbytes of memory instead of 24.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/8xx: replace most #ifdef by IS_ENABLED() in 8xx_mmu.c
Christophe Leroy [Wed, 13 Feb 2019 16:06:19 +0000 (16:06 +0000)]
powerpc/8xx: replace most #ifdef by IS_ENABLED() in 8xx_mmu.c

This patch replaces most #ifdef mess by IS_ENABLED() in 8xx_mmu.c
This has the advantage of allowing syntax verification at compile
time regardless of selected options.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: sstep: Add tests for addc[.] instruction
Sandipan Das [Wed, 20 Feb 2019 06:57:00 +0000 (12:27 +0530)]
powerpc: sstep: Add tests for addc[.] instruction

This adds test cases for the addc[.] instruction.

Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: sstep: Add tests for add[.] instruction
Sandipan Das [Wed, 20 Feb 2019 06:56:59 +0000 (12:26 +0530)]
powerpc: sstep: Add tests for add[.] instruction

This adds test cases for the add[.] instruction.

Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: sstep: Add tests for compute type instructions
Sandipan Das [Wed, 20 Feb 2019 06:56:58 +0000 (12:26 +0530)]
powerpc: sstep: Add tests for compute type instructions

This enhances the current selftest framework for validating
the in-kernel instruction emulation infrastructure by adding
support for compute type instructions i.e. integer ALU-based
instructions. Originally, this framework was limited to only
testing load and store instructions.

While most of the GPRs can be validated, support for SPRs is
limited to LR, CR and XER for now.

When writing the test cases, one must ensure that the Stack
Pointer (GPR1) or the Thread Pointer (GPR13) are not touched
by any means as these are vital non-volatile registers.

Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
[mpe: Use patch_site for the code patching]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agoRevert "powerpc/book3s32: Reorder _PAGE_XXX flags to simplify TLB handling"
Michael Ellerman [Sat, 23 Feb 2019 09:30:50 +0000 (20:30 +1100)]
Revert "powerpc/book3s32: Reorder _PAGE_XXX flags to simplify TLB handling"

This reverts commit 78ca1108b10927b3d068c8da91352b0f4cd01fc5.

It is causing boot failures with qemu mac99 in at least some
configurations.

5 years agopowerpc: Move page table dump files in a dedicated subdirectory
Christophe Leroy [Mon, 18 Feb 2019 12:28:36 +0000 (12:28 +0000)]
powerpc: Move page table dump files in a dedicated subdirectory

This patch moves the files related to page table dump in a
dedicated subdirectory.

The purpose is to clean a bit arch/powerpc/mm by regrouping
multiple files handling a dedicated function.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Shorten the file names while we're at it]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/kvm: Save and restore host AMR/IAMR/UAMOR
Michael Ellerman [Fri, 22 Feb 2019 02:22:08 +0000 (13:22 +1100)]
powerpc/kvm: Save and restore host AMR/IAMR/UAMOR

When the hash MMU is active the AMR, IAMR and UAMOR are used for
pkeys. The AMR is directly writable by user space, and the UAMOR masks
those writes, meaning both registers are effectively user register
state. The IAMR is used to create an execute only key.

Also we must maintain the value of at least the AMR when running in
process context, so that any memory accesses done by the kernel on
behalf of the process are correctly controlled by the AMR.

Although we are correctly switching all registers when going into a
guest, on returning to the host we just write 0 into all regs, except
on Power9 where we restore the IAMR correctly.

This could be observed by a user process if it writes the AMR, then
runs a guest and we then return immediately to it without
rescheduling. Because we have written 0 to the AMR that would have the
effect of granting read/write permission to pages that the process was
trying to protect.

In addition, when using the Radix MMU, the AMR can prevent inadvertent
kernel access to userspace data, writing 0 to the AMR disables that
protection.

So save and restore AMR, IAMR and UAMOR.

Fixes: cf43d3b26452 ("powerpc: Enable pkey subsystem")
Cc: stable@vger.kernel.org # v4.16+
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
5 years agopowerpc: dump as a single line areas mapping a single physical page.
Christophe Leroy [Mon, 18 Feb 2019 12:25:20 +0000 (12:25 +0000)]
powerpc: dump as a single line areas mapping a single physical page.

When using KASAN, there are parts of the shadow area where all
pages are mapped to the kasan_early_shadow_page. It is pointless
to dump one line for each of those pages (in the example below there
are 7168 entries pointing to the same physical page).

~# cat /sys/kernel/debug/kernel_page_tables
 ...
---[ kasan shadow mem start ]---
0xf7c00000-0xf8bfffff 0x06fac000       16M        rw       present           dirty  accessed
0xf8c00000-0xf8c03fff 0x00cd0000       16K        r        present           dirty  accessed
0xf8c04000-0xf8c07fff 0x00cd0000       16K        r        present           dirty  accessed
0xf8c08000-0xf8c0bfff 0x00cd0000       16K        r        present           dirty  accessed
0xf8c0c000-0xf8c0ffff 0x00cd0000       16K        r        present           dirty  accessed
0xf8c10000-0xf8c13fff 0x00cd0000       16K        r        present           dirty  accessed
 ... 7168 identical lines
0xffbfc000-0xffbfffff 0x00cd0000       16K        r        present           dirty  accessed
---[ kasan shadow mem end ]---
 ...

This patch modifies linux table dump to dump as a single line areas
where all addresses points to the same physical page. That physical
address is put inside [] to show that all virt pages points to the
same phys page.

~# cat /sys/kernel/debug/kernel_page_tables
 ...
---[ kasan shadow mem start ]---
0xf7c00000-0xf8bfffff  0x06fac000        16M        rw       present           dirty  accessed
0xf8c00000-0xffbfffff [0x00cd0000]       16K        r        present           dirty  accessed
---[ kasan shadow mem end ]---
 ...

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agotools/selftest/vm: allow choosing mem size and page size in map_hugetlb
Christophe Leroy [Fri, 8 Feb 2019 15:02:55 +0000 (15:02 +0000)]
tools/selftest/vm: allow choosing mem size and page size in map_hugetlb

map_hugetlb maps 256Mbytes of memory with default hugepage size.

This patch allows the user to pass the size and page shift as an
argument in order to use different size and page size.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/32: Fix CONFIG_VIRT_CPU_ACCOUNTING_NATIVE for 40x/booke
Christophe Leroy [Thu, 31 Jan 2019 10:10:31 +0000 (10:10 +0000)]
powerpc/32: Fix CONFIG_VIRT_CPU_ACCOUNTING_NATIVE for 40x/booke

40x/booke have another path to reach 3f from transfer_to_handler,
make sure it also calls ACCOUNT_CPU_USER_ENTRY() when
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is selected.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/book3s32: Reorder _PAGE_XXX flags to simplify TLB handling
Christophe Leroy [Fri, 25 Jan 2019 12:34:20 +0000 (12:34 +0000)]
powerpc/book3s32: Reorder _PAGE_XXX flags to simplify TLB handling

For pages without _PAGE_USER, PP field is 00
For pages with _PAGE_USER, PP field is 10 for RW and 11 for RO.

This patch sets _PAGE_USER to 0x002 and _PAGE_RW to 0x001
is order to simplify TLB handling by reducing amount of shifts.

The location of _PAGE_PRESENT and _PAGE_HASHPTE doesn't matter
as they are only SW related flags.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/603: don't handle PAGE_ACCESSED in TLB miss handlers.
Christophe Leroy [Thu, 21 Feb 2019 10:38:02 +0000 (10:38 +0000)]
powerpc/603: don't handle PAGE_ACCESSED in TLB miss handlers.

PAGE_ACCESSED is only needed for CONFIG_SWAP. When CONFIG_SWAP
is not set, just ignore it. If CONFIG_SWAP is set and PAGE_ACCESSED
is not, let's take a minor fault.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/603: Don't worry about _PAGE_USER in TLB miss handlers
Christophe Leroy [Thu, 21 Feb 2019 10:38:01 +0000 (10:38 +0000)]
powerpc/603: Don't worry about _PAGE_USER in TLB miss handlers

PP bits take user access into account, so no need to check _PAGE_USER
here. A DSI or ISI will be generated if needed.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/603: let's handle PAGE_DIRTY directly
Christophe Leroy [Thu, 21 Feb 2019 10:38:00 +0000 (10:38 +0000)]
powerpc/603: let's handle PAGE_DIRTY directly

PAGE_DIRTY corresponds to the C bit. If writing on
a page for which the C bit is not set, a DataStoreTLBMiss
is generated. No need to check it in DataLoadTLBMiss.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/603: Don't handle _PAGE_RW and _PAGE_DIRTY on ITLB misses
Christophe Leroy [Thu, 21 Feb 2019 10:37:59 +0000 (10:37 +0000)]
powerpc/603: Don't handle _PAGE_RW and _PAGE_DIRTY on ITLB misses

_PAGE_RW and _PAGE_DIRTY do not matter for ITLB misses.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/603: Don't handle kernel page TLB misses when not need
Christophe Leroy [Thu, 21 Feb 2019 10:37:58 +0000 (10:37 +0000)]
powerpc/603: Don't handle kernel page TLB misses when not need

ITLB miss on kernel pages only occur with CONFIG_MODULES and
CONFIG_DEBUG_PAGEALLOC.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/hash32: use physical address directly in hash handlers.
Christophe Leroy [Thu, 21 Feb 2019 10:37:57 +0000 (10:37 +0000)]
powerpc/hash32: use physical address directly in hash handlers.

Since commit c62ce9ef97ba ("powerpc: remove remaining bits from
CONFIG_APUS"), tophys() has become a pure constant operation.
PAGE_OFFSET is known at compile time so the physical address
can be builtin directly.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/603: use physical address directly in TLB miss handlers.
Christophe Leroy [Thu, 21 Feb 2019 10:37:56 +0000 (10:37 +0000)]
powerpc/603: use physical address directly in TLB miss handlers.

Since commit c62ce9ef97ba ("powerpc: remove remaining bits from
CONFIG_APUS"), tophys() has become a pure constant operation.
PAGE_OFFSET is known at compile time so the physical address
can be builtin directly.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/6xx: Store PGDIR physical address in a SPRG
Christophe Leroy [Thu, 21 Feb 2019 10:37:55 +0000 (10:37 +0000)]
powerpc/6xx: Store PGDIR physical address in a SPRG

Use SPRN_SPRG2 to store the current thread PGDIR and
avoid reading thread_struct.pgdir at every TLB miss.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/6xx: Don't use SPRN_SPRG2 for storing stack pointer while in RTAS
Christophe Leroy [Thu, 21 Feb 2019 10:37:54 +0000 (10:37 +0000)]
powerpc/6xx: Don't use SPRN_SPRG2 for storing stack pointer while in RTAS

When calling RTAS, the stack pointer is stored in SPRN_SPRG2
in order to be able to restore it in case of machine check in RTAS.

As machine check is not a perfomance critical path, this patch
frees SPRN_SPRG2 by using a field in thread struct instead.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: simplify BDI switch
Christophe Leroy [Thu, 21 Feb 2019 10:37:53 +0000 (10:37 +0000)]
powerpc: simplify BDI switch

There is no reason to re-read each time the pointer at
location 0xf0 as it is fixed and known.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/83xx: Also save/restore SPRG4-7 during suspend
Christophe Leroy [Fri, 25 Jan 2019 12:03:55 +0000 (12:03 +0000)]
powerpc/83xx: Also save/restore SPRG4-7 during suspend

The 83xx has 8 SPRG registers and uses at least SPRG4
for DTLB handling LRU.

Fixes: 2319f1239592 ("powerpc/mm: e300c2/c3/c4 TLB errata workaround")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/traps: fix recoverability of machine check handling on book3s/32
Christophe Leroy [Tue, 22 Jan 2019 14:11:24 +0000 (14:11 +0000)]
powerpc/traps: fix recoverability of machine check handling on book3s/32

Looks like book3s/32 doesn't set RI on machine check, so
checking RI before calling die() will always be fatal
allthought this is not an issue in most cases.

Fixes: b96672dd840f ("powerpc: Machine check interrupt is a non-maskable interrupt")
Fixes: daf00ae71dad ("powerpc/traps: restore recoverability of machine_check interrupts")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: stable@vger.kernel.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/32: Remove unneccessary MSR[RI] clearing for 8xx
Christophe Leroy [Tue, 22 Jan 2019 13:52:04 +0000 (13:52 +0000)]
powerpc/32: Remove unneccessary MSR[RI] clearing for 8xx

MSR[RI] has already been cleared a few lines above.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/setup: display reason for not booting
Christophe Leroy [Tue, 18 Dec 2018 06:53:41 +0000 (06:53 +0000)]
powerpc/setup: display reason for not booting

When no machine description matches, display it clearly
before looping forever.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/8xx: hide itlbie and dtlbie symbols
Christophe Leroy [Thu, 13 Dec 2018 08:08:11 +0000 (08:08 +0000)]
powerpc/8xx: hide itlbie and dtlbie symbols

When disassembling InstructionTLBError we get the following messy code:

c000138c:       7d 84 63 78     mr      r4,r12
c0001390:       75 25 58 00     andis.  r5,r9,22528
c0001394:       75 2a 40 00     andis.  r10,r9,16384
c0001398:       41 a2 00 08     beq     c00013a0 <itlbie>
c000139c:       7c 00 22 64     tlbie   r4,r0

c00013a0 <itlbie>:
c00013a0:       39 40 04 01     li      r10,1025
c00013a4:       91 4b 00 b0     stw     r10,176(r11)
c00013a8:       39 40 10 32     li      r10,4146
c00013ac:       48 00 cc 59     bl      c000e004 <transfer_to_handler>

For a cleaner code dump, this patch replaces itlbie and dtlbie
symbols by local symbols.

c000138c:       7d 84 63 78     mr      r4,r12
c0001390:       75 25 58 00     andis.  r5,r9,22528
c0001394:       75 2a 40 00     andis.  r10,r9,16384
c0001398:       41 a2 00 08     beq     c00013a0 <InstructionTLBError+0xa0>
c000139c:       7c 00 22 64     tlbie   r4,r0
c00013a0:       39 40 04 01     li      r10,1025
c00013a4:       91 4b 00 b0     stw     r10,176(r11)
c00013a8:       39 40 10 32     li      r10,4146
c00013ac:       48 00 cc 59     bl      c000e004 <transfer_to_handler>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/selftest: fix type of mftb() in null_syscall
Christophe Leroy [Tue, 22 Jan 2019 13:54:57 +0000 (13:54 +0000)]
powerpc/selftest: fix type of mftb() in null_syscall

All callers of mftb() expect 'unsigned long', and the function itself
only returns lower part of the TB so it really is 'unsigned long'
not 'unsigned long long'

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/powernv: Don't reprogram SLW image on every KVM guest entry/exit
Paul Mackerras [Tue, 12 Feb 2019 00:58:29 +0000 (11:58 +1100)]
powerpc/powernv: Don't reprogram SLW image on every KVM guest entry/exit

Commit 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via stop-api
only on Hotplug", 2017-07-21) added two calls to opal_slw_set_reg()
inside pnv_cpu_offline(), with the aim of changing the LPCR value in
the SLW image to disable wakeups from the decrementer while a CPU is
offline.  However, pnv_cpu_offline() gets called each time a secondary
CPU thread is woken up to participate in running a KVM guest, that is,
not just when a CPU is offlined.

Since opal_slw_set_reg() is a very slow operation (with observed
execution times around 20 milliseconds), this means that an offline
secondary CPU can often be busy doing the opal_slw_set_reg() call
when the primary CPU wants to grab all the secondary threads so that
it can run a KVM guest.  This leads to messages like "KVM: couldn't
grab CPU n" being printed and guest execution failing.

There is no need to reprogram the SLW image on every KVM guest entry
and exit.  So that we do it only when a CPU is really transitioning
between online and offline, this moves the calls to
pnv_program_cpu_hotplug_lpcr() into pnv_smp_cpu_kill_self().

Fixes: 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via stop-api only on Hotplug")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/64s: Fix logic when handling unknown CPU features
Michael Ellerman [Mon, 11 Feb 2019 00:20:01 +0000 (11:20 +1100)]
powerpc/64s: Fix logic when handling unknown CPU features

In cpufeatures_process_feature(), if a provided CPU feature is unknown and
enable_unknown is false, we erroneously print that the feature is being
enabled and return true, even though no feature has been enabled, and
may also set feature bits based on the last entry in the match table.

Fix this so that we only set feature bits from the match table if we have
actually enabled a feature from that table, and when failing to enable an
unknown feature, always print the "not enabling" message and return false.

Coincidentally, some older gccs (<GCC 7), when invoked with
-fsanitize-coverage=trace-pc, cause a spurious uninitialised variable
warning in this function:

  arch/powerpc/kernel/dt_cpu_ftrs.c: In function ‘cpufeatures_process_feature’:
  arch/powerpc/kernel/dt_cpu_ftrs.c:686:7: warning: ‘m’ may be used uninitialized in this function [-Wmaybe-uninitialized]
    if (m->cpu_ftr_bit_mask)

An upcoming patch will enable support for kcov, which requires this option.
This patch avoids the warning.

Fixes: 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features")
Reported-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[ajd: add commit message]
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
5 years agopowerpc/smp: Make __smp_send_nmi_ipi() static
Nicholas Piggin [Mon, 26 Nov 2018 02:01:07 +0000 (12:01 +1000)]
powerpc/smp: Make __smp_send_nmi_ipi() static

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/smp: Fix NMI IPI xmon timeout
Nicholas Piggin [Mon, 26 Nov 2018 02:01:06 +0000 (12:01 +1000)]
powerpc/smp: Fix NMI IPI xmon timeout

The xmon debugger IPI handler waits in the callback function while
xmon is still active. This means they don't complete the IPI, and the
initiator always times out waiting for them.

Things manage to work after the timeout because there is some fallback
logic to keep NMI IPI state sane in case of the timeout, but this is a
bit ugly.

This patch changes NMI IPI back to half-asynchronous (i.e., wait for
everyone to call in, do not wait for IPI function to complete), but
the complexity is avoided by going one step further and allowing new
IPIs to be issued before the IPI functions to all complete.

If synchronization against that is required, it is left up to the
caller, but current callers don't require that. In fact with the
timeout handling, callers must be able to cope with this already.

Fixes: 5b73151fff63 ("powerpc: NMI IPI make NMI IPIs fully sychronous")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc/smp: Fix NMI IPI timeout
Nicholas Piggin [Mon, 26 Nov 2018 02:01:05 +0000 (12:01 +1000)]
powerpc/smp: Fix NMI IPI timeout

The NMI IPI timeout logic is broken, if __smp_send_nmi_ipi() times out
on the first condition, delay_us will be zero which will send it into
the second spin loop with no timeout so it will spin forever.

Fixes: 5b73151fff63 ("powerpc: NMI IPI make NMI IPIs fully sychronous")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
5 years agopowerpc: Make PPC_64K_PAGES depend on only 44x or PPC_BOOK3S_64
Michael Ellerman [Fri, 8 Feb 2019 12:34:16 +0000 (23:34 +1100)]
powerpc: Make PPC_64K_PAGES depend on only 44x or PPC_BOOK3S_64

In commit 7820856a4fcd ("powerpc/mm/book3e/64: Remove unsupported
64Kpage size from 64bit booke") we dropped the 64K page size support
from the 64-bit nohash (Book3E) code.

But we didn't update the dependencies of the PPC_64K_PAGES option,
meaning a randconfig can still trigger this code and cause a build
breakage, eg:
  arch/powerpc/include/asm/nohash/64/pgtable.h:14:2: error: #error "Page size not supported"
  arch/powerpc/include/asm/nohash/mmu-book3e.h:275:2: error: #error Unsupported page size

So remove PPC_BOOK3E_64 from the dependencies. This also means we
don't need to worry about PPC_FSL_BOOK3E, because that was just trying
to prevent the PPC_BOOK3E_64=y && PPC_FSL_BOOK3E=y case.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>