linux-2.6-block.git
2 years agoselftests/powerpc/pmu: fix spelling mistake "mis-match" -> "mismatch"
Colin Ian King [Sat, 19 Mar 2022 23:20:25 +0000 (23:20 +0000)]
selftests/powerpc/pmu: fix spelling mistake "mis-match" -> "mismatch"

There are a few spelling mistakes in error messages. Fix them.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220319232025.22067-1-colin.i.king@gmail.com
2 years agopowerpc: Enable the DAWR on POWER9 DD2.3 and above
Reza Arbab [Tue, 3 May 2022 17:01:52 +0000 (12:01 -0500)]
powerpc: Enable the DAWR on POWER9 DD2.3 and above

The hardware bug in POWER9 preventing use of the DAWR was fixed in
DD2.3. Set the CPU_FTR_DAWR feature bit on these newer systems to start
using it again, and update the documentation accordingly.

The CPU features for DD2.3 are currently determined by "DD2.2 or later"
logic. In adding DD2.3 as a discrete case for the first time here, I'm
carrying the quirks of DD2.2 forward to keep all behavior outside of
this DAWR change the same. This leaves the assessment and potential
removal of those quirks on DD2.3 for later.

Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220503170152.23412-1-arbab@linux.ibm.com
2 years agopowerpc/64s: Add CPU_FTRS_POWER10 to ALWAYS mask
Michael Ellerman [Thu, 19 May 2022 05:03:57 +0000 (15:03 +1000)]
powerpc/64s: Add CPU_FTRS_POWER10 to ALWAYS mask

CPU_FTRS_POWER10 is missing from the CPU_FTRS_ALWAYS mask.

Currently that doesn't cause any bug, because it is a superset of the
POWER9 mask, which the exception of CPU_FTR_TM, but POWER7 doesn't have
CPU_FTR_TM, so CPU_FTR_TM is not in the ALWAYS mask to begin with.

However for consistency, and to be robust against future changes, it
should be included in the ALWAYS mask.

Fixes: a3ea40d5c736 ("powerpc: Add POWER10 architected mode")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220519122205.746276-2-mpe@ellerman.id.au
2 years agopowerpc/64s: Add CPU_FTRS_POWER9_DD2_2 to CPU_FTRS_ALWAYS mask
Michael Ellerman [Thu, 19 May 2022 04:30:56 +0000 (14:30 +1000)]
powerpc/64s: Add CPU_FTRS_POWER9_DD2_2 to CPU_FTRS_ALWAYS mask

CPU_FTRS_POWER9_DD2_2 is missing from CPU_FTRS_ALWAYS.

That doesn't cause any bug, because CPU_FTRS_POWER9_DD2_2 adds new bits
that don't appear in other values, so when anded with the other masks
the result is the same.

But for consistency we should have all values in the CPU_FTRS_ALWAYS
mask, so that the logic is robust against the values being changed in
future.

Fixes: b5af4f279323 ("powerpc: Add CPU feature bits for TM bug workarounds on POWER9 v2.2")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220519122205.746276-1-mpe@ellerman.id.au
2 years agopowerpc: Fix all occurences of "the the"
Michael Ellerman [Wed, 18 May 2022 14:26:29 +0000 (00:26 +1000)]
powerpc: Fix all occurences of "the the"

Rather than waiting for the bots to fix these one-by-one, fix all
occurences of "the the" throughout arch/powerpc.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220518142629.513007-1-mpe@ellerman.id.au
2 years agoselftests/powerpc/pmu/ebb: remove fixed_instruction.S
Madhavan Srinivasan [Tue, 22 Mar 2022 04:56:38 +0000 (10:26 +0530)]
selftests/powerpc/pmu/ebb: remove fixed_instruction.S

Commit 3752e453f6ba ("selftests/powerpc: Add tests of PMU EBBs") added
selftest testcases to verify EBB interface. instruction_count_test.c
testcase needs a fixed loop function to count overhead. Instead of using
the thirty_two_instruction_loop() in fixed_instruction_loop.S in ebb
folder, file is linked with thirty_two_instruction_loop() in loop.S from
top folder. Since fixed_instruction_loop.S not used, patch removes the
file.

Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220322045638.10443-1-maddy@linux.ibm.com
2 years agopowerpc/platforms/83xx: Use of_device_get_match_data()
Minghao Chi (CGEL ZTE) [Fri, 25 Feb 2022 01:07:37 +0000 (01:07 +0000)]
powerpc/platforms/83xx: Use of_device_get_match_data()

Use of_device_get_match_data() to simplify the code.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi (CGEL ZTE) <chi.minghao@zte.com.cn>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220225010737.2038781-1-chi.minghao@zte.com.cn
2 years agopowerpc/eeh: Drop redundant spinlock initialization
Haowen Bai [Wed, 11 May 2022 01:27:56 +0000 (09:27 +0800)]
powerpc/eeh: Drop redundant spinlock initialization

slot_errbuf_lock has declared and initialized by DEFINE_SPINLOCK,
so we don't need to spin_lock_init again, drop it.

Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1652232476-9696-1-git-send-email-baihaowen@meizu.com
2 years agopowerpc/iommu: Add missing of_node_put in iommu_init_early_dart
Peng Wu [Mon, 25 Apr 2022 08:12:45 +0000 (08:12 +0000)]
powerpc/iommu: Add missing of_node_put in iommu_init_early_dart

The device_node pointer is returned by of_find_compatible_node
with refcount incremented. We should use of_node_put() to avoid
the refcount leak.

Signed-off-by: Peng Wu <wupeng58@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220425081245.21705-1-wupeng58@huawei.com
2 years agopowerpc/pseries/vas: Call misc_deregister if sysfs init fails
Zheng Bin [Wed, 11 May 2022 03:35:07 +0000 (11:35 +0800)]
powerpc/pseries/vas: Call misc_deregister if sysfs init fails

Undo effects of misc_register if sysfs init fails after
misc_register.

Signed-off-by: Zheng Bin <zhengbin13@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220511033507.2745992-1-zhengbin13@huawei.com
2 years agopowerpc/papr_scm: Fix leaking nvdimm_events_map elements
Vaibhav Jain [Wed, 11 May 2022 08:26:36 +0000 (13:56 +0530)]
powerpc/papr_scm: Fix leaking nvdimm_events_map elements

Right now 'char *' elements allocated for individual 'stat_id' in
'papr_scm_priv.nvdimm_events_map[]' during papr_scm_pmu_check_events(), get
leaked in papr_scm_remove() and papr_scm_pmu_register(),
papr_scm_pmu_check_events() error paths.

Also individual 'stat_id' arent NULL terminated 'char *' instead they are fixed
8-byte sized identifiers. However papr_scm_pmu_register() assumes it to be a
NULL terminated 'char *' and at other places it assumes it to be a
'papr_scm_perf_stat.stat_id' sized string which is 8-byes in size.

Fix this by allocating the memory for papr_scm_priv.nvdimm_events_map to also
include space for 'stat_id' entries. This is possible since number of available
events/stat_ids are known upfront. This saves some memory and one extra level of
indirection from 'nvdimm_events_map' to 'stat_id'. Also rest of the code
can continue to call 'kfree(papr_scm_priv.nvdimm_events_map)' without needing to
iterate over the array and free up individual elements.

Fixes: 4c08d4bbc089 ("powerpc/papr_scm: Add perf interface support")
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220511082637.646714-1-vaibhav@linux.ibm.com
2 years agopowerpc/fsl_rio: Fix refcount leak in fsl_rio_setup
Miaoqian Lin [Thu, 12 May 2022 12:37:18 +0000 (16:37 +0400)]
powerpc/fsl_rio: Fix refcount leak in fsl_rio_setup

of_parse_phandle() returns a node pointer with refcount
incremented, we should use of_node_put() on it when not need anymore.
Add missing of_node_put() to avoid refcount leak.

Fixes: abc3aeae3aaa ("fsl-rio: Add two ports and rapidio message units support")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220512123724.62931-1-linmq006@gmail.com
2 years agopowerpc/xive: Fix refcount leak in xive_spapr_init
Miaoqian Lin [Thu, 12 May 2022 09:05:33 +0000 (13:05 +0400)]
powerpc/xive: Fix refcount leak in xive_spapr_init

of_find_compatible_node() returns a node pointer with refcount
incremented, we should use of_node_put() on it when done.
Add missing of_node_put() to avoid refcount leak.

Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt controller")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220512090535.33397-1-linmq006@gmail.com
2 years agopowerpc/numa: Associate numa node to its cpu earlier
Oscar Salvador [Mon, 11 Apr 2022 07:49:34 +0000 (09:49 +0200)]
powerpc/numa: Associate numa node to its cpu earlier

powerpc is the only platform that do not rely on
cpu_up()->try_online_node() to bring up a numa node,
and special cases it, instead, deep in its own machinery:

dlpar_online_cpu
 find_and_online_cpu_nid
  try_online_node

This should not be needed, but the thing is that the try_online_node()
from cpu_up() will not apply on the right node, because cpu_to_node()
will return the old mapping numa<->cpu that gets set on boot stage
for all possible cpus.

That can be seen easily if we try to print out the numa node passed
to try_online_node() in cpu_up().

The thing is that the numa<->cpu mapping does not get updated till a much
later stage in start_secondary:

start_secondary:
 set_numa_node(numa_cpu_lookup_table[cpu])

But we do not really care, as we already now the
CPU <-> NUMA associativity back in find_and_online_cpu_nid(),
so let us make use of that and set the proper numa<->cpu mapping,
so cpu_to_node() in cpu_up() returns the right node and
try_online_node() can do its work.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
Tested-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220411074934.4632-1-osalvador@suse.de
2 years agomacintosh: via-pmu and via-cuda need RTC_LIB
Randy Dunlap [Sun, 10 Apr 2022 16:10:35 +0000 (09:10 -0700)]
macintosh: via-pmu and via-cuda need RTC_LIB

Fix build when RTC_LIB is not set/enabled.
Eliminates these build errors:

m68k-linux-ld: drivers/macintosh/via-pmu.o: in function `pmu_set_rtc_time':
drivers/macintosh/via-pmu.c:1769: undefined reference to `rtc_tm_to_time64'
m68k-linux-ld: drivers/macintosh/via-cuda.o: in function `cuda_set_rtc_time':
drivers/macintosh/via-cuda.c:797: undefined reference to `rtc_tm_to_time64'

Fixes: 0792a2c8e0bb ("macintosh: Use common code to access RTC")
Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220410161035.592-1-rdunlap@infradead.org
2 years agomacintosh/via-pmu: Fix build failure when CONFIG_INPUT is disabled
Finn Thain [Thu, 7 Apr 2022 10:11:32 +0000 (20:11 +1000)]
macintosh/via-pmu: Fix build failure when CONFIG_INPUT is disabled

drivers/macintosh/via-pmu-event.o: In function `via_pmu_event':
via-pmu-event.c:(.text+0x44): undefined reference to `input_event'
via-pmu-event.c:(.text+0x68): undefined reference to `input_event'
via-pmu-event.c:(.text+0x94): undefined reference to `input_event'
via-pmu-event.c:(.text+0xb8): undefined reference to `input_event'
drivers/macintosh/via-pmu-event.o: In function `via_pmu_event_init':
via-pmu-event.c:(.init.text+0x20): undefined reference to `input_allocate_device'
via-pmu-event.c:(.init.text+0xc4): undefined reference to `input_register_device'
via-pmu-event.c:(.init.text+0xd4): undefined reference to `input_free_device'
make[1]: *** [Makefile:1155: vmlinux] Error 1
make: *** [Makefile:350: __build_one_by_one] Error 2

Don't call into the input subsystem unless CONFIG_INPUT is built-in.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Finn Thain <fthain@linux-m68k.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5edbe76ce68227f71e09af4614cc4c1bd61c7ec8.1649326292.git.fthain@linux-m68k.org
2 years agopowerpc/powernv: fix missing of_node_put in uv_init()
Lv Ruyi [Thu, 7 Apr 2022 09:00:43 +0000 (09:00 +0000)]
powerpc/powernv: fix missing of_node_put in uv_init()

of_find_compatible_node() returns node pointer with refcount incremented,
use of_node_put() on it when done.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220407090043.2491854-1-lv.ruyi@zte.com.cn
2 years agopowerpc/85xx: Remove FSL_85XX_CACHE_SRAM
Christophe Leroy [Thu, 31 Mar 2022 10:03:06 +0000 (12:03 +0200)]
powerpc/85xx: Remove FSL_85XX_CACHE_SRAM

CONFIG_FSL_85XX_CACHE_SRAM is an option that is not
user selectable and which is not selected by any driver
nor any defconfig.

Remove it and all associated code.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9949813a6b758903b7bee910f798ba2ca82ff8ee.1648720908.git.christophe.leroy@csgroup.eu
2 years agopowerpc/xics: fix refcount leak in icp_opal_init()
Lv Ruyi [Sat, 2 Apr 2022 01:34:19 +0000 (01:34 +0000)]
powerpc/xics: fix refcount leak in icp_opal_init()

The of_find_compatible_node() function returns a node pointer with
refcount incremented, use of_node_put() on it when done.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220402013419.2410298-1-lv.ruyi@zte.com.cn
2 years agopowerpc/perf: Fix the threshold compare group constraint for power9
Kajol Jain [Fri, 6 May 2022 06:10:15 +0000 (11:40 +0530)]
powerpc/perf: Fix the threshold compare group constraint for power9

Thresh compare bits for a event is used to program thresh compare
field in Monitor Mode Control Register A (MMCRA: 9-18 bits for power9).
When scheduling events as a group, all events in that group should
match value in threshold bits (like thresh compare, thresh control,
thresh select). Otherwise event open for the sibling events should fail.
But in the current code, incase thresh compare bits are not valid,
we are not failing in group_constraint function which can result
in invalid group schduling.

Fix the issue by returning -1 incase event is threshold and threshold
compare value is not valid.

Thresh control bits in the event code is used to program thresh_ctl
field in Monitor Mode Control Register A (MMCRA: 48-55). In below example,
the scheduling of group events PM_MRK_INST_CMPL (873534401e0) and
PM_THRESH_MET (8734340101ec) is expected to fail as both event
request different thresh control bits and invalid thresh compare value.

Result before the patch changes:

[command]# perf stat -e "{r8735340401e0,r8734340101ec}" sleep 1

 Performance counter stats for 'sleep 1':

            11,048      r8735340401e0
             1,967      r8734340101ec

       1.001354036 seconds time elapsed

       0.001421000 seconds user
       0.000000000 seconds sys

Result after the patch changes:

[command]# perf stat -e "{r8735340401e0,r8734340101ec}" sleep 1
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument)
for event (r8735340401e0).
/bin/dmesg | grep -i perf may provide additional information.

Fixes: 78a16d9fc1206 ("powerpc/perf: Avoid FAB_*_MATCH checks for power9")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220506061015.43916-2-kjain@linux.ibm.com
2 years agopowerpc/perf: Fix the threshold compare group constraint for power10
Kajol Jain [Fri, 6 May 2022 06:10:14 +0000 (11:40 +0530)]
powerpc/perf: Fix the threshold compare group constraint for power10

Thresh compare bits for a event is used to program thresh compare
field in Monitor Mode Control Register A (MMCRA: 8-18 bits for power10).
When scheduling events as a group, all events in that group should
match value in threshold bits. Otherwise event open for the sibling
events should fail. But in the current code, incase thresh compare bits are
not valid, we are not failing in group_constraint function which can result
in invalid group schduling.

Fix the issue by returning -1 incase event is threshold and threshold
compare value is not valid in group_constraint function.

Patch also fixes the p10_thresh_cmp_val function to return -1,
incase threshold bits are not valid and changes corresponding check in
is_thresh_cmp_valid function to return false only when the thresh_cmp
value is less then 0.

Thresh control bits in the event code is used to program thresh_ctl
field in Monitor Mode Control Register A (MMCRA: 48-55). In below example,
the scheduling of group events PM_MRK_INST_CMPL (3534401e0) and
PM_THRESH_MET (34340101ec) is expected to fail as both event
request different thresh control bits.

Result before the patch changes:

[command]# perf stat -e "{r35340401e0,r34340101ec}" sleep 1

 Performance counter stats for 'sleep 1':

             8,482      r35340401e0
                 0      r34340101ec

       1.001474838 seconds time elapsed

       0.001145000 seconds user
       0.000000000 seconds sys

Result after the patch changes:

[command]# perf stat -e "{r35340401e0,r34340101ec}" sleep 1

 Performance counter stats for 'sleep 1':

     <not counted>      r35340401e0
   <not supported>      r34340101ec

       1.001499607 seconds time elapsed

       0.000204000 seconds user
       0.000760000 seconds sys

Fixes: 82d2c16b350f7 ("powerpc/perf: Adds support for programming of Thresholding in P10")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220506061015.43916-1-kjain@linux.ibm.com
2 years agopowerpc/kaslr_booke: Fix build error
YueHaibing [Tue, 17 May 2022 09:49:00 +0000 (17:49 +0800)]
powerpc/kaslr_booke: Fix build error

arch/powerpc/mm/nohash/kaslr_booke.c: In function ‘kaslr_get_cmdline’:
arch/powerpc/mm/nohash/kaslr_booke.c:46:2: error: implicit declaration of function ‘early_init_dt_scan_chosen’
  early_init_dt_scan_chosen(boot_command_line);
  ^~~~~~~~~~~~~~~~~~~~~~~~~

arch/powerpc/mm/nohash/kaslr_booke.c: In function ‘get_initrd_range’:
arch/powerpc/mm/nohash/kaslr_booke.c:210:10: error: implicit declaration of function ‘of_read_number’
  start = of_read_number(prop, len / 4);
          ^~~~~~~~~~~~~~

Add missing include files to fix this.

Fixes: 86c38fec69a4 ("powerpc: Remove asm/prom.h from all files that don't need it")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220517094900.14900-1-yuehaibing@huawei.com
2 years agopowerpc/book3e: Fix build error
YueHaibing [Tue, 17 May 2022 09:48:30 +0000 (17:48 +0800)]
powerpc/book3e: Fix build error

arch/powerpc/mm/nohash/fsl_book3e.c: In function ‘relocate_init’:
arch/powerpc/mm/nohash/fsl_book3e.c:348:2: error: implicit declaration of function ‘early_get_first_memblock_info’
  early_get_first_memblock_info(__va(dt_ptr), &size);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Add missing include file linux/of_fdt.h to fix this.

Fixes: 86c38fec69a4 ("powerpc: Remove asm/prom.h from all files that don't need it")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220517094830.27560-1-yuehaibing@huawei.com
2 years agopowerpc: Book3S 64-bit outline-only KASAN support
Daniel Axtens [Wed, 18 May 2022 10:05:31 +0000 (20:05 +1000)]
powerpc: Book3S 64-bit outline-only KASAN support

Implement a limited form of KASAN for Book3S 64-bit machines running under
the Radix MMU, supporting only outline mode.

 - Enable the compiler instrumentation to check addresses and maintain the
   shadow region. (This is the guts of KASAN which we can easily reuse.)

 - Require kasan-vmalloc support to handle modules and anything else in
   vmalloc space.

 - KASAN needs to be able to validate all pointer accesses, but we can't
   instrument all kernel addresses - only linear map and vmalloc. On boot,
   set up a single page of read-only shadow that marks all iomap and
   vmemmap accesses as valid.

 - Document KASAN in powerpc docs.

Background
----------

KASAN support on Book3S is a bit tricky to get right:

 - It would be good to support inline instrumentation so as to be able to
   catch stack issues that cannot be caught with outline mode.

 - Inline instrumentation requires a fixed offset.

 - Book3S runs code with translations off ("real mode") during boot,
   including a lot of generic device-tree parsing code which is used to
   determine MMU features.

    [ppc64 mm note: The kernel installs a linear mapping at effective
    address c000...-c008.... This is a one-to-one mapping with physical
    memory from 0000... onward. Because of how memory accesses work on
    powerpc 64-bit Book3S, a kernel pointer in the linear map accesses the
    same memory both with translations on (accessing as an 'effective
    address'), and with translations off (accessing as a 'real
    address'). This works in both guests and the hypervisor. For more
    details, see s5.7 of Book III of version 3 of the ISA, in particular
    the Storage Control Overview, s5.7.3, and s5.7.5 - noting that this
    KASAN implementation currently only supports Radix.]

 - Some code - most notably a lot of KVM code - also runs with translations
   off after boot.

 - Therefore any offset has to point to memory that is valid with
   translations on or off.

One approach is just to give up on inline instrumentation. This way
boot-time checks can be delayed until after the MMU is set is up, and we
can just not instrument any code that runs with translations off after
booting. Take this approach for now and require outline instrumentation.

Previous attempts allowed inline instrumentation. However, they came with
some unfortunate restrictions: only physically contiguous memory could be
used and it had to be specified at compile time. Maybe we can do better in
the future.

[paulus@ozlabs.org - Rebased onto 5.17.  Note that a kernel with
 CONFIG_KASAN=y will crash during boot on a machine using HPT
 translation because not all the entry points to the generic
 KASAN code are protected with a call to kasan_arch_is_ready().]

Originally-by: Balbir Singh <bsingharora@gmail.com> # ppc64 out-of-line radix version
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
[mpe: Update copyright year and comment formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/YoTE69OQwiG7z+Gu@cleo
2 years agopowerpc/kasan: Disable address sanitization in kexec paths
Daniel Axtens [Wed, 18 May 2022 10:07:05 +0000 (20:07 +1000)]
powerpc/kasan: Disable address sanitization in kexec paths

The kexec code paths involve code that necessarily run in real mode, as
CPUs are disabled and control is transferred to the new kernel. Disable
address sanitization for the kexec code and the functions called in real
mode on CPUs being disabled.

[paulus@ozlabs.org: combined a few work-in-progress commits of
 Daniel's and wrote the commit message.]

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
[mpe: Move pseries_machine_kexec() into kexec.c so setup.c can be instrumented]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/YoTFSQ2TUSEaDdVC@cleo
2 years agopowerpc/kasan: Don't instrument non-maskable or raw interrupts
Daniel Axtens [Wed, 18 May 2022 10:06:17 +0000 (20:06 +1000)]
powerpc/kasan: Don't instrument non-maskable or raw interrupts

Disable address sanitization for raw and non-maskable interrupt
handlers, because they can run in real mode, where we cannot access
the shadow memory.  (Note that kasan_arch_is_ready() doesn't test for
real mode, since it is a static branch for speed, and in any case not
all the entry points to the generic KASAN code are protected by
kasan_arch_is_ready guards.)

The changes to interrupt_nmi_enter/exit_prepare() look larger than
they actually are.  The changes are equivalent to adding
!IS_ENABLED(CONFIG_KASAN) to the conditions for calling nmi_enter() or
nmi_exit() in real mode.  That is, the code is equivalent to using the
following condition for calling nmi_enter/exit:

if (((!IS_ENABLED(CONFIG_PPC_BOOK3S_64) ||
!firmware_has_feature(FW_FEATURE_LPAR) ||
radix_enabled()) &&
    !IS_ENABLED(CONFIG_KASAN) ||
(mfmsr() & MSR_DR))

That unwieldy condition has been split into several statements with
comments, for easier reading.

The nmi_ipi_lock functions that call atomic functions (i.e.,
nmi_ipi_lock_start(), nmi_ipi_lock() and nmi_ipi_unlock()), besides
being marked noinstr, now call arch_atomic_* functions instead of
atomic_* functions because with KASAN enabled, the atomic_* functions
are wrappers which explicitly do address sanitization on their
arguments.  Since we are trying to avoid address sanitization, we have
to use the lower-level arch_atomic_* versions.

In hv_nmi_check_nonrecoverable(), the regs_set_unrecoverable() call
has been open-coded so as to avoid having to either trust the inlining
or mark regs_set_unrecoverable() as noinstr.

[paulus@ozlabs.org: combined a few work-in-progress commits of
 Daniel's and wrote the commit message.]

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/YoTFGaKM8Pd46PIK@cleo
2 years agopowerpc/mm/kasan: rename kasan_init_32.c to init_32.c
Daniel Axtens [Wed, 18 May 2022 10:04:58 +0000 (20:04 +1000)]
powerpc/mm/kasan: rename kasan_init_32.c to init_32.c

kasan is already implied by the directory name, we don't need to
repeat it.

Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/YoTEyoi+xu9brJYe@cleo
2 years agokasan: Document support on 32-bit powerpc
Daniel Axtens [Wed, 18 May 2022 10:04:12 +0000 (20:04 +1000)]
kasan: Document support on 32-bit powerpc

KASAN is supported on 32-bit powerpc and the docs should reflect this.

Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/YoTEnMLrnd64j0w5@cleo
2 years agopowerpc/ftrace: Remove ftrace init tramp once kernel init is complete
Naveen N. Rao [Mon, 16 May 2022 07:14:22 +0000 (12:44 +0530)]
powerpc/ftrace: Remove ftrace init tramp once kernel init is complete

Stop using the ftrace trampoline for init section once kernel init is
complete.

Fixes: 67361cf8071286 ("powerpc/ftrace: Handle large kernel configs")
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220516071422.463738-1-naveen.n.rao@linux.vnet.ibm.com
2 years agopowerpc/irq: Remove arch_local_irq_restore() for !CONFIG_CC_HAS_ASM_GOTO
Christophe Leroy [Mon, 16 May 2022 15:36:04 +0000 (17:36 +0200)]
powerpc/irq: Remove arch_local_irq_restore() for !CONFIG_CC_HAS_ASM_GOTO

All supported versions of GCC & clang support asm goto.

Remove the !CONFIG_CC_HAS_ASM_GOTO version of arch_local_irq_restore()

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/58df50c9e77e2ed945bacdead30412770578886b.1652715336.git.christophe.leroy@csgroup.eu
2 years agoselftests/powerpc: Better reporting in spectre_v2
Russell Currey [Tue, 8 Jun 2021 06:48:09 +0000 (16:48 +1000)]
selftests/powerpc: Better reporting in spectre_v2

In commit f3054ffd71b5 ("selftests/powerpc: Return skip code for
spectre_v2"), the spectre_v2 selftest is updated to be aware of cases
where the vulnerability status reported in sysfs is incorrect, skipping
the test instead.

This happens because qemu can misrepresent the mitigation status of the
host to the guest. If the count cache is disabled in the host, and this
is correctly reported to the guest, then the guest won't apply
mitigations. If the guest is then migrated to a new host where
mitigations are necessary, it is now vulnerable because it has not
applied mitigations.

Update the selftest to report when we see excessive misses, indicative of
the count cache being disabled. If software flushing is enabled, also
warn that these flushes are just wasting performance.

Signed-off-by: Russell Currey <ruscur@russell.cc>
[mpe: Rebase and update change log appropriately]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210608064809.199116-1-ruscur@russell.cc
2 years agopowerpc/powernv: Get STF barrier requirements from device-tree
Russell Currey [Mon, 4 Apr 2022 10:15:36 +0000 (20:15 +1000)]
powerpc/powernv: Get STF barrier requirements from device-tree

The device-tree property no-need-store-drain-on-priv-state-switch is
equivalent to H_CPU_BEHAV_NO_STF_BARRIER from the
H_CPU_GET_CHARACTERISTICS hcall on pseries.

Since commit 84ed26fd00c5 ("powerpc/security: Add a security feature for
STF barrier") powernv systems with this device-tree property have been
enabling the STF barrier when they have no need for it.  This patch
fixes this by clearing the STF barrier feature on those systems.

Fixes: 84ed26fd00c5 ("powerpc/security: Add a security feature for STF barrier")
Reported-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220404101536.104794-2-ruscur@russell.cc
2 years agopowerpc/powernv: Get L1D flush requirements from device-tree
Russell Currey [Mon, 4 Apr 2022 10:15:35 +0000 (20:15 +1000)]
powerpc/powernv: Get L1D flush requirements from device-tree

The device-tree properties no-need-l1d-flush-msr-pr-1-to-0 and
no-need-l1d-flush-kernel-on-user-access are the equivalents of
H_CPU_BEHAV_NO_L1D_FLUSH_ENTRY and H_CPU_BEHAV_NO_L1D_FLUSH_UACCESS
from the H_GET_CPU_CHARACTERISTICS hcall on pseries respectively.

In commit d02fa40d759f ("powerpc/powernv: Remove POWER9 PVR version
check for entry and uaccess flushes") the condition for disabling the
L1D flush on kernel entry and user access was changed from any non-P9
CPU to only checking P7 and P8.  Without the appropriate device-tree
checks for newer processors on powernv, these flushes are unnecessarily
enabled on those systems.  This patch corrects this.

Fixes: d02fa40d759f ("powerpc/powernv: Remove POWER9 PVR version check for entry and uaccess flushes")
Reported-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220404101536.104794-1-ruscur@russell.cc
2 years agopowerpc/85xx/p2020: Add fsl,mpc8548-pmc node
Pali Rohár [Fri, 6 May 2022 20:36:21 +0000 (22:36 +0200)]
powerpc/85xx/p2020: Add fsl,mpc8548-pmc node

P2020 also contains Power Management Controller and their registers at
offset 0xe0070 compatible with mpc8548. So add PMC node into DTS include
file fsl/p2020si-post.dtsi

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220506203621.26314-1-pali@kernel.org
2 years agopowerpc/64: Only WARN if __pa()/__va() called with bad addresses
Michael Ellerman [Wed, 6 Apr 2022 14:58:01 +0000 (00:58 +1000)]
powerpc/64: Only WARN if __pa()/__va() called with bad addresses

We added checks to __pa() / __va() to ensure they're only called with
appropriate addresses. But using BUG_ON() is too strong, it means
virt_addr_valid() will BUG when DEBUG_VIRTUAL is enabled.

Instead switch them to warnings, arm64 does the same.

Fixes: 4dd7554a6456 ("powerpc/64: Add VIRTUAL_BUG_ON checks for __va and __pa addresses")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220406145802.538416-5-mpe@ellerman.id.au
2 years agoarch/Kconfig: Drop references to powerpc PAGE_SIZE symbols
Michael Ellerman [Thu, 5 May 2022 12:51:23 +0000 (22:51 +1000)]
arch/Kconfig: Drop references to powerpc PAGE_SIZE symbols

In the previous commit powerpc added PAGE_SIZE related config symbols
using the generic names.

So there's no need to refer to them in the definition of
PAGE_SIZE_LESS_THAN_64KB etc, the negative dependency on the generic
symbol is sufficient (in this case !PAGE_SIZE_64KB).

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220505125123.2088143-2-mpe@ellerman.id.au
2 years agopowerpc: Add generic PAGE_SIZE config symbols
Michael Ellerman [Thu, 5 May 2022 12:51:22 +0000 (22:51 +1000)]
powerpc: Add generic PAGE_SIZE config symbols

Other arches (sh, mips, hexagon) use standard names for PAGE_SIZE
related config symbols.

Add matching symbols for powerpc, which are enabled by default but
depend on our architecture specific PAGE_SIZE symbols.

This allows generic/driver code to express dependencies on the PAGE_SIZE
without needing to refer to architecture specific config symbols.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220505125123.2088143-1-mpe@ellerman.id.au
2 years agopowerpc/pseries/vas: sysfs comments with the correct entries
Haren Myneni [Sat, 9 Apr 2022 08:46:15 +0000 (01:46 -0700)]
powerpc/pseries/vas: sysfs comments with the correct entries

VAS entry is created as a misc device and the sysfs comments
should list the proper entries

Reported-by: Matheus Castanho <mscastanho@ibm.com>
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6dee950c7b72a4965c102208041f14a063cf5a8c.camel@linux.ibm.com
2 years agopowerpc/powernv/vas: Assign real address to rx_fifo in vas_rx_win_attr
Haren Myneni [Sat, 9 Apr 2022 08:44:16 +0000 (01:44 -0700)]
powerpc/powernv/vas: Assign real address to rx_fifo in vas_rx_win_attr

In init_winctx_regs(), __pa() is called on winctx->rx_fifo and this
function is called to initialize registers for receive and fault
windows. But the real address is passed in winctx->rx_fifo for
receive windows and the virtual address for fault windows which
causes errors with DEBUG_VIRTUAL enabled. Fixes this issue by
assigning only real address to rx_fifo in vas_rx_win_attr struct
for both receive and fault windows.

Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/338e958c7ab8f3b266fa794a1f80f99b9671829e.camel@linux.ibm.com
2 years agopowerpc/opcodes: Remove unused PPC_INST_XXX macros
Christophe Leroy [Mon, 9 May 2022 05:36:23 +0000 (07:36 +0200)]
powerpc/opcodes: Remove unused PPC_INST_XXX macros

The following PPC_INST_XXX macros are not used anymore
outside ppc-opcode.h:
- PPC_INST_LD
- PPC_INST_STD
- PPC_INST_ADDIS
- PPC_INST_ADD
- PPC_INST_DIVD

Remove them.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8c28636126f69141419953b5638b4a908c184dc1.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/inst: Remove PPC_INST_BL
Christophe Leroy [Mon, 9 May 2022 05:36:22 +0000 (07:36 +0200)]
powerpc/inst: Remove PPC_INST_BL

Convert last users of PPC_INST_BL to PPC_RAW_BL()

And remove PPC_INST_BL.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d9eacb758e7ae7cf224211ebe3f6f7d409a333be.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/modules: Use PPC_LI macros instead of opencoding
Christophe Leroy [Mon, 9 May 2022 05:36:21 +0000 (07:36 +0200)]
powerpc/modules: Use PPC_LI macros instead of opencoding

Use PPC_LI_MASK and PPC_LI() instead of opencoding.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3d56d7bc3200403773d54e62659d0e01292a055d.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/inst: Remove PPC_INST_BRANCH
Christophe Leroy [Mon, 9 May 2022 05:36:20 +0000 (07:36 +0200)]
powerpc/inst: Remove PPC_INST_BRANCH

Convert last users of PPC_INST_BRANCH to PPC_RAW_BRANCH()

And remove PPC_INST_BRANCH.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/fa8807108a2ef2287a2c9651d6e1ff7c051923d9.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Don't use copy_from_kernel_nofault() in module_trampoline_target()
Christophe Leroy [Mon, 9 May 2022 05:36:19 +0000 (07:36 +0200)]
powerpc/ftrace: Don't use copy_from_kernel_nofault() in module_trampoline_target()

module_trampoline_target() is quite a hot path used when
activating/deactivating function tracer.

Avoid the heavy copy_from_kernel_nofault() by doing four calls
to copy_inst_from_kernel_nofault().

Use __copy_inst_from_kernel_nofault() for the 3 last calls. First call
is done to copy_from_kernel_nofault() to check address is within
kernel space. No risk to wrap out the top of kernel space because the
last page is never mapped so if address is in last page the first copy
will fails and the other ones will never be performed.

And also make it notrace just like all functions that call it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c55559103e014b7863161559d340e8e9484eaaa6.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/inst: Add __copy_inst_from_kernel_nofault()
Christophe Leroy [Mon, 9 May 2022 05:36:18 +0000 (07:36 +0200)]
powerpc/inst: Add __copy_inst_from_kernel_nofault()

On the same model as get_user() versus __get_user(),
introduce __copy_inst_from_kernel_nofault() which doesn't
check address.

To be used by callers that have already checked that the adress
is a kernel address.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1f3702890d6dbd64702b61834753bcc96851c18c.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Minimise number of #ifdefs
Christophe Leroy [Mon, 9 May 2022 05:36:17 +0000 (07:36 +0200)]
powerpc/ftrace: Minimise number of #ifdefs

A lot of #ifdefs can be replaced by IS_ENABLED()

Do so.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Fold in changes suggested by Naveen and Christophe on list]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/18ce6708d6f8c71d87436f9c6019f04df4125128.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Simplify expected_nop_sequence()
Christophe Leroy [Mon, 9 May 2022 05:36:16 +0000 (07:36 +0200)]
powerpc/ftrace: Simplify expected_nop_sequence()

Avoid ifdefs around expected_nop_sequence().

While at it make it a bool.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/305d22472f1f92127fba09692df6bb5d079a8cd0.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Use size macro instead of opencoding
Christophe Leroy [Mon, 9 May 2022 05:36:15 +0000 (07:36 +0200)]
powerpc/ftrace: Use size macro instead of opencoding

0x80000000 is SZ_2G. Use it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Fix comparison against unsigned -SZ_2G]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bb6626e884acffe87b58736291df57db3deaa9b9.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Use PPC_RAW_xxx() macros instead of opencoding.
Christophe Leroy [Mon, 9 May 2022 05:36:14 +0000 (07:36 +0200)]
powerpc/ftrace: Use PPC_RAW_xxx() macros instead of opencoding.

PPC_RAW_xxx() macros are self explanatory and less error prone
than open coding.

Use them in ftrace.c

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9292094c9a69cef6d29ee83f435a557b59c45065.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Use BRANCH_SET_LINK instead of value 1
Christophe Leroy [Mon, 9 May 2022 05:36:13 +0000 (07:36 +0200)]
powerpc/ftrace: Use BRANCH_SET_LINK instead of value 1

To make it explicit, use BRANCH_SET_LINK instead of value 1
when calling create_branch().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d57847063ac93660a5af620d4df1847f10edf61a.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Remove ftrace_plt_tramps[]
Christophe Leroy [Mon, 9 May 2022 05:36:12 +0000 (07:36 +0200)]
powerpc/ftrace: Remove ftrace_plt_tramps[]

ftrace_plt_tramps table is never filled so it is useless.

Remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/daeeb618a6619e3a7e3f82f1bd83ca7c25af6330.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Use CONFIG_FUNCTION_TRACER instead of CONFIG_DYNAMIC_FTRACE
Christophe Leroy [Mon, 9 May 2022 05:36:11 +0000 (07:36 +0200)]
powerpc/ftrace: Use CONFIG_FUNCTION_TRACER instead of CONFIG_DYNAMIC_FTRACE

Since commit 0c0c52306f47 ("powerpc: Only support DYNAMIC_FTRACE not
static"), CONFIG_DYNAMIC_FTRACE is always selected when
CONFIG_FUNCTION_TRACER is selected.

To avoid confusion and have the reader wonder what's happen when
CONFIG_FUNCTION_TRACER is selected and CONFIG_DYNAMIC_FTRACE is not,
use CONFIG_FUNCTION_TRACER in ifdefs instead of CONFIG_DYNAMIC_FTRACE.

As CONFIG_FUNCTION_GRAPH_TRACER depends on CONFIG_FUNCTION_TRACER,
ftrace.o doesn't need to appear for both symbols in Makefile.

Then as ftrace.o is built only when CONFIG_FUNCTION_TRACER is selected
ifdef CONFIG_FUNCTION_TRACER is not needed in ftrace.c, and since it
implies CONFIG_DYNAMIC_FTRACE, CONFIG_DYNAMIC_FTRACE is not needed
in ftrace.c

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/628d357503eb90b4a034f99b7df516caaff4d279.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Don't include ftrace.o for CONFIG_FTRACE_SYSCALLS
Christophe Leroy [Mon, 9 May 2022 05:36:10 +0000 (07:36 +0200)]
powerpc/ftrace: Don't include ftrace.o for CONFIG_FTRACE_SYSCALLS

Since commit 7bea7ac0ca01 ("powerpc/syscalls: Fix syscall tracing")
ftrace.o is not needed anymore for CONFIG_FTRACE_SYSCALLS.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/275932a5d61543b825ff9a64f61abed6da5d4a2a.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Make __ftrace_make_{nop/call}() common to PPC32 and PPC64
Christophe Leroy [Mon, 9 May 2022 05:36:09 +0000 (07:36 +0200)]
powerpc/ftrace: Make __ftrace_make_{nop/call}() common to PPC32 and PPC64

Since c93d4f6ecf4b ("powerpc/ftrace: Add module_trampoline_target()
for PPC32"), __ftrace_make_nop() for PPC32 is very similar to the
one for PPC64.

Same for __ftrace_make_call().

Make them common.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/96f53c237316dab4b1b8c682685266faa92da816.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc: Finalise cleanup around ABI use
Christophe Leroy [Mon, 9 May 2022 05:36:08 +0000 (07:36 +0200)]
powerpc: Finalise cleanup around ABI use

Now that we have CONFIG_PPC64_ELF_ABI_V1 and CONFIG_PPC64_ELF_ABI_V2,
get rid of all indirect detection of ABI version.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/709d9d69523c14c8a9fba4486395dca0f2d675b1.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc: Replace PPC64_ELF_ABI_v{1/2} by CONFIG_PPC64_ELF_ABI_V{1/2}
Christophe Leroy [Mon, 9 May 2022 05:36:07 +0000 (07:36 +0200)]
powerpc: Replace PPC64_ELF_ABI_v{1/2} by CONFIG_PPC64_ELF_ABI_V{1/2}

Replace all uses of PPC64_ELF_ABI_v1 and PPC64_ELF_ABI_v2 by
resp CONFIG_PPC64_ELF_ABI_V1 and CONFIG_PPC64_ELF_ABI_V2.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ba13d59e8c50bc9aa6328f1c7f0c0d0278e0a3a7.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc: Add CONFIG_PPC64_ELF_ABI_V1 and CONFIG_PPC64_ELF_ABI_V2
Christophe Leroy [Mon, 9 May 2022 05:36:06 +0000 (07:36 +0200)]
powerpc: Add CONFIG_PPC64_ELF_ABI_V1 and CONFIG_PPC64_ELF_ABI_V2

At the time being, we use CONFIG_CPU_LITTLE_ENDIAN and
CONFIG_CPU_BIG_ENDIAN to pass -mabi=elfv1 or elfv2 to
compiler, then define a PPC64_ELF_ABI_v1 or PPC64_ELF_ABI_v2
macro in asm/types.h based on _CALL_ELF define set by the compiler.

Make it more straight forward with a CONFIG option that
is directly usable.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1eca1addbc550167da9841c7340a010d0c4b2200.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Use patch_instruction() return directly
Christophe Leroy [Mon, 9 May 2022 05:36:05 +0000 (07:36 +0200)]
powerpc/ftrace: Use patch_instruction() return directly

Instead of returning -EPERM when patch_instruction() fails,
just return what patch_instruction returns.

That simplifies ftrace_modify_code():

   0: 94 21 ff c0  stwu    r1,-64(r1)
   4: 93 e1 00 3c  stw     r31,60(r1)
   8: 7c 7f 1b 79  mr.     r31,r3
   c: 40 80 00 30  bge     3c <ftrace_modify_code+0x3c>
  10: 93 c1 00 38  stw     r30,56(r1)
  14: 7c 9e 23 78  mr      r30,r4
  18: 7c a4 2b 78  mr      r4,r5
  1c: 80 bf 00 00  lwz     r5,0(r31)
  20: 7c 1e 28 40  cmplw   r30,r5
  24: 40 82 00 34  bne     58 <ftrace_modify_code+0x58>
  28: 83 c1 00 38  lwz     r30,56(r1)
  2c: 7f e3 fb 78  mr      r3,r31
  30: 83 e1 00 3c  lwz     r31,60(r1)
  34: 38 21 00 40  addi    r1,r1,64
  38: 48 00 00 00  b       38 <ftrace_modify_code+0x38>
38: R_PPC_REL24 patch_instruction

Before:

   0: 94 21 ff c0  stwu    r1,-64(r1)
   4: 93 e1 00 3c  stw     r31,60(r1)
   8: 7c 7f 1b 79  mr.     r31,r3
   c: 40 80 00 4c  bge     58 <ftrace_modify_code+0x58>
  10: 93 c1 00 38  stw     r30,56(r1)
  14: 7c 9e 23 78  mr      r30,r4
  18: 7c a4 2b 78  mr      r4,r5
  1c: 80 bf 00 00  lwz     r5,0(r31)
  20: 7c 08 02 a6  mflr    r0
  24: 90 01 00 44  stw     r0,68(r1)
  28: 7c 1e 28 40  cmplw   r30,r5
  2c: 40 82 00 48  bne     74 <ftrace_modify_code+0x74>
  30: 7f e3 fb 78  mr      r3,r31
  34: 48 00 00 01  bl      34 <ftrace_modify_code+0x34>
34: R_PPC_REL24 patch_instruction
  38: 80 01 00 44  lwz     r0,68(r1)
  3c: 20 63 00 00  subfic  r3,r3,0
  40: 83 c1 00 38  lwz     r30,56(r1)
  44: 7c 63 19 10  subfe   r3,r3,r3
  48: 7c 08 03 a6  mtlr    r0
  4c: 83 e1 00 3c  lwz     r31,60(r1)
  50: 38 21 00 40  addi    r1,r1,64
  54: 4e 80 00 20  blr

It improves ftrace activation/deactivation duration by about 3%.

Modify patch_instruction() return on failure to -EPERM in order to
match with ftrace expectations. Other users of patch_instruction()
do not care about the exact error value returned.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/49a8597230713e2633e7d9d7b56140787c4a7e20.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Inline ftrace_modify_code()
Christophe Leroy [Mon, 9 May 2022 05:36:04 +0000 (07:36 +0200)]
powerpc/ftrace: Inline ftrace_modify_code()

Inlining ftrace_modify_code(), it increases a bit the
size of ftrace code but brings 5% improvment on ftrace
activation.

Usually in C files we let gcc decide what to do but here
it really help to 'help' gcc to decide to inline, thought
we don't want to force it with an __always_inline that
would be too much for CONFIG_CC_OPTIMIZE_FOR_SIZE.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1597a06d57cfc80e6853838c4066e799bf6c7977.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/code-patching: Inline create_branch()
Christophe Leroy [Mon, 9 May 2022 05:36:03 +0000 (07:36 +0200)]
powerpc/code-patching: Inline create_branch()

create_branch() is a good candidate for inlining because:
- Flags can be folded in.
- Range tests are likely to be already done.

Hence reducing the create_branch() to only a set of instructions.

So inline it.

It improves ftrace activation by 10%.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/69851cc9a7bf8f03d025e6d29e165f2d0bd3bb6e.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Use is_offset_in_branch_range()
Christophe Leroy [Mon, 9 May 2022 05:36:02 +0000 (07:36 +0200)]
powerpc/ftrace: Use is_offset_in_branch_range()

Use is_offset_in_branch_range() instead of create_branch()
to check if a target is within branch range.

This patch together with the previous one improves
ftrace activation time by 7%

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/912ae51782f5a53c44e435497c8c3fb5cc632387.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/code-patching: Inline is_offset_in_{cond}_branch_range()
Christophe Leroy [Mon, 9 May 2022 05:36:01 +0000 (07:36 +0200)]
powerpc/code-patching: Inline is_offset_in_{cond}_branch_range()

Test in is_offset_in_branch_range() and is_offset_in_cond_branch_range()
are simple tests that are worth inlining.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a05be0ccb7373e6a9789a1988fcd0c810f5f9269.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Remove redundant create_branch() calls
Christophe Leroy [Mon, 9 May 2022 05:36:00 +0000 (07:36 +0200)]
powerpc/ftrace: Remove redundant create_branch() calls

Since commit d5937db114e4 ("powerpc/code-patching: Fix patch_branch()
return on out-of-range failure") patch_branch() fails with -ERANGE
when trying to branch out of range.

No need to perform the test twice. Remove redundant create_branch()
calls.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/aa45fbad0b4b7493080835d8276c0cb4ce146503.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/ftrace: Refactor prepare_ftrace_return()
Christophe Leroy [Mon, 9 May 2022 05:35:59 +0000 (07:35 +0200)]
powerpc/ftrace: Refactor prepare_ftrace_return()

When we have CONFIG_DYNAMIC_FTRACE_WITH_ARGS,
prepare_ftrace_return() is called by ftrace_graph_func()
otherwise prepare_ftrace_return() is called from assembly.

Refactor prepare_ftrace_return() into a static
__prepare_ftrace_return() that will be called by both
prepare_ftrace_return() and ftrace_graph_func().

It will allow GCC to fold __prepare_ftrace_return() inside
ftrace_graph_func().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0d42deafe353980c66cf19d3132638c05ba9f4a9.1652074503.git.christophe.leroy@csgroup.eu
2 years agopowerpc/rtas: enture rtas_call is called with MMU enabled
Nicholas Piggin [Tue, 8 Mar 2022 13:50:46 +0000 (23:50 +1000)]
powerpc/rtas: enture rtas_call is called with MMU enabled

rtas_call must not be called with the MMU disabled because in case
of rtas error, log_error is called which requires MMU enabled. Add
a test and warning for this.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220308135047.478297-14-npiggin@gmail.com
2 years agopowerpc/rtas: Leave MSR[RI] enabled over RTAS call
Nicholas Piggin [Tue, 8 Mar 2022 13:50:42 +0000 (23:50 +1000)]
powerpc/rtas: Leave MSR[RI] enabled over RTAS call

PAPR specifies that RTAS may be called with MSR[RI] enabled if the
calling context is recoverable, and RTAS will manage RI as necessary.
Call the rtas entry point with RI enabled, and add a check to ensure
the caller has RI enabled.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220308135047.478297-10-npiggin@gmail.com
2 years agopowerpc/rtas: PACA can be restored directly from SPRG
Nicholas Piggin [Tue, 8 Mar 2022 13:50:40 +0000 (23:50 +1000)]
powerpc/rtas: PACA can be restored directly from SPRG

On 64-bit, PACA is saved in a SPRG so it does not need to be saved on
stack. We also don't need to mask off the top bits for real mode
addresses because the architecture does this for us.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220308135047.478297-8-npiggin@gmail.com
2 years agopowerpc/rtas: Call enter_rtas with MSR[EE] disabled
Nicholas Piggin [Tue, 8 Mar 2022 13:50:37 +0000 (23:50 +1000)]
powerpc/rtas: Call enter_rtas with MSR[EE] disabled

Disable MSR[EE] in C code rather than asm.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220308135047.478297-5-npiggin@gmail.com
2 years agopowerpc/rtas: Fix whitespace in rtas_entry.S
Nicholas Piggin [Tue, 8 Mar 2022 13:50:36 +0000 (23:50 +1000)]
powerpc/rtas: Fix whitespace in rtas_entry.S

The code was moved verbatim including whitespace cruft. Fix that.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220308135047.478297-4-npiggin@gmail.com
2 years agopowerpc/rtas: Make enter_rtas a nokprobe symbol on 64-bit
Nicholas Piggin [Tue, 8 Mar 2022 13:50:35 +0000 (23:50 +1000)]
powerpc/rtas: Make enter_rtas a nokprobe symbol on 64-bit

This symbol is marked nokprobe on 32-bit but not 64-bit, add it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220308135047.478297-3-npiggin@gmail.com
2 years agopowerpc/rtas: Move rtas entry assembly into its own file
Nicholas Piggin [Tue, 8 Mar 2022 13:50:34 +0000 (23:50 +1000)]
powerpc/rtas: Move rtas entry assembly into its own file

This makes working on the code a bit easier.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220308135047.478297-2-npiggin@gmail.com
2 years agopowerpc/signal: Report minimum signal frame size to userspace via AT_MINSIGSTKSZ
Nicholas Piggin [Mon, 7 Mar 2022 18:27:34 +0000 (04:27 +1000)]
powerpc/signal: Report minimum signal frame size to userspace via AT_MINSIGSTKSZ

Implement the AT_MINSIGSTKSZ AUXV entry, allowing userspace to
dynamically size stack allocations in a manner forward-compatible with
new processor state saved in the signal frame

For now these statically find the maximum signal frame size rather than
doing any runtime testing of features to minimise the size.

glibc 2.34 will take advantage of this, as will applications that use
use _SC_MINSIGSTKSZ and _SC_SIGSTKSZ.

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
References: 94b07c1f8c39 ("arm64: signal: Report signal frame size to userspace via auxv")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220307182734.289289-2-npiggin@gmail.com
2 years agopowerpc/64: Bump SIGSTKSZ and MINSIGSTKSZ
Nicholas Piggin [Mon, 7 Mar 2022 18:27:33 +0000 (04:27 +1000)]
powerpc/64: Bump SIGSTKSZ and MINSIGSTKSZ

The sad tale of SIGSTKSZ and MINSIGSTKSZ is documented in glibc.git
commit f7c399cff5bd ("PowerPC SIGSTKSZ"), which explains why glibc
does not use the kernel defines for these constants.

Since then in fact there has been a further expansion of the signal
stack frame size on little-endian with linux commit
573ebfa6601f ("powerpc: Increase stack redzone for 64-bit userspace to
512 bytes"), which has caused it to exceed even the glibc defines.

See kernel commit 63dee5df43a3 ("powerpc: Allow 4224 bytes of stack
expansion for the signal frame") for more details of the history of the
expansion.

Increase MINSIGSTKSZ to 8192 which is double the current glibc value and
fits the current stack frame with room to grow. SIGSTKSZ is set to 4x
the minimum as convention.

glibc will have to be updated as well.

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220307182734.289289-1-npiggin@gmail.com
2 years agopowerpc/vdso: Link with ld.lld when requested
Nathan Chancellor [Wed, 11 May 2022 18:50:01 +0000 (11:50 -0700)]
powerpc/vdso: Link with ld.lld when requested

The PowerPC vDSO uses $(CC) to link, which differs from the rest of the
kernel, which uses $(LD) directly. As a result, the default linker of
the compiler is used, which may differ from the linker requested by the
builder. For example:

  $ make ARCH=powerpc LLVM=1 mrproper defconfig arch/powerpc/kernel/vdso/
  ...

  $ llvm-readelf -p .comment arch/powerpc/kernel/vdso/vdso{32,64}.so.dbg

  File: arch/powerpc/kernel/vdso/vdso32.so.dbg
  String dump of section '.comment':
  [     0] clang version 14.0.0 (Fedora 14.0.0-1.fc37)

  File: arch/powerpc/kernel/vdso/vdso64.so.dbg
  String dump of section '.comment':
  [     0] clang version 14.0.0 (Fedora 14.0.0-1.fc37)

LLVM=1 sets LD=ld.lld but ld.lld is not used to link the vDSO; GNU ld is
because "ld" is the default linker for clang on most Linux platforms.

This is a problem for Clang's Link Time Optimization as implemented in
the kernel because use of GNU ld with LTO requires the LLVMgold plugin,
which is not technically supported for ld.bfd per
https://llvm.org/docs/GoldPlugin.html. Furthermore, if LLVMgold.so is
missing from a user's system, the build will fail, even though LTO as it
is implemented in the kernel requires ld.lld to avoid this dependency in
the first place.

Ultimately, the PowerPC vDSO should be converted to compiling and
linking with $(CC) and $(LD) respectively but there were issues last
time this was tried, potentially due to older but supported tool
versions. To avoid regressing GCC + binutils, use the compiler option
'-fuse-ld', which tells the compiler which linker to use when it is
invoked as both the compiler and linker. Use '-fuse-ld=lld' when
LD=ld.lld has been specified (CONFIG_LD_IS_LLD) so that the vDSO is
linked with the same linker as the rest of the kernel.

  $ llvm-readelf -p .comment arch/powerpc/kernel/vdso/vdso{32,64}.so.dbg

  File: arch/powerpc/kernel/vdso/vdso32.so.dbg
  String dump of section '.comment':
  [     0] Linker: LLD 14.0.0
  [    14] clang version 14.0.0 (Fedora 14.0.0-1.fc37)

  File: arch/powerpc/kernel/vdso/vdso64.so.dbg
  String dump of section '.comment':
  [     0] Linker: LLD 14.0.0
  [    14] clang version 14.0.0 (Fedora 14.0.0-1.fc37)

LD can be a full path to ld.lld, which will not be handled properly by
'-fuse-ld=lld' if the full path to ld.lld is outside of the compiler's
search path. '-fuse-ld' can take a path to the linker but it is
deprecated in clang 12.0.0; '--ld-path' is preferred for this scenario.

Use '--ld-path' if it is supported, as it will handle a full path or
just 'ld.lld' properly. See the LLVM commit below for the full details
of '--ld-path'.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://github.com/ClangBuiltLinux/linux/issues/774
Link: https://github.com/llvm/llvm-project/commit/1bc5c84710a8c73ef21295e63c19d10a8c71f2f5
Link: https://lore.kernel.org/r/20220511185001.3269404-3-nathan@kernel.org
2 years agopowerpc/vdso: Remove unused ENTRY in linker scripts
Nathan Chancellor [Wed, 11 May 2022 18:50:00 +0000 (11:50 -0700)]
powerpc/vdso: Remove unused ENTRY in linker scripts

When linking vdso{32,64}.so.dbg with ld.lld, there is a warning about
not finding _start for the starting address:

  ld.lld: warning: cannot find entry symbol _start; not setting start address
  ld.lld: warning: cannot find entry symbol _start; not setting start address

Looking at GCC + GNU ld, the entry point address is 0x0:

  $ llvm-readelf -h vdso{32,64}.so.dbg &| rg "(File|Entry point address):"
  File: vdso32.so.dbg
    Entry point address:               0x0
  File: vdso64.so.dbg
    Entry point address:               0x0

This matches what ld.lld emits:

  $ powerpc64le-linux-gnu-readelf -p .comment vdso{32,64}.so.dbg

  File: vdso32.so.dbg

  String dump of section '.comment':
    [     0]  Linker: LLD 14.0.0
    [    14]  clang version 14.0.0 (Fedora 14.0.0-1.fc37)

  File: vdso64.so.dbg

  String dump of section '.comment':
    [     0]  Linker: LLD 14.0.0
    [    14]  clang version 14.0.0 (Fedora 14.0.0-1.fc37)

  $ llvm-readelf -h vdso{32,64}.so.dbg &| rg "(File|Entry point address):"
  File: vdso32.so.dbg
    Entry point address:               0x0
  File: vdso64.so.dbg
    Entry point address:               0x0

Remove ENTRY to remove the warning, as it is unnecessary for the vDSO to
function correctly.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220511185001.3269404-2-nathan@kernel.org
2 years agopowerpc: Export mmu_feature_keys[] as non-GPL
Kevin Hao [Tue, 29 Mar 2022 08:57:09 +0000 (16:57 +0800)]
powerpc: Export mmu_feature_keys[] as non-GPL

When the mmu_feature_keys[] was introduced in the commit c12e6f24d413
("powerpc: Add option to use jump label for mmu_has_feature()"),
it is unlikely that it would be used either directly or indirectly in
the out of tree modules. So we exported it as GPL only.

But with the evolution of the codes, especially the PPC_KUAP support, it
may be indirectly referenced by some primitive macro or inline functions
such as get_user() or __copy_from_user_inatomic(), this will make it
impossible to build many non GPL modules (such as ZFS) on ppc
architecture. Fix this by exposing the mmu_feature_keys[] to the non-GPL
modules too.

Fixes: 7613f5a66bec ("powerpc/64s/kuap: Use mmu_has_feature()")
Reported-by: Nathaniel Filardo <nwfilardo@gmail.com>
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220329085709.4132729-1-haokexin@gmail.com
2 years agopowerpc/setup: Refactor/untangle panic notifiers
Guilherme G. Piccoli [Wed, 27 Apr 2022 22:49:02 +0000 (19:49 -0300)]
powerpc/setup: Refactor/untangle panic notifiers

The panic notifiers infrastructure is a bit limited in the scope of
the callbacks - basically every kind of functionality is dropped
in a list that runs in the same point during the kernel panic path.
This is not really on par with the complexities and particularities
of architecture / hypervisors' needs, and a refactor is ongoing.

As part of this refactor, it was observed that powerpc has 2 notifiers,
with mixed goals: one is just a KASLR offset dumper, whereas the other
aims to hard-disable IRQs (necessary on panic path), warn firmware of
the panic event (fadump) and run low-level platform-specific machinery
that might stop kernel execution and never come back.

Clearly, the 2nd notifier has opposed goals: disable IRQs / fadump
should run earlier while low-level platform actions should
run late since it might not even return. Hence, this patch decouples
the notifiers splitting them in three:

- First one is responsible for hard-disable IRQs and fadump,
should run early;

- The kernel KASLR offset dumper is really an informative notifier,
harmless and may run at any moment in the panic path;

- The last notifier should run last, since it aims to perform
low-level actions for specific platforms, and might never return.
It is also only registered for 2 platforms, pseries and ps3.

The patch better documents the notifiers and clears the code too,
also removing a useless header.

Currently no functionality change should be observed, but after
the planned panic refactor we should expect more panic reliability
with this patch.

Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220427224924.592546-9-gpiccoli@igalia.com
2 years agoMerge branch 'topic/ppc-kvm' into next
Michael Ellerman [Thu, 19 May 2022 13:10:42 +0000 (23:10 +1000)]
Merge branch 'topic/ppc-kvm' into next

Merge our KVM topic branch.

2 years agoKVM: PPC: Book3S HV: Fix vcore_blocked tracepoint
Fabiano Rosas [Mon, 28 Mar 2022 21:58:31 +0000 (18:58 -0300)]
KVM: PPC: Book3S HV: Fix vcore_blocked tracepoint

We removed most of the vcore logic from the P9 path but there's still
a tracepoint that tried to dereference vc->runner.

Fixes: ecb6a7207f92 ("KVM: PPC: Book3S HV P9: Remove most of the vcore logic")
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220328215831.320409-1-farosas@linux.ibm.com
2 years agoKVM: PPC: Book3s: Remove real mode interrupt controller hcalls handlers
Alexey Kardashevskiy [Mon, 9 May 2022 07:11:50 +0000 (17:11 +1000)]
KVM: PPC: Book3s: Remove real mode interrupt controller hcalls handlers

Currently we have 2 sets of interrupt controller hypercalls handlers
for real and virtual modes, this is from POWER8 times when switching
MMU on was considered an expensive operation.

POWER9 however does not have dependent threads and MMU is enabled for
handling hcalls so the XIVE native or XICS-on-XIVE real mode handlers
never execute on real P9 and later CPUs.

This untemplate the handlers and only keeps the real mode handlers for
XICS native (up to POWER8) and remove the rest of dead code. Changes
in functions are mechanical except few missing empty lines to make
checkpatch.pl happy.

The default implemented hcalls list already contains XICS hcalls so
no change there.

This should not cause any behavioral change.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220509071150.181250-1-aik@ozlabs.ru
2 years agoKVM: PPC: Book3s: PR: Enable default TCE hypercalls
Alexey Kardashevskiy [Fri, 6 May 2022 07:37:37 +0000 (17:37 +1000)]
KVM: PPC: Book3s: PR: Enable default TCE hypercalls

When KVM_CAP_PPC_ENABLE_HCALL was introduced, H_GET_TCE and H_PUT_TCE
were already implemented and enabled by default; however H_GET_TCE
was missed out on PR KVM (probably because the handler was in
the real mode code at the time).

This enables H_GET_TCE by default. While at this, this wraps
the checks in ifdef CONFIG_SPAPR_TCE_IOMMU just like HV KVM.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220506073737.3823347-1-aik@ozlabs.ru
2 years agoKVM: PPC: Book3s: Retire H_PUT_TCE/etc real mode handlers
Alexey Kardashevskiy [Fri, 6 May 2022 05:37:55 +0000 (15:37 +1000)]
KVM: PPC: Book3s: Retire H_PUT_TCE/etc real mode handlers

LoPAPR defines guest visible IOMMU with hypercalls to use it -
H_PUT_TCE/etc. Implemented first on POWER7 where hypercalls would trap
in the KVM in the real mode (with MMU off). The problem with the real mode
is some memory is not available and some API usage crashed the host but
enabling MMU was an expensive operation.

The problems with the real mode handlers are:
1. Occasionally these cannot complete the request so the code is
copied+modified to work in the virtual mode, very little is shared;
2. The real mode handlers have to be linked into vmlinux to work;
3. An exception in real mode immediately reboots the machine.

If the small DMA window is used, the real mode handlers bring better
performance. However since POWER8, there has always been a bigger DMA
window which VMs use to map the entire VM memory to avoid calling
H_PUT_TCE. Such 1:1 mapping happens once and uses H_PUT_TCE_INDIRECT
(a bulk version of H_PUT_TCE) which virtual mode handler is even closer
to its real mode version.

On POWER9 hypercalls trap straight to the virtual mode so the real mode
handlers never execute on POWER9 and later CPUs.

So with the current use of the DMA windows and MMU improvements in
POWER9 and later, there is no point in duplicating the code.
The 32bit passed through devices may slow down but we do not have many
of these in practice. For example, with this applied, a 1Gbit ethernet
adapter still demostrates above 800Mbit/s of actual throughput.

This removes the real mode handlers from KVM and related code from
the powernv platform.

This updates the list of implemented hcalls in KVM-HV as the realmode
handlers are removed.

This changes ABI - kvmppc_h_get_tce() moves to the KVM module and
kvmppc_find_table() is static now.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220506053755.3820702-1-aik@ozlabs.ru
2 years agoMerge branch 'fixes' into topic/ppc-kvm
Michael Ellerman [Wed, 18 May 2022 14:43:04 +0000 (00:43 +1000)]
Merge branch 'fixes' into topic/ppc-kvm

Merge our fixes branch. In parciular this brings in the KVM TCE handling
fix, which is a prerequisite for a subsequent patch.

2 years agoKVM: PPC: Book3S HV: Initialize AMOR in nested entry
Fabiano Rosas [Mon, 25 Apr 2022 14:21:51 +0000 (11:21 -0300)]
KVM: PPC: Book3S HV: Initialize AMOR in nested entry

The hypervisor always sets AMOR to ~0, but let's ensure we're not
passing stale values around.

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220425142151.1495142-1-farosas@linux.ibm.com
2 years agoMerge branch 'fixes' into next
Michael Ellerman [Wed, 18 May 2022 14:11:51 +0000 (00:11 +1000)]
Merge branch 'fixes' into next

Merge our fixes branch from this cycle. In particular this brings in a
papr_scm.c change which a subsequent patch has a dependency on.

2 years agoKVM: PPC: Book3S HV: Use consistent type for return value of kvm_age_rmapp()
Bo Liu [Fri, 1 Apr 2022 06:52:52 +0000 (02:52 -0400)]
KVM: PPC: Book3S HV: Use consistent type for return value of kvm_age_rmapp()

The return value type defined in the function kvm_age_rmapp() is
"bool", but the return value type defined in the implementation of the
function kvm_age_rmapp() is "int".

Change the return value type to "bool".

Signed-off-by: Bo Liu <liubo03@inspur.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220401065252.36472-1-liubo03@inspur.com
2 years agoKVM: PPC: Book3S HV: fix incorrect NULL check on list iterator
Xiaomeng Tong [Thu, 14 Apr 2022 06:21:03 +0000 (14:21 +0800)]
KVM: PPC: Book3S HV: fix incorrect NULL check on list iterator

The bug is here:
if (!p)
                return ret;

The list iterator value 'p' will *always* be set and non-NULL by
list_for_each_entry(), so it is incorrect to assume that the iterator
value will be NULL if the list is empty or no element is found.

To fix the bug, Use a new value 'iter' as the list iterator, while use
the old value 'p' as a dedicated variable to point to the found element.

Fixes: dfaa973ae960 ("KVM: PPC: Book3S HV: In H_SVM_INIT_DONE, migrate remaining normal-GFNs to secure-GFNs")
Cc: stable@vger.kernel.org # v5.9+
Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220414062103.8153-1-xiam0nd.tong@gmail.com
2 years agoKVM: PPC: Book3S HV: remove extraneous asterisk from rm_host_ipi_action() comment
Bagas Sanjaya [Fri, 6 May 2022 07:07:47 +0000 (14:07 +0700)]
KVM: PPC: Book3S HV: remove extraneous asterisk from rm_host_ipi_action() comment

kernel test robot reported kernel-doc warning for rm_host_ipi_action():

   arch/powerpc/kvm/book3s_hv_rm_xics.c:887: warning: This comment starts with '/**', but isn't a kernel-doc comment.
    * Host Operations poked by RM KVM

Since the function is static, remove the extraneous (second) asterisk at
the head of function comment.

Fixes: 0c2a66062470cd ("KVM: PPC: Book3S HV: Host side kick VCPU when poked by real-mode KVM")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/linux-doc/202204252334.Cd2IsiII-lkp@intel.com/
Link: https://lore.kernel.org/r/20220506070747.16309-1-bagasdotme@gmail.com
2 years agoKVM: PPC: Book3S HV Nested: L2 LPCR should inherit L1 LPES setting
Nicholas Piggin [Thu, 3 Mar 2022 05:33:15 +0000 (15:33 +1000)]
KVM: PPC: Book3S HV Nested: L2 LPCR should inherit L1 LPES setting

The L1 should not be able to adjust LPES mode for the L2. Setting LPES
if the L0 needs it clear would cause external interrupts to be sent to
L2 and missed by the L0.

Clearing LPES when it may be set, as typically happens with XIVE enabled
could cause a performance issue despite having no native XIVE support in
the guest, because it will cause mediated interrupts for the L2 to be
taken in HV mode, which then have to be injected.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220303053315.1056880-7-npiggin@gmail.com
2 years agoKVM: PPC: Book3S HV Nested: L2 must not run with L1 xive context
Nicholas Piggin [Thu, 3 Mar 2022 05:33:14 +0000 (15:33 +1000)]
KVM: PPC: Book3S HV Nested: L2 must not run with L1 xive context

The PowerNV L0 currently pushes the OS xive context when running a vCPU,
regardless of whether it is running a nested guest. The problem is that
xive OS ring interrupts will be delivered while the L2 is running.

At the moment, by default, the L2 guest runs with LPCR[LPES]=0, which
actually makes external interrupts go to the L0. That causes the L2 to
exit and the interrupt taken or injected into the L1, so in some
respects this behaves like an escalation. It's not clear if this was
deliberate or not, there's no comment about it and the L1 is actually
allowed to clear LPES in the L2, so it's confusing at best.

When the L2 is running, the L1 is essentially in a ceded state with
respect to external interrupts (it can't respond to them directly and
won't get scheduled again absent some additional event). So the natural
way to solve this is when the L0 handles a H_ENTER_NESTED hypercall to
run the L2, have it arm the escalation interrupt and don't push the L1
context while running the L2.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220303053315.1056880-6-npiggin@gmail.com
2 years agoKVM: PPC: Book3S HV P9: Split !nested case out from guest entry
Nicholas Piggin [Thu, 3 Mar 2022 05:33:13 +0000 (15:33 +1000)]
KVM: PPC: Book3S HV P9: Split !nested case out from guest entry

The differences between nested and !nested will become larger in
later changes so split them out for readability.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220303053315.1056880-5-npiggin@gmail.com
2 years agoKVM: PPC: Book3S HV P9: Move cede logic out of XIVE escalation rearming
Nicholas Piggin [Thu, 3 Mar 2022 05:33:12 +0000 (15:33 +1000)]
KVM: PPC: Book3S HV P9: Move cede logic out of XIVE escalation rearming

Move the cede abort logic out of xive escalation rearming and into
the caller to prepare for handling a similar case with nested guest
entry.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220303053315.1056880-4-npiggin@gmail.com
2 years agoKVM: PPC: Book3S HV P9: Inject pending xive interrupts at guest entry
Nicholas Piggin [Thu, 3 Mar 2022 05:33:11 +0000 (15:33 +1000)]
KVM: PPC: Book3S HV P9: Inject pending xive interrupts at guest entry

If there is a pending xive interrupt, inject it at guest entry (if
MSR[EE] is enabled) rather than take another interrupt when the guest
is entered. If xive is enabled then LPCR[LPES] is set so this behaviour
should be expected.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220303053315.1056880-3-npiggin@gmail.com
2 years agoKVM: PPC: Book3S HV: Remove KVMPPC_NR_LPIDS
Nicholas Piggin [Sun, 23 Jan 2022 12:00:43 +0000 (22:00 +1000)]
KVM: PPC: Book3S HV: Remove KVMPPC_NR_LPIDS

KVMPPC_NR_LPIDS no longer represents any size restriction on the
LPID space and can be removed. A CPU with more than 12 LPID bits
implemented will now be able to create more than 4095 guests.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220123120043.3586018-7-npiggin@gmail.com
2 years agoKVM: PPC: Book3S Nested: Use explicit 4096 LPID maximum
Nicholas Piggin [Sun, 23 Jan 2022 12:00:42 +0000 (22:00 +1000)]
KVM: PPC: Book3S Nested: Use explicit 4096 LPID maximum

Rather than tie this to KVMPPC_NR_LPIDS which is becoming more dynamic,
fix it to 4096 (12-bits) explicitly for now.

kvmhv_get_nested() does not have to check against KVM_MAX_NESTED_GUESTS
because the L1 partition table registration hcall already did that, and
it checks against the partition table size.

This patch also puts all the partition table size calculations into the
same form, using 12 for the architected size field shift and 4 for the
shift corresponding to the partition table entry size.

Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-of-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220123120043.3586018-6-npiggin@gmail.com
2 years agoKVM: PPC: Book3S HV Nested: Change nested guest lookup to use idr
Nicholas Piggin [Sun, 23 Jan 2022 12:00:41 +0000 (22:00 +1000)]
KVM: PPC: Book3S HV Nested: Change nested guest lookup to use idr

This removes the fixed sized kvm->arch.nested_guests array.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220123120043.3586018-5-npiggin@gmail.com
2 years agoKVM: PPC: Book3S HV: Use IDA allocator for LPID allocator
Nicholas Piggin [Sun, 23 Jan 2022 12:00:40 +0000 (22:00 +1000)]
KVM: PPC: Book3S HV: Use IDA allocator for LPID allocator

This removes the fixed-size lpid_inuse array.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220123120043.3586018-4-npiggin@gmail.com
2 years agoKVM: PPC: Book3S HV: Update LPID allocator init for POWER9, Nested
Nicholas Piggin [Sun, 23 Jan 2022 12:00:39 +0000 (22:00 +1000)]
KVM: PPC: Book3S HV: Update LPID allocator init for POWER9, Nested

The LPID allocator init is changed to:
- use mmu_lpid_bits rather than hard-coding;
- use KVM_MAX_NESTED_GUESTS for nested hypervisors;
- not reserve the top LPID on POWER9 and newer CPUs.

The reserved LPID is made a POWER7/8-specific detail.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220123120043.3586018-3-npiggin@gmail.com
2 years agoKVM: PPC: Remove kvmppc_claim_lpid
Nicholas Piggin [Sun, 23 Jan 2022 12:00:38 +0000 (22:00 +1000)]
KVM: PPC: Remove kvmppc_claim_lpid

Removing kvmppc_claim_lpid makes the lpid allocator API a bit simpler to
change the underlying implementation in a future patch.

The host LPID is always 0, so that can be a detail of the allocator. If
the allocator range is restricted, that can reserve LPIDs at the top of
the range. This allows kvmppc_claim_lpid to be removed.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220123120043.3586018-2-npiggin@gmail.com
2 years agoKVM: PPC: Book3S HV P9: Optimise loads around context switch
Nicholas Piggin [Sun, 23 Jan 2022 11:47:25 +0000 (21:47 +1000)]
KVM: PPC: Book3S HV P9: Optimise loads around context switch

It is better to get all loads for the register values in flight
before starting to switch LPID, PID, and LPCR because those
mtSPRs are expensive and serialising.

This also just tidies up the code for a potential future change
to the context switching sequence.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220123114725.3549202-1-npiggin@gmail.com