linux-block.git
10 months agoRISC-V: Probe misaligned access speed in parallel
Evan Green [Mon, 6 Nov 2023 22:58:55 +0000 (14:58 -0800)]
RISC-V: Probe misaligned access speed in parallel

Probing for misaligned access speed takes about 0.06 seconds. On a
system with 64 cores, doing this in smp_callin() means it's done
serially, extending boot time by 3.8 seconds. That's a lot of boot time.

Instead of measuring each CPU serially, let's do the measurements on
all CPUs in parallel. If we disable preemption on all CPUs, the
jiffies stop ticking, so we can do this in stages of 1) everybody
except core 0, then 2) core 0. The allocations are all done outside of
on_each_cpu() to avoid calling alloc_pages() with interrupts disabled.

For hotplugged CPUs that come in after the boot time measurement,
register CPU hotplug callbacks, and do the measurement there. Interrupts
are enabled in those callbacks, so they're fine to do alloc_pages() in.

Reported-by: Jisheng Zhang <jszhang@kernel.org>
Closes: https://lore.kernel.org/all/mhng-9359993d-6872-4134-83ce-c97debe1cf9a@palmer-ri-x1c9/T/#mae9b8f40016f9df428829d33360144dc5026bcbf
Fixes: 584ea6564bca ("RISC-V: Probe for unaligned access speed")
Signed-off-by: Evan Green <evan@rivosinc.com>
Link: https://lore.kernel.org/r/20231106225855.3121724-1-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoRISC-V: Remove __init on unaligned_emulation_finish()
Evan Green [Mon, 6 Nov 2023 23:11:05 +0000 (15:11 -0800)]
RISC-V: Remove __init on unaligned_emulation_finish()

This function shouldn't be __init, since it's called during hotplug. The
warning says it well enough:

WARNING: modpost: vmlinux: section mismatch in reference:
check_unaligned_access_all_cpus+0x13a (section: .text) ->
unaligned_emulation_finish (section: .init.text)

Signed-off-by: Evan Green <evan@rivosinc.com>
Fixes: 71c54b3d169d ("riscv: report misaligned accesses emulation to hwprobe")
Link: https://lore.kernel.org/r/20231106231105.3141413-1-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoRISC-V: Show accurate per-hart isa in /proc/cpuinfo
Evan Green [Mon, 6 Nov 2023 23:24:39 +0000 (15:24 -0800)]
RISC-V: Show accurate per-hart isa in /proc/cpuinfo

In /proc/cpuinfo, most of the information we show for each processor is
specific to that hart: marchid, mvendorid, mimpid, processor, hart,
compatible, and the mmu size. But the ISA string gets filtered through a
lowest common denominator mask, so that if one CPU is missing an ISA
extension, no CPUs will show it.

Now that we track the ISA extensions for each hart, let's report ISA
extension info accurately per-hart in /proc/cpuinfo. We cannot change
the "isa:" line, as usermode may be relying on that line to show only
the common set of extensions supported across all harts. Add a new "hart
isa" line instead, which reports the true set of extensions for that
hart.

Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20231106232439.3176268-1-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoRISC-V: Don't rely on positional structure initialization
Palmer Dabbelt [Tue, 7 Nov 2023 15:55:29 +0000 (07:55 -0800)]
RISC-V: Don't rely on positional structure initialization

Without this I get a bunch of warnings along the lines of

    arch/riscv/kernel/module.c:535:26: error: positional initialization of field in 'struct' declared with 'designated_init' attribute [-Werror=designated-init]
      535 |         [R_RISCV_32] = { apply_r_riscv_32_rela },

This just mades the member initializers explicit instead of positional.
I also aligned some of the table, but mostly just to make the batch
editing go faster.

Fixes: b51fc88cb35e ("Merge patch series "riscv: Add remaining module relocations and tests"")
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20231107155529.8368-1-palmer@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoMerge patch series "riscv: Add remaining module relocations and tests"
Palmer Dabbelt [Tue, 7 Nov 2023 22:59:35 +0000 (14:59 -0800)]
Merge patch series "riscv: Add remaining module relocations and tests"

Charlie Jenkins <charlie@rivosinc.com> says:

A handful of module relocations were missing, this patch includes the
remaining ones. I also wrote some test cases to ensure that module
loading works properly. Some relocations cannot be supported in the
kernel, these include the ones that rely on thread local storage and
dynamic linking.

This patch also overhauls the implementation of ADD/SUB/SET/ULEB128
relocations to handle overflow. "Overflow" is different for ULEB128
since it is a variable-length encoding that the compiler can be expected
to generate enough space for. Instead of overflowing, ULEB128 will
expand into the next 8-bit segment of the location.

A psABI proposal [1] was merged that mandates that SET_ULEB128 and
SUB_ULEB128 are paired, however the discussion following the merging of
the pull request revealed that while the pull request was valid, it
would be better for linkers to properly handle this overflow. This patch
proactively implements this methodology for future compatibility.

This can be tested by enabling KUNIT, RUNTIME_KERNEL_TESTING_MENU, and
RISCV_MODULE_LINKING_KUNIT.

[1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/403

* b4-shazam-merge:
  riscv: Add tests for riscv module loading
  riscv: Add remaining module relocations
  riscv: Avoid unaligned access when relocating modules

Link: https://lore.kernel.org/r/20231101-module_relocations-v9-0-8dfa3483c400@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: Add tests for riscv module loading
Charlie Jenkins [Wed, 1 Nov 2023 18:33:01 +0000 (11:33 -0700)]
riscv: Add tests for riscv module loading

Add test cases for the two main groups of relocations added: SUB and
SET, along with uleb128.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20231101-module_relocations-v9-3-8dfa3483c400@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: Add remaining module relocations
Charlie Jenkins [Wed, 1 Nov 2023 18:33:00 +0000 (11:33 -0700)]
riscv: Add remaining module relocations

Add all final module relocations and add error logs explaining the ones
that are not supported. Implement overflow checks for
ADD/SUB/SET/ULEB128 relocations.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20231101-module_relocations-v9-2-8dfa3483c400@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: Avoid unaligned access when relocating modules
Emil Renner Berthing [Wed, 1 Nov 2023 18:32:59 +0000 (11:32 -0700)]
riscv: Avoid unaligned access when relocating modules

With the C-extension regular 32bit instructions are not
necessarily aligned on 4-byte boundaries. RISC-V instructions
are in fact an ordered list of 16bit little-endian
"parcels", so access the instruction as such.

This should also make the code work in case someone builds
a big-endian RISC-V machine.

Signed-off-by: Emil Renner Berthing <kernel@esmil.dk>
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20231101-module_relocations-v9-1-8dfa3483c400@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: split cache ops out of dma-noncoherent.c
Christoph Hellwig [Sat, 28 Oct 2023 15:51:01 +0000 (17:51 +0200)]
riscv: split cache ops out of dma-noncoherent.c

The cache ops are also used by the pmem code which is unconditionally
built into the kernel.  Move them into a separate file that is built
based on the correct config option.

Fixes: fd962781270e ("riscv: RISCV_NONSTANDARD_CACHE_OPS shouldn't depend on RISCV_DMA_NONCOHERENT")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> #
Link: https://lore.kernel.org/r/20231028155101.1039049-1-hch@lst.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoMerge patch series "riscv: tlb flush improvements"
Palmer Dabbelt [Tue, 7 Nov 2023 06:49:30 +0000 (22:49 -0800)]
Merge patch series "riscv: tlb flush improvements"

Alexandre Ghiti <alexghiti@rivosinc.com> says:

This series optimizes the tlb flushes on riscv which used to simply
flush the whole tlb whatever the size of the range to flush or the size
of the stride.

Patch 3 introduces a threshold that is microarchitecture specific and
will very likely be modified by vendors, not sure though which mechanism
we'll use to do that (dt? alternatives? vendor initialization code?).

* b4-shazam-merge:
  riscv: Improve flush_tlb_kernel_range()
  riscv: Make __flush_tlb_range() loop over pte instead of flushing the whole tlb
  riscv: Improve flush_tlb_range() for hugetlb pages
  riscv: Improve tlb_flush()

Link: https://lore.kernel.org/r/20231030133027.19542-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: Improve flush_tlb_kernel_range()
Alexandre Ghiti [Mon, 30 Oct 2023 13:30:28 +0000 (14:30 +0100)]
riscv: Improve flush_tlb_kernel_range()

This function used to simply flush the whole tlb of all harts, be more
subtile and try to only flush the range.

The problem is that we can only use PAGE_SIZE as stride since we don't know
the size of the underlying mapping and then this function will be improved
only if the size of the region to flush is < threshold * PAGE_SIZE.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> # On RZ/Five SMARC
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Tested-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20231030133027.19542-5-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: Make __flush_tlb_range() loop over pte instead of flushing the whole tlb
Alexandre Ghiti [Mon, 30 Oct 2023 13:30:27 +0000 (14:30 +0100)]
riscv: Make __flush_tlb_range() loop over pte instead of flushing the whole tlb

Currently, when the range to flush covers more than one page (a 4K page or
a hugepage), __flush_tlb_range() flushes the whole tlb. Flushing the whole
tlb comes with a greater cost than flushing a single entry so we should
flush single entries up to a certain threshold so that:
threshold * cost of flushing a single entry < cost of flushing the whole
tlb.

Co-developed-by: Mayuresh Chitale <mchitale@ventanamicro.com>
Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> # On RZ/Five SMARC
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Tested-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20231030133027.19542-4-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: Improve flush_tlb_range() for hugetlb pages
Alexandre Ghiti [Mon, 30 Oct 2023 13:30:26 +0000 (14:30 +0100)]
riscv: Improve flush_tlb_range() for hugetlb pages

flush_tlb_range() uses a fixed stride of PAGE_SIZE and in its current form,
when a hugetlb mapping needs to be flushed, flush_tlb_range() flushes the
whole tlb: so set a stride of the size of the hugetlb mapping in order to
only flush the hugetlb mapping. However, if the hugepage is a NAPOT region,
all PTEs that constitute this mapping must be invalidated, so the stride
size must actually be the size of the PTE.

Note that THPs are directly handled by flush_pmd_tlb_range().

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> # On RZ/Five SMARC
Link: https://lore.kernel.org/r/20231030133027.19542-3-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: Improve tlb_flush()
Alexandre Ghiti [Mon, 30 Oct 2023 13:30:25 +0000 (14:30 +0100)]
riscv: Improve tlb_flush()

For now, tlb_flush() simply calls flush_tlb_mm() which results in a
flush of the whole TLB. So let's use mmu_gather fields to provide a more
fine-grained flush of the TLB.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> # On RZ/Five SMARC
Link: https://lore.kernel.org/r/20231030133027.19542-2-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: select ARCH_PROC_KCORE_TEXT
Andreas Schwab [Tue, 31 Oct 2023 11:40:47 +0000 (12:40 +0100)]
riscv: select ARCH_PROC_KCORE_TEXT

This adds a separate segment for kernel text in /proc/kcore, which has a
different address than the direct linear map.

Signed-off-by: Andreas Schwab <schwab@suse.de>
Link: https://lore.kernel.org/r/mvmh6m758ao.fsf@suse.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: kernel: Use correct SYM_DATA_*() macro for data
Clément Léger [Tue, 24 Oct 2023 13:26:53 +0000 (15:26 +0200)]
riscv: kernel: Use correct SYM_DATA_*() macro for data

Some data were incorrectly annotated with SYM_FUNC_*() instead of
SYM_DATA_*() ones. Use the correct ones.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20231024132655.730417-4-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: Use SYM_*() assembly macros instead of deprecated ones
Clément Léger [Tue, 24 Oct 2023 13:26:52 +0000 (15:26 +0200)]
riscv: Use SYM_*() assembly macros instead of deprecated ones

ENTRY()/END()/WEAK() macros are deprecated and we should make use of the
new SYM_*() macros [1] for better annotation of symbols. Replace the
deprecated ones with the new ones and fix wrong usage of END()/ENDPROC()
to correctly describe the symbols.

[1] https://docs.kernel.org/core-api/asm-annotations.html

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20231024132655.730417-3-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: use ".L" local labels in assembly when applicable
Clément Léger [Tue, 24 Oct 2023 13:26:51 +0000 (15:26 +0200)]
riscv: use ".L" local labels in assembly when applicable

For the sake of coherency, use local labels in assembly when
applicable. This also avoid kprobes being confused when applying a
kprobe since the size of function is computed by checking where the
next visible symbol is located. This might end up in computing some
function size to be way shorter than expected and thus failing to apply
kprobes to the specified offset.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20231024132655.730417-2-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: boot: Fix creation of loader.bin
Geert Uytterhoeven [Tue, 24 Oct 2023 14:53:18 +0000 (16:53 +0200)]
riscv: boot: Fix creation of loader.bin

When flashing loader.bin for K210 using kflash:

    [ERROR] This is an ELF file and cannot be programmed to flash directly: arch/riscv/boot/loader.bin

Before, loader.bin relied on "OBJCOPYFLAGS := -O binary" in the main
RISC-V Makefile to create a boot image with the right format.  With this
removed, the image is now created in the wrong (ELF) format.

Fix this by adding an explicit rule.

Fixes: 505b02957e74f0c5 ("riscv: Remove duplicate objcopy flag")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/1086025809583809538dfecaa899892218f44e7e.1698159066.git.geert+renesas@glider.be
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoMerge patch series "riscv: tlb flush improvements"
Palmer Dabbelt [Mon, 6 Nov 2023 15:20:54 +0000 (07:20 -0800)]
Merge patch series "riscv: tlb flush improvements"

Alexandre Ghiti <alexghiti@rivosinc.com> says:

This series optimizes the tlb flushes on riscv which used to simply
flush the whole tlb whatever the size of the range to flush or the size
of the stride.

Patch 3 introduces a threshold that is microarchitecture specific and
will very likely be modified by vendors, not sure though which mechanism
we'll use to do that (dt? alternatives? vendor initialization code?).

* b4-shazam-merge:
  riscv: Improve flush_tlb_kernel_range()
  riscv: Make __flush_tlb_range() loop over pte instead of flushing the whole tlb
  riscv: Improve flush_tlb_range() for hugetlb pages
  riscv: Improve tlb_flush()

Link: https://lore.kernel.org/r/20231030133027.19542-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: Improve flush_tlb_kernel_range()
Alexandre Ghiti [Mon, 30 Oct 2023 13:30:28 +0000 (14:30 +0100)]
riscv: Improve flush_tlb_kernel_range()

This function used to simply flush the whole tlb of all harts, be more
subtile and try to only flush the range.

The problem is that we can only use PAGE_SIZE as stride since we don't know
the size of the underlying mapping and then this function will be improved
only if the size of the region to flush is < threshold * PAGE_SIZE.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> # On RZ/Five SMARC
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Tested-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20231030133027.19542-5-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: Make __flush_tlb_range() loop over pte instead of flushing the whole tlb
Alexandre Ghiti [Mon, 30 Oct 2023 13:30:27 +0000 (14:30 +0100)]
riscv: Make __flush_tlb_range() loop over pte instead of flushing the whole tlb

Currently, when the range to flush covers more than one page (a 4K page or
a hugepage), __flush_tlb_range() flushes the whole tlb. Flushing the whole
tlb comes with a greater cost than flushing a single entry so we should
flush single entries up to a certain threshold so that:
threshold * cost of flushing a single entry < cost of flushing the whole
tlb.

Co-developed-by: Mayuresh Chitale <mchitale@ventanamicro.com>
Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> # On RZ/Five SMARC
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Tested-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20231030133027.19542-4-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: Improve flush_tlb_range() for hugetlb pages
Alexandre Ghiti [Mon, 30 Oct 2023 13:30:26 +0000 (14:30 +0100)]
riscv: Improve flush_tlb_range() for hugetlb pages

flush_tlb_range() uses a fixed stride of PAGE_SIZE and in its current form,
when a hugetlb mapping needs to be flushed, flush_tlb_range() flushes the
whole tlb: so set a stride of the size of the hugetlb mapping in order to
only flush the hugetlb mapping. However, if the hugepage is a NAPOT region,
all PTEs that constitute this mapping must be invalidated, so the stride
size must actually be the size of the PTE.

Note that THPs are directly handled by flush_pmd_tlb_range().

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> # On RZ/Five SMARC
Link: https://lore.kernel.org/r/20231030133027.19542-3-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: Improve tlb_flush()
Alexandre Ghiti [Mon, 30 Oct 2023 13:30:25 +0000 (14:30 +0100)]
riscv: Improve tlb_flush()

For now, tlb_flush() simply calls flush_tlb_mm() which results in a
flush of the whole TLB. So let's use mmu_gather fields to provide a more
fine-grained flush of the TLB.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> # On RZ/Five SMARC
Link: https://lore.kernel.org/r/20231030133027.19542-2-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: mm: update T-Head memory type definitions
Jisheng Zhang [Tue, 12 Sep 2023 07:25:10 +0000 (15:25 +0800)]
riscv: mm: update T-Head memory type definitions

Update T-Head memory type definitions according to C910 doc [1]
For NC and IO, SH property isn't configurable, hardcoded as SH,
so set SH for NOCACHE and IO.

And also set bit[61](Bufferable) for NOCACHE according to the
table 6.1 in the doc [1].

Link: https://github.com/T-head-Semi/openc910
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Guo Ren <guoren@kernel.org>
Tested-by: Drew Fustini <dfustini@baylibre.com>
Link: https://lore.kernel.org/r/20230912072510.2510-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoMerge patch series "riscv: vdso.lds.S: some improvement"
Palmer Dabbelt [Sun, 5 Nov 2023 22:15:17 +0000 (14:15 -0800)]
Merge patch series "riscv: vdso.lds.S: some improvement"

Jisheng Zhang <jszhang@kernel.org> says:

This series renews one of my last year RFC patch[1], tries to improve
the vdso layout a bit.

patch1 removes useless symbols
patch2 merges .data section of vdso into .rodata because they are
readonly
patch3 is the real renew patch, it removes hardcoded 0x800 .text start
addr. But I rewrite the commit msg per Andrew's suggestions and move
move .note, .eh_frame_hdr, and .eh_frame between .rodata and .text to
keep the actual code well away from the non-instruction data.

* b4-shazam-merge:
  riscv: vdso.lds.S: remove hardcoded 0x800 .text start addr
  riscv: vdso.lds.S: merge .data section into .rodata section
  riscv: vdso.lds.S: drop __alt_start and __alt_end symbols

Link: https://lore.kernel.org/linux-riscv/20221123161805.1579-1-jszhang@kernel.org/
Link: https://lore.kernel.org/r/20230912072015.2424-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: vdso.lds.S: remove hardcoded 0x800 .text start addr
Jisheng Zhang [Tue, 12 Sep 2023 07:20:15 +0000 (15:20 +0800)]
riscv: vdso.lds.S: remove hardcoded 0x800 .text start addr

I believe the hardcoded 0x800 and related comments come from the long
history VDSO_TEXT_OFFSET in x86 vdso code, but commit 5b9304933730
("x86 vDSO: generate vdso-syms.lds") and commit f6b46ebf904f ("x86
vDSO: new layout") removes the comment and hard coding for x86.

Similar as x86 and other arch, riscv doesn't need the rigid layout
using VDSO_TEXT_OFFSET since it "no longer matters to the kernel".
so we could remove the hard coding now, and removing it brings a
small vdso.so and aligns with other architectures.

Also, having enough separation between data and text is important for
I-cache, so similar as x86, move .note, .eh_frame_hdr, and .eh_frame
between .rodata and .text.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Link: https://lore.kernel.org/r/20230912072015.2424-4-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: vdso.lds.S: merge .data section into .rodata section
Jisheng Zhang [Tue, 12 Sep 2023 07:20:14 +0000 (15:20 +0800)]
riscv: vdso.lds.S: merge .data section into .rodata section

The .data section doesn't need to be separate from .rodata section,
they are both readonly.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Link: https://lore.kernel.org/r/20230912072015.2424-3-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: vdso.lds.S: drop __alt_start and __alt_end symbols
Jisheng Zhang [Tue, 12 Sep 2023 07:20:13 +0000 (15:20 +0800)]
riscv: vdso.lds.S: drop __alt_start and __alt_end symbols

These two symbols are not used, remove them.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Link: https://lore.kernel.org/r/20230912072015.2424-2-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: add userland instruction dump to RISC-V splats
Yunhui Cui [Tue, 12 Sep 2023 02:13:49 +0000 (10:13 +0800)]
riscv: add userland instruction dump to RISC-V splats

Add userland instruction dump and rename dump_kernel_instr()
to dump_instr().

An example:
[    0.822439] Freeing unused kernel image (initmem) memory: 6916K
[    0.823817] Run /init as init process
[    0.839411] init[1]: unhandled signal 4 code 0x1 at 0x000000000005be18 in bb[10000+5fb000]
[    0.840751] CPU: 0 PID: 1 Comm: init Not tainted 5.14.0-rc4-00049-gbd644290aa72-dirty #187
[    0.841373] Hardware name:  , BIOS
[    0.841743] epc : 000000000005be18 ra : 0000000000079e74 sp : 0000003fffcafda0
[    0.842271]  gp : ffffffff816e9dc8 tp : 0000000000000000 t0 : 0000000000000000
[    0.842947]  t1 : 0000003fffc9fdf0 t2 : 0000000000000000 s0 : 0000000000000000
[    0.843434]  s1 : 0000000000000000 a0 : 0000003fffca0190 a1 : 0000003fffcafe18
[    0.843891]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
[    0.844357]  a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000
[    0.844803]  s2 : 0000000000000000 s3 : 0000000000000000 s4 : 0000000000000000
[    0.845253]  s5 : 0000000000000000 s6 : 0000000000000000 s7 : 0000000000000000
[    0.845722]  s8 : 0000000000000000 s9 : 0000000000000000 s10: 0000000000000000
[    0.846180]  s11: 0000000000d144e0 t3 : 0000000000000000 t4 : 0000000000000000
[    0.846616]  t5 : 0000000000000000 t6 : 0000000000000000
[    0.847204] status: 0000000200000020 badaddr: 00000000f0028053 cause: 0000000000000002
[    0.848219] Code: f06f ff5f 3823 fa11 0113 fb01 2e23 0201 0293 0000 (8053) f002
[    0.851016] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004

Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230912021349.28302-1-cuiyunhui@bytedance.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: kprobes: allow writing to x0
Nam Cao [Tue, 29 Aug 2023 18:25:00 +0000 (20:25 +0200)]
riscv: kprobes: allow writing to x0

Instructions can write to x0, so we should simulate these instructions
normally.

Currently, the kernel hangs if an instruction who writes to x0 is
simulated.

Fixes: c22b0bcb1dd0 ("riscv: Add kprobes supported")
Cc: stable@vger.kernel.org
Signed-off-by: Nam Cao <namcaov@gmail.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Acked-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230829182500.61875-1-namcaov@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: provide riscv-specific is_trap_insn()
Nam Cao [Tue, 29 Aug 2023 08:36:15 +0000 (10:36 +0200)]
riscv: provide riscv-specific is_trap_insn()

uprobes expects is_trap_insn() to return true for any trap instructions,
not just the one used for installing uprobe. The current default
implementation only returns true for 16-bit c.ebreak if C extension is
enabled. This can confuse uprobes if a 32-bit ebreak generates a trap
exception from userspace: uprobes asks is_trap_insn() who says there is no
trap, so uprobes assume a probe was there before but has been removed, and
return to the trap instruction. This causes an infinite loop of entering
and exiting trap handler.

Instead of using the default implementation, implement this function
speficially for riscv with checks for both ebreak and c.ebreak.

Fixes: 74784081aac8 ("riscv: Add uprobes supported")
Signed-off-by: Nam Cao <namcaov@gmail.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230829083614.117748-1-namcaov@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoMerge patch series "Improve PTDUMP and introduce new fields"
Palmer Dabbelt [Sun, 5 Nov 2023 17:41:57 +0000 (09:41 -0800)]
Merge patch series "Improve PTDUMP and introduce new fields"

Yu Chien Peter Lin <peterlin@andestech.com> says:

This patchset enhances PTDUMP by providing additional information
from pagetable entries.

The first patch fixes the RSW field, while the second and third
patches introduce the PBMT and NAPOT fields, respectively, for
RV64 systems.

* b4-shazam-merge:
  riscv: Introduce NAPOT field to PTDUMP
  riscv: Introduce PBMT field to PTDUMP
  riscv: Improve PTDUMP to show RSW with non-zero value

Link: https://lore.kernel.org/r/20230921025022.3989723-1-peterlin@andestech.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: Introduce NAPOT field to PTDUMP
Yu Chien Peter Lin [Thu, 21 Sep 2023 02:50:22 +0000 (10:50 +0800)]
riscv: Introduce NAPOT field to PTDUMP

This patch introduces the NAPOT field to PTDUMP, allowing it
to display the letter "N" for pages that have the 63rd bit set.

Signed-off-by: Yu Chien Peter Lin <peterlin@andestech.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230921025022.3989723-4-peterlin@andestech.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: Introduce PBMT field to PTDUMP
Yu Chien Peter Lin [Thu, 21 Sep 2023 02:50:21 +0000 (10:50 +0800)]
riscv: Introduce PBMT field to PTDUMP

This patch introduces the PBMT field to the PTDUMP, so it can
display the memory attributes for NC or IO.

Signed-off-by: Yu Chien Peter Lin <peterlin@andestech.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230921025022.3989723-3-peterlin@andestech.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: Improve PTDUMP to show RSW with non-zero value
Yu Chien Peter Lin [Thu, 21 Sep 2023 02:50:20 +0000 (10:50 +0800)]
riscv: Improve PTDUMP to show RSW with non-zero value

RSW field can be used to encode 2 bits of software
defined information. Currently, PTDUMP only prints
"RSW" when its value is 1 or 3.

To fix this issue and improve the debugging experience
with PTDUMP, we redefine _PAGE_SPECIAL to its original
value and use _PAGE_SOFT as the RSW mask, allow it to
print the RSW with any non-zero value.

This patch also removes the val from the struct prot_bits
as it is no longer needed.

Signed-off-by: Yu Chien Peter Lin <peterlin@andestech.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230921025022.3989723-2-peterlin@andestech.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoRISC-V: capitalise CMO op macros
Conor Dooley [Fri, 15 Sep 2023 15:40:44 +0000 (16:40 +0100)]
RISC-V: capitalise CMO op macros

The CMO op macros initially used lower case, as the original iteration
of the ALT_CMO_OP alternative stringified the first parameter to
finalise the assembly for the standard variant.
As a knock-on, the T-Head versions of these CMOs had to use mixed case
defines. Commit dd23e9535889 ("RISC-V: replace cbom instructions with
an insn-def") removed the asm construction with stringify, replacing it
an insn-def macro, rending the lower-case surplus to requirements.
As far as I can tell from a brief check, CBO_zero does not see similar
use and didn't require the mixed case define in the first place.
Replace the lower case characters now for consistency with other
insn-def macros in the standard and T-Head forms, and adjust the
callsites.

Suggested-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230915-aloe-dollar-994937477776@spud
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: don't probe unaligned access speed if already done
Jisheng Zhang [Tue, 12 Sep 2023 15:40:40 +0000 (23:40 +0800)]
riscv: don't probe unaligned access speed if already done

If misaligned_access_speed percpu var isn't so called "HWPROBE
MISALIGNED UNKNOWN", it means the probe has happened(this is possible
for example, hotplug off then hotplug on one cpu), and the percpu var
has been set, don't probe again in this case.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Fixes: 584ea6564bca ("RISC-V: Probe for unaligned access speed")
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230912154040.3306-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: defconfig : add CONFIG_MMC_DW for starfive
Jinyu Tang [Tue, 12 Sep 2023 13:31:28 +0000 (21:31 +0800)]
riscv: defconfig : add CONFIG_MMC_DW for starfive

If these config not set, mmc can't run for jh7110, rootfs can't
be found when using SD card. So set CONFIG_MMC_DW=y like arm64
defconfig, and set CONFIG_MMC_DW_STARFIVE=y for starfive. Then
starfive vf2 board can start SD card rootfs with mainline defconfig
and dtb.

Signed-off-by: Jinyu Tang <tangjinyu@tinylab.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230912133128.5247-1-tangjinyu@tinylab.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: signal: handle syscall restart before get_signal
Haorong Lu [Thu, 3 Aug 2023 22:44:54 +0000 (15:44 -0700)]
riscv: signal: handle syscall restart before get_signal

In the current riscv implementation, blocking syscalls like read() may
not correctly restart after being interrupted by ptrace. This problem
arises when the syscall restart process in arch_do_signal_or_restart()
is bypassed due to changes to the regs->cause register, such as an
ebreak instruction.

Steps to reproduce:
1. Interrupt the tracee process with PTRACE_SEIZE & PTRACE_INTERRUPT.
2. Backup original registers and instruction at new_pc.
3. Change pc to new_pc, and inject an instruction (like ebreak) to this
   address.
4. Resume with PTRACE_CONT and wait for the process to stop again after
   executing ebreak.
5. Restore original registers and instructions, and detach from the
   tracee process.
6. Now the read() syscall in tracee will return -1 with errno set to
   ERESTARTSYS.

Specifically, during an interrupt, the regs->cause changes from
EXC_SYSCALL to EXC_BREAKPOINT due to the injected ebreak, which is
inaccessible via ptrace so we cannot restore it. This alteration breaks
the syscall restart condition and ends the read() syscall with an
ERESTARTSYS error. According to include/linux/errno.h, it should never
be seen by user programs. X86 can avoid this issue as it checks the
syscall condition using a register (orig_ax) exposed to user space.
Arm64 handles syscall restart before calling get_signal, where it could
be paused and inspected by ptrace/debugger.

This patch adjusts the riscv implementation to arm64 style, which also
checks syscall using a kernel register (syscallno). It ensures the
syscall restart process is not bypassed when changes to the cause
register occur, providing more consistent behavior across various
architectures.

For a simplified reproduction program, feel free to visit:
https://github.com/ancientmodern/riscv-ptrace-bug-demo.

Signed-off-by: Haorong Lu <ancientmodern4@gmail.com>
Link: https://lore.kernel.org/r/20230803224458.4156006-1-ancientmodern4@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoMerge patch series "Add support to handle misaligned accesses in S-mode"
Palmer Dabbelt [Wed, 1 Nov 2023 16:02:09 +0000 (09:02 -0700)]
Merge patch series "Add support to handle misaligned accesses in S-mode"

Clément Léger <cleger@rivosinc.com> says:

Since commit 61cadb9 ("Provide new description of misaligned load/store
behavior compatible with privileged architecture.") in the RISC-V ISA
manual, it is stated that misaligned load/store might not be supported.
However, the RISC-V kernel uABI describes that misaligned accesses are
supported. In order to support that, this series adds support for S-mode
handling of misaligned accesses as well support for prctl(PR_UNALIGN).

Handling misaligned access in kernel allows for a finer grain control
of the misaligned accesses behavior, and thanks to the prctl() call,
can allow disabling misaligned access emulation to generate SIGBUS. User
space can then optimize its software by removing such access based on
SIGBUS generation.

This series is useful when using a SBI implementation that does not
handle misaligned traps as well as detecting misaligned accesses
generated by userspace application using the prctrl(PR_SET_UNALIGN)
feature.

This series can be tested using the spike simulator[1] and a modified
openSBI version[2] which allows to always delegate misaligned load/store to
S-mode. A test[3] that exercise various instructions/registers can be
executed to verify the unaligned access support.

[1] https://github.com/riscv-software-src/riscv-isa-sim
[2] https://github.com/rivosinc/opensbi/tree/dev/cleger/no_misaligned
[3] https://github.com/clementleger/unaligned_test

* b4-shazam-merge:
  riscv: add support for PR_SET_UNALIGN and PR_GET_UNALIGN
  riscv: report misaligned accesses emulation to hwprobe
  riscv: annotate check_unaligned_access_boot_cpu() with __init
  riscv: add support for sysctl unaligned_enabled control
  riscv: add floating point insn support to misaligned access emulation
  riscv: report perf event for misaligned fault
  riscv: add support for misaligned trap handling in S-mode
  riscv: remove unused functions in traps_misaligned.c

Link: https://lore.kernel.org/r/20231004151405.521596-1-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoRISC-V: hwprobe: Fix vDSO SIGSEGV
Andrew Jones [Tue, 10 Oct 2023 16:51:02 +0000 (18:51 +0200)]
RISC-V: hwprobe: Fix vDSO SIGSEGV

A hwprobe pair key is signed, but the hwprobe vDSO function was
only checking that the upper bound was valid. In order to help
avoid this type of problem in the future, and in anticipation of
this check becoming more complicated with sparse keys, introduce
and use a "key is valid" predicate function for the check.

Fixes: aa5af0aa90ba ("RISC-V: Add hwprobe vDSO function and data")
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Evan Green <evan@rivosinc.com>
Link: https://lore.kernel.org/r/20231010165101.14942-2-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: configs: defconfig: Enable configs required for RZ/Five SoC
Lad Prabhakar [Fri, 29 Sep 2023 00:07:04 +0000 (01:07 +0100)]
riscv: configs: defconfig: Enable configs required for RZ/Five SoC

Enable the configs required by the below IP blocks which are
present on RZ/Five SoC:
* ADC
* CANFD
* DMAC
* eMMC/SDHI
* OSTM
* RAVB (+ Micrel PHY)
* RIIC
* RSPI
* SSI (Sound+WM8978 codec)
* Thermal
* USB (PHY/RESET/OTG)

Along with the above some core configs are enabled too,
-> CPU frequency scaling as RZ/Five does support this.
-> MTD is enabled as RSPI can be connected to flash chips
-> Enabled I2C chardev so that it enables userspace to read/write
   i2c devices (similar to arm64)
-> Thermal configs as RZ/Five SoC does have thermal unit
-> GPIO regulator as we might have IP blocks for which voltage
   levels are controlled by GPIOs
-> OTG configs as RZ/Five USB can support host/function
-> Gadget configs so that we can test USB function (as done in arm64
   all the gadget configs are enabled)

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20230929000704.53217-6-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoMerge patch series "riscv: SCS support"
Palmer Dabbelt [Thu, 2 Nov 2023 21:05:23 +0000 (14:05 -0700)]
Merge patch series "riscv: SCS support"

Sami Tolvanen <samitolvanen@google.com> says:

This series adds Shadow Call Stack (SCS) support for RISC-V. SCS
uses compiler instrumentation to store return addresses in a
separate shadow stack to protect them against accidental or
malicious overwrites. More information about SCS can be found
here:

  https://clang.llvm.org/docs/ShadowCallStack.html

Patch 1 is from Deepak, and it simplifies VMAP_STACK overflow
handling by adding support for accessing per-CPU variables
directly in assembly. The patch is included in this series to
make IRQ stack switching cleaner with SCS, and I've simply
rebased it and fixed a couple of minor issues. Patch 2 uses this
functionality to clean up the stack switching by moving duplicate
code into a single function. On RISC-V, the compiler uses the
gp register for storing the current shadow call stack pointer,
which is incompatible with global pointer relaxation. Patch 3
moves global pointer loading into a macro that can be easily
disabled with SCS. Patch 4 implements SCS register loading and
switching, and allows the feature to be enabled, and patch 5 adds
separate per-CPU IRQ shadow call stacks when CONFIG_IRQ_STACKS is
enabled. Patch 6 fixes the backward-edge CFI test in lkdtm for
RISC-V.

Note that this series requires Clang 17. Earlier Clang versions
support SCS on RISC-V, but use the x18 register instead of gp,
which isn't ideal. gcc has SCS support for arm64, but I'm not
aware of plans to support RISC-V. Once the Zicfiss extension is
ratified, it's probably preferable to use hardware-backed shadow
stacks instead of SCS on hardware that supports the extension,
and we may want to consider implementing CONFIG_DYNAMIC_SCS to
patch between the implementation at runtime (similarly to the
arm64 implementation, which switches to SCS when hardware PAC
support isn't available).

* b4-shazam-merge:
  lkdtm: Fix CFI_BACKWARD on RISC-V
  riscv: Use separate IRQ shadow call stacks
  riscv: Implement Shadow Call Stack
  riscv: Move global pointer loading to a macro
  riscv: Deduplicate IRQ stack switching
  riscv: VMAP_STACK overflow detection thread-safe

Link: https://lore.kernel.org/r/20230927224757.1154247-8-samitolvanen@google.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoMerge patch "riscv: errata: improve T-Head CMO"
Palmer Dabbelt [Fri, 27 Oct 2023 13:22:52 +0000 (06:22 -0700)]
Merge patch "riscv: errata: improve T-Head CMO"

This fixes an encoding issue with T-Head's dcache.cva and fixes the
comment about the T-Head encodings.  The first of these was a fix and
got picked up earlier, I'm merging the second on top of it as they touch
the same comment.

* b4-shazam-merge:
  riscv: errata: prefix T-Head mnemonics with th.
  riscv: errata: fix T-Head dcache.cva encoding

Link: https://lore.kernel.org/r/20230827090813.1353-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: errata: prefix T-Head mnemonics with th.
Icenowy Zheng [Sun, 27 Aug 2023 09:08:13 +0000 (17:08 +0800)]
riscv: errata: prefix T-Head mnemonics with th.

T-Head now maintains some specification for their extended instructions
at [1], in which all instructions are prefixed "th.".

Follow this practice in the kernel comments.

Link: https://github.com/T-head-Semi/thead-extension-spec
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: add support for PR_SET_UNALIGN and PR_GET_UNALIGN
Clément Léger [Wed, 4 Oct 2023 15:14:05 +0000 (17:14 +0200)]
riscv: add support for PR_SET_UNALIGN and PR_GET_UNALIGN

Now that trap support is ready to handle misalignment errors in S-mode,
allow the user to control the behavior of misaligned accesses using
prctl(PR_SET_UNALIGN). Add an align_ctl flag in thread_struct which
will be used to determine if we should SIGBUS the process or not on
such fault.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20231004151405.521596-9-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: report misaligned accesses emulation to hwprobe
Clément Léger [Wed, 4 Oct 2023 15:14:04 +0000 (17:14 +0200)]
riscv: report misaligned accesses emulation to hwprobe

hwprobe provides a way to report if misaligned access are emulated. In
order to correctly populate that feature, we can check if it actually
traps when doing a misaligned access. This can be checked using an
exception table entry which will actually be used when a misaligned
access is done from kernel mode.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Link: https://lore.kernel.org/r/20231004151405.521596-8-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: annotate check_unaligned_access_boot_cpu() with __init
Clément Léger [Wed, 4 Oct 2023 15:14:03 +0000 (17:14 +0200)]
riscv: annotate check_unaligned_access_boot_cpu() with __init

This function is solely called as an initcall, thus annotate it with
__init.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Evan Green <evan@rivosinc.com>
Link: https://lore.kernel.org/r/20231004151405.521596-7-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: add support for sysctl unaligned_enabled control
Clément Léger [Wed, 4 Oct 2023 15:14:02 +0000 (17:14 +0200)]
riscv: add support for sysctl unaligned_enabled control

This sysctl tuning option allows the user to disable misaligned access
handling globally on the system. This will also be used by misaligned
detection code to temporarily disable misaligned access handling.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20231004151405.521596-6-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: add floating point insn support to misaligned access emulation
Clément Léger [Wed, 4 Oct 2023 15:14:01 +0000 (17:14 +0200)]
riscv: add floating point insn support to misaligned access emulation

This support is partially based of openSBI misaligned emulation floating
point instruction support. It provides support for the existing
floating point instructions (both for 32/64 bits as well as compressed
ones). Since floating point registers are not part of the pt_regs
struct, we need to modify them directly using some assembly. We also
dirty the pt_regs status in case we modify them to be sure context
switch will save FP state. With this support, Linux is on par with
openSBI support.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Link: https://lore.kernel.org/r/20231004151405.521596-5-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: report perf event for misaligned fault
Clément Léger [Wed, 4 Oct 2023 15:14:00 +0000 (17:14 +0200)]
riscv: report perf event for misaligned fault

Add missing calls to account for misaligned fault event using
perf_sw_event().

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20231004151405.521596-4-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: add support for misaligned trap handling in S-mode
Clément Léger [Wed, 4 Oct 2023 15:13:59 +0000 (17:13 +0200)]
riscv: add support for misaligned trap handling in S-mode

Misalignment trap handling is only supported for M-mode and uses direct
accesses to user memory. In S-mode, when handling usermode fault, this
requires to use the get_user()/put_user() accessors. Implement
load_u8(), store_u8() and get_insn() using these accessors for
userspace and direct text access for kernel.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20231004151405.521596-3-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: remove unused functions in traps_misaligned.c
Clément Léger [Wed, 4 Oct 2023 15:13:58 +0000 (17:13 +0200)]
riscv: remove unused functions in traps_misaligned.c

Replace macros by the only two function calls that are done from this
file, store_u8() and load_u8().

Signed-off-by: Clément Léger <cleger@rivosinc.com>
Link: https://lore.kernel.org/r/20231004151405.521596-2-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoMerge patch series "RISC-V: ACPI improvements"
Palmer Dabbelt [Thu, 26 Oct 2023 16:40:36 +0000 (09:40 -0700)]
Merge patch series "RISC-V: ACPI improvements"

Sunil V L <sunilvl@ventanamicro.com> says:

This series is a set of patches which were originally part of RFC v1 series
[1] to add ACPI support in RISC-V interrupt controllers. Since these
patches are independent of the interrupt controllers, creating this new
series which helps to merge instead of waiting for big series.

This set of patches primarily adds support below ECR [2] which is approved
by the ASWG and adds below features.

- Get CBO block sizes from RHCT on ACPI based systems.

Additionally, the series contains a patch to improve acpi_os_ioremap().

[1] - https://lore.kernel.org/lkml/20230803175202.3173957-1-sunilvl@ventanamicro.com/
[2] - https://drive.google.com/file/d/1sKbOa8m1UZw1JkquZYe3F1zQBN1xXsaf/view?usp=sharing

* b4-shazam-merge:
  RISC-V: cacheflush: Initialize CBO variables on ACPI systems
  RISC-V: ACPI: RHCT: Add function to get CBO block sizes
  RISC-V: ACPI: Update the return value of acpi_get_rhct()
  RISC-V: ACPI: Enhance acpi_os_ioremap with MMIO remapping

Link: https://lore.kernel.org/r/20231018124007.1306159-1-sunilvl@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: put interrupt entries into .irqentry.text
Nam Cao [Mon, 21 Aug 2023 14:57:09 +0000 (16:57 +0200)]
riscv: put interrupt entries into .irqentry.text

The interrupt entries are expected to be in the .irqentry.text section.
For example, for kprobes to work properly, exception code cannot be
probed; this is ensured by blacklisting addresses in the .irqentry.text
section.

Fixes: 7db91e57a0ac ("RISC-V: Task implementation")
Signed-off-by: Nam Cao <namcaov@gmail.com>
Link: https://lore.kernel.org/r/20230821145708.21270-1-namcaov@gmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: mm: Update the comment of CONFIG_PAGE_OFFSET
Song Shuai [Wed, 9 Aug 2023 03:10:23 +0000 (11:10 +0800)]
riscv: mm: Update the comment of CONFIG_PAGE_OFFSET

Since the commit 011f09d12052 set sv57 as default for CONFIG_64BIT,
the comment of CONFIG_PAGE_OFFSET should be updated too.

Fixes: 011f09d12052 ("riscv: mm: Set sv57 on defaultly")
Signed-off-by: Song Shuai <suagrfillet@gmail.com>
Link: https://lore.kernel.org/r/20230809031023.3575407-1-songshuaishuai@tinylab.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: Using TOOLCHAIN_HAS_ZIHINTPAUSE marco replace zihintpause
Minda Chen [Wed, 2 Aug 2023 06:42:15 +0000 (14:42 +0800)]
riscv: Using TOOLCHAIN_HAS_ZIHINTPAUSE marco replace zihintpause

Actually it is a part of Conor's
commit aae538cd03bc ("riscv: fix detection of toolchain
Zihintpause support").
It is looks like a merge issue. Samuel's
commit 0b1d60d6dd9e ("riscv: Fix build with
CONFIG_CC_OPTIMIZE_FOR_SIZE=y") do not base on Conor's commit and
revert to __riscv_zihintpause. So this patch can fix it.

Signed-off-by: Minda Chen <minda.chen@starfivetech.com>
Fixes: 3c349eacc559 ("Merge patch "riscv: Fix build with CONFIG_CC_OPTIMIZE_FOR_SIZE=y"")
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230802064215.31111-1-minda.chen@starfivetech.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv/mm: Fix the comment for swap pte format
Xiao Wang [Thu, 21 Sep 2023 14:16:52 +0000 (22:16 +0800)]
riscv/mm: Fix the comment for swap pte format

Swap type takes bits 7-11 and swap offset should start from bit 12.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20230921141652.2657054-1-xiao.w.wang@intel.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoRISC-V: clarify the QEMU workaround in ISA parser
Tsukasa OI [Wed, 26 Jul 2023 05:44:16 +0000 (05:44 +0000)]
RISC-V: clarify the QEMU workaround in ISA parser

Extensions prefixed with "Su" won't corrupt the workaround in many
cases.  The only exception is when the first multi-letter extension in the
ISA string begins with "Su" and is not prefixed with an underscore.

For instance, following ISA string can confuse this QEMU workaround.

*   "rv64imacsuclic" (RV64I + M + A + C + "Suclic")

However, this case is very unlikely because extensions prefixed by either
"Z", "Sm" or "Ss" will most likely precede first.

For instance, the "Suclic" extension (draft as of now) will be placed after
related "Smclic" and "Ssclic" extensions.  It's also highly likely that
other unprivileged extensions like "Zba" will precede.

It's also possible to suppress the issue in the QEMU workaround with an
underscore.  Following ISA string won't confuse the QEMU workaround.

*   "rv64imac_suclic" (RV64I + M + A + C + delimited "Suclic")

This fix is to tell kernel developers the nature of this workaround
precisely.  There are some "Su*" extensions to be ratified but don't worry
about this workaround too much.

This commit comes with other minor editorial fixes (for minor wording and
spacing issues, without changing the meaning).

Signed-off-by: Tsukasa OI <research_trasio@irq.a4lg.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/8a127608cf6194a6d288289f2520bd1744b81437.1690350252.git.research_trasio@irq.a4lg.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: correct pt_level name via pgtable_l5/4_enabled
Song Shuai [Wed, 30 Aug 2023 04:39:20 +0000 (21:39 -0700)]
riscv: correct pt_level name via pgtable_l5/4_enabled

The pt_level uses CONFIG_PGTABLE_LEVELS to display page table names.
But if page mode is downgraded from kernel cmdline or restricted by
the hardware in 64BIT, it will give a wrong name.

Like, using no4lvl for sv39, ptdump named the 1G-mapping as "PUD"
that should be "PGD":

0xffffffd840000000-0xffffffd900000000    0x00000000c0000000         3G PUD     D A G . . W R V

So select "P4D/PUD" or "PGD" via pgtable_l5/4_enabled to correct it.

Fixes: e8a62cc26ddf ("riscv: Implement sv48 support")
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Song Shuai <suagrfillet@gmail.com>
Link: https://lore.kernel.org/r/20230712115740.943324-1-suagrfillet@gmail.com
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230830044129.11481-3-palmer@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoRISC-V: Provide pgtable_l5_enabled on rv32
Palmer Dabbelt [Wed, 30 Aug 2023 04:39:19 +0000 (21:39 -0700)]
RISC-V: Provide pgtable_l5_enabled on rv32

A few of the other page table level helpers are defined on rv32, but not
pgtable_l5_enabled.  This adds the definition as a constant and converts
pgtable_l4_enabled to a constant as well.

Link: https://lore.kernel.org/r/20230830044129.11481-2-palmer@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoclocksource: timer-riscv: Increase rating of clock_event_device for Sstc
Anup Patel [Mon, 10 Jul 2023 13:19:02 +0000 (18:49 +0530)]
clocksource: timer-riscv: Increase rating of clock_event_device for Sstc

When Sstc is available the RISC-V timer clock_event_device should be
the preferred clock_event_device hence we increase clock_event_device
rating for Sstc.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230710131902.1459180-3-apatel@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoclocksource: timer-riscv: Don't enable/disable timer interrupt
Anup Patel [Mon, 10 Jul 2023 13:19:01 +0000 (18:49 +0530)]
clocksource: timer-riscv: Don't enable/disable timer interrupt

Currently, we enable/disable timer interrupt at runtime to start/stop
timer events. This makes timer interrupt state go out-of-sync with
the Linux interrupt subsystem.

To address the above issue, we can stop a per-HART timer interrupt
by setting U64_MAX in timecmp CSR (or sbi_set_timer()) at the time
of handling timer interrupt.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230710131902.1459180-2-apatel@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoMerge patch series "RISC-V: Enable cbo.zero in usermode"
Palmer Dabbelt [Thu, 21 Sep 2023 11:22:30 +0000 (04:22 -0700)]
Merge patch series "RISC-V: Enable cbo.zero in usermode"

Andrew Jones <ajones@ventanamicro.com> says:

In order for usermode to issue cbo.zero, it needs privilege granted to
issue the extension instruction (patch 2) and to know that the extension
is available and its block size (patch 3). Patch 1 could be separate from
this series (it just fixes up some error messages), patches 4-5 convert
the hwprobe selftest to a statically-linked, TAP test and patch 6 adds a
new hwprobe test for the new information as well as testing CBO
instructions can or cannot be issued as appropriate.

* b4-shazam-merge:
  RISC-V: selftests: Add CBO tests
  RISC-V: selftests: Convert hwprobe test to kselftest API
  RISC-V: selftests: Statically link hwprobe test
  RISC-V: hwprobe: Expose Zicboz extension and its block size
  RISC-V: Enable cbo.zero in usermode
  RISC-V: Make zicbom/zicboz errors consistent

Link: https://lore.kernel.org/r/20230918131518.56803-8-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoMerge patch series "riscv: kexec: cleanup and fixups"
Palmer Dabbelt [Wed, 1 Nov 2023 02:15:41 +0000 (19:15 -0700)]
Merge patch series "riscv: kexec: cleanup and fixups"

Song Shuai <songshuaishuai@tinylab.org> says:

This series contains a cleanup for riscv_kexec_relocate() and two fixups
for KEXEC_FILE and had passed the basic kexec test in my 64bit Qemu-virt.

You can use this kexec-tools[3] to test the kexec-file-syscall and these patches.

riscv: kexec: Cleanup riscv_kexec_relocate (patch1)
==================================================

For readability and simplicity, cleanup the riscv_kexec_relocate code:

 - Re-sort the first 4 `mv` instructions against `riscv_kexec_method()`
 - Eliminate registers for debugging (s9,s10,s11) and storing const-value (s5,s6)
 - Replace `jalr` with `jr` for no-link jump

riscv: kexec: Align the kexeced kernel entry (patch2)
==================================================

The current riscv boot protocol requires 2MB alignment for RV64
and 4MB alignment for RV32.

In KEXEC_FILE path, the elf_find_pbase() function should align
the kexeced kernel entry according to the requirement, otherwise
the kexeced kernel would silently BUG at the setup_vm().

riscv: kexec: Remove -fPIE for PURGATORY_CFLAGS (patch3)
==================================================

With CONFIG_RELOCATABLE enabled, KBUILD_CFLAGS had a -fPIE option
and then the purgatory/string.o was built to reference _ctype symbol
via R_RISCV_GOT_HI20 relocations which can't be handled by purgatory.

As a consequence, the kernel failed kexec_load_file() with:

[  880.386562] kexec_image: The entry point of kernel at 0x80200000
[  880.388650] kexec_image: Unknown rela relocation: 20
[  880.389173] kexec_image: Error loading purgatory ret=-8

So remove the -fPIE option for PURGATORY_CFLAGS to generate
R_RISCV_PCREL_HI20 relocations type making puragtory work as it was.

 arch/riscv/kernel/elf_kexec.c      |  8 ++++-
 arch/riscv/kernel/kexec_relocate.S | 52 +++++++++++++-----------------
 arch/riscv/purgatory/Makefile      |  4 +++
 3 files changed, 34 insertions(+), 30 deletions(-)

* b4-shazam-merge:
  riscv: kexec: Remove -fPIE for PURGATORY_CFLAGS
  riscv: kexec: Align the kexeced kernel entry
  riscv: kexec: Cleanup riscv_kexec_relocate

Link: https://lore.kernel.org/r/20230907103304.590739-1-songshuaishuai@tinylab.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agolkdtm: Fix CFI_BACKWARD on RISC-V
Sami Tolvanen [Wed, 27 Sep 2023 22:48:04 +0000 (22:48 +0000)]
lkdtm: Fix CFI_BACKWARD on RISC-V

On RISC-V, the return address is before the current frame pointer,
unlike on most other architectures. Use the correct offset on RISC-V
to fix the CFI_BACKWARD test.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20230927224757.1154247-14-samitolvanen@google.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: Use separate IRQ shadow call stacks
Sami Tolvanen [Wed, 27 Sep 2023 22:48:03 +0000 (22:48 +0000)]
riscv: Use separate IRQ shadow call stacks

When both CONFIG_IRQ_STACKS and SCS are enabled, also use a separate
per-CPU shadow call stack.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20230927224757.1154247-13-samitolvanen@google.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: Implement Shadow Call Stack
Sami Tolvanen [Wed, 27 Sep 2023 22:48:02 +0000 (22:48 +0000)]
riscv: Implement Shadow Call Stack

Implement CONFIG_SHADOW_CALL_STACK for RISC-V. When enabled, the
compiler injects instructions to all non-leaf C functions to
store the return address to the shadow stack and unconditionally
load it again before returning, which makes it harder to corrupt
the return address through a stack overflow, for example.

The active shadow call stack pointer is stored in the gp
register, which makes SCS incompatible with gp relaxation. Use
--no-relax-gp to ensure gp relaxation is disabled and disable
global pointer loading.  Add SCS pointers to struct thread_info,
implement SCS initialization, and task switching

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20230927224757.1154247-12-samitolvanen@google.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: Move global pointer loading to a macro
Sami Tolvanen [Wed, 27 Sep 2023 22:48:01 +0000 (22:48 +0000)]
riscv: Move global pointer loading to a macro

In Clang 17, -fsanitize=shadow-call-stack uses the newly declared
platform register gp for storing shadow call stack pointers. As
this is obviously incompatible with gp relaxation, in preparation
for CONFIG_SHADOW_CALL_STACK support, move global pointer loading
to a single macro, which we can cleanly disable when SCS is used
instead.

Link: https://reviews.llvm.org/rGaa1d2693c256
Link: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/commit/a484e843e6eeb51f0cb7b8819e50da6d2444d769
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20230927224757.1154247-11-samitolvanen@google.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: Deduplicate IRQ stack switching
Sami Tolvanen [Wed, 27 Sep 2023 22:48:00 +0000 (22:48 +0000)]
riscv: Deduplicate IRQ stack switching

With CONFIG_IRQ_STACKS, we switch to a separate per-CPU IRQ stack
before calling handle_riscv_irq or __do_softirq. We currently
have duplicate inline assembly snippets for stack switching in
both code paths. Now that we can access per-CPU variables in
assembly, implement call_on_irq_stack in assembly, and use that
instead of redundant inline assembly.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230927224757.1154247-10-samitolvanen@google.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoriscv: VMAP_STACK overflow detection thread-safe
Deepak Gupta [Wed, 27 Sep 2023 22:47:59 +0000 (22:47 +0000)]
riscv: VMAP_STACK overflow detection thread-safe

commit 31da94c25aea ("riscv: add VMAP_STACK overflow detection") added
support for CONFIG_VMAP_STACK. If overflow is detected, CPU switches to
`shadow_stack` temporarily before switching finally to per-cpu
`overflow_stack`.

If two CPUs/harts are racing and end up in over flowing kernel stack, one
or both will end up corrupting each other state because `shadow_stack` is
not per-cpu. This patch optimizes per-cpu overflow stack switch by
directly picking per-cpu `overflow_stack` and gets rid of `shadow_stack`.

Following are the changes in this patch

 - Defines an asm macro to obtain per-cpu symbols in destination
   register.
 - In entry.S, when overflow is detected, per-cpu overflow stack is
   located using per-cpu asm macro. Computing per-cpu symbol requires
   a temporary register. x31 is saved away into CSR_SCRATCH
   (CSR_SCRATCH is anyways zero since we're in kernel).

Please see Links for additional relevant disccussion and alternative
solution.

Tested by `echo EXHAUST_STACK > /sys/kernel/debug/provoke-crash/DIRECT`
Kernel crash log below

 Insufficient stack space to handle exception!/debug/provoke-crash/DIRECT
 Task stack:     [0xff20000010a98000..0xff20000010a9c000]
 Overflow stack: [0xff600001f7d98370..0xff600001f7d99370]
 CPU: 1 PID: 205 Comm: bash Not tainted 6.1.0-rc2-00001-g328a1f96f7b9 #34
 Hardware name: riscv-virtio,qemu (DT)
 epc : __memset+0x60/0xfc
  ra : recursive_loop+0x48/0xc6 [lkdtm]
 epc : ffffffff808de0e4 ra : ffffffff0163a752 sp : ff20000010a97e80
  gp : ffffffff815c0330 tp : ff600000820ea280 t0 : ff20000010a97e88
  t1 : 000000000000002e t2 : 3233206874706564 s0 : ff20000010a982b0
  s1 : 0000000000000012 a0 : ff20000010a97e88 a1 : 0000000000000000
  a2 : 0000000000000400 a3 : ff20000010a98288 a4 : 0000000000000000
  a5 : 0000000000000000 a6 : fffffffffffe43f0 a7 : 00007fffffffffff
  s2 : ff20000010a97e88 s3 : ffffffff01644680 s4 : ff20000010a9be90
  s5 : ff600000842ba6c0 s6 : 00aaaaaac29e42b0 s7 : 00fffffff0aa3684
  s8 : 00aaaaaac2978040 s9 : 0000000000000065 s10: 00ffffff8a7cad10
  s11: 00ffffff8a76a4e0 t3 : ffffffff815dbaf4 t4 : ffffffff815dbaf4
  t5 : ffffffff815dbab8 t6 : ff20000010a9bb48
 status: 0000000200000120 badaddr: ff20000010a97e88 cause: 000000000000000f
 Kernel panic - not syncing: Kernel stack overflow
 CPU: 1 PID: 205 Comm: bash Not tainted 6.1.0-rc2-00001-g328a1f96f7b9 #34
 Hardware name: riscv-virtio,qemu (DT)
 Call Trace:
 [<ffffffff80006754>] dump_backtrace+0x30/0x38
 [<ffffffff808de798>] show_stack+0x40/0x4c
 [<ffffffff808ea2a8>] dump_stack_lvl+0x44/0x5c
 [<ffffffff808ea2d8>] dump_stack+0x18/0x20
 [<ffffffff808dec06>] panic+0x126/0x2fe
 [<ffffffff800065ea>] walk_stackframe+0x0/0xf0
 [<ffffffff0163a752>] recursive_loop+0x48/0xc6 [lkdtm]
 SMP: stopping secondary CPUs
 ---[ end Kernel panic - not syncing: Kernel stack overflow ]---

Cc: Guo Ren <guoren@kernel.org>
Cc: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/linux-riscv/Y347B0x4VUNOd6V7@xhacker/T/#t
Link: https://lore.kernel.org/lkml/20221124094845.1907443-1-debug@rivosinc.com/
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Co-developed-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Guo Ren <guoren@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20230927224757.1154247-9-samitolvanen@google.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoRISC-V: cacheflush: Initialize CBO variables on ACPI systems
Sunil V L [Wed, 18 Oct 2023 12:40:07 +0000 (18:10 +0530)]
RISC-V: cacheflush: Initialize CBO variables on ACPI systems

Initialize the CBO variables on ACPI based systems using information in
RHCT.

Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20231018124007.1306159-5-sunilvl@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoRISC-V: ACPI: RHCT: Add function to get CBO block sizes
Sunil V L [Wed, 18 Oct 2023 12:40:06 +0000 (18:10 +0530)]
RISC-V: ACPI: RHCT: Add function to get CBO block sizes

Cache Block Operation (CBO) related block size in ACPI is provided by RHCT.
Add support to read the CMO node in RHCT to get this information.

Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20231018124007.1306159-4-sunilvl@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoRISC-V: ACPI: Update the return value of acpi_get_rhct()
Sunil V L [Wed, 18 Oct 2023 12:40:05 +0000 (18:10 +0530)]
RISC-V: ACPI: Update the return value of acpi_get_rhct()

acpi_get_rhct() currently returns pointer to acpi_table_header
structure. But since this is specific to RHCT, return pointer to
acpi_table_rhct structure itself.

Suggested-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20231018124007.1306159-3-sunilvl@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
10 months agoRISC-V: ACPI: Enhance acpi_os_ioremap with MMIO remapping
Sunil V L [Wed, 18 Oct 2023 12:40:04 +0000 (18:10 +0530)]
RISC-V: ACPI: Enhance acpi_os_ioremap with MMIO remapping

Enhance the acpi_os_ioremap() to support opregions in MMIO space. Also,
have strict checks using EFI memory map to allow remapping the RAM similar
to arm64.

Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20231018124007.1306159-2-sunilvl@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
12 months agoRISC-V: selftests: Add CBO tests
Andrew Jones [Mon, 18 Sep 2023 13:15:25 +0000 (15:15 +0200)]
RISC-V: selftests: Add CBO tests

Add hwprobe test for Zicboz and its block size. Also, when Zicboz is
present, test that cbo.zero may be issued and works. Additionally
provide a command line option that enables testing that the Zicbom
instructions cause SIGILL and also that cbo.zero causes SIGILL when
Zicboz it's not present. The SIGILL tests require "opt-in" with a
command line option because the RISC-V ISA does not require
unimplemented standard opcodes to issue illegal-instruction
exceptions (but hopefully most platforms do).

Pinning the test to a subset of cpus with taskset will also restrict
the hwprobe calls to that set.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Xiao Wang <xiao.w.wang@intel.com>
Link: https://lore.kernel.org/r/20230918131518.56803-14-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
12 months agoRISC-V: selftests: Convert hwprobe test to kselftest API
Andrew Jones [Mon, 18 Sep 2023 13:15:24 +0000 (15:15 +0200)]
RISC-V: selftests: Convert hwprobe test to kselftest API

Returning (exiting with) negative exit codes isn't user friendly,
because the user must output the exit code with the shell, convert it
from its unsigned 8-bit value back to the negative value, and then
look up where that comes from in the code (which may be multiple
places). Use the kselftests TAP interface, instead.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230918131518.56803-13-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
12 months agoRISC-V: selftests: Statically link hwprobe test
Andrew Jones [Mon, 18 Sep 2023 13:15:23 +0000 (15:15 +0200)]
RISC-V: selftests: Statically link hwprobe test

Statically linking makes it more convenient to copy the test to a
minimal busybox environment.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230918131518.56803-12-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
12 months agoRISC-V: hwprobe: Expose Zicboz extension and its block size
Andrew Jones [Mon, 18 Sep 2023 13:15:22 +0000 (15:15 +0200)]
RISC-V: hwprobe: Expose Zicboz extension and its block size

Expose Zicboz through hwprobe and also provide a key to extract its
respective block size. Opportunistically add a macro and apply it to
current extensions in order to avoid duplicating code.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230918131518.56803-11-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
12 months agoRISC-V: Enable cbo.zero in usermode
Andrew Jones [Mon, 18 Sep 2023 13:15:21 +0000 (15:15 +0200)]
RISC-V: Enable cbo.zero in usermode

When Zicboz is present, enable its instruction (cbo.zero) in
usermode by setting its respective senvcfg bit. We don't bother
trying to set this bit per-task, which would also require an
interface for tasks to request enabling and/or disabling. Instead,
permanently set the bit for each hart which has the extension when
bringing it online.

This patch also introduces riscv_cpu_has_extension_[un]likely()
functions to check a specific hart's ISA bitmap for extensions.
Prior to checking the specific hart's bitmap in these functions
we try the bitmap which represents the LCD of extensions, but only
when we know it will use its optimized, alternatives path by gating
its call on CONFIG_RISCV_ALTERNATIVE. When alternatives are used, the
compiler ensures that the invocation of the LCD search becomes a
constant true or false. When it's true, even the new functions will
completely vanish from their callsites. OTOH, when the LCD check is
false, we need to do a search of the hart's ISA bitmap. Had we also
checked the LCD bitmap without the use of alternatives, then we would
have ended up with two bitmap searches instead of one.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230918131518.56803-10-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
12 months agoRISC-V: Make zicbom/zicboz errors consistent
Andrew Jones [Mon, 18 Sep 2023 13:15:20 +0000 (15:15 +0200)]
RISC-V: Make zicbom/zicboz errors consistent

commit c818fea83de4 ("riscv: say disabling zicbom if no or bad
riscv,cbom-block-size found") improved the error messages for
zicbom but zicboz was missed since its patches were in flight
at the same time. Get 'em now.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230918131518.56803-9-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
12 months agoriscv: kexec: Remove -fPIE for PURGATORY_CFLAGS
Song Shuai [Thu, 7 Sep 2023 10:33:04 +0000 (18:33 +0800)]
riscv: kexec: Remove -fPIE for PURGATORY_CFLAGS

With CONFIG_RELOCATABLE enabled, KBUILD_CFLAGS had a -fPIE option
and then the purgatory/string.o was built to reference _ctype symbol
via R_RISCV_GOT_HI20 relocations which can't be handled by purgatory.

As a consequence, the kernel failed kexec_load_file() with:

[  880.386562] kexec_image: The entry point of kernel at 0x80200000
[  880.388650] kexec_image: Unknown rela relocation: 20
[  880.389173] kexec_image: Error loading purgatory ret=-8

So remove the -fPIE option for PURGATORY_CFLAGS to generate
R_RISCV_PCREL_HI20 relocations type making puragtory work as it was.

Fixes: 39b33072941f ("riscv: Introduce CONFIG_RELOCATABLE")
Signed-off-by: Song Shuai <songshuaishuai@tinylab.org>
Link: https://lore.kernel.org/r/20230907103304.590739-4-songshuaishuai@tinylab.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
12 months agoriscv: kexec: Align the kexeced kernel entry
Song Shuai [Thu, 7 Sep 2023 10:33:03 +0000 (18:33 +0800)]
riscv: kexec: Align the kexeced kernel entry

The current riscv boot protocol requires 2MB alignment for RV64
and 4MB alignment for RV32.

In KEXEC_FILE path, the elf_find_pbase() function should align
the kexeced kernel entry according to the requirement, otherwise
the kexeced kernel would silently BUG at the setup_vm().

Fixes: 8acea455fafa ("RISC-V: Support for kexec_file on panic")
Signed-off-by: Song Shuai <songshuaishuai@tinylab.org>
Link: https://lore.kernel.org/r/20230907103304.590739-3-songshuaishuai@tinylab.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
12 months agoriscv: kexec: Cleanup riscv_kexec_relocate
Song Shuai [Thu, 7 Sep 2023 10:33:02 +0000 (18:33 +0800)]
riscv: kexec: Cleanup riscv_kexec_relocate

For readability and simplicity, cleanup the riscv_kexec_relocate code:

- Re-sort the first 4 `mv` instructions against `riscv_kexec_method()`
- Eliminate registers for debugging (s9,s10,s11) and storing const-value (s5,s6)
- Replace `jalr` with `jr` for no-link jump

I tested this on Qemu virt machine and works as it was.

Signed-off-by: Song Shuai <songshuaishuai@tinylab.org>
Link: https://lore.kernel.org/r/20230907103304.590739-2-songshuaishuai@tinylab.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
12 months agoriscv: errata: fix T-Head dcache.cva encoding
Icenowy Zheng [Tue, 12 Sep 2023 07:24:10 +0000 (15:24 +0800)]
riscv: errata: fix T-Head dcache.cva encoding

The dcache.cva encoding shown in the comments are wrong, it's for
dcache.cval1 (which is restricted to L1) instead.

Fix this in the comment and in the hardcoded instruction.

Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Tested-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Guo Ren <guoren@kernel.org>
Tested-by: Drew Fustini <dfustini@baylibre.com>
Link: https://lore.kernel.org/r/20230912072410.2481-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
12 months agoriscv: kexec: Align the kexeced kernel entry
Song Shuai [Wed, 6 Sep 2023 09:58:17 +0000 (17:58 +0800)]
riscv: kexec: Align the kexeced kernel entry

The current riscv boot protocol requires 2MB alignment for RV64
and 4MB alignment for RV32.

In KEXEC_FILE path, the elf_find_pbase() function should align
the kexeced kernel entry according to the requirement, otherwise
the kexeced kernel would silently BUG at the setup_vm().

Fixes: 8acea455fafa ("RISC-V: Support for kexec_file on panic")
Signed-off-by: Song Shuai <songshuaishuai@tinylab.org>
Link: https://lore.kernel.org/r/20230906095817.364390-1-songshuaishuai@tinylab.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
12 months agoLinux 6.6-rc1 v6.6-rc1
Linus Torvalds [Sun, 10 Sep 2023 23:28:41 +0000 (16:28 -0700)]
Linux 6.6-rc1

12 months agoMerge tag 'topic/drm-ci-2023-08-31-1' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Sun, 10 Sep 2023 18:55:26 +0000 (11:55 -0700)]
Merge tag 'topic/drm-ci-2023-08-31-1' of git://anongit.freedesktop.org/drm/drm

Pull drm ci scripts from Dave Airlie:
 "This is a bunch of ci integration for the freedesktop gitlab instance
  where we currently do upstream userspace testing on diverse sets of
  GPU hardware. From my perspective I think it's an experiment worth
  going with and seeing how the benefits/noise playout keeping these
  files useful.

  Ideally I'd like to get this so we can do pre-merge testing on PRs
  eventually.

  Below is some info from danvet on why we've ended up making the
  decision and how we can roll it back if we decide it was a bad plan.

  Why in upstream?

   - like documentation, testcases, tools CI integration is one of these
     things where you can waste endless amounts of time if you
     accidentally have a version that doesn't match your source code

   - but also like the above, there's a balance, this is the initial cut
     of what we think makes sense to keep in sync vs out-of-tree,
     probably needs adjustment

   - gitlab supports out-of-repo gitlab integration and that's what's
     been used for the kernel in drm, but it results in per-driver
     fragmentation and lots of duplicated effort. the simple act of
     smashing an arbitrary winner into a topic branch already started
     surfacing patches on dri-devel and sparking good cross driver team
     discussions

  Why gitlab?

   - it's not any more shit than any of the other CI

   - drm userspace uses it extensively for everything in userspace, we
     have a lot of people and experience with this, including
     integration of hw testing labs

   - media userspace like gstreamer is also on gitlab.fd.o, and there's
     discussion to extend this to the media subsystem in some fashion

  Can this be shared?

   - there's definitely a pile of code that could move to scripts/ if
     other subsystem adopt ci integration in upstream kernel git. other
     bits are more drm/gpu specific like the igt-gpu-tests/tools
     integration

   - docker images can be run locally or in other CI runners

  Will we regret this?

   - it's all in one directory, intentionally, for easy deletion

   - probably 1-2 years in upstream to see whether this is worth it or a
     Big Mistake. that's roughly what it took to _really_ roll out solid
     CI in the bigger userspace projects we have on gitlab.fd.o like
     mesa3d"

* tag 'topic/drm-ci-2023-08-31-1' of git://anongit.freedesktop.org/drm/drm:
  drm: ci: docs: fix build warning - add missing escape
  drm: Add initial ci/ subdirectory

12 months agoMerge tag 'x86-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 10 Sep 2023 17:39:31 +0000 (10:39 -0700)]
Merge tag 'x86-urgent-2023-09-10' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
 "Fix preemption delays in the SGX code, remove unnecessarily
  UAPI-exported code, fix a ld.lld linker (in)compatibility quirk and
  make the x86 SMP init code a bit more conservative to fix kexec()
  lockups"

* tag 'x86-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sgx: Break up long non-preemptible delays in sgx_vepc_release()
  x86: Remove the arch_calc_vm_prot_bits() macro from the UAPI
  x86/build: Fix linker fill bytes quirk/incompatibility for ld.lld
  x86/smp: Don't send INIT to non-present and non-booted CPUs

12 months agoMerge tag 'perf-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 10 Sep 2023 17:34:46 +0000 (10:34 -0700)]
Merge tag 'perf-urgent-2023-09-10' of git://git./linux/kernel/git/tip/tip

Pull x86 perf event fix from Ingo Molnar:
 "Work around a firmware bug in the uncore PMU driver, affecting certain
  Intel systems"

* tag 'perf-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/uncore: Correct the number of CHAs on EMR

12 months agoMerge tag 'perf-tools-for-v6.6-1-2023-09-05' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sun, 10 Sep 2023 03:06:17 +0000 (20:06 -0700)]
Merge tag 'perf-tools-for-v6.6-1-2023-09-05' of git://git./linux/kernel/git/perf/perf-tools

Pull perf tools updates from Arnaldo Carvalho de Melo:
 "perf tools maintainership:

   - Add git information for perf-tools and perf-tools-next trees and
     branches to the MAINTAINERS file. That is where development now
     takes place and myself and Namhyung Kim have write access, more
     people to come as we emulate other maintainer groups.

  perf record:

   - Record kernel data maps when 'perf record --data' is used, so that
     global variables can be resolved and used in tools that do data
     profiling.

  perf trace:

   - Remove the old, experimental support for BPF events in which a .c
     file was passed as an event: "perf trace -e hello.c" to then get
     compiled and loaded.

     The only known usage for that, that shipped with the kernel as an
     example for such events, augmented the raw_syscalls tracepoints and
     was converted to a libbpf skeleton, reusing all the user space
     components and the BPF code connected to the syscalls.

     In the end just the way to glue the BPF part and the user space
     type beautifiers changed, now being performed by libbpf skeletons.

     The next step is to use BTF to do pretty printing of all syscall
     types, as discussed with Alan Maguire and others.

     Now, on a perf built with BUILD_BPF_SKEL=1 we get most if not all
     path/filenames/strings, some of the networking data structures,
     perf_event_attr, etc, i.e. systemwide tracing of nanosleep calls
     and perf_event_open syscalls while 'perf stat' runs 'sleep' for 5
     seconds:

      # perf trace -a -e *nanosleep,perf* perf stat -e cycles,instructions sleep 5
         0.000 (   9.034 ms): perf/327641 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 327642 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3
         9.039 (   0.006 ms): perf/327641 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x1 (PERF_COUNT_HW_INSTRUCTIONS), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 327642 (perf-exec), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
             ? (           ): gpm/991  ... [continued]: clock_nanosleep())               = 0
        10.133 (           ): sleep/327642 clock_nanosleep(rqtp: { .tv_sec: 5, .tv_nsec: 0 }, rmtp: 0x7ffd36f83ed0) ...
             ? (           ): pool-gsd-smart/3051  ... [continued]: clock_nanosleep())   = 0
        30.276 (           ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ...
       223.215 (1000.430 ms): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) = 0
        30.276 (2000.394 ms): gpm/991  ... [continued]: clock_nanosleep())               = 0
      1230.814 (           ): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) ...
      1230.814 (1000.404 ms): pool-gsd-smart/3051  ... [continued]: clock_nanosleep())   = 0
      2030.886 (           ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ...
      2237.709 (1000.153 ms): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) = 0
             ? (           ): crond/1172  ... [continued]: clock_nanosleep())            = 0
      3242.699 (           ): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) ...
      2030.886 (2000.385 ms): gpm/991  ... [continued]: clock_nanosleep())               = 0
      3728.078 (           ): crond/1172 clock_nanosleep(rqtp: { .tv_sec: 60, .tv_nsec: 0 }, rmtp: 0x7ffe0971dcf0) ...
      3242.699 (1000.158 ms): pool-gsd-smart/3051  ... [continued]: clock_nanosleep())   = 0
      4031.409 (           ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ...
        10.133 (5000.375 ms): sleep/327642  ... [continued]: clock_nanosleep())          = 0

      Performance counter stats for 'sleep 5':

             2,617,347      cycles
             1,855,997      instructions                     #    0.71  insn per cycle

           5.002282128 seconds time elapsed

           0.000855000 seconds user
           0.000852000 seconds sys

  perf annotate:

   - Building with binutils' libopcode now is opt-in (BUILD_NONDISTRO=1)
     for licensing reasons, and we missed a build test on
     tools/perf/tests makefile.

     Since we now default to NDEBUG=1, we ended up segfaulting when
     building with BUILD_NONDISTRO=1 because a needed initialization
     routine was being "error checked" via an assert.

     Fix it by explicitly checking the result and aborting instead if it
     fails.

     We better back propagate the error, but at least 'perf annotate' on
     samples collected for a BPF program is back working when perf is
     built with BUILD_NONDISTRO=1.

  perf report/top:

   - Add back TUI hierarchy mode header, that is seen when using 'perf
     report/top --hierarchy'.

   - Fix the number of entries for 'e' key in the TUI that was
     preventing navigation of lines when expanding an entry.

  perf report/script:

   - Support cross platform register handling, allowing a perf.data file
     collected on one architecture to have registers sampled correctly
     displayed when analysis tools such as 'perf report' and 'perf
     script' are used on a different architecture.

   - Fix handling of event attributes in pipe mode, i.e. when one uses:

   perf record -o - | perf report -i -

     When no perf.data files are used.

   - Handle files generated via pipe mode with a version of perf and
     then read also via pipe mode with a different version of perf,
     where the event attr record may have changed, use the record size
     field to properly support this version mismatch.

  perf probe:

   - Accessing global variables from uprobes isn't supported, make the
     error message state that instead of stating that some minimal
     kernel version is needed to have that feature. This seems just a
     tool limitation, the kernel probably has all that is needed.

  perf tests:

   - Fix a reference count related leak in the dlfilter v0 API where the
     result of a thread__find_symbol_fb() is not matched with an
     addr_location__exit() to drop the reference counts of the resolved
     components (machine, thread, map, symbol, etc). Add a dlfilter test
     to make sure that doesn't regresses.

   - Lots of fixes for the 'perf test' written in shell script related
     to problems found with the shellcheck utility.

   - Fixes for 'perf test' shell scripts testing features enabled when
     perf is built with BUILD_BPF_SKEL=1, such as 'perf stat' bpf
     counters.

   - Add perf record sample filtering test, things like the following
     example, that gets implemented as a BPF filter attached to the
     event:

       # perf record -e task-clock -c 10000 --filter 'ip < 0xffffffff00000000'

   - Improve the way the task_analyzer test checks if libtraceevent is
     linked, using 'perf version --build-options' instead of the more
     expensinve 'perf record -e "sched:sched_switch"'.

   - Add support for riscv in the mmap-basic test. (This went as well
     via the RiscV tree, same contents).

  libperf:

   - Implement riscv mmap support (This went as well via the RiscV tree,
     same contents).

  perf script:

   - New tool that converts perf.data files to the firefox profiler
     format so that one can use the visualizer at
     https://profiler.firefox.com/. Done by Anup Sharma as part of this
     year's Google Summer of Code.

     One can generate the output and upload it to the web interface but
     Anup also automated everything:

       perf script gecko -F 99 -a sleep 60

   - Support syscall name parsing on arm64.

   - Print "cgroup" field on the same line as "comm".

  perf bench:

   - Add new 'uprobe' benchmark to measure the overhead of uprobes
     with/without BPF programs attached to it.

   - breakpoints are not available on power9, skip that test.

  perf stat:

   - Add #num_cpus_online literal to be used in 'perf stat' metrics, and
     add this extra 'perf test' check that exemplifies its purpose:

   TEST_ASSERT_VAL("#num_cpus_online",
                         expr__parse(&num_cpus_online, ctx, "#num_cpus_online") == 0);
   TEST_ASSERT_VAL("#num_cpus", expr__parse(&num_cpus, ctx, "#num_cpus") == 0);
   TEST_ASSERT_VAL("#num_cpus >= #num_cpus_online", num_cpus >= num_cpus_online);

  Miscellaneous:

   - Improve tool startup time by lazily reading PMU, JSON, sysfs data.

   - Improve error reporting in the parsing of events, passing YYLTYPE
     to error routines, so that the output can show were the parsing
     error was found.

   - Add 'perf test' entries to check the parsing of events
     improvements.

   - Fix various leak for things detected by -fsanitize=address, mostly
     things that would be freed at tool exit, including:

       - Free evsel->filter on the destructor.

       - Allow tools to register a thread->priv destructor and use it in
         'perf trace'.

       - Free evsel->priv in 'perf trace'.

       - Free string returned by synthesize_perf_probe_point() when the
         caller fails to do all it needs.

   - Adjust various compiler options to not consider errors some
     warnings when building with broken headers found in things like
     python, flex, bison, as we otherwise build with -Werror. Some for
     gcc, some for clang, some for some specific version of those, some
     for some specific version of flex or bison, or some specific
     combination of these components, bah.

   - Allow customization of clang options for BPF target, this helps
     building on gentoo where there are other oddities where BPF targets
     gets passed some compiler options intended for the native build, so
     building with WERROR=0 helps while these oddities are fixed.

   - Dont pass ERR_PTR() values to perf_session__delete() in 'perf top'
     and 'perf lock', fixing some segfaults when handling some odd
     failures.

   - Add LTO build option.

   - Fix format of unordered lists in the perf docs
     (tools/perf/Documentation)

   - Overhaul the bison files, using constructs such as YYNOMEM.

   - Remove unused tokens from the bison .y files.

   - Add more comments to various structs.

   - A few LoongArch enablement patches.

  Vendor events (JSON):

   - Add JSON metrics for Yitian 710 DDR (aarch64). Things like:

   EventName, BriefDescription
   visible_window_limit_reached_rd, "At least one entry in read queue reaches the visible window limit.",
   visible_window_limit_reached_wr, "At least one entry in write queue reaches the visible window limit.",
   op_is_dqsosc_mpc        , "A DQS Oscillator MPC command to DRAM.",
   op_is_dqsosc_mrr        , "A DQS Oscillator MRR command to DRAM.",
   op_is_tcr_mrr        , "A Temperature Compensated Refresh(TCR) MRR command to DRAM.",

   - Add AmpereOne metrics (aarch64).

   - Update N2 and V2 metrics (aarch64) and events using Arm telemetry
     repo.

   - Update scale units and descriptions of common topdown metrics on
     aarch64. Things like:
       - "MetricExpr": "stall_slot_frontend / (#slots * cpu_cycles)",
       - "BriefDescription": "Frontend bound L1 topdown metric",
       + "MetricExpr": "100 * (stall_slot_frontend / (#slots * cpu_cycles))",
       + "BriefDescription": "This metric is the percentage of total slots that were stalled due to resource constraints in the frontend of the processor.",

   - Update events for intel: meteorlake to 1.04, sapphirerapids to
     1.15, Icelake+ metric constraints.

   - Update files for the power10 platform"

* tag 'perf-tools-for-v6.6-1-2023-09-05' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (217 commits)
  perf parse-events: Fix driver config term
  perf parse-events: Fixes relating to no_value terms
  perf parse-events: Fix propagation of term's no_value when cloning
  perf parse-events: Name the two term enums
  perf list: Don't print Unit for "default_core"
  perf vendor events intel: Fix modifier in tma_info_system_mem_parallel_reads for skylake
  perf dlfilter: Avoid leak in v0 API test use of resolve_address()
  perf metric: Add #num_cpus_online literal
  perf pmu: Remove str from perf_pmu_alias
  perf parse-events: Make common term list to strbuf helper
  perf parse-events: Minor help message improvements
  perf pmu: Avoid uninitialized use of alias->str
  perf jevents: Use "default_core" for events with no Unit
  perf test stat_bpf_counters_cgrp: Enhance perf stat cgroup BPF counter test
  perf test shell stat_bpf_counters: Fix test on Intel
  perf test shell record_bpf_filter: Skip 6.2 kernel
  libperf: Get rid of attr.id field
  perf tools: Convert to perf_record_header_attr_id()
  libperf: Add perf_record_header_attr_id()
  perf tools: Handle old data in PERF_RECORD_ATTR
  ...

12 months agoMerge tag '6.6-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sun, 10 Sep 2023 02:56:23 +0000 (19:56 -0700)]
Merge tag '6.6-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - six smb3 client fixes including ones to allow controlling smb3
   directory caching timeout and limits, and one debugging improvement

 - one fix for nls Kconfig (don't need to expose NLS_UCS2_UTILS option)

 - one minor spnego registry update

* tag '6.6-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
  spnego: add missing OID to oid registry
  smb3: fix minor typo in SMB2_GLOBAL_CAP_LARGE_MTU
  cifs: update internal module version number for cifs.ko
  smb3: allow controlling maximum number of cached directories
  smb3: add trace point for queryfs (statfs)
  nls: Hide new NLS_UCS2_UTILS
  smb3: allow controlling length of time directory entries are cached with dir leases
  smb: propagate error code of extract_sharename()

12 months agoiov_iter: Kunit tests for page extraction
David Howells [Fri, 8 Sep 2023 16:03:22 +0000 (17:03 +0100)]
iov_iter: Kunit tests for page extraction

Add some kunit tests for page extraction for ITER_BVEC, ITER_KVEC and
ITER_XARRAY type iterators.  ITER_UBUF and ITER_IOVEC aren't dealt with
as they require userspace VM interaction.  ITER_DISCARD isn't dealt with
either as that can't be extracted.

Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Hildenbrand <david@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 months agoiov_iter: Kunit tests for copying to/from an iterator
David Howells [Fri, 8 Sep 2023 16:03:21 +0000 (17:03 +0100)]
iov_iter: Kunit tests for copying to/from an iterator

Add some kunit tests for page extraction for ITER_BVEC, ITER_KVEC and
ITER_XARRAY type iterators.  ITER_UBUF and ITER_IOVEC aren't dealt with
as they require userspace VM interaction.  ITER_DISCARD isn't dealt with
either as that does nothing.

Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Hildenbrand <david@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 months agoiov_iter: Fix iov_iter_extract_pages() with zero-sized entries
David Howells [Fri, 8 Sep 2023 16:03:20 +0000 (17:03 +0100)]
iov_iter: Fix iov_iter_extract_pages() with zero-sized entries

iov_iter_extract_pages() doesn't correctly handle skipping over initial
zero-length entries in ITER_KVEC and ITER_BVEC-type iterators.

The problem is that it accidentally reduces maxsize to 0 when it
skipping and thus runs to the end of the array and returns 0.

Fix this by sticking the calculated size-to-copy in a new variable
rather than back in maxsize.

Fixes: 7d58fe731028 ("iov_iter: Add a function to extract a page list from an iterator")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Hildenbrand <david@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 months agoMerge tag 'sh-for-v6.6-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubit...
Linus Torvalds [Sat, 9 Sep 2023 21:46:57 +0000 (14:46 -0700)]
Merge tag 'sh-for-v6.6-tag1' of git://git./linux/kernel/git/glaubitz/sh-linux

Pull sh updates from Adrian Glaubitz:

 - Fix a use-after-free bug in the push-switch driver (Duoming Zhou)

 - Fix calls to dma_declare_coherent_memory() that incorrectly passed
   the buffer end address instead of the buffer size as the size
   parameter

* tag 'sh-for-v6.6-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
  sh: push-switch: Reorder cleanup operations to avoid use-after-free bug
  sh: boards: Fix CEU buffer size passed to dma_declare_coherent_memory()

12 months agoMerge tag 'riscv-for-linus-6.6-mw2-2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 9 Sep 2023 21:25:11 +0000 (14:25 -0700)]
Merge tag 'riscv-for-linus-6.6-mw2-2' of git://git./linux/kernel/git/riscv/linux

Pull more RISC-V updates from Palmer Dabbelt:

 - The kernel now dynamically probes for misaligned access speed, as
   opposed to relying on a table of known implementations.

 - Support for non-coherent devices on systems using the Andes AX45MP
   core, including the RZ/Five SoCs.

 - Support for the V extension in ptrace(), again.

 - Support for KASLR.

 - Support for the BPF prog pack allocator in RISC-V.

 - A handful of bug fixes and cleanups.

* tag 'riscv-for-linus-6.6-mw2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (25 commits)
  soc: renesas: Kconfig: For ARCH_R9A07G043 select the required configs if dependencies are met
  riscv: Kconfig.errata: Add dependency for RISCV_SBI in ERRATA_ANDES config
  riscv: Kconfig.errata: Drop dependency for MMU in ERRATA_ANDES_CMO config
  riscv: Kconfig: Select DMA_DIRECT_REMAP only if MMU is enabled
  bpf, riscv: use prog pack allocator in the BPF JIT
  riscv: implement a memset like function for text
  riscv: extend patch_text_nosync() for multiple pages
  bpf: make bpf_prog_pack allocator portable
  riscv: libstub: Implement KASLR by using generic functions
  libstub: Fix compilation warning for rv32
  arm64: libstub: Move KASLR handling functions to kaslr.c
  riscv: Dump out kernel offset information on panic
  riscv: Introduce virtual kernel mapping KASLR
  RISC-V: Add ptrace support for vectors
  soc: renesas: Kconfig: Select the required configs for RZ/Five SoC
  cache: Add L2 cache management for Andes AX45MP RISC-V core
  dt-bindings: cache: andestech,ax45mp-cache: Add DT binding documentation for L2 cache controller
  riscv: mm: dma-noncoherent: nonstandard cache operations support
  riscv: errata: Add Andes alternative ports
  riscv: asm: vendorid_list: Add Andes Technology to the vendors list
  ...

12 months agosh: push-switch: Reorder cleanup operations to avoid use-after-free bug
Duoming Zhou [Wed, 2 Aug 2023 03:37:37 +0000 (11:37 +0800)]
sh: push-switch: Reorder cleanup operations to avoid use-after-free bug

The original code puts flush_work() before timer_shutdown_sync()
in switch_drv_remove(). Although we use flush_work() to stop
the worker, it could be rescheduled in switch_timer(). As a result,
a use-after-free bug can occur. The details are shown below:

      (cpu 0)                    |      (cpu 1)
switch_drv_remove()              |
 flush_work()                    |
  ...                            |  switch_timer // timer
                                 |   schedule_work(&psw->work)
 timer_shutdown_sync()           |
 ...                             |  switch_work_handler // worker
 kfree(psw) // free              |
                                 |   psw->state = 0 // use

This patch puts timer_shutdown_sync() before flush_work() to
mitigate the bugs. As a result, the worker and timer will be
stopped safely before the deallocate operations.

Fixes: 9f5e8eee5cfe ("sh: generic push-switch framework.")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Link: https://lore.kernel.org/r/20230802033737.9738-1-duoming@zju.edu.cn
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
12 months agosh: boards: Fix CEU buffer size passed to dma_declare_coherent_memory()
Petr Tesarik [Mon, 24 Jul 2023 12:07:42 +0000 (14:07 +0200)]
sh: boards: Fix CEU buffer size passed to dma_declare_coherent_memory()

In all these cases, the last argument to dma_declare_coherent_memory() is
the buffer end address, but the expected value should be the size of the
reserved region.

Fixes: 39fb993038e1 ("media: arch: sh: ap325rxa: Use new renesas-ceu camera driver")
Fixes: c2f9b05fd5c1 ("media: arch: sh: ecovec: Use new renesas-ceu camera driver")
Fixes: f3590dc32974 ("media: arch: sh: kfr2r09: Use new renesas-ceu camera driver")
Fixes: 186c446f4b84 ("media: arch: sh: migor: Use new renesas-ceu camera driver")
Fixes: 1a3c230b4151 ("media: arch: sh: ms7724se: Use new renesas-ceu camera driver")
Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Link: https://lore.kernel.org/r/20230724120742.2187-1-petrtesarik@huaweicloud.com
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>