riscv: Stop emitting preventive sfence.vma for new vmalloc mappings
authorAlexandre Ghiti <alexghiti@rivosinc.com>
Wed, 17 Jul 2024 06:01:24 +0000 (08:01 +0200)
committerPalmer Dabbelt <palmer@rivosinc.com>
Sun, 15 Sep 2024 07:11:04 +0000 (00:11 -0700)
commit503638e0babf364061bc50fca5103b00a56cc50a
tree400ef258cc2d3ce9f8aaa45b07bdefff6a14290a
parentd25599b5933fb5f89d4b4c720564d613a795f502
riscv: Stop emitting preventive sfence.vma for new vmalloc mappings

In 6.5, we removed the vmalloc fault path because that can't work (see
[1] [2]). Then in order to make sure that new page table entries were
seen by the page table walker, we had to preventively emit a sfence.vma
on all harts [3] but this solution is very costly since it relies on IPI.

And even there, we could end up in a loop of vmalloc faults if a vmalloc
allocation is done in the IPI path (for example if it is traced, see
[4]), which could result in a kernel stack overflow.

Those preventive sfence.vma needed to be emitted because:

- if the uarch caches invalid entries, the new mapping may not be
  observed by the page table walker and an invalidation may be needed.
- if the uarch does not cache invalid entries, a reordered access
  could "miss" the new mapping and traps: in that case, we would actually
  only need to retry the access, no sfence.vma is required.

So this patch removes those preventive sfence.vma and actually handles
the possible (and unlikely) exceptions. And since the kernel stacks
mappings lie in the vmalloc area, this handling must be done very early
when the trap is taken, at the very beginning of handle_exception: this
also rules out the vmalloc allocations in the fault path.

Link: https://lore.kernel.org/linux-riscv/20230531093817.665799-1-bjorn@kernel.org/
Link: https://lore.kernel.org/linux-riscv/20230801090927.2018653-1-dylan@andestech.com
Link: https://lore.kernel.org/linux-riscv/20230725132246.817726-1-alexghiti@rivosinc.com/
Link: https://lore.kernel.org/lkml/20200508144043.13893-1-joro@8bytes.org/
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Yunhui Cui <cuiyunhui@bytedance.com>
Link: https://lore.kernel.org/r/20240717060125.139416-4-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
arch/riscv/include/asm/cacheflush.h
arch/riscv/include/asm/thread_info.h
arch/riscv/kernel/asm-offsets.c
arch/riscv/kernel/entry.S
arch/riscv/mm/init.c