linux-2.6-block.git
3 weeks agoKVM: arm64: selftests: Sync ID_AA64MMFR3_EL1 in set_id_regs
Mark Brown [Mon, 18 Aug 2025 16:41:00 +0000 (17:41 +0100)]
KVM: arm64: selftests: Sync ID_AA64MMFR3_EL1 in set_id_regs

When we added coverage for ID_AA64MMFR3_EL1 we didn't add it to the list
of registers we read in the guest, do so.

Fixes: 0b593ef12afc ("KVM: arm64: selftests: Catch up set_id_regs with the kernel")
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20250818-kvm-arm64-selftests-mmfr3-idreg-v1-1-2f85114d0163@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
3 weeks agoKVM: arm64: Get rid of ARM64_FEATURE_MASK()
Marc Zyngier [Sun, 17 Aug 2025 20:21:58 +0000 (21:21 +0100)]
KVM: arm64: Get rid of ARM64_FEATURE_MASK()

The ARM64_FEATURE_MASK() macro was a hack introduce whilst the
automatic generation of sysreg encoding was introduced, and was
too unreliable to be entirely trusted.

We are in a better place now, and we could really do without this
macro. Get rid of it altogether.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250817202158.395078-7-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
3 weeks agoKVM: arm64: Make ID_AA64PFR1_EL1.RAS_frac writable
Marc Zyngier [Sun, 17 Aug 2025 20:21:57 +0000 (21:21 +0100)]
KVM: arm64: Make ID_AA64PFR1_EL1.RAS_frac writable

Allow userspace to write to RAS_frac, under the condition that
the host supports RASv1p1 with RAS_frac==1. Other configurations
will result in RAS_frac being exposed as 0, and therefore implicitly
not writable.

To avoid the clutter, the ID_AA64PFR1_EL1 sanitisation is moved to
its own function.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Link: https://lore.kernel.org/r/20250817202158.395078-6-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
3 weeks agoKVM: arm64: Make ID_AA64PFR0_EL1.RAS writable
Marc Zyngier [Sun, 17 Aug 2025 20:21:56 +0000 (21:21 +0100)]
KVM: arm64: Make ID_AA64PFR0_EL1.RAS writable

Make ID_AA64PFR0_EL1.RAS writable so that we can restore a VM from
a system without RAS to a RAS-equipped machine (or disable RAS
in the guest).

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Link: https://lore.kernel.org/r/20250817202158.395078-5-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
3 weeks agoKVM: arm64: Ignore HCR_EL2.FIEN set by L1 guest's EL2
Marc Zyngier [Sun, 17 Aug 2025 20:21:55 +0000 (21:21 +0100)]
KVM: arm64: Ignore HCR_EL2.FIEN set by L1 guest's EL2

An EL2 guest can set HCR_EL2.FIEN, which gives access to the RASv1p1
fault injection mechanism. This would allow an EL1 guest to inject
error records into the system, which does sound like a terrible idea.

Prevent this situation by added FIEN to the list of bits we silently
exclude from being inserted into the host configuration.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20250817202158.395078-4-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
3 weeks agoKVM: arm64: Handle RASv1p1 registers
Marc Zyngier [Sun, 17 Aug 2025 20:21:54 +0000 (21:21 +0100)]
KVM: arm64: Handle RASv1p1 registers

FEAT_RASv1p1 system registeres are not handled at all so far.
KVM will give an embarassed warning on the console and inject
an UNDEF, despite RASv1p1 being exposed to the guest on suitable HW.

Handle these registers similarly to FEAT_RAS, with the added fun
that there are *two* way to indicate the presence of FEAT_RASv1p1.

Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Link: https://lore.kernel.org/r/20250817202158.395078-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
3 weeks agoarm64: Add capability denoting FEAT_RASv1p1
Marc Zyngier [Sun, 17 Aug 2025 20:21:53 +0000 (21:21 +0100)]
arm64: Add capability denoting FEAT_RASv1p1

Detecting FEAT_RASv1p1 is rather complicated, as there are two
ways for the architecture to advertise the same thing (always a
delight...).

Add a capability that will advertise this in a synthetic way to
the rest of the kernel.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Link: https://lore.kernel.org/r/20250817202158.395078-2-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
3 weeks agoKVM: arm64: Reschedule as needed when destroying the stage-2 page-tables
Raghavendra Rao Ananta [Wed, 20 Aug 2025 16:22:42 +0000 (16:22 +0000)]
KVM: arm64: Reschedule as needed when destroying the stage-2 page-tables

When a large VM, specifically one that holds a significant number of PTEs,
gets abruptly destroyed, the following warning is seen during the
page-table walk:

 sched: CPU 0 need_resched set for > 100018840 ns (100 ticks) without schedule
 CPU: 0 UID: 0 PID: 9617 Comm: kvm_page_table_ Tainted: G O 6.16.0-smp-DEV #3 NONE
 Tainted: [O]=OOT_MODULE
 Call trace:
  show_stack+0x20/0x38 (C)
  dump_stack_lvl+0x3c/0xb8
  dump_stack+0x18/0x30
  resched_latency_warn+0x7c/0x88
  sched_tick+0x1c4/0x268
  update_process_times+0xa8/0xd8
  tick_nohz_handler+0xc8/0x168
  __hrtimer_run_queues+0x11c/0x338
  hrtimer_interrupt+0x104/0x308
  arch_timer_handler_phys+0x40/0x58
  handle_percpu_devid_irq+0x8c/0x1b0
  generic_handle_domain_irq+0x48/0x78
  gic_handle_irq+0x1b8/0x408
  call_on_irq_stack+0x24/0x30
  do_interrupt_handler+0x54/0x78
  el1_interrupt+0x44/0x88
  el1h_64_irq_handler+0x18/0x28
  el1h_64_irq+0x84/0x88
  stage2_free_walker+0x30/0xa0 (P)
  __kvm_pgtable_walk+0x11c/0x258
  __kvm_pgtable_walk+0x180/0x258
  __kvm_pgtable_walk+0x180/0x258
  __kvm_pgtable_walk+0x180/0x258
  kvm_pgtable_walk+0xc4/0x140
  kvm_pgtable_stage2_destroy+0x5c/0xf0
  kvm_free_stage2_pgd+0x6c/0xe8
  kvm_uninit_stage2_mmu+0x24/0x48
  kvm_arch_flush_shadow_all+0x80/0xa0
  kvm_mmu_notifier_release+0x38/0x78
  __mmu_notifier_release+0x15c/0x250
  exit_mmap+0x68/0x400
  __mmput+0x38/0x1c8
  mmput+0x30/0x68
  exit_mm+0xd4/0x198
  do_exit+0x1a4/0xb00
  do_group_exit+0x8c/0x120
  get_signal+0x6d4/0x778
  do_signal+0x90/0x718
  do_notify_resume+0x70/0x170
  el0_svc+0x74/0xd8
  el0t_64_sync_handler+0x60/0xc8
  el0t_64_sync+0x1b0/0x1b8

The warning is seen majorly on the host kernels that are configured
not to force-preempt, such as CONFIG_PREEMPT_NONE=y. To avoid this,
instead of walking the entire page-table in one go, split it into
smaller ranges, by checking for cond_resched() between each range.
Since the path is executed during VM destruction, after the
page-table structure is unlinked from the KVM MMU, relying on
cond_resched_rwlock_write() isn't necessary.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Suggested-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250820162242.2624752-3-rananta@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
3 weeks agoKVM: arm64: Split kvm_pgtable_stage2_destroy()
Raghavendra Rao Ananta [Wed, 20 Aug 2025 16:22:41 +0000 (16:22 +0000)]
KVM: arm64: Split kvm_pgtable_stage2_destroy()

Split kvm_pgtable_stage2_destroy() into two:
  - kvm_pgtable_stage2_destroy_range(), that performs the
    page-table walk and free the entries over a range of addresses.
  - kvm_pgtable_stage2_destroy_pgd(), that frees the PGD.

This refactoring enables subsequent patches to free large page-tables
in chunks, calling cond_resched() between each chunk, to yield the
CPU as necessary.

Existing callers of kvm_pgtable_stage2_destroy(), that probably cannot
take advantage of this (such as nVMHE), will continue to function as is.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Suggested-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250820162242.2624752-2-rananta@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
4 weeks agoKVM: arm64: Correctly populate FAR_EL2 on nested SEA injection
Marc Zyngier [Wed, 13 Aug 2025 16:37:47 +0000 (17:37 +0100)]
KVM: arm64: Correctly populate FAR_EL2 on nested SEA injection

vcpu_write_sys_reg()'s signature is not totally obvious, and it
is rather easy to write something that looks correct, except that...
Oh wait...

Swap addr and FAR_EL2 to restore some sanity in the nested SEA
department.

Fixes: 9aba641b9ec2a ("KVM: arm64: nv: Respect exception routing rules for SEAs")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250813163747.2591317-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
5 weeks agoKVM: arm64: selftest: Add standalone test checking for KVM's own UUID
Marc Zyngier [Wed, 6 Aug 2025 17:13:41 +0000 (18:13 +0100)]
KVM: arm64: selftest: Add standalone test checking for KVM's own UUID

Tinkering with UUIDs is a perilious task, and the KVM UUID gets
broken at times. In order to spot this early enough, add a selftest
that will shout if the expected value isn't found.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Sebastian Ott <sebott@redhat.com>
Link: https://lore.kernel.org/r/20250721130558.50823-1-jackabt.amazon@gmail.com
Link: https://lore.kernel.org/r/20250806171341.1521210-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
5 weeks agoarm64: vgic-v2: Fix guest endianness check in hVHE mode
Fuad Tabba [Thu, 7 Aug 2025 12:01:33 +0000 (13:01 +0100)]
arm64: vgic-v2: Fix guest endianness check in hVHE mode

In hVHE when running at the hypervisor, SCTLR_EL1 refers to the
hypervisor's System Control Register rather than the guest's. Make sure
to access the guest's register to determine its endianness.

Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20250807120133.871892-4-tabba@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
5 weeks agoKVM: arm64: Sync protected guest VBAR_EL1 on injecting an undef exception
Fuad Tabba [Thu, 7 Aug 2025 12:01:32 +0000 (13:01 +0100)]
KVM: arm64: Sync protected guest VBAR_EL1 on injecting an undef exception

In pKVM, a race condition can occur if a guest updates its VBAR_EL1
register and, before a vCPU exit synchronizes this change, the
hypervisor needs to inject an undefined exception into a protected
guest.

In this scenario, the vCPU still holds the stale VBAR_EL1 value from
before the guest's update. When pKVM injects the exception, it ends up
using the stale value.

Explicitly read the live value of VBAR_EL1 from the guest and update the
vCPU value immediately before pending the exception. This ensures the
vCPU's value is the same as the guest's and that the exception will be
handled at the correct address upon resuming the guest.

Reported-by: Keir Fraser <keirf@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20250807120133.871892-3-tabba@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
5 weeks agoKVM: arm64: Handle AIDR_EL1 and REVIDR_EL1 in host for protected VMs
Fuad Tabba [Thu, 7 Aug 2025 12:01:31 +0000 (13:01 +0100)]
KVM: arm64: Handle AIDR_EL1 and REVIDR_EL1 in host for protected VMs

Since commit 17efc1acee62 ("arm64: Expose AIDR_EL1 via sysfs"), AIDR_EL1
is read early during boot. Therefore, a guest running as a protected VM
will fail to boot because when it attempts to access AIDR_EL1, access to
that register is restricted in pKVM for protected guests.

Similar to how MIDR_EL1 is handled by the host for protected VMs, let
the host handle accesses to AIDR_EL1 as well as REVIDR_EL1. However note
that, unlike MIDR_EL1, AIDR_EL1 and REVIDR_EL1 are trapped by
HCR_EL2.TID1. Therefore, explicitly mark them as handled by the host for
protected VMs. TID1 is always set in pKVM, because it needs to restrict
access to SMIDR_EL1, which is also trapped by that bit.

Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20250807120133.871892-2-tabba@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
5 weeks agokvm: arm64: use BUG() instead of BUG_ON(1)
Arnd Bergmann [Thu, 7 Aug 2025 07:21:28 +0000 (09:21 +0200)]
kvm: arm64: use BUG() instead of BUG_ON(1)

The BUG_ON() macro adds a little bit of complexity over BUG(), and in
some cases this ends up confusing the compiler's control flow analysis
in a way that results in a warning. This one now shows up with clang-21:

arch/arm64/kvm/vgic/vgic-mmio.c:1094:3: error: variable 'len' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
 1094 |                 BUG_ON(1);

Change both instances of BUG_ON(1) to a plain BUG() in the arm64 kvm
code, to avoid the false-positive warning.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20250807072132.4170088-1-arnd@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
5 weeks agoKVM: arm64: nv: Handle SEAs due to VNCR redirection
Oliver Upton [Tue, 29 Jul 2025 18:23:42 +0000 (11:23 -0700)]
KVM: arm64: nv: Handle SEAs due to VNCR redirection

System register accesses redirected to the VNCR page can also generate
external aborts just like any other form of memory access. Route to
kvm_handle_guest_sea() for potential APEI handling, falling back to a
vSError if the kernel didn't handle the abort.

Take the opportunity to throw out the useless kvm_ras.h which provided a
helper with a single callsite...

Cc: Jiaqi Yan <jiaqiyan@google.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250729182342.3281742-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
5 weeks agoKVM: arm64: nv: Properly check ESR_EL2.VNCR on taking a VNCR_EL2 related fault
Marc Zyngier [Wed, 30 Jul 2025 10:18:28 +0000 (11:18 +0100)]
KVM: arm64: nv: Properly check ESR_EL2.VNCR on taking a VNCR_EL2 related fault

Instead of checking for the ESR_EL2.VNCR bit being set (the only case
we should be here), we are actually testing random bits in ESR_EL2.DFSC.

13 obviously being a lucky number, it matches both permission and
translation fault status codes, which explains why we never saw it
failing. This was found by inspection, while reviewing a vaguely
related patch.

Whilst we're at it, turn the BUG_ON() into a WARN_ON_ONCE(), as
exploding here is just silly.

Fixes: 069a05e535496 ("KVM: arm64: nv: Handle VNCR_EL2-triggered faults")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20250730101828.1168707-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
6 weeks agoKVM: arm64: Don't attempt vLPI mappings when vPE allocation is disabled
Raghavendra Rao Ananta [Tue, 29 Jul 2025 21:06:44 +0000 (21:06 +0000)]
KVM: arm64: Don't attempt vLPI mappings when vPE allocation is disabled

commit c652887a9288 ("KVM: arm64: vgic-v3: Allow userspace to write
GICD_TYPER2.nASSGIcap") makes the allocation of vPEs depend on nASSGIcap
for GICv4.1 hosts. While the vGIC v4 initialization and teardown is
handled correctly, it erroneously attempts to establish a vLPI mapping
to a VM that has no vPEs allocated:

  Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a8
   Mem abort info:
     ESR = 0x0000000096000044
     EC = 0x25: DABT (current EL), IL = 32 bits
     SET = 0, FnV = 0
     EA = 0, S1PTW = 0
     FSC = 0x04: level 0 translation fault
   Data abort info:
     ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000
     CM = 0, WnR = 1, TnD = 0, TagAccess = 0
     GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
   user pgtable: 4k pages, 48-bit VAs, pgdp=00000073a453b000
   [00000000000000a8] pgd=0000000000000000, p4d=0000000000000000
   Internal error: Oops: 0000000096000044 [#1] SMP
   pstate: 23400009 (nzCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
   pc : its_irq_set_vcpu_affinity+0x58c/0x95c
   lr : its_irq_set_vcpu_affinity+0x1e0/0x95c
   sp : ffff8001029bb9e0
   pmr_save: 00000060
   x29: ffff8001029bba20 x28: ffff0001ca5e28c0 x27: 0000000000000000
   x26: 0000000000000000 x25: ffff00019eee9f80 x24: ffff0001992b3f00
   x23: ffff8001029bbab8 x22: ffff00001159fb80 x21: 00000000000024a7
   x20: 00000000000024a7 x19: ffff00019eee9fb4 x18: 0000000000000494
   x17: 000000000000000e x16: 0000000000000494 x15: 0000000000000002
   x14: ffff0001a7f34600 x13: ffffccaad1203000 x12: 0000000000000018
   x11: ffff000011991000 x10: 0000000000000000 x9 : 00000000000000a2
   x8 : 00000000000020a8 x7 : 0000000000000000 x6 : 000000000000003f
   x5 : 0000000000000040 x4 : 0000000000000000 x3 : 0000000000000004
   x2 : 0000000000000000 x1 : ffff8001029bbab8 x0 : 00000000000000a8
   Call trace:
    its_irq_set_vcpu_affinity+0x58c/0x95c
    irq_set_vcpu_affinity+0x74/0xc8
    its_map_vlpi+0x4c/0x94
    kvm_vgic_v4_set_forwarding+0x134/0x298
    kvm_arch_irq_bypass_add_producer+0x28/0x34
    irq_bypass_register_producer+0xf8/0x1d8
    vfio_msi_set_vector_signal+0x2c8/0x308
    vfio_pci_set_msi_trigger+0x198/0x2d4
    vfio_pci_set_irqs_ioctl+0xf0/0x104
    vfio_pci_core_ioctl+0x6ac/0xc5c
    vfio_device_fops_unl_ioctl+0x128/0x370
    __arm64_sys_ioctl+0x98/0xd0
    el0_svc_common+0xd8/0x1d8
    do_el0_svc+0x28/0x34
    el0_svc+0x40/0xb8
    el0t_64_sync_handler+0x70/0xbc
    el0t_64_sync+0x1a8/0x1ac
   Code: 321f0129 f940094a 8b080148 d1400900 (39000009)
   ---[ end trace 0000000000000000 ]---

Fix it by moving the GICv4.1 special-casing to
vgic_supports_direct_msis(), returning false if the user explicitly
disabled nASSGIcap for the VM.

Fixes: c652887a9288 ("KVM: arm64: vgic-v3: Allow userspace to write GICD_TYPER2.nASSGIcap")
Suggested-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Link: https://lore.kernel.org/r/20250729210644.830364-1-rananta@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
6 weeks agoKVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-list
Oliver Upton [Mon, 28 Jul 2025 15:26:03 +0000 (08:26 -0700)]
KVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-list

VDISR_EL2 and VSESR_EL2 are now visible to userspace for nested VMs. Add
them to get-reg-list.

Link: https://lore.kernel.org/r/20250728152603.2823699-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
6 weeks agoMerge branch 'kvm-arm64/vgic-v4-ctl' into kvmarm/next
Oliver Upton [Mon, 28 Jul 2025 15:11:34 +0000 (08:11 -0700)]
Merge branch 'kvm-arm64/vgic-v4-ctl' into kvmarm/next

* kvm-arm64/vgic-v4-ctl:
  : Userspace control of nASSGIcap, courtesy of Raghavendra Rao Ananta
  :
  : Allow userspace to decide if support for SGIs without an active state is
  : advertised to the guest, allowing VMs from GICv3-only hardware to be
  : migrated to to GICv4.1 capable machines.
  Documentation: KVM: arm64: Describe VGICv3 registers writable pre-init
  KVM: arm64: selftests: Add test for nASSGIcap attribute
  KVM: arm64: vgic-v3: Allow userspace to write GICD_TYPER2.nASSGIcap
  KVM: arm64: vgic-v3: Allow access to GICD_IIDR prior to initialization
  KVM: arm64: vgic-v3: Consolidate MAINT_IRQ handling
  KVM: arm64: Disambiguate support for vSGIs v. vLPIs

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
6 weeks agoMerge branch 'kvm-arm64/el2-reg-visibility' into kvmarm/next
Oliver Upton [Mon, 28 Jul 2025 15:06:27 +0000 (08:06 -0700)]
Merge branch 'kvm-arm64/el2-reg-visibility' into kvmarm/next

* kvm-arm64/el2-reg-visibility:
  : Fixes to EL2 register visibility, courtesy of Marc Zyngier
  :
  :  - Expose EL2 VGICv3 registers via the VGIC attributes accessor, not the
  :    KVM_{GET,SET}_ONE_REG ioctls
  :
  :  - Condition visibility of FGT registers on the presence of FEAT_FGT in
  :    the VM
  KVM: arm64: selftest: vgic-v3: Add basic GICv3 sysreg userspace access test
  KVM: arm64: Enforce the sorting of the GICv3 system register table
  KVM: arm64: Clarify the check for reset callback in check_sysreg_table()
  KVM: arm64: vgic-v3: Fix ordering of ICH_HCR_EL2
  KVM: arm64: Document registers exposed via KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS
  KVM: arm64: selftests: get-reg-list: Add base EL2 registers
  KVM: arm64: selftests: get-reg-list: Simplify feature dependency
  KVM: arm64: Advertise FGT2 registers to userspace
  KVM: arm64: Condition FGT registers on feature availability
  KVM: arm64: Expose GICv3 EL2 registers via KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS
  KVM: arm64: Let GICv3 save/restore honor visibility attribute
  KVM: arm64: Define helper for ICH_VTR_EL2
  KVM: arm64: Define constant value for ICC_SRE_EL2
  KVM: arm64: Don't advertise ICH_*_EL2 registers through GET_ONE_REG
  KVM: arm64: Make RVBAR_EL2 accesses UNDEF

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
6 weeks agoMerge branch 'kvm-arm64/config-masks' into kvmarm/next
Oliver Upton [Mon, 28 Jul 2025 15:03:03 +0000 (08:03 -0700)]
Merge branch 'kvm-arm64/config-masks' into kvmarm/next

* kvm-arm64/config-masks:
  : More config-driven mask computation, courtesy of Marc Zyngier
  :
  : Converts more system registers to the config-driven computation of RESx
  : masks based on the advertised feature set
  KVM: arm64: Tighten the definition of FEAT_PMUv3p9
  KVM: arm64: Convert MDCR_EL2 to config-driven sanitisation
  KVM: arm64: Convert SCTLR_EL1 to config-driven sanitisation
  KVM: arm64: Convert TCR2_EL2 to config-driven sanitisation
  arm64: sysreg: Add THE/ASID2 controls to TCR2_ELx

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
7 weeks agoMerge branch 'kvm-arm64/misc' into kvmarm/next
Oliver Upton [Sat, 26 Jul 2025 15:54:47 +0000 (08:54 -0700)]
Merge branch 'kvm-arm64/misc' into kvmarm/next

* kvm-arm64/misc:
  : Miscellaneous fixes/cleanups for KVM/arm64
  :
  :  - Fixes for computing POE output permissions
  :
  :  - Return ENXIO for invalid VGIC device attribute
  :
  :  - String helper conversions
  arm64: kvm: trace_handle_exit: use string choices helper
  arm64: kvm: sys_regs: use string choices helper
  KVM: arm64: Follow specification when implementing WXN
  KVM: arm64: Remove the wi->{e0,}poe vs wr->{p,u}ov confusion
  KVM: arm64: vgic-its: Return -ENXIO to invalid KVM_DEV_ARM_VGIC_GRP_CTRL attrs

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
7 weeks agoMerge branch 'kvm-arm64/gcie-legacy' into kvmarm/next
Oliver Upton [Sat, 26 Jul 2025 15:50:06 +0000 (08:50 -0700)]
Merge branch 'kvm-arm64/gcie-legacy' into kvmarm/next

* kvm-arm64/gcie-legacy:
  : Support for GICv3 emulation on GICv5, courtesy of Sascha Bischoff
  :
  : FEAT_GCIE_LEGACY adds the necessary hardware for GICv5 systems to
  : support the legacy GICv3 for VMs, including a backwards-compatible VGIC
  : implementation that we all know and love.
  :
  : As a starting point for GICv5 enablement in KVM, enable + use the
  : GICv3-compatible feature when running VMs on GICv5 hardware.
  KVM: arm64: gic-v5: Probe for GICv5
  KVM: arm64: gic-v5: Support GICv3 compat
  arm64/sysreg: Add ICH_VCTLR_EL2
  irqchip/gic-v5: Populate struct gic_kvm_info
  irqchip/gic-v5: Skip deactivate for forwarded PPI interrupts

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
7 weeks agoMerge tag 'irqchip-gic-v5-host' into kvmarm/next
Oliver Upton [Sat, 26 Jul 2025 15:49:15 +0000 (08:49 -0700)]
Merge tag 'irqchip-gic-v5-host' into kvmarm/next

GICv5 initial host support

Add host kernel support for the new arm64 GICv5 architecture, which is
quite a departure from the previous ones.

Include support for the full gamut of the architecture (interrupt
routing and delivery to CPUs, wired interrupts, MSIs, and interrupt
translation).

* tag 'irqchip-gic-v5-host': (32 commits)
  arm64: smp: Fix pNMI setup after GICv5 rework
  arm64: Kconfig: Enable GICv5
  docs: arm64: gic-v5: Document booting requirements for GICv5
  irqchip/gic-v5: Add GICv5 IWB support
  irqchip/gic-v5: Add GICv5 ITS support
  irqchip/msi-lib: Add IRQ_DOMAIN_FLAG_FWNODE_PARENT handling
  irqchip/gic-v3: Rename GICv3 ITS MSI parent
  PCI/MSI: Add pci_msi_map_rid_ctlr_node() helper function
  of/irq: Add of_msi_xlate() helper function
  irqchip/gic-v5: Enable GICv5 SMP booting
  irqchip/gic-v5: Add GICv5 LPI/IPI support
  irqchip/gic-v5: Add GICv5 IRS/SPI support
  irqchip/gic-v5: Add GICv5 PPI support
  arm64: Add support for GICv5 GSB barriers
  arm64: smp: Support non-SGIs for IPIs
  arm64: cpucaps: Add GICv5 CPU interface (GCIE) capability
  arm64: cpucaps: Rename GICv3 CPU interface capability
  arm64: Disable GICv5 read/write/instruction traps
  arm64/sysreg: Add ICH_HFGITR_EL2
  arm64/sysreg: Add ICH_HFGWTR_EL2
  ...

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
7 weeks agoMerge branch 'kvm-arm64/doublefault2' into kvmarm/next
Oliver Upton [Sat, 26 Jul 2025 15:47:22 +0000 (08:47 -0700)]
Merge branch 'kvm-arm64/doublefault2' into kvmarm/next

* kvm-arm64/doublefault2: (33 commits)
  : NV Support for FEAT_RAS + DoubleFault2
  :
  : Delegate the vSError context to the guest hypervisor when in a nested
  : state, including registers related to ESR propagation. Additionally,
  : catch up KVM's external abort infrastructure to the architecture,
  : implementing the effects of FEAT_DoubleFault2.
  :
  : This has some impact on non-nested guests, as SErrors deemed unmasked at
  : the time they're made pending are now immediately injected with an
  : emulated exception entry rather than using the VSE bit.
  KVM: arm64: Make RAS registers UNDEF when RAS isn't advertised
  KVM: arm64: Filter out HCR_EL2 bits when running in hypervisor context
  KVM: arm64: Check for SYSREGS_ON_CPU before accessing the CPU state
  KVM: arm64: Commit exceptions from KVM_SET_VCPU_EVENTS immediately
  KVM: arm64: selftests: Test ESR propagation for vSError injection
  KVM: arm64: Populate ESR_ELx.EC for emulated SError injection
  KVM: arm64: selftests: Catch up set_id_regs with the kernel
  KVM: arm64: selftests: Add SCTLR2_EL1 to get-reg-list
  KVM: arm64: selftests: Test SEAs are taken to SError vector when EASE=1
  KVM: arm64: selftests: Add basic SError injection test
  KVM: arm64: Don't retire MMIO instruction w/ pending (emulated) SError
  KVM: arm64: Advertise support for FEAT_DoubleFault2
  KVM: arm64: Advertise support for FEAT_SCTLR2
  KVM: arm64: nv: Enable vSErrors when HCRX_EL2.TMEA is set
  KVM: arm64: nv: Honor SError routing effects of SCTLR2_ELx.NMEA
  KVM: arm64: nv: Take "masked" aborts to EL2 when HCRX_EL2.TMEA is set
  KVM: arm64: Route SEAs to the SError vector when EASE is set
  KVM: arm64: nv: Ensure Address size faults affect correct ESR
  KVM: arm64: Factor out helper for selecting exception target EL
  KVM: arm64: Describe SCTLR2_ELx RESx masks
  ...

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
7 weeks agoMerge branch 'kvm-arm64/cacheable-pfnmap' into kvmarm/next
Oliver Upton [Sat, 26 Jul 2025 15:47:02 +0000 (08:47 -0700)]
Merge branch 'kvm-arm64/cacheable-pfnmap' into kvmarm/next

* kvm-arm64/cacheable-pfnmap:
  : Cacheable PFNMAP support at stage-2, courtesy of Ankit Agrawal
  :
  : For historical reasons, KVM only allows cacheable mappings at stage-2
  : when a kernel alias exists in the direct map for the memory region. On
  : hardware without FEAT_S2FWB, this is necessary as KVM must do cache
  : maintenance to keep guest/host accesses coherent.
  :
  : This is unnecessarily restrictive on systems with FEAT_S2FWB and
  : CTR_EL0.DIC, as KVM no longer needs to perform cache maintenance to
  : maintain correctness.
  :
  : Allow cacheable mappings at stage-2 on supporting hardware when the
  : corresponding VMA has cacheable memory attributes and advertise a
  : capability to userspace such that a VMM can determine if a stage-2
  : mapping can be established (e.g. VFIO device).
  KVM: arm64: Expose new KVM cap for cacheable PFNMAP
  KVM: arm64: Allow cacheable stage 2 mapping using VMA flags
  KVM: arm64: Block cacheable PFNMAP mapping
  KVM: arm64: Assume non-PFNMAP/MIXEDMAP VMAs can be mapped cacheable
  KVM: arm64: Rename the device variable to s2_force_noncacheable

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
7 weeks agoDocumentation: KVM: arm64: Describe VGICv3 registers writable pre-init
Oliver Upton [Thu, 24 Jul 2025 06:28:05 +0000 (23:28 -0700)]
Documentation: KVM: arm64: Describe VGICv3 registers writable pre-init

KVM allows userspace to control GICD_IIDR.Revision and
GICD_TYPER2.nASSGIcap prior to initialization for the sake of
provisioning the guest-visible feature set. Document the userspace
expectations surrounding accesses to these registers.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250724062805.2658919-7-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
7 weeks agoKVM: arm64: selftests: Add test for nASSGIcap attribute
Raghavendra Rao Ananta [Thu, 24 Jul 2025 06:28:04 +0000 (23:28 -0700)]
KVM: arm64: selftests: Add test for nASSGIcap attribute

Extend vgic_init to test the nASSGIcap attribute, asserting that it is
configurable (within reason) prior to initializing the VGIC.
Additionally, check that userspace cannot set the attribute after the
VGIC has been initialized.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250724062805.2658919-6-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
7 weeks agoKVM: arm64: vgic-v3: Allow userspace to write GICD_TYPER2.nASSGIcap
Raghavendra Rao Ananta [Thu, 24 Jul 2025 06:28:03 +0000 (23:28 -0700)]
KVM: arm64: vgic-v3: Allow userspace to write GICD_TYPER2.nASSGIcap

KVM unconditionally advertises GICD_TYPER2.nASSGIcap (which internally
implies vSGIs) on GICv4.1 systems. Allow userspace to change whether a
VM supports the feature. Only allow changes prior to VGIC initialization
as at that point vPEs need to be allocated for the VM.

For convenience, bundle support for vLPIs and vSGIs behind this feature,
allowing userspace to control vPE allocation for VMs in environments
that may be constrained on vPE IDs.

Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250724062805.2658919-5-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
7 weeks agoKVM: arm64: vgic-v3: Allow access to GICD_IIDR prior to initialization
Oliver Upton [Thu, 24 Jul 2025 06:28:02 +0000 (23:28 -0700)]
KVM: arm64: vgic-v3: Allow access to GICD_IIDR prior to initialization

KVM allows userspace to write GICD_IIDR for backwards-compatibility with
older kernels, where new implementation revisions have new features.
Unfortunately this is allowed to happen at runtime, and ripping features
out from underneath a running guest is a terrible idea.

While we can't do anything about the ABI, prepare for more ID-like
registers by allowing access to GICD_IIDR prior to VGIC initialization.
Hoist initializaiton of the default value to kvm_vgic_create() and
discard the incorrect comment that assumed userspace could access the
register before initialization (until now).

Subsequent changes will allow the VMM to further provision the GIC
feature set, e.g. the presence of nASSGIcap.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250724062805.2658919-4-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
7 weeks agoKVM: arm64: vgic-v3: Consolidate MAINT_IRQ handling
Oliver Upton [Thu, 24 Jul 2025 06:28:01 +0000 (23:28 -0700)]
KVM: arm64: vgic-v3: Consolidate MAINT_IRQ handling

Consolidate the duplicated handling of the VGICv3 maintenance IRQ
attribute as a regular GICv3 attribute, as it is neither a register nor
a common attribute. As this is now handled separately from the VGIC
registers, the locking is relaxed to only acquire the intended
config_lock.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250724062805.2658919-3-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
7 weeks agoKVM: arm64: Disambiguate support for vSGIs v. vLPIs
Oliver Upton [Thu, 24 Jul 2025 06:28:00 +0000 (23:28 -0700)]
KVM: arm64: Disambiguate support for vSGIs v. vLPIs

vgic_supports_direct_msis() is a bit of a misnomer, as it returns true
if either vSGIs or vLPIs are supported. Pick it apart into a few
predicates and replace some open-coded checks for vSGIs, including an
opportunistic fix to always check if the CPUIF is capable of handling
vSGIs.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250724062805.2658919-2-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
7 weeks agoKVM: arm64: selftest: vgic-v3: Add basic GICv3 sysreg userspace access test
Marc Zyngier [Fri, 18 Jul 2025 11:11:54 +0000 (12:11 +0100)]
KVM: arm64: selftest: vgic-v3: Add basic GICv3 sysreg userspace access test

We have a lot of more or less useful vgic tests, but none of them
tracks the availability of GICv3 system registers, which is a bit
annoying.

Add one such test, which covers both EL1 and EL2 registers.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com>
Reviewed-by: Sebastian Ott <sebott@redhat.com>
Link: https://lore.kernel.org/r/20250718111154.104029-5-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
7 weeks agoKVM: arm64: Enforce the sorting of the GICv3 system register table
Marc Zyngier [Fri, 18 Jul 2025 11:11:53 +0000 (12:11 +0100)]
KVM: arm64: Enforce the sorting of the GICv3 system register table

In order to avoid further embarassing bugs, enforce that the GICv3
sysreg table is actually sorted, just like all the other tables.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Sebastian Ott <sebott@redhat.com>
Link: https://lore.kernel.org/r/20250718111154.104029-4-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
7 weeks agoKVM: arm64: Clarify the check for reset callback in check_sysreg_table()
Marc Zyngier [Fri, 18 Jul 2025 11:11:52 +0000 (12:11 +0100)]
KVM: arm64: Clarify the check for reset callback in check_sysreg_table()

check_sysreg_table() has a wonky 'is_32" parameter, which is really
an indication that we should enforce the presence of a reset helper.

Clean this up by naming the variable accordingly and inverting the
condition. Contrary to popular belief, system instructions don't
have a reset value (duh!), and therefore do not need to be checked
for reset (they escaped the check through luck...).

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Sebastian Ott <sebott@redhat.com>
Link: https://lore.kernel.org/r/20250718111154.104029-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
7 weeks agoKVM: arm64: vgic-v3: Fix ordering of ICH_HCR_EL2
Marc Zyngier [Fri, 18 Jul 2025 11:11:51 +0000 (12:11 +0100)]
KVM: arm64: vgic-v3: Fix ordering of ICH_HCR_EL2

The sysreg tables are supposed to be sorted so that a binary search
can easily find them. However, ICH_HCR_EL2 is obviously at the wrong
spot.

Move it where it belongs.

Fixes: 9fe9663e47e21 ("KVM: arm64: Expose GICv3 EL2 registers via KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Sebastian Ott <sebott@redhat.com>
Link: https://lore.kernel.org/r/20250718111154.104029-2-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
7 weeks agoarm64: kvm: trace_handle_exit: use string choices helper
Kuninori Morimoto [Fri, 18 Jul 2025 01:50:38 +0000 (01:50 +0000)]
arm64: kvm: trace_handle_exit: use string choices helper

We can use string choices helper, let's use it.

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://lore.kernel.org/r/87o6ti5ksx.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
7 weeks agoarm64: kvm: sys_regs: use string choices helper
Kuninori Morimoto [Fri, 18 Jul 2025 01:50:24 +0000 (01:50 +0000)]
arm64: kvm: sys_regs: use string choices helper

We can use string choices helper, let's use it.

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Link: https://lore.kernel.org/r/87pldy5ktb.wl-kuninori.morimoto.gx@renesas.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
7 weeks agoKVM: arm64: Follow specification when implementing WXN
Marc Zyngier [Tue, 1 Jul 2025 15:16:48 +0000 (16:16 +0100)]
KVM: arm64: Follow specification when implementing WXN

The R_QXXPC and R_NPBXC rules have some interesting (and pretty
sharp) corners when defining the behaviour of of WXN at S1:

- when S1 overlay is enabled, WXN applies to the overlay and
  will remove W

- when S1 overlay is disabled, WXN applies to the base permissions
  and will remove X.

Today, we lumb the two together in a way that doesn't really match
the rules, making things awkward to follow what is happening, in
particular when overlays are enabled.

Split these two rules over two distinct paths, which makes things
a lot easier to read and validate against the architecture rules.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250701151648.754785-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
7 weeks agoKVM: arm64: Remove the wi->{e0,}poe vs wr->{p,u}ov confusion
Marc Zyngier [Tue, 1 Jul 2025 15:16:47 +0000 (16:16 +0100)]
KVM: arm64: Remove the wi->{e0,}poe vs wr->{p,u}ov confusion

Some of the POE computation is a bit confused. Specifically, there
is an element of confusion between what wi->{e0,}poe an wr->{p,u}ov
actually represent.

- wi->{e0,}poe is an *input* to the walk, and indicates whether
  POE is enabled at EL0 or EL{1,2}

- wr->{p,u}ov is a *result* of the walk, and indicates whether
  overlays are enabled. Crutially, it is possible to have POE
  enabled, and yet overlays disabled, while the converse isn't
  true

What this all means is that once the base permissions have been
established, checking for wi->{e0,}poe makes little sense, because
the truth about overlays resides in wr->{p,u}ov. So constructs
checking for (wi->poe && wr->pov) only add perplexity.

Refactor compute_s1_overlay_permissions() and the way it is
called according to the above principles. Take the opportunity
to avoid reading registers that are not strictly required.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250701151648.754785-2-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
7 weeks agoKVM: arm64: vgic-its: Return -ENXIO to invalid KVM_DEV_ARM_VGIC_GRP_CTRL attrs
David Woodhouse [Mon, 23 Jun 2025 13:22:52 +0000 (15:22 +0200)]
KVM: arm64: vgic-its: Return -ENXIO to invalid KVM_DEV_ARM_VGIC_GRP_CTRL attrs

A preliminary version of a hack to invoke unmap_all_vpes() from an ioctl
didn't work very well. We eventually determined this was because we were
invoking it on the wrong file descriptor, but not getting an error.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/bbbddd56135399baf699bc46ffb6e7f08d9f8c9f.camel@infradead.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
7 weeks agoKVM: arm64: Make RAS registers UNDEF when RAS isn't advertised
Marc Zyngier [Mon, 21 Jul 2025 10:19:51 +0000 (11:19 +0100)]
KVM: arm64: Make RAS registers UNDEF when RAS isn't advertised

We currently always expose FEAT_RAS when available on the host.

As we are about to make this feature selectable from userspace,
check for it being present before emulating register accesses
as RAZ/WI, and inject an UNDEF otherwise.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250721101955.535159-4-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
7 weeks agoKVM: arm64: Filter out HCR_EL2 bits when running in hypervisor context
Marc Zyngier [Mon, 21 Jul 2025 10:19:50 +0000 (11:19 +0100)]
KVM: arm64: Filter out HCR_EL2 bits when running in hypervisor context

Most HCR_EL2 bits are not supposed to affect EL2 at all, but only
the guest. However, we gladly merge these bits with the host's
HCR_EL2 configuration, irrespective of entering L1 or L2.

This leads to some funky behaviour, such as L1 trying to inject
a virtual SError for L2, and getting a taste of its own medecine.
Not quite what the architecture anticipated.

In the end, the only bits that matter are those we have defined as
invariants, either because we've made them RESx (E2H, HCD...), or
that we actively refuse to merge because the mess with KVM's own
logic.

Use the sanitisation infrastructure to get the RES1 bits, and let
things rip in a safer way.

Fixes: 04ab519bb86df ("KVM: arm64: nv: Configure HCR_EL2 for FEAT_NV2")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250721101955.535159-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
7 weeks agoKVM: arm64: Check for SYSREGS_ON_CPU before accessing the CPU state
Marc Zyngier [Sun, 20 Jul 2025 10:22:29 +0000 (11:22 +0100)]
KVM: arm64: Check for SYSREGS_ON_CPU before accessing the CPU state

Mark Brown reports that since we commit to making exceptions
visible without the vcpu being loaded, the external abort selftest
fails.

Upon investigation, it turns out that the code that makes registers
affected by an exception visible to the guest is completely broken
on VHE, as we don't check whether the system registers are loaded
on the CPU at this point. We managed to get away with this so far,
but that's obviously as bad as it gets,

Add the required checksm and document the absolute need to check
for the SYSREGS_ON_CPU flag before calling into any of the
__vcpu_write_sys_reg_to_cpu()__vcpu_read_sys_reg_from_cpu() helpers.

Reported-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/18535df8-e647-4643-af9a-bb780af03a70@sirena.org.uk
Link: https://lore.kernel.org/r/20250720102229.179114-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
8 weeks agoKVM: arm64: Tighten the definition of FEAT_PMUv3p9
Marc Zyngier [Mon, 14 Jul 2025 11:55:03 +0000 (12:55 +0100)]
KVM: arm64: Tighten the definition of FEAT_PMUv3p9

The current definition of FEAT_PMUv3p9 doesn't check for the lack
of an IMPDEF PMU, which is encoded as 0b1111, but considered unsigned.

Use the recently introduced helper to address the issue (which is
harmless, as KVM never advertises an IMPDEF PMU).

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714115503.3334242-6-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
8 weeks agoKVM: arm64: Convert MDCR_EL2 to config-driven sanitisation
Marc Zyngier [Mon, 14 Jul 2025 11:55:02 +0000 (12:55 +0100)]
KVM: arm64: Convert MDCR_EL2 to config-driven sanitisation

As for other registers, convert the determination of the RES0 bits
affecting MDCR_EL2 to be driven by a table extracted from the 2025-06
JSON drop

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714115503.3334242-5-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
8 weeks agoKVM: arm64: Convert SCTLR_EL1 to config-driven sanitisation
Marc Zyngier [Mon, 14 Jul 2025 11:55:01 +0000 (12:55 +0100)]
KVM: arm64: Convert SCTLR_EL1 to config-driven sanitisation

As for other registers, convert the determination of the RES0 bits
affecting SCTLR_EL1 to be driven by a table extracted from the 2025-06
JSON drop

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714115503.3334242-4-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
8 weeks agoKVM: arm64: Convert TCR2_EL2 to config-driven sanitisation
Marc Zyngier [Mon, 14 Jul 2025 11:55:00 +0000 (12:55 +0100)]
KVM: arm64: Convert TCR2_EL2 to config-driven sanitisation

As for other registers, convert the determination of the RES0 bits
affecting TCR2_EL2 to be driven by a table extracted from the 2025-06
JSON drop.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714115503.3334242-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
8 weeks agoarm64: sysreg: Add THE/ASID2 controls to TCR2_ELx
Marc Zyngier [Mon, 14 Jul 2025 11:54:59 +0000 (12:54 +0100)]
arm64: sysreg: Add THE/ASID2 controls to TCR2_ELx

FEAT_THE and FEAT_ASID2 add new controls to the TCR2_ELx registers.

Add them to the register descriptions.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714115503.3334242-2-maz@kernel.org
[ fix whitespace ]
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
8 weeks agoKVM: arm64: Document registers exposed via KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS
Marc Zyngier [Mon, 14 Jul 2025 12:26:34 +0000 (13:26 +0100)]
KVM: arm64: Document registers exposed via KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS

We never documented which GICv3 registers are available for save/restore
via the KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS interface.

Let's take the opportunity of adding the EL2 registers to document the whole
thing in one go.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714122634.3334816-12-maz@kernel.org
[ oliver: fix trailing whitespace ]
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
8 weeks agoKVM: arm64: selftests: get-reg-list: Add base EL2 registers
Marc Zyngier [Mon, 14 Jul 2025 12:26:33 +0000 (13:26 +0100)]
KVM: arm64: selftests: get-reg-list: Add base EL2 registers

Add the EL2 registers and the eventual dependencies, effectively
doubling the number of test vectors. Oh well.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714122634.3334816-11-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
8 weeks agoKVM: arm64: selftests: get-reg-list: Simplify feature dependency
Marc Zyngier [Mon, 14 Jul 2025 12:26:32 +0000 (13:26 +0100)]
KVM: arm64: selftests: get-reg-list: Simplify feature dependency

Describing the dependencies between registers and features is on
the masochistic side of things, with hard-coded values that would
be better taken from the existing description.

Add a couple of helpers to that effect, and repaint the dependency
array. More could be done to improve this test, but my interest is
wearing  thin...

Signed-off-by: Marc Zyngier <maz@kernel.org>
Tested-by: Itaru Kitayama <itaru.kitayama@fujitsu.com>
Link: https://lore.kernel.org/r/20250714122634.3334816-10-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
8 weeks agoKVM: arm64: Advertise FGT2 registers to userspace
Marc Zyngier [Mon, 14 Jul 2025 12:26:31 +0000 (13:26 +0100)]
KVM: arm64: Advertise FGT2 registers to userspace

While a guest is able to use the FEAT_FGT2 registers, we're missing
them being exposed to userspace. Add them to the (very long) list.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714122634.3334816-9-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
8 weeks agoKVM: arm64: Condition FGT registers on feature availability
Marc Zyngier [Mon, 14 Jul 2025 12:26:30 +0000 (13:26 +0100)]
KVM: arm64: Condition FGT registers on feature availability

We shouldn't expose the FEAT_FGT registers unconditionally. Make
them dependent on FEAT_FGT being actually advertised to the guest.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714122634.3334816-8-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
8 weeks agoKVM: arm64: Expose GICv3 EL2 registers via KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS
Marc Zyngier [Mon, 14 Jul 2025 12:26:29 +0000 (13:26 +0100)]
KVM: arm64: Expose GICv3 EL2 registers via KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS

Expose all the GICv3 EL2 registers through the usual GICv3 save/restore
interface, making it possible for a VMM to access the EL2 state.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714122634.3334816-7-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
8 weeks agoKVM: arm64: Let GICv3 save/restore honor visibility attribute
Marc Zyngier [Mon, 14 Jul 2025 12:26:28 +0000 (13:26 +0100)]
KVM: arm64: Let GICv3 save/restore honor visibility attribute

The GICv3 save/restore code never needed any visibility attribute,
but that's about to change. Make vgic_v3_has_cpu_sysregs_attr()
check the visibility in case a register is hidden.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714122634.3334816-6-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
8 weeks agoKVM: arm64: Define helper for ICH_VTR_EL2
Marc Zyngier [Mon, 14 Jul 2025 12:26:27 +0000 (13:26 +0100)]
KVM: arm64: Define helper for ICH_VTR_EL2

Move the computation of the ICH_VTR_EL2 value to a common location,
so that it can be reused by the save/restore code.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714122634.3334816-5-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
8 weeks agoKVM: arm64: Define constant value for ICC_SRE_EL2
Marc Zyngier [Mon, 14 Jul 2025 12:26:26 +0000 (13:26 +0100)]
KVM: arm64: Define constant value for ICC_SRE_EL2

Move the bag of bits defining the value of ICC_SRE_EL2 to a common
spot so that it can be reused by the save/restore code.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714122634.3334816-4-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
8 weeks agoKVM: arm64: Don't advertise ICH_*_EL2 registers through GET_ONE_REG
Marc Zyngier [Mon, 14 Jul 2025 12:26:25 +0000 (13:26 +0100)]
KVM: arm64: Don't advertise ICH_*_EL2 registers through GET_ONE_REG

It appears that exposing the GICv3 EL2 registers through the usual
sysreg interface is not consistent with the way we expose the EL1
registers. The latter are exposed via the GICv3 device interface
instead, and there is no reason why the EL2 registers should get
a different treatement.

Hide the registers from userspace until the GICv3 code grows the
required infrastructure.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714122634.3334816-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
8 weeks agoKVM: arm64: Make RVBAR_EL2 accesses UNDEF
Marc Zyngier [Mon, 14 Jul 2025 12:26:24 +0000 (13:26 +0100)]
KVM: arm64: Make RVBAR_EL2 accesses UNDEF

We always expose a virtual CPU that has EL3 when NV is enabled,
irrespective of EL3 being actually implemented in HW.

Therefore, as per the architecture, RVBAR_EL2 must UNDEF, since
EL2 is not the highest implemented exception level. This is
consistent with RMR_EL2 also triggering an UNDEF.

Adjust the handling of RVBAR_EL2 accordingly.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250714122634.3334816-2-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
8 weeks agoKVM: arm64: Commit exceptions from KVM_SET_VCPU_EVENTS immediately
Oliver Upton [Tue, 15 Jul 2025 06:25:07 +0000 (23:25 -0700)]
KVM: arm64: Commit exceptions from KVM_SET_VCPU_EVENTS immediately

syzkaller has found that it can trip a warning in KVM's exception
emulation infrastructure by repeatedly injecting exceptions into the
guest.

While it's unlikely that a reasonable VMM will do this, further
investigation of the issue reveals that KVM can potentially discard the
"pending" SEA state. While the handling of KVM_GET_VCPU_EVENTS presumes
that userspace-injected SEAs are realized immediately, in reality the
emulated exception entry is deferred until the next call to KVM_RUN.

Hack-a-fix the immediate issues by committing the pending exceptions to
the vCPU's architectural state immediately in KVM_SET_VCPU_EVENTS. This
is no different to the way KVM-injected exceptions are handled in
KVM_RUN where we potentially call __kvm_adjust_pc() before returning to
userspace.

Reported-by: syzbot+4e09b1432de3774b86ae@syzkaller.appspotmail.com
Reported-by: syzbot+1f6f096afda6f4f8f565@syzkaller.appspotmail.com
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoarm64: smp: Fix pNMI setup after GICv5 rework
Marc Zyngier [Tue, 15 Jul 2025 17:11:12 +0000 (18:11 +0100)]
arm64: smp: Fix pNMI setup after GICv5 rework

Breno reports that pNMIs are not behaving the way they should since
they were reworked for GICv5. Turns out we feed the IRQ number to
the pNMI helper instead of the IPI number -- not a good idea.

Fix it by providing the correct number (duh).

Fixes: ba1004f861d16 ("arm64: smp: Support non-SGIs for IPIs")
Reported-by: Breno Leitao <leitao@debian.org>
Suggested-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoKVM: arm64: selftests: Test ESR propagation for vSError injection
Oliver Upton [Tue, 8 Jul 2025 23:06:32 +0000 (16:06 -0700)]
KVM: arm64: selftests: Test ESR propagation for vSError injection

Ensure that vSErrors taken in the guest have an appropriate ESR_ELx
value for the expected exception. Additionally, switch the EASE test to
install the SEA handler at the SError offset, as the ESR is still
expected to match an SEA in that case.

Link: https://lore.kernel.org/r/20250708230632.1954240-3-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoKVM: arm64: Populate ESR_ELx.EC for emulated SError injection
Oliver Upton [Tue, 8 Jul 2025 23:06:31 +0000 (16:06 -0700)]
KVM: arm64: Populate ESR_ELx.EC for emulated SError injection

The hardware vSError injection mechanism populates ESR_ELx.EC as part of
ESR propagation and the contents of VSESR_EL2 populate the ISS field. Of
course, this means our emulated injection needs to set up the EC
correctly for an SError too.

Fixes: ce66109cec86 ("KVM: arm64: nv: Take "masked" aborts to EL2 when HCRX_EL2.TMEA is set")
Link: https://lore.kernel.org/r/20250708230632.1954240-2-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoKVM: arm64: gic-v5: Probe for GICv5
Sascha Bischoff [Fri, 27 Jun 2025 10:09:02 +0000 (10:09 +0000)]
KVM: arm64: gic-v5: Probe for GICv5

Add in a probe function for GICv5 which enables support for GICv3
guests on a GICv5 host, if FEAT_GCIE_LEGACY is supported by the
hardware.

Co-authored-by: Timothy Hayes <timothy.hayes@arm.com>
Signed-off-by: Timothy Hayes <timothy.hayes@arm.com>
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Link: https://lore.kernel.org/r/20250627100847.1022515-6-sascha.bischoff@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoKVM: arm64: gic-v5: Support GICv3 compat
Sascha Bischoff [Fri, 27 Jun 2025 10:09:02 +0000 (10:09 +0000)]
KVM: arm64: gic-v5: Support GICv3 compat

Add support for GICv3 compat mode (FEAT_GCIE_LEGACY) which allows a
GICv5 host to run GICv3-based VMs. This change enables the
VHE/nVHE/hVHE/protected modes, but does not support nested
virtualization.

A lazy-disable approach is taken for compat mode; it is enabled on the
vgic_v3_load path but not disabled on the vgic_v3_put path. A
non-GICv3 VM, i.e., one based on GICv5, is responsible for disabling
compat mode on the corresponding vgic_v5_load path. Currently, GICv5
is not supported, and hence compat mode is not disabled again once it
is enabled, and this function is intentionally omitted from the code.

Co-authored-by: Timothy Hayes <timothy.hayes@arm.com>
Signed-off-by: Timothy Hayes <timothy.hayes@arm.com>
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Link: https://lore.kernel.org/r/20250627100847.1022515-5-sascha.bischoff@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoarm64/sysreg: Add ICH_VCTLR_EL2
Sascha Bischoff [Fri, 27 Jun 2025 10:09:01 +0000 (10:09 +0000)]
arm64/sysreg: Add ICH_VCTLR_EL2

This system register is required to enable/disable V3 legacy mode when
running on a GICv5 host.

Co-authored-by: Timothy Hayes <timothy.hayes@arm.com>
Signed-off-by: Timothy Hayes <timothy.hayes@arm.com>
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Link: https://lore.kernel.org/r/20250627100847.1022515-4-sascha.bischoff@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoirqchip/gic-v5: Populate struct gic_kvm_info
Sascha Bischoff [Fri, 27 Jun 2025 10:09:01 +0000 (10:09 +0000)]
irqchip/gic-v5: Populate struct gic_kvm_info

Populate the gic_kvm_info struct based on support for
FEAT_GCIE_LEGACY.  The struct is used by KVM to probe for a compatible
GIC.

Co-authored-by: Timothy Hayes <timothy.hayes@arm.com>
Signed-off-by: Timothy Hayes <timothy.hayes@arm.com>
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Link: https://lore.kernel.org/r/20250627100847.1022515-3-sascha.bischoff@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoirqchip/gic-v5: Skip deactivate for forwarded PPI interrupts
Sascha Bischoff [Fri, 27 Jun 2025 10:09:01 +0000 (10:09 +0000)]
irqchip/gic-v5: Skip deactivate for forwarded PPI interrupts

If a PPI interrupt is forwarded to a guest, skip the deactivate and
only EOI. Rely on the guest deactivating both the virtual and physical
interrupts (due to ICH_LRx_EL2.HW being set) later on as part of
handling the injected interrupt. This mimics the behaviour seen on
native GICv3.

This is part of adding support for the GICv3 compatibility mode on a
GICv5 host.

Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Co-authored-by: Timothy Hayes <timothy.hayes@arm.com>
Signed-off-by: Timothy Hayes <timothy.hayes@arm.com>
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Link: https://lore.kernel.org/r/20250627100847.1022515-2-sascha.bischoff@arm.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoKVM: arm64: selftests: Catch up set_id_regs with the kernel
Oliver Upton [Tue, 8 Jul 2025 17:25:32 +0000 (10:25 -0700)]
KVM: arm64: selftests: Catch up set_id_regs with the kernel

Add test coverage for ID_AA64MMFR3_EL1 and the recently added
FEAT_DoubleFault2.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-28-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoKVM: arm64: selftests: Add SCTLR2_EL1 to get-reg-list
Oliver Upton [Tue, 8 Jul 2025 17:25:31 +0000 (10:25 -0700)]
KVM: arm64: selftests: Add SCTLR2_EL1 to get-reg-list

Handle SCTLR2_EL1 specially as it is only visible to userspace when
FEAT_SCTLR2 is implemented for the VM.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-27-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoKVM: arm64: selftests: Test SEAs are taken to SError vector when EASE=1
Oliver Upton [Tue, 8 Jul 2025 17:25:30 +0000 (10:25 -0700)]
KVM: arm64: selftests: Test SEAs are taken to SError vector when EASE=1

Ensure KVM routes SEAs to the correct vector depending on
SCTLR2_EL1.EASE.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-26-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoKVM: arm64: selftests: Add basic SError injection test
Oliver Upton [Tue, 8 Jul 2025 17:25:29 +0000 (10:25 -0700)]
KVM: arm64: selftests: Add basic SError injection test

Add tests for SError injection considering KVM is more directly involved
in delivery:

 - Pending SErrors are taken at the first CSE after SErrors are unmasked

 - Pending SErrors aren't taken and remain pending if SErrors are masked

 - Unmasked SErrors are taken immediately when injected (implementation
   detail)

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-25-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoKVM: arm64: Don't retire MMIO instruction w/ pending (emulated) SError
Oliver Upton [Tue, 8 Jul 2025 17:25:28 +0000 (10:25 -0700)]
KVM: arm64: Don't retire MMIO instruction w/ pending (emulated) SError

KVM might have an emulated SError queued for the guest if userspace
returned an abort for MMIO. Better yet, it could actually be a
*synchronous* exception in disguise if SCTLR2_ELx.EASE is set.

Don't advance PC if KVM owes an emulated SError, just like the handling
of emulated SEA injection.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-24-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoKVM: arm64: Advertise support for FEAT_DoubleFault2
Oliver Upton [Tue, 8 Jul 2025 17:25:27 +0000 (10:25 -0700)]
KVM: arm64: Advertise support for FEAT_DoubleFault2

KVM's external abort injection now respects the exception routing
wreckage due to FEAT_DoubleFault2. Advertise the feature.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-23-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoKVM: arm64: Advertise support for FEAT_SCTLR2
Oliver Upton [Tue, 8 Jul 2025 17:25:26 +0000 (10:25 -0700)]
KVM: arm64: Advertise support for FEAT_SCTLR2

Everything is in place to handle the additional state for SCTLR2_ELx,
which is all that FEAT_SCTLR2 implies.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-22-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoKVM: arm64: nv: Enable vSErrors when HCRX_EL2.TMEA is set
Oliver Upton [Tue, 8 Jul 2025 17:25:25 +0000 (10:25 -0700)]
KVM: arm64: nv: Enable vSErrors when HCRX_EL2.TMEA is set

Per R_CDCKC, vSErrors are enabled if HCRX_EL2.TMEA is set, regardless of
HCR_EL2.AMO.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-21-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoKVM: arm64: nv: Honor SError routing effects of SCTLR2_ELx.NMEA
Oliver Upton [Tue, 8 Jul 2025 17:25:24 +0000 (10:25 -0700)]
KVM: arm64: nv: Honor SError routing effects of SCTLR2_ELx.NMEA

As the name might imply, when NMEA is set SErrors are non-maskable and
can be taken regardless of PSTATE.A. As is the recurring theme with
DoubleFault2, the effects on SError routing are entirely backwards to
this.

If at EL1, NMEA is *not* considered for SError routing when TMEA is set
and the exception is taken to EL2 when PSTATE.A is set.

Link: https://lore.kernel.org/r/20250708172532.1699409-20-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoKVM: arm64: nv: Take "masked" aborts to EL2 when HCRX_EL2.TMEA is set
Oliver Upton [Tue, 8 Jul 2025 17:25:23 +0000 (10:25 -0700)]
KVM: arm64: nv: Take "masked" aborts to EL2 when HCRX_EL2.TMEA is set

HCRX_EL2.TMEA further modifies the external abort behavior where
unmasked aborts are taken to EL1 and masked aborts are taken to EL2.
It's rather weird when you consider that SEAs are, well, *synchronous*
and therefore not actually maskable. However, for the purposes of
exception routing, they're considered "masked" if the A flag is set.

This gets a bit hairier when considering the fact that TMEA
also enables vSErrors, i.e. KVM has delegated the HW vSError context to
the guest hypervisor.  We can keep the vSError context delegation as-is
by taking advantage of a couple properties:

 - If SErrors are unmasked, the 'physical' SError can be taken
   in-context immediately. In other words, KVM can emulate the EL1
   SError while preserving vEL2's ownership of the vSError context.

 - If SErrors are masked, the 'physical' SError is taken to EL2
   immediately and needs the usual nested exception entry.

Note that the new in-context handling has the benign effect where
unmasked SError injections are emulated even for non-nested VMs.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-19-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoKVM: arm64: Route SEAs to the SError vector when EASE is set
Oliver Upton [Tue, 8 Jul 2025 17:25:22 +0000 (10:25 -0700)]
KVM: arm64: Route SEAs to the SError vector when EASE is set

One of the finest additions of FEAT_DoubleFault2 is the ability for
software to request *synchronous* external aborts be taken to the
SError vector, which of coure are *asynchronous* in nature.

Opinions be damned, implement the architecture and send SEAs to the
SError vector if EASE is set for the target context.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-18-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoKVM: arm64: nv: Ensure Address size faults affect correct ESR
Oliver Upton [Tue, 8 Jul 2025 17:25:21 +0000 (10:25 -0700)]
KVM: arm64: nv: Ensure Address size faults affect correct ESR

For historical reasons, Address size faults are first injected into the
guest as an SEA and ESR_EL1 is subsequently modified to reflect the
correct FSC. Of course, when dealing with a vEL2 this should poke
ESR_EL2.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-17-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoKVM: arm64: Factor out helper for selecting exception target EL
Oliver Upton [Tue, 8 Jul 2025 17:25:20 +0000 (10:25 -0700)]
KVM: arm64: Factor out helper for selecting exception target EL

Pull out the exception target selection from pend_sync_exception() for
general use. Use PSR_MODE_ELxh as a shorthand for the target EL, as
SP_ELx selection is handled further along in the hyp's exception
emulation.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-16-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoKVM: arm64: Describe SCTLR2_ELx RESx masks
Oliver Upton [Tue, 8 Jul 2025 17:25:19 +0000 (10:25 -0700)]
KVM: arm64: Describe SCTLR2_ELx RESx masks

External abort injection will soon rely on a sanitised view of
SCTLR2_ELx to determine exception routing. Compute the RESx masks.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-15-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoKVM: arm64: Enable SCTLR2 when advertised to the guest
Oliver Upton [Tue, 8 Jul 2025 17:25:18 +0000 (10:25 -0700)]
KVM: arm64: Enable SCTLR2 when advertised to the guest

HCRX_EL2.SCTLR2En needs to be set for SCTLR2_EL1 to take effect in
hardware (in addition to disabling traps).

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-14-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoKVM: arm64: Context switch SCTLR2_ELx when advertised to the guest
Oliver Upton [Tue, 8 Jul 2025 17:25:17 +0000 (10:25 -0700)]
KVM: arm64: Context switch SCTLR2_ELx when advertised to the guest

Restore SCTLR2_EL1 with the correct value for the given context when
FEAT_SCTLR2 is advertised to the guest.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-13-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoKVM: arm64: Wire up SCTLR2_ELx sysreg descriptors
Oliver Upton [Tue, 8 Jul 2025 17:25:16 +0000 (10:25 -0700)]
KVM: arm64: Wire up SCTLR2_ELx sysreg descriptors

Set up the sysreg descriptors for SCTLR2_ELx, along with the associated
storage and VNCR mapping.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-12-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoKVM: arm64: nv: Describe trap behavior of SCTLR2_EL1
Oliver Upton [Tue, 8 Jul 2025 17:25:15 +0000 (10:25 -0700)]
KVM: arm64: nv: Describe trap behavior of SCTLR2_EL1

Add the complete trap description for SCTLR2_EL1, including FGT and the
inverted HCRX bit.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-11-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoKVM: arm64: nv: Advertise support for FEAT_RAS
Oliver Upton [Tue, 8 Jul 2025 17:25:14 +0000 (10:25 -0700)]
KVM: arm64: nv: Advertise support for FEAT_RAS

Now that the missing bits for vSError injection/deferral have been added
we can merrily claim support for FEAT_RAS.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-10-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoKVM: arm64: nv: Use guest hypervisor's vSError state
Oliver Upton [Tue, 8 Jul 2025 17:25:13 +0000 (10:25 -0700)]
KVM: arm64: nv: Use guest hypervisor's vSError state

When HCR_EL2.AMO is set, physical SErrors are routed to EL2 and virtual
SError injection is enabled for EL1. Conceptually treating
host-initiated SErrors as 'physical', this means we can delegate control
of the vSError injection context to the guest hypervisor when nesting &&
AMO is set.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-9-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoKVM: arm64: nv: Add FEAT_RAS vSError sys regs to table
Oliver Upton [Tue, 8 Jul 2025 17:25:12 +0000 (10:25 -0700)]
KVM: arm64: nv: Add FEAT_RAS vSError sys regs to table

Prepare to implement RAS for NV by adding the missing EL2 sysregs for
the vSError context.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-8-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoKVM: arm64: nv: Honor SError exception routing / masking
Oliver Upton [Tue, 8 Jul 2025 17:25:11 +0000 (10:25 -0700)]
KVM: arm64: nv: Honor SError exception routing / masking

To date KVM has used HCR_EL2.VSE to track the state of a pending SError
for the guest. With this bit set, hardware respects the EL1 exception
routing / masking rules and injects the vSError when appropriate.

This isn't correct for NV guests as hardware is oblivious to vEL2's
intentions for SErrors. Better yet, with FEAT_NV2 the guest can change
the routing behind our back as HCR_EL2 is redirected to memory. Cope
with this mess by:

 - Using a flag (instead of HCR_EL2.VSE) to track the pending SError
   state when SErrors are unconditionally masked for the current context

 - Resampling the routing / masking of a pending SError on every guest
   entry/exit

 - Emulating exception entry when SError routing implies a translation
   regime change

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-7-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoKVM: arm64: nv: Respect exception routing rules for SEAs
Oliver Upton [Tue, 8 Jul 2025 17:25:10 +0000 (10:25 -0700)]
KVM: arm64: nv: Respect exception routing rules for SEAs

Synchronous external aborts are taken to EL2 if ELIsInHost() or
HCR_EL2.TEA=1. Rework the SEA injection plumbing to respect the imposed
routing of the guest hypervisor and opportunistically rephrase things to
make their function a bit more obvious.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-6-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoKVM: arm64: Treat vCPU with pending SError as runnable
Oliver Upton [Tue, 8 Jul 2025 17:25:09 +0000 (10:25 -0700)]
KVM: arm64: Treat vCPU with pending SError as runnable

Per R_VRLPB, a pending SError is a WFI wakeup event regardless of
PSTATE.A, meaning that the vCPU is runnable. Sample VSE in addition to
the other IRQ lines.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-5-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoKVM: arm64: Add helper to identify a nested context
Marc Zyngier [Tue, 8 Jul 2025 17:25:08 +0000 (10:25 -0700)]
KVM: arm64: Add helper to identify a nested context

A common idiom in the KVM code is to check if we are currently
dealing with a "nested" context, defined as having NV enabled,
but being in the EL1&0 translation regime.

This is usually expressed as:

if (vcpu_has_nv(vcpu) && !is_hyp_ctxt(vcpu) ... )

which is a mouthful and a bit hard to read, specially when followed
by additional conditions.

Introduce a new helper that encapsulate these two terms, allowing
the above to be written as

if (is_nested_context(vcpu) ... )

which is both shorter and easier to read, and makes more obvious
the potential for simplification on some code paths.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-4-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoarm64: Detect FEAT_DoubleFault2
Oliver Upton [Tue, 8 Jul 2025 17:25:07 +0000 (10:25 -0700)]
arm64: Detect FEAT_DoubleFault2

KVM will soon support FEAT_DoubleFault2. Add a descriptor for the
corresponding ID register field.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-3-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoarm64: Detect FEAT_SCTLR2
Oliver Upton [Tue, 8 Jul 2025 17:25:06 +0000 (10:25 -0700)]
arm64: Detect FEAT_SCTLR2

KVM is about to pick up support for SCTLR2. Add cpucap for later use in
the guest/host context switch hot path.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250708172532.1699409-2-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2 months agoarm64: Kconfig: Enable GICv5
Lorenzo Pieralisi [Thu, 3 Jul 2025 10:25:21 +0000 (12:25 +0200)]
arm64: Kconfig: Enable GICv5

Enable GICv5 driver code for the ARM64 architecture.

Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-31-12e71f1b3528@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agodocs: arm64: gic-v5: Document booting requirements for GICv5
Lorenzo Pieralisi [Thu, 3 Jul 2025 10:25:20 +0000 (12:25 +0200)]
docs: arm64: gic-v5: Document booting requirements for GICv5

Document the requirements for booting a kernel on a system implementing
a GICv5 interrupt controller.

Specifically, other than DT/ACPI providing the required firmware
representation, define what traps must be disabled if the kernel is
booted at EL1 on a system where EL2 is implemented.

Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-30-12e71f1b3528@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2 months agoirqchip/gic-v5: Add GICv5 IWB support
Lorenzo Pieralisi [Thu, 3 Jul 2025 10:25:19 +0000 (12:25 +0200)]
irqchip/gic-v5: Add GICv5 IWB support

The GICv5 architecture implements the Interrupt Wire Bridge (IWB) in
order to support wired interrupts that cannot be connected directly
to an IRS and instead uses the ITS to translate a wire event into
an IRQ signal.

Add the wired-to-MSI IWB driver to manage IWB wired interrupts.

An IWB is connected to an ITS and it has its own deviceID for all
interrupt wires that it manages; the IWB input wire number must be
exposed to the ITS as an eventID with a 1:1 mapping.

This eventID is not programmable and therefore requires a new
msi_alloc_info_t flag to make sure the ITS driver does not allocate
an eventid for the wire but rather it uses the msi_alloc_info_t.hwirq
number to gather the ITS eventID.

Co-developed-by: Sascha Bischoff <sascha.bischoff@arm.com>
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Co-developed-by: Timothy Hayes <timothy.hayes@arm.com>
Signed-off-by: Timothy Hayes <timothy.hayes@arm.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-29-12e71f1b3528@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>