Marc Zyngier [Fri, 23 May 2025 09:59:43 +0000 (10:59 +0100)]
Merge branch kvm-arm64/misc-6.16 into kvmarm-master/next
* kvm-arm64/misc-6.16:
: .
: Misc changes and improvements for 6.16:
:
: - Add a new selftest for the SVE host state being corrupted by a guest
:
: - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2,
: ensuring that the window for interrupts is slightly bigger, and avoiding
: a pretty bad erratum on the AmpereOne HW
:
: - Replace a couple of open-coded on/off strings with str_on_off()
:
: - Get rid of the pKVM memblock sorting, which now appears to be superflous
:
: - Drop superflous clearing of ICH_LR_EOI in the LR when nesting
:
: - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from
: a pretty bad case of TLB corruption unless accesses to HCR_EL2 are
: heavily synchronised
:
: - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables
: in a human-friendly fashion
: .
KVM: arm64: Fix documentation for vgic_its_iter_next()
KVM: arm64: vgic-its: Add debugfs interface to expose ITS tables
arm64: errata: Work around AmpereOne's erratum AC04_CPU_23
KVM: arm64: nv: Remove clearing of ICH_LR<n>.EOI if ICH_LR<n>.HW == 1
KVM: arm64: Drop sort_memblock_regions()
KVM: arm64: selftests: Add test for SVE host corruption
KVM: arm64: Force HCR_EL2.xMO to 1 at all times in VHE mode
KVM: arm64: Replace ternary flags with str_on_off() helper
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Fri, 23 May 2025 09:58:57 +0000 (10:58 +0100)]
Merge branch kvm-arm64/nv-nv into kvmarm-master/next
* kvm-arm64/nv-nv:
: .
: Flick the switch on the NV support by adding the missing piece
: in the form of the VNCR page management. From the cover letter:
:
: "This is probably the most interesting bit of the whole NV adventure.
: So far, everything else has been a walk in the park, but this one is
: where the real fun takes place.
:
: With FEAT_NV2, most of the NV support revolves around tricking a guest
: into accessing memory while it tries to access system registers. The
: hypervisor's job is to handle the context switch of the actual
: registers with the state in memory as needed."
: .
KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section
KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held
KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating
KVM: arm64: Document NV caps and vcpu flags
KVM: arm64: Allow userspace to request KVM_ARM_VCPU_EL2*
KVM: arm64: nv: Remove dead code from ERET handling
KVM: arm64: nv: Plumb TLBI S1E2 into system instruction dispatch
KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2
KVM: arm64: nv: Program host's VNCR_EL2 to the fixmap address
KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers
KVM: arm64: nv: Handle mapping of VNCR_EL2 at EL2
KVM: arm64: nv: Handle VNCR_EL2-triggered faults
KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2
KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2
KVM: arm64: nv: Don't adjust PSTATE.M when L2 is nesting
KVM: arm64: nv: Move TLBI range decoding to a helper
KVM: arm64: nv: Snapshot S1 ASID tagging information during walk
KVM: arm64: nv: Extract translation helper from the AT code
KVM: arm64: nv: Allocate VNCR page when required
arm64: sysreg: Add layout for VNCR_EL2
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Fri, 23 May 2025 09:58:34 +0000 (10:58 +0100)]
Merge branch kvm-arm64/at-fixes-6.16 into kvmarm-master/next
* kvm-arm64/at-fixes-6.16:
: .
: Set of fixes for Address Translation (AT) instruction emulation,
: which affect the (not yet upstream) NV support.
:
: From the cover letter:
:
: "Here's a small series of fixes for KVM's implementation of address
: translation (aka the AT S1* instructions), addressing a number of
: issues in increasing levels of severity:
:
: - We misreport PAR_EL1.PTW in a number of occasions, including state
: that is not possible as per the architecture definition
:
: - We don't handle access faults at all, and that doesn't play very
: well with the rest of the VNCR stuff
:
: - AT S1E{0,1} from EL2 with HCR_EL2.{E2H,TGE}={1,1} will absolutely
: take the host down, no questions asked"
: .
KVM: arm64: Don't feed uninitialised data to HCR_EL2
KVM: arm64: Teach address translation about access faults
KVM: arm64: Fix PAR_EL1.{PTW,S} reporting on AT S1E*
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Fri, 23 May 2025 09:58:15 +0000 (10:58 +0100)]
Merge branch kvm-arm64/fgt-masks into kvmarm-master/next
* kvm-arm64/fgt-masks: (43 commits)
: .
: Large rework of the way KVM deals with trap bits in conjunction with
: the CPU feature registers. It now draws a direct link between which
: the feature set, the system registers that need to UNDEF to match
: the configuration and bits that need to behave as RES0 or RES1 in
: the trap registers that are visible to the guest.
:
: Best of all, these definitions are mostly automatically generated
: from the JSON description published by ARM under a permissive
: license.
: .
KVM: arm64: Handle TSB CSYNC traps
KVM: arm64: Add FGT descriptors for FEAT_FGT2
KVM: arm64: Allow sysreg ranges for FGT descriptors
KVM: arm64: Add context-switch for FEAT_FGT2 registers
KVM: arm64: Add trap routing for FEAT_FGT2 registers
KVM: arm64: Add sanitisation for FEAT_FGT2 registers
KVM: arm64: Add FEAT_FGT2 registers to the VNCR page
KVM: arm64: Use HCR_EL2 feature map to drive fixed-value bits
KVM: arm64: Use HCRX_EL2 feature map to drive fixed-value bits
KVM: arm64: Allow kvm_has_feat() to take variable arguments
KVM: arm64: Use FGT feature maps to drive RES0 bits
KVM: arm64: Validate FGT register descriptions against RES0 masks
KVM: arm64: Switch to table-driven FGU configuration
KVM: arm64: Handle PSB CSYNC traps
KVM: arm64: Use KVM-specific HCRX_EL2 RES0 mask
KVM: arm64: Remove hand-crafted masks for FGT registers
KVM: arm64: Use computed FGT masks to setup FGT registers
KVM: arm64: Propagate FGT masks to the nVHE hypervisor
KVM: arm64: Unconditionally configure fine-grain traps
KVM: arm64: Use computed masks as sanitisers for FGT registers
...
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Fri, 23 May 2025 09:57:44 +0000 (10:57 +0100)]
Merge branch kvm-arm64/mte-frac into kvmarm-master/next
* kvm-arm64/mte-frac:
: .
: Prevent FEAT_MTE_ASYNC from being accidently exposed to a guest,
: courtesy of Ben Horgan. From the cover letter:
:
: "The ID_AA64PFR1_EL1.MTE_frac field is currently hidden from KVM.
: However, when ID_AA64PFR1_EL1.MTE==2, ID_AA64PFR1_EL1.MTE_frac==0
: indicates that MTE_ASYNC is supported. On a host with
: ID_AA64PFR1_EL1.MTE==2 but without MTE_ASYNC support a guest with the
: MTE capability enabled will incorrectly see MTE_ASYNC advertised as
: supported. This series fixes that."
: .
KVM: selftests: Confirm exposing MTE_frac does not break migration
KVM: arm64: Make MTE_frac masking conditional on MTE capability
arm64/sysreg: Expose MTE_frac so that it is visible to KVM
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Fri, 23 May 2025 09:57:32 +0000 (10:57 +0100)]
Merge branch kvm-arm64/ubsan-el2 into kvmarm-master/next
* kvm-arm64/ubsan-el2:
: .
: Add UBSAN support to the EL2 portion of KVM, reusing most of the
: existing logic provided by CONFIG_IBSAN_TRAP.
:
: Patches courtesy of Mostafa Saleh.
: .
KVM: arm64: Handle UBSAN faults
KVM: arm64: Introduce CONFIG_UBSAN_KVM_EL2
ubsan: Remove regs from report_ubsan_failure()
arm64: Introduce esr_is_ubsan_brk()
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Fri, 23 May 2025 09:56:25 +0000 (10:56 +0100)]
Merge branch kvm-arm64/pkvm-np-thp-6.16 into kvmarm-master/next
* kvm-arm64/pkvm-np-thp-6.16: (21 commits)
: .
: Large mapping support for non-protected pKVM guests, courtesy of
: Vincent Donnefort. From the cover letter:
:
: "This series adds support for stage-2 huge mappings (PMD_SIZE) to pKVM
: np-guests, that is installing PMD-level mappings in the stage-2,
: whenever the stage-1 is backed by either Hugetlbfs or THPs."
: .
KVM: arm64: np-guest CMOs with PMD_SIZE fixmap
KVM: arm64: Stage-2 huge mappings for np-guests
KVM: arm64: Add a range to pkvm_mappings
KVM: arm64: Convert pkvm_mappings to interval tree
KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest()
KVM: arm64: Add a range to __pkvm_host_wrprotect_guest()
KVM: arm64: Add a range to __pkvm_host_unshare_guest()
KVM: arm64: Add a range to __pkvm_host_share_guest()
KVM: arm64: Introduce for_each_hyp_page
KVM: arm64: Handle huge mappings for np-guest CMOs
KVM: arm64: Extend pKVM selftest for np-guests
KVM: arm64: Selftest for pKVM transitions
KVM: arm64: Don't WARN from __pkvm_host_share_guest()
KVM: arm64: Add .hyp.data section
KVM: arm64: Unconditionally cross check hyp state
KVM: arm64: Defer EL2 stage-1 mapping on share
KVM: arm64: Move hyp state to hyp_vmemmap
KVM: arm64: Introduce {get,set}_host_state() helpers
KVM: arm64: Use 0b11 for encoding PKVM_NOPAGE
KVM: arm64: Fix pKVM page-tracking comments
...
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Thu, 22 May 2025 08:15:02 +0000 (09:15 +0100)]
KVM: arm64: Fix documentation for vgic_its_iter_next()
As reported by the build robot, the documentation for vgic_its_iter_next()
contains a typo. Fix it.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/
202505221421.KAuWlmSr-lkp@intel.com/
Signed-off-by: Marc Zyngier <maz@kernel.org>
Vincent Donnefort [Wed, 21 May 2025 12:48:34 +0000 (13:48 +0100)]
KVM: arm64: np-guest CMOs with PMD_SIZE fixmap
With the introduction of stage-2 huge mappings in the pKVM hypervisor,
guest pages CMO is needed for PMD_SIZE size. Fixmap only supports
PAGE_SIZE and iterating over the huge-page is time consuming (mostly due
to TLBI on hyp_fixmap_unmap) which is a problem for EL2 latency.
Introduce a shared PMD_SIZE fixmap (hyp_fixblock_map/hyp_fixblock_unmap)
to improve guest page CMOs when stage-2 huge mappings are installed.
On a Pixel6, the iterative solution resulted in a latency of ~700us,
while the PMD_SIZE fixmap reduces it to ~100us.
Because of the horrendous private range allocation that would be
necessary, this is disabled for 64KiB pages systems.
Suggested-by: Quentin Perret <qperret@google.com>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-11-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Vincent Donnefort [Wed, 21 May 2025 12:48:33 +0000 (13:48 +0100)]
KVM: arm64: Stage-2 huge mappings for np-guests
Now np-guests hypercalls with range are supported, we can let the
hypervisor to install block mappings whenever the Stage-1 allows it,
that is when backed by either Hugetlbfs or THPs. The size of those block
mappings is limited to PMD_SIZE.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-10-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Quentin Perret [Wed, 21 May 2025 12:48:32 +0000 (13:48 +0100)]
KVM: arm64: Add a range to pkvm_mappings
In preparation for supporting stage-2 huge mappings for np-guest, add a
nr_pages member for pkvm_mappings to allow EL1 to track the size of the
stage-2 mapping.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-9-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Quentin Perret [Wed, 21 May 2025 12:48:31 +0000 (13:48 +0100)]
KVM: arm64: Convert pkvm_mappings to interval tree
In preparation for supporting stage-2 huge mappings for np-guest, let's
convert pgt.pkvm_mappings to an interval tree.
No functional change intended.
Suggested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-8-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Vincent Donnefort [Wed, 21 May 2025 12:48:30 +0000 (13:48 +0100)]
KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest()
In preparation for supporting stage-2 huge mappings for np-guest. Add a
nr_pages argument to the __pkvm_host_test_clear_young_guest hypercall.
This range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is
512 on a 4K-pages system).
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-7-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Vincent Donnefort [Wed, 21 May 2025 12:48:29 +0000 (13:48 +0100)]
KVM: arm64: Add a range to __pkvm_host_wrprotect_guest()
In preparation for supporting stage-2 huge mappings for np-guest. Add a
nr_pages argument to the __pkvm_host_wrprotect_guest hypercall. This
range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512
on a 4K-pages system).
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-6-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Vincent Donnefort [Wed, 21 May 2025 12:48:28 +0000 (13:48 +0100)]
KVM: arm64: Add a range to __pkvm_host_unshare_guest()
In preparation for supporting stage-2 huge mappings for np-guest. Add a
nr_pages argument to the __pkvm_host_unshare_guest hypercall. This range
supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a
4K-pages system).
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-5-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Vincent Donnefort [Wed, 21 May 2025 12:48:27 +0000 (13:48 +0100)]
KVM: arm64: Add a range to __pkvm_host_share_guest()
In preparation for supporting stage-2 huge mappings for np-guest. Add a
nr_pages argument to the __pkvm_host_share_guest hypercall. This range
supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a
4K-pages system).
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-4-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Vincent Donnefort [Wed, 21 May 2025 12:48:26 +0000 (13:48 +0100)]
KVM: arm64: Introduce for_each_hyp_page
Add a helper to iterate over the hypervisor vmemmap. This will be
particularly handy with the introduction of huge mapping support
for the np-guest stage-2.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-3-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Vincent Donnefort [Wed, 21 May 2025 12:48:25 +0000 (13:48 +0100)]
KVM: arm64: Handle huge mappings for np-guest CMOs
clean_dcache_guest_page() and invalidate_icache_guest_page() accept a
size as an argument. But they also rely on fixmap, which can only map a
single PAGE_SIZE page.
With the upcoming stage-2 huge mappings for pKVM np-guests, those
callbacks will get size > PAGE_SIZE. Loop the CMOs on a PAGE_SIZE basis
until the whole range is done.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20250521124834.1070650-2-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 21 May 2025 13:33:43 +0000 (14:33 +0100)]
Merge branch kvm-arm64/pkvm-selftest-6.16 into kvm-arm64/pkvm-np-thp-6.16
* kvm-arm64/pkvm-selftest-6.16:
: .
: pKVM selftests covering the memory ownership transitions by
: Quentin Perret. From the initial cover letter:
:
: "We have recently found a bug [1] in the pKVM memory ownership
: transitions by code inspection, but it could have been caught with a
: test.
:
: Introduce a boot-time selftest exercising all the known pKVM memory
: transitions and importantly checks the rejection of illegal transitions.
:
: The new test is hidden behind a new Kconfig option separate from
: CONFIG_EL2_NVHE_DEBUG on purpose as that has side effects on the
: transition checks ([1] doesn't reproduce with EL2 debug enabled).
:
: [1] https://lore.kernel.org/kvmarm/
20241128154406.602875-1-qperret@google.com/"
: .
KVM: arm64: Extend pKVM selftest for np-guests
KVM: arm64: Selftest for pKVM transitions
KVM: arm64: Don't WARN from __pkvm_host_share_guest()
KVM: arm64: Add .hyp.data section
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 21 May 2025 13:33:39 +0000 (14:33 +0100)]
Merge branch kvm-arm64/pkvm-6.16 into kvm-arm64/pkvm-np-thp-6.16
* kvm-arm64/pkvm-6.16:
: .
: pKVM memory management cleanups, courtesy of Quentin Perret.
: From the cover letter:
:
: "This series moves the hypervisor's ownership state to the hyp_vmemmap,
: as discussed in [1]. The two main benefits are:
:
: 1. much cheaper hyp state lookups, since we can avoid the hyp stage-1
: page-table walk;
:
: 2. de-correlates the hyp state from the presence of a mapping in the
: linear map range of the hypervisor; which enables a bunch of
: clean-ups in the existing code and will simplify the introduction of
: other features in the future (hyp tracing, ...)"
: .
KVM: arm64: Unconditionally cross check hyp state
KVM: arm64: Defer EL2 stage-1 mapping on share
KVM: arm64: Move hyp state to hyp_vmemmap
KVM: arm64: Introduce {get,set}_host_state() helpers
KVM: arm64: Use 0b11 for encoding PKVM_NOPAGE
KVM: arm64: Fix pKVM page-tracking comments
KVM: arm64: Track SVE state in the hypervisor vcpu structure
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 21 May 2025 10:04:11 +0000 (11:04 +0100)]
KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section
The conversion to kvm_release_faultin_page() missed the requirement
for this to be called within a critical section with mmu_lock held
for write. Move this call up to satisfy this requirement.
Fixes:
069a05e535496 ("KVM: arm64: nv: Handle VNCR_EL2-triggered faults")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 21 May 2025 09:58:29 +0000 (10:58 +0100)]
KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held
Calling invalidate_vncr_va() without the mmu_lock held for write
is a bad idea, and lockdep tells you about that.
Fixes:
4ffa72ad8f37e ("KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Tue, 20 May 2025 14:41:16 +0000 (15:41 +0100)]
KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating
When translating a VNCR translation fault, we start by marking the
current SW-managed TLB as invalid, so that we can populate it
in place. This is, however, done without the mmu_lock held.
A consequence of this is that another CPU dealing with TLBI
emulation can observe a translation still flagged as valid, but
with invalid walk results (such as pgshift being 0). Bad things
can result from this, such as a BUG() in pgshift_level_to_ttl().
Fix it by taking the mmu_lock for write to perform this local
invalidation, and use invalidate_vncr() instead of open-coding
the write to the 'valid' flag.
Fixes:
069a05e535496 ("KVM: arm64: nv: Handle VNCR_EL2-triggered faults")
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250520144116.3667978-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Jing Zhang [Thu, 20 Feb 2025 22:42:46 +0000 (14:42 -0800)]
KVM: arm64: vgic-its: Add debugfs interface to expose ITS tables
This commit introduces a debugfs interface to display the contents of the
VGIC Interrupt Translation Service (ITS) tables.
The ITS tables map Device/Event IDs to Interrupt IDs and target processors.
Exposing this information through debugfs allows for easier inspection and
debugging of the interrupt routing configuration.
The debugfs interface presents the ITS table data in a tabular format:
Device ID: 0x0, Event ID Range: [0 - 31]
EVENT_ID INTID HWINTID TARGET COL_ID HW
-----------------------------------------------
0 8192 0 0 0 0
1 8193 0 0 0 0
2 8194 0 2 2 0
Device ID: 0x18, Event ID Range: [0 - 3]
EVENT_ID INTID HWINTID TARGET COL_ID HW
-----------------------------------------------
0 8225 0 0 0 0
1 8226 0 1 1 0
2 8227 0 3 3 0
Device ID: 0x10, Event ID Range: [0 - 7]
EVENT_ID INTID HWINTID TARGET COL_ID HW
-----------------------------------------------
0 8229 0 3 3 1
1 8230 0 0 0 1
2 8231 0 1 1 1
3 8232 0 2 2 1
4 8233 0 3 3 1
The output is generated using the seq_file interface, allowing for efficient
handling of potentially large ITS tables.
This interface is read-only and does not allow modification of the ITS
tables. It is intended for debugging and informational purposes only.
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Link: https://lore.kernel.org/r/20250220224247.2017205-1-jingzhangos@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
D Scott Phillips [Tue, 13 May 2025 18:45:14 +0000 (11:45 -0700)]
arm64: errata: Work around AmpereOne's erratum AC04_CPU_23
On AmpereOne AC04, updates to HCR_EL2 can rarely corrupt simultaneous
translations for data addresses initiated by load/store instructions.
Only instruction initiated translations are vulnerable, not translations
from prefetches for example. A DSB before the store to HCR_EL2 is
sufficient to prevent older instructions from hitting the window for
corruption, and an ISB after is sufficient to prevent younger
instructions from hitting the window for corruption.
Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20250513184514.2678288-1-scott@os.amperecomputing.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Mon, 27 Jan 2025 11:58:38 +0000 (11:58 +0000)]
KVM: arm64: Handle TSB CSYNC traps
The architecture introduces a trap for TSB CSYNC that fits in
the same EC as LS64 and PSB CSYNC. Let's deal with it in a similar
way.
It's not that we expect this to be useful any time soon anyway.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Fri, 25 Apr 2025 13:00:01 +0000 (14:00 +0100)]
KVM: arm64: Add FGT descriptors for FEAT_FGT2
Bulk addition of all the FGT2 traps reported with EC == 0x18,
as described in the 2025-03 JSON drop.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Fri, 25 Apr 2025 12:53:18 +0000 (13:53 +0100)]
KVM: arm64: Allow sysreg ranges for FGT descriptors
Just like we allow sysreg ranges for Coarse Grained Trap descriptors,
allow them for Fine Grain Traps as well.
This comes with a warning that not all ranges are suitable for this
particular definition of ranges.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Tue, 22 Apr 2025 20:20:18 +0000 (21:20 +0100)]
KVM: arm64: Add context-switch for FEAT_FGT2 registers
Just like the rest of the FGT registers, perform a switch of the
FGT2 equivalent. This avoids the host configuration leaking into
the guest...
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Fri, 25 Apr 2025 16:42:49 +0000 (17:42 +0100)]
KVM: arm64: Add trap routing for FEAT_FGT2 registers
Similarly to the FEAT_FGT registers, pick the correct FEAT_FGT2
register when a sysreg trap indicates they could be responsible
for the exception.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Tue, 22 Apr 2025 20:16:34 +0000 (21:16 +0100)]
KVM: arm64: Add sanitisation for FEAT_FGT2 registers
Just like the FEAT_FGT registers, treat the FGT2 variant the same
way. THis is a large update, but a fairly mechanical one.
The config dependencies are extracted from the 2025-03 JSON drop.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Tue, 22 Apr 2025 18:21:46 +0000 (19:21 +0100)]
KVM: arm64: Add FEAT_FGT2 registers to the VNCR page
The FEAT_FGT2 registers are part of the VNCR page. Describe the
corresponding offsets and add them to the vcpu sysreg enumeration.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Tue, 4 Feb 2025 10:46:41 +0000 (10:46 +0000)]
KVM: arm64: Use HCR_EL2 feature map to drive fixed-value bits
Similarly to other registers, describe which HCR_EL2 bit depends
on which feature, and use this to compute the RES0 status of these
bits.
An additional complexity stems from the status of some bits such
as E2H and RW, which do not had a RESx status, but still take
a fixed value due to implementation choices in KVM.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Sun, 9 Feb 2025 14:51:23 +0000 (14:51 +0000)]
KVM: arm64: Use HCRX_EL2 feature map to drive fixed-value bits
Similarly to other registers, describe which HCR_EL2 bit depends
on which feature, and use this to compute the RES0 status of these
bits.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Sun, 9 Feb 2025 13:38:35 +0000 (13:38 +0000)]
KVM: arm64: Allow kvm_has_feat() to take variable arguments
In order to be able to write more compact (and easier to read) code,
let kvm_has_feat() and co take variable arguments. This enables
constructs such as:
#define FEAT_SME ID_AA64PFR1_EL1, SME, IMP
if (kvm_has_feat(kvm, FEAT_SME))
[...]
which is admitedly more readable.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Sun, 9 Feb 2025 14:45:29 +0000 (14:45 +0000)]
KVM: arm64: Use FGT feature maps to drive RES0 bits
Another benefit of mapping bits to features is that it becomes trivial
to define which bits should be handled as RES0.
Let's apply this principle to the guest's view of the FGT registers.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 14 May 2025 10:35:00 +0000 (11:35 +0100)]
KVM: arm64: Document NV caps and vcpu flags
Describe the two new vcpu flags that control NV, together with
the capabilities that advertise them.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20250514103501.2225951-18-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 14 May 2025 10:34:59 +0000 (11:34 +0100)]
KVM: arm64: Allow userspace to request KVM_ARM_VCPU_EL2*
Since we're (almost) feature complete, let's allow userspace to
request KVM_ARM_VCPU_EL2* by bumping KVM_VCPU_MAX_FEATURES up.
We also now advertise the features to userspace with new capabilities.
It's going to be great...
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Link: https://lore.kernel.org/r/20250514103501.2225951-17-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 14 May 2025 10:34:58 +0000 (11:34 +0100)]
KVM: arm64: nv: Remove dead code from ERET handling
Cleanly, this code cannot trigger, since we filter this from the
caller. Drop it.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-16-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 14 May 2025 10:34:57 +0000 (11:34 +0100)]
KVM: arm64: nv: Plumb TLBI S1E2 into system instruction dispatch
Now that we have to handle TLBI S1E2 in the core code, plumb the
sysinsn dispatch code into it, so that these instructions don't
just UNDEF anymore.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-15-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 14 May 2025 10:34:56 +0000 (11:34 +0100)]
KVM: arm64: nv: Add S1 TLB invalidation primitive for VNCR_EL2
A TLBI by VA for S1 must take effect on our pseudo-TLB for VNCR
and potentially knock the fixmap mapping. Even worse, that TLBI
must be able to work cross-vcpu.
For that, we track on a per-VM basis if any VNCR is mapped, using
an atomic counter. Whenever a TLBI S1E2 occurs and that this counter
is non-zero, we take the long road all the way back to the core code.
There, we iterate over all vcpus and check whether this particular
invalidation has any damaging effect. If it does, we nuke the pseudo
TLB and the corresponding fixmap.
Yes, this is costly.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-14-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 14 May 2025 10:34:55 +0000 (11:34 +0100)]
KVM: arm64: nv: Program host's VNCR_EL2 to the fixmap address
Since we now have a way to map the guest's VNCR_EL2 on the host,
we can point the host's VNCR_EL2 to it and go full circle!
Note that we unconditionally assign the fixmap to VNCR_EL2,
irrespective of the guest's version being mapped or not. We want
to take a fault on first access, so the fixmap either contains
something guranteed to be either invalid or a guest mapping.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-13-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 14 May 2025 10:34:54 +0000 (11:34 +0100)]
KVM: arm64: nv: Handle VNCR_EL2 invalidation from MMU notifiers
During an invalidation triggered by an MMU notifier, we need to
make sure we can drop the *host* mapping that would have been
translated by the stage-2 mapping being invalidated.
For the moment, the invalidation is pretty brutal, as we nuke
the full IPA range, and therefore any VNCR_EL2 mapping.
At some point, we'll be more light-weight, and the code is able
to deal with something more targetted.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-12-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 14 May 2025 10:34:53 +0000 (11:34 +0100)]
KVM: arm64: nv: Handle mapping of VNCR_EL2 at EL2
Now that we can handle faults triggered through VNCR_EL2, we need
to map the corresponding page at EL2. But where, you'll ask?
Since each CPU in the system can run a vcpu, we need a per-CPU
mapping. For that, we carve a NR_CPUS range in the fixmap, giving
us a per-CPU va at which to map the guest's VNCR's page.
The mapping occurs both on vcpu load and on the back of a fault,
both generating a request that will take care of the mapping.
That mapping will also get dropped on vcpu put.
Yes, this is a bit heavy handed, but it is simple. Eventually,
we may want to have a per-VM, per-CPU mapping, which would avoid
all the TLBI overhead.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-11-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 14 May 2025 10:34:52 +0000 (11:34 +0100)]
KVM: arm64: nv: Handle VNCR_EL2-triggered faults
As VNCR_EL2.BADDR contains a VA, it is bound to trigger faults.
These faults can have multiple source:
- We haven't mapped anything on the host: we need to compute the
resulting translation, populate a TLB, and eventually map
the corresponding page
- The permissions are out of whack: we need to tell the guest about
this state of affairs
Note that the kernel doesn't support S1POE for itself yet, so
the particular case of a VNCR page mapped with no permissions
or with write-only permissions is not correctly handled yet.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-10-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 14 May 2025 10:34:51 +0000 (11:34 +0100)]
KVM: arm64: nv: Add userspace and guest handling of VNCR_EL2
Plug VNCR_EL2 in the vcpu_sysreg enum, define its RES0/RES1 bits,
and make it accessible to userspace when the VM is configured to
support FEAT_NV2.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-9-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 14 May 2025 10:34:50 +0000 (11:34 +0100)]
KVM: arm64: nv: Add pseudo-TLB backing VNCR_EL2
FEAT_NV2 introduces an interesting problem for NV, as VNCR_EL2.BADDR
is a virtual address in the EL2&0 (or EL2, but we thankfully ignore
this) translation regime.
As we need to replicate such mapping in the real EL2, it means that
we need to remember that there is such a translation, and that any
TLBI affecting EL2 can possibly affect this translation.
It also means that any invalidation driven by an MMU notifier must
be able to shoot down any such mapping.
All in all, we need a data structure that represents this mapping,
and that is extremely close to a TLB. Given that we can only use
one of those per vcpu at any given time, we only allocate one.
No effort is made to keep that structure small. If we need to
start caching multiple of them, we may want to revisit that design
point. But for now, it is kept simple so that we can reason about it.
Oh, and add a braindump of how things are supposed to work, because
I will definitely page this out at some point. Yes, pun intended.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-8-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 14 May 2025 10:34:49 +0000 (11:34 +0100)]
KVM: arm64: nv: Don't adjust PSTATE.M when L2 is nesting
We currently check for HCR_EL2.NV being set to decide whether we
need to repaint PSTATE.M to say EL2 instead of EL1 on exit.
However, this isn't correct when L2 is itself a hypervisor, and
that L1 as set its own HCR_EL2.NV. That's because we "flatten"
the state and inherit parts of the guest's own setup. In that case,
we shouldn't adjust PSTATE.M, as this is really EL1 for both us
and the guest.
Instead of trying to try and work out how we ended-up with HCR_EL2.NV
being set by introspecting both the host and guest states, use
a per-CPU flag to remember the context (HYP or not), and use that
information to decide whether PSTATE needs tweaking.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-7-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 14 May 2025 10:34:48 +0000 (11:34 +0100)]
KVM: arm64: nv: Move TLBI range decoding to a helper
As we are about to expand out TLB invalidation capabilities to support
recursive virtualisation, move the decoding of a TLBI by range into
a helper that returns the base, the range and the ASID.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-6-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 14 May 2025 10:34:47 +0000 (11:34 +0100)]
KVM: arm64: nv: Snapshot S1 ASID tagging information during walk
We currently completely ignore any sort of ASID tagging during a S1
walk, as AT doesn't care about it.
However, such information is required if we are going to create
anything that looks like a TLB from this walk.
Let's capture it both the nG and ASID information while walking
the page tables.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-5-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 14 May 2025 10:34:46 +0000 (11:34 +0100)]
KVM: arm64: nv: Extract translation helper from the AT code
The address translation infrastructure is currently pretty tied to
the AT emulation.
However, we also need to features that require the use of VAs, such
as VNCR_EL2 (and maybe one of these days SPE), meaning that we need
a slightly more generic infrastructure.
Start this by introducing a new helper (__kvm_translate_va()) that
performs a S1 walk for a given translation regime, EL and PAN
settings.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-4-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 14 May 2025 10:34:45 +0000 (11:34 +0100)]
KVM: arm64: nv: Allocate VNCR page when required
If running a NV guest on an ARMv8.4-NV capable system, let's
allocate an additional page that will be used by the hypervisor
to fulfill system register accesses.
Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 14 May 2025 10:34:44 +0000 (11:34 +0100)]
arm64: sysreg: Add layout for VNCR_EL2
Now that we're about to emulate VNCR_EL2, we need its full layout.
Add it to the sysreg file.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250514103501.2225951-2-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Wei-Lin Chang [Mon, 12 May 2025 13:32:23 +0000 (21:32 +0800)]
KVM: arm64: nv: Remove clearing of ICH_LR<n>.EOI if ICH_LR<n>.HW == 1
In the case of ICH_LR<n>.HW == 1, bit 41 of LR is just a part of pINTID
without EOI meaning, and bit 41 will be zeroed by the subsequent clearing
of ICH_LR_PHYS_ID_MASK anyway.
No functional changes intended.
Signed-off-by: Wei-Lin Chang <r09922117@csie.ntu.edu.tw>
Link: https://lore.kernel.org/r/20250512133223.866999-1-r09922117@csie.ntu.edu.tw
Signed-off-by: Marc Zyngier <maz@kernel.org>
Ben Horgan [Mon, 12 May 2025 11:41:12 +0000 (12:41 +0100)]
KVM: selftests: Confirm exposing MTE_frac does not break migration
When MTE is supported but MTE_ASYMM is not (ID_AA64PFR1_EL1.MTE == 2)
ID_AA64PFR1_EL1.MTE_frac == 0xF indicates MTE_ASYNC is unsupported
and MTE_frac == 0 indicates it is supported.
As MTE_frac was previously unconditionally read as 0 from the guest
and user-space, check that using SET_ONE_REG to set it to 0 succeeds
but does not change MTE_frac from unsupported (0xF) to supported (0).
This is required as values originating from KVM from user-space must
be accepted to avoid breaking migration.
Also, to allow this MTE field to be tested, enable KVM_ARM_CAP_MTE
for the set_id_regs test. No effect on existing tests is expected.
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Link: https://lore.kernel.org/r/20250512114112.359087-4-ben.horgan@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Ben Horgan [Mon, 12 May 2025 11:41:11 +0000 (12:41 +0100)]
KVM: arm64: Make MTE_frac masking conditional on MTE capability
If MTE_frac is masked out unconditionally then the guest will always
see ID_AA64PFR1_EL1_MTE_frac as 0. However, a value of 0 when
ID_AA64PFR1_EL1_MTE is 2 indicates that MTE_ASYNC is supported. Hence, for
a host with ID_AA64PFR1_EL1_MTE==2 and ID_AA64PFR1_EL1_MTE_frac==0xf
(MTE_ASYNC unsupported) the guest would see MTE_ASYNC advertised as
supported whilst the host does not support it. Hence, expose the sanitised
value of MTE_frac to the guest and user-space.
As MTE_frac was previously hidden, always 0, and KVM must accept values
from KVM provided by user-space, when ID_AA64PFR1_EL1.MTE is 2 allow
user-space to set ID_AA64PFR1_EL1.MTE_frac to 0. However, ignore it to
avoid incorrectly claiming hardware support for MTE_ASYNC in the guest.
Note that linux does not check the value of ID_AA64PFR1_EL1_MTE_frac and
wrongly assumes that MTE async faults can be generated even on hardware
that does nto support them. This issue is not addressed here.
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Link: https://lore.kernel.org/r/20250512114112.359087-3-ben.horgan@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Ben Horgan [Mon, 12 May 2025 11:41:10 +0000 (12:41 +0100)]
arm64/sysreg: Expose MTE_frac so that it is visible to KVM
KVM exposes the sanitised ID registers to guests. Currently these ignore
the ID_AA64PFR1_EL1.MTE_frac field, meaning guests always see a value of
zero.
This is a problem for platforms without the MTE_ASYNC feature where
ID_AA64PFR1_EL1.MTE==0x2 and ID_AA64PFR1_EL1.MTE_frac==0xf. KVM forces
MTE_frac to zero, meaning the guest believes MTE_ASYNC is supported, when
no async fault will ever occur.
Before KVM can fix this, the architecture needs to sanitise the ID
register field for MTE_frac.
Linux itself does not use MTE_frac field and just assumes MTE async faults
can be generated if MTE is supported.
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Link: https://lore.kernel.org/r/20250512114112.359087-2-ben.horgan@arm.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Tue, 22 Apr 2025 12:26:12 +0000 (13:26 +0100)]
KVM: arm64: Don't feed uninitialised data to HCR_EL2
When the guest executes an AT S1E{0,1} from EL2, and that its
HCR_EL2.{E2H,TGE}=={1,1}, then this is a pure S1 translation
that doesn't involve a guest-supplied S2, and the full S1
context is already in place. This allows us to take a shortcut
and avoid save/restoring a bunch of registers.
However, we set HCR_EL2 to a value suitable for the use of AT
in guest context. And we do so by using the value that we saved.
Or not. In the case described above, we restore whatever junk
was on the stack, and carry on with it until the next entry.
Needless to say, this is completely broken.
But this also triggers the realisation that saving HCR_EL2 is
a bit pointless. We are always in host context at the point where
reach this code, and what we program to enter the guest is a known
value (vcpu->arch.hcr_el2).
Drop the pointless save/restore, and wrap the AT operations with
writes that switch between guest and host values for HCR_EL2.
Reported-by: D Scott Phillips <scott@os.amperecomputing.com>
Link: https://lore.kernel.org/r/20250422122612.2675672-4-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Tue, 22 Apr 2025 12:26:11 +0000 (13:26 +0100)]
KVM: arm64: Teach address translation about access faults
It appears that our S1 PTW is completely oblivious of access faults.
Teach the S1 translation code about it.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20250422122612.2675672-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Tue, 22 Apr 2025 12:26:10 +0000 (13:26 +0100)]
KVM: arm64: Fix PAR_EL1.{PTW,S} reporting on AT S1E*
When an AT S1E* operation fails, we need to report whether the
translation failed at S2, and whether this was during a S1 PTW.
But these two bits are not independent. PAR_EL1.PTW can only be
set of PAR_EL1.S is also set, and PAR_EL1.S can only be set on
its own when the full S1 PTW has succeeded, but that the access
itself is reporting a fault at S2.
As a result, it makes no sense to carry both ptw and s2 as parameters
to fail_s1_walk(), and they should be unified.
This fixes a number of cases where we were reporting PTW=1 *and*
S=0, which makes no sense.
Link: https://lore.kernel.org/r/20250422122612.2675672-2-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Sun, 9 Feb 2025 14:19:05 +0000 (14:19 +0000)]
KVM: arm64: Validate FGT register descriptions against RES0 masks
In order to point out to the unsuspecting KVM hacker that they
are missing something somewhere, validate that the known FGT bits
do not intersect with the corresponding RES0 mask, as computed at
boot time.
THis check is also performed at boot time, ensuring that there is
no runtime overhead.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Sun, 9 Feb 2025 14:05:01 +0000 (14:05 +0000)]
KVM: arm64: Switch to table-driven FGU configuration
Defining the FGU behaviour is extremely tedious. It relies on matching
each set of bits from FGT registers with am architectural feature, and
adding them to the FGU list if the corresponding feature isn't advertised
to the guest.
It is however relatively easy to dump most of that information from
the architecture JSON description, and use that to control the FGU bits.
Let's introduce a new set of tables descripbing the mapping between
FGT bits and features. Most of the time, this is only a lookup in
an idreg field, with a few more complex exceptions.
While this is obviously many more lines in a new file, this is
mostly generated, and is pretty easy to maintain.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Mon, 27 Jan 2025 11:58:38 +0000 (11:58 +0000)]
KVM: arm64: Handle PSB CSYNC traps
The architecture introduces a trap for PSB CSYNC that fits in
the same EC as LS64. Let's deal with it in a similar way as
LS64.
It's not that we expect this to be useful any time soon anyway.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Fri, 24 Jan 2025 19:04:26 +0000 (19:04 +0000)]
KVM: arm64: Use KVM-specific HCRX_EL2 RES0 mask
We do not have a computed table for HCRX_EL2, so statically define
the bits we know about. A warning will fire if the architecture
grows bits that are not handled yet.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Fri, 24 Jan 2025 17:21:17 +0000 (17:21 +0000)]
KVM: arm64: Remove hand-crafted masks for FGT registers
These masks are now useless, and can be removed.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Fri, 24 Jan 2025 17:20:31 +0000 (17:20 +0000)]
KVM: arm64: Use computed FGT masks to setup FGT registers
Flip the hyervisor FGT configuration over to the computed FGT
masks.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Gavin Shan [Tue, 11 Mar 2025 04:37:18 +0000 (14:37 +1000)]
KVM: arm64: Drop sort_memblock_regions()
Drop sort_memblock_regions() and avoid sorting the copied memory
regions to be ascending order on their base addresses, because the
source memory regions should have been sorted correctly when they
are added by memblock_add() or its variants.
This is generally reverting commit
a14307f5310c ("KVM: arm64: Sort
the hypervisor memblocks"). No functional changes intended.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Quentin Perret <qperret@google.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250311043718.91004-1-gshan@redhat.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Mostafa Saleh [Wed, 30 Apr 2025 16:27:11 +0000 (16:27 +0000)]
KVM: arm64: Handle UBSAN faults
As now UBSAN can be enabled, handle brk64 exits from UBSAN.
Re-use the decoding code from the kernel, and panic with
UBSAN message.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/r/20250430162713.1997569-5-smostafa@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Mostafa Saleh [Wed, 30 Apr 2025 16:27:10 +0000 (16:27 +0000)]
KVM: arm64: Introduce CONFIG_UBSAN_KVM_EL2
Add a new Kconfig CONFIG_UBSAN_KVM_EL2 for KVM which enables
UBSAN for EL2 code (in protected/nvhe/hvhe) modes.
This will re-use the same checks enabled for the kernel for
the hypervisor. The only difference is that for EL2 it always
emits a "brk" instead of implementing hooks as the hypervisor
can't print reports.
The KVM code will re-use the same code for the kernel
"report_ubsan_failure()" so #ifdefs are changed to also have this
code for CONFIG_UBSAN_KVM_EL2
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/r/20250430162713.1997569-4-smostafa@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Mostafa Saleh [Wed, 30 Apr 2025 16:27:09 +0000 (16:27 +0000)]
ubsan: Remove regs from report_ubsan_failure()
report_ubsan_failure() doesn't use argument regs, and soon it will
be called from the hypervisor context were regs are not available.
So, remove the unused argument.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Acked-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/r/20250430162713.1997569-3-smostafa@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Mostafa Saleh [Wed, 30 Apr 2025 16:27:08 +0000 (16:27 +0000)]
arm64: Introduce esr_is_ubsan_brk()
Soon, KVM is going to use this logic for hypervisor panics,
so add it in a wrapper that can be used by the hypervisor exit
handler to decode hyp panics.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/r/20250430162713.1997569-2-smostafa@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Fri, 24 Jan 2025 17:17:42 +0000 (17:17 +0000)]
KVM: arm64: Propagate FGT masks to the nVHE hypervisor
The nVHE hypervisor needs to have access to its own view of the FGT
masks, which unfortunately results in a bit of data duplication.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Mark Rutland [Tue, 19 Nov 2024 13:57:22 +0000 (13:57 +0000)]
KVM: arm64: Unconditionally configure fine-grain traps
... otherwise we can inherit the host configuration if this differs from
the KVM configuration.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
[maz: simplified a couple of things]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Fri, 24 Jan 2025 16:01:47 +0000 (16:01 +0000)]
KVM: arm64: Use computed masks as sanitisers for FGT registers
Now that we have computed RES0 bits, use them to sanitise the
guest view of FGT registers.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Fri, 24 Jan 2025 17:09:27 +0000 (17:09 +0000)]
KVM: arm64: Add description of FGT bits leading to EC!=0x18
The current FTP tables are only concerned with the bits generating
ESR_ELx.EC==0x18. However, we want an exhaustive view of what KVM
really knows about.
So let's add another small table that provides that extra information.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Fri, 24 Jan 2025 15:51:12 +0000 (15:51 +0000)]
KVM: arm64: Compute FGT masks from KVM's own FGT tables
In the process of decoupling KVM's view of the FGT bits from the
wider architectural state, use KVM's own FGT tables to build
a synthetic view of what is actually known.
This allows for some checking along the way.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Fri, 24 Jan 2025 15:44:32 +0000 (15:44 +0000)]
KVM: arm64: Plug FEAT_GCS handling
We don't seem to be handling the GCS-specific exception class.
Handle it by delivering an UNDEF to the guest, and populate the
relevant trap bits.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Fri, 24 Jan 2025 15:36:17 +0000 (15:36 +0000)]
KVM: arm64: Don't treat HCRX_EL2 as a FGT register
Treating HCRX_EL2 as yet another FGT register seems excessive, and
gets in a way of further improvements. It is actually simpler to
just be explicit about the masking, so just to that.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 3 Jul 2024 15:41:47 +0000 (16:41 +0100)]
KVM: arm64: Restrict ACCDATA_EL1 undef to FEAT_LS64_ACCDATA being disabled
We currently unconditionally make ACCDATA_EL1 accesses UNDEF.
As we are about to support it, restrict the UNDEF behaviour to cases
where FEAT_LS64_ACCDATA is not exposed to the guest.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Thu, 4 Jul 2024 17:58:01 +0000 (18:58 +0100)]
KVM: arm64: Handle trapping of FEAT_LS64* instructions
We generally don't expect FEAT_LS64* instructions to trap, unless
they are trapped by a guest hypervisor.
Otherwise, this is just the guest playing tricks on us by using
an instruction that isn't advertised, which we handle with a well
deserved UNDEF.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Sat, 26 Apr 2025 11:05:04 +0000 (12:05 +0100)]
KVM: arm64: Simplify handling of negative FGT bits
check_fgt_bit() and triage_sysreg_trap() implement the same thing
twice for no good reason. We have to lookup the FGT register twice,
as we don't communicate it. Similarly, we extract the register value
at the wrong spot.
Reorganise the code in a more logical way so that things are done
at the correct location, removing a lot of duplication.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Sat, 26 Apr 2025 10:42:15 +0000 (11:42 +0100)]
KVM: arm64: Tighten handling of unknown FGT groups
triage_sysreg_trap() assumes that it knows all the possible values
for FGT groups, which won't be the case as we start adding more
FGT registers (unless we add everything in one go, which is obviously
undesirable).
At the same time, it doesn't offer much in terms of debugging info
when things go wrong.
Turn the "__NR_FGT_GROUP_IDS__" case into a default, covering any
unhandled value, and give the kernel hacker a bit of a clue about
what's wrong (system register and full trap descriptor).
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Tue, 22 Apr 2025 18:23:41 +0000 (19:23 +0100)]
arm64: Add FEAT_FGT2 capability
As we will eventually have to context-switch the FEAT_FGT2 registers
in KVM (something that has been completely ignored so far), add
a new cap that we will be able to check for.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Thu, 4 Jul 2024 17:45:22 +0000 (18:45 +0100)]
arm64: Add syndrome information for trapped LD64B/ST64B{,V,V0}
Provide the architected EC and ISS values for all the FEAT_LS64*
instructions.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Sat, 26 Apr 2025 10:17:13 +0000 (11:17 +0100)]
arm64: tools: Resync sysreg.h
Perform a bulk resync of tools/arch/arm64/include/asm/sysreg.h.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Tue, 11 Mar 2025 09:36:28 +0000 (09:36 +0000)]
arm64: Remove duplicated sysreg encodings
A bunch of sysregs are now generated from the sysreg file, so no
need to carry separate definitions.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Fri, 25 Apr 2025 12:47:18 +0000 (13:47 +0100)]
arm64: sysreg: Add system instructions trapped by HFGIRT2_EL2
Add the new CMOs trapped by HFGITR2_EL2.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Fri, 25 Apr 2025 12:44:31 +0000 (13:44 +0100)]
arm64: sysreg: Add registers trapped by HDFG{R,W}TR2_EL2
Bulk addition of all the system registers trapped by HDFG{R,W}TR2_EL2.
The descriptions are extracted from the BSD-licenced JSON file part
of the 2025-03 drop from ARM.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Thu, 24 Apr 2025 18:47:09 +0000 (19:47 +0100)]
arm64: sysreg: Add registers trapped by HFG{R,W}TR2_EL2
Bulk addition of all the system registers trapped by HFG{R,W}TR2_EL2.
The descriptions are extracted from the BSD-licenced JSON file part
of the 2025-03 drop from ARM.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Tue, 6 May 2025 16:27:25 +0000 (17:27 +0100)]
arm64: sysreg: Update CPACR_EL1 description
Add the couple of fields introduced with FEAT_NV2p1.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 23 Apr 2025 10:28:05 +0000 (11:28 +0100)]
arm64: sysreg: Update TRBIDR_EL1 description
Add the missing MPAM field.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 23 Apr 2025 10:27:44 +0000 (11:27 +0100)]
arm64: sysreg: Update PMSIDR_EL1 description
Add the missing SME, ALTCLK, FPF, EFT. CRR and FDS fields.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 23 Apr 2025 10:26:42 +0000 (11:26 +0100)]
arm64: sysreg: Update ID_AA64PFR0_EL1 description
Add the missing RASv2 description.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Tue, 11 Mar 2025 14:44:30 +0000 (14:44 +0000)]
arm64: sysreg: Replace HFGxTR_EL2 with HFG{R,W}TR_EL2
Treating HFGRTR_EL2 and HFGWTR_EL2 identically was a mistake.
It makes things hard to reason about, has the potential to
introduce bugs by giving a meaning to bits that are really reserved,
and is in general a bad description of the architecture.
Given that #defines are cheap, let's describe both registers as
intended by the architecture, and repaint all the existing uses.
Yes, this is painful.
The registers themselves are generated from the JSON file in
an automated way.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Mon, 3 Feb 2025 17:27:23 +0000 (17:27 +0000)]
arm64: sysreg: Add layout for HCR_EL2
Add HCR_EL2 to the sysreg file, more or less directly generated
from the JSON file.
Since the generated names significantly differ from the existing
naming, express the old names in terms of the new one. One day, we'll
fix this mess, but I'm not in any hurry.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Mon, 10 Feb 2025 11:35:31 +0000 (11:35 +0000)]
arm64: sysreg: Update ID_AA64MMFR4_EL1 description
Resync the ID_AA64MMFR4_EL1 with the architectue description.
This results in:
- the new PoPS field
- the new NV2P1 value for the NV_frac field
- the new RMEGDI field
- the new SRMASK field
These fields have been generated from the reference JSON file.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Thu, 31 Oct 2024 08:42:11 +0000 (08:42 +0000)]
arm64: sysreg: Add ID_AA64ISAR1_EL1.LS64 encoding for FEAT_LS64WB
The 2024 extensions are adding yet another variant of LS64
(aptly named FEAT_LS64WB) supporting LS64 accesses to write-back
memory, as well as 32 byte single-copy atomic accesses using pairs
of FP registers.
Add the relevant encoding to ID_AA64ISAR1_EL1.LS64.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Quentin Perret [Wed, 16 Apr 2025 16:09:00 +0000 (16:09 +0000)]
KVM: arm64: Extend pKVM selftest for np-guests
The pKVM selftest intends to test as many memory 'transitions' as
possible, so extend it to cover sharing pages with non-protected guests,
including in the case of multi-sharing.
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416160900.3078417-5-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Quentin Perret [Wed, 16 Apr 2025 16:08:59 +0000 (16:08 +0000)]
KVM: arm64: Selftest for pKVM transitions
We have recently found a bug [1] in the pKVM memory ownership
transitions by code inspection, but it could have been caught with a
test.
Introduce a boot-time selftest exercising all the known pKVM memory
transitions and importantly checks the rejection of illegal transitions.
The new test is hidden behind a new Kconfig option separate from
CONFIG_EL2_NVHE_DEBUG on purpose as that has side effects on the
transition checks ([1] doesn't reproduce with EL2 debug enabled).
[1] https://lore.kernel.org/kvmarm/
20241128154406.602875-1-qperret@google.com/
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416160900.3078417-4-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
Quentin Perret [Wed, 16 Apr 2025 16:08:58 +0000 (16:08 +0000)]
KVM: arm64: Don't WARN from __pkvm_host_share_guest()
We currently WARN() if the host attempts to share a page that is not in
an acceptable state with a guest. This isn't strictly necessary and
makes testing much harder, so drop the WARN and make sure to propage the
error code instead.
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20250416160900.3078417-3-qperret@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>