KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating
authorMarc Zyngier <maz@kernel.org>
Tue, 20 May 2025 14:41:16 +0000 (15:41 +0100)
committerMarc Zyngier <maz@kernel.org>
Wed, 21 May 2025 08:53:08 +0000 (09:53 +0100)
When translating a VNCR translation fault, we start by marking the
current SW-managed TLB as invalid, so that we can populate it
in place. This is, however, done without the mmu_lock held.

A consequence of this is that another CPU dealing with TLBI
emulation can observe a translation still flagged as valid, but
with invalid walk results (such as pgshift being 0). Bad things
can result from this, such as a BUG() in pgshift_level_to_ttl().

Fix it by taking the mmu_lock for write to perform this local
invalidation, and use invalidate_vncr() instead of open-coding
the write to the 'valid' flag.

Fixes: 069a05e535496 ("KVM: arm64: nv: Handle VNCR_EL2-triggered faults")
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20250520144116.3667978-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
arch/arm64/kvm/nested.c

index 6a9fd4e0e789cf21722c46304a5251fb16f3182d..56b732003caa7e1cf7af7a337bac833beda2646e 100644 (file)
@@ -1179,13 +1179,24 @@ static int kvm_translate_vncr(struct kvm_vcpu *vcpu)
 
        vt = vcpu->arch.vncr_tlb;
 
-       vt->wi = (struct s1_walk_info) {
-               .regime = TR_EL20,
-               .as_el0 = false,
-               .pan    = false,
-       };
-       vt->wr = (struct s1_walk_result){};
-       vt->valid = false;
+       /*
+        * If we're about to walk the EL2 S1 PTs, we must invalidate the
+        * current TLB, as it could be sampled from another vcpu doing a
+        * TLBI *IS. A real CPU wouldn't do that, but we only keep a single
+        * translation, so not much of a choice.
+        *
+        * We also prepare the next walk wilst we're at it.
+        */
+       scoped_guard(write_lock, &vcpu->kvm->mmu_lock) {
+               invalidate_vncr(vt);
+
+               vt->wi = (struct s1_walk_info) {
+                       .regime = TR_EL20,
+                       .as_el0 = false,
+                       .pan    = false,
+               };
+               vt->wr = (struct s1_walk_result){};
+       }
 
        guard(srcu)(&vcpu->kvm->srcu);