mm: add optional close() to struct vm_special_mapping
authorMichael Ellerman <mpe@ellerman.id.au>
Mon, 12 Aug 2024 08:26:02 +0000 (18:26 +1000)
committerAndrew Morton <akpm@linux-foundation.org>
Mon, 2 Sep 2024 03:26:12 +0000 (20:26 -0700)
Add an optional close() callback to struct vm_special_mapping.  It will be
used, by powerpc at least, to handle unmapping of the VDSO.

Although support for unmapping the VDSO was initially added for CRIU[1],
it is not desirable to guard that support behind
CONFIG_CHECKPOINT_RESTORE.

There are other known users of unmapping the VDSO which are not related to
CRIU, eg.  Valgrind [2] and void-ship [3].

The powerpc arch_unmap() hook has been in place for ~9 years, with no
ifdef, so there may be other unknown users that have come to rely on
unmapping the VDSO.  Even if the code was behind an ifdef, major distros
enable CHECKPOINT_RESTORE so users may not realise unmapping the VDSO
depends on that configuration option.

It's also undesirable to have such core mm behaviour behind a relatively
obscure CONFIG option.

Longer term the unmap behaviour should be standardised across
architectures, however that is complicated by the fact the VDSO pointer is
stored differently across architectures.  There was a previous attempt to
unify that handling [4], which could be revived.

See [5] for further discussion.

[1]: commit 83d3f0e90c6c ("powerpc/mm: tracking vDSO remap")
[2]: https://sourceware.org/git/?p=valgrind.git;a=commit;h=3a004915a2cbdcdebafc1612427576bf3321eef5
[3]: https://github.com/insanitybit/void-ship
[4]: https://lore.kernel.org/lkml/20210611180242.711399-17-dima@arista.com/
[5]: https://lore.kernel.org/linuxppc-dev/shiq5v3jrmyi6ncwke7wgl76ojysgbhrchsk32q4lbx2hadqqc@kzyy2igem256

Link: https://lkml.kernel.org/r/20240812082605.743814-1-mpe@ellerman.id.au
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
include/linux/mm_types.h
mm/mmap.c

index 003619fab20e57f70d4ec9a02746ce6c307d3df9..2419e60c9a7f285a565b8e92a54540765c491224 100644 (file)
@@ -1324,6 +1324,9 @@ struct vm_special_mapping {
 
        int (*mremap)(const struct vm_special_mapping *sm,
                     struct vm_area_struct *new_vma);
+
+       void (*close)(const struct vm_special_mapping *sm,
+                     struct vm_area_struct *vma);
 };
 
 enum tlb_flush_reason {
index 4a9c2329b09a83571160b8c12408272c0003dda7..933fdc1491a7501cd9cf5a2e5b9e1da8e107ae68 100644 (file)
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2045,10 +2045,16 @@ void vm_stat_account(struct mm_struct *mm, vm_flags_t flags, long npages)
 static vm_fault_t special_mapping_fault(struct vm_fault *vmf);
 
 /*
+ * Close hook, called for unmap() and on the old vma for mremap().
+ *
  * Having a close hook prevents vma merging regardless of flags.
  */
 static void special_mapping_close(struct vm_area_struct *vma)
 {
+       const struct vm_special_mapping *sm = vma->vm_private_data;
+
+       if (sm->close)
+               sm->close(sm, vma);
 }
 
 static const char *special_mapping_name(struct vm_area_struct *vma)