mm, hugetlb: avoid passing a null nodemask when there is mbind policy
authorOscar Salvador <osalvador@suse.de>
Tue, 15 Apr 2025 12:15:03 +0000 (14:15 +0200)
committerAndrew Morton <akpm@linux-foundation.org>
Mon, 12 May 2025 00:48:30 +0000 (17:48 -0700)
Before trying to allocate a page, gather_surplus_pages() sets up a
nodemask for the nodes we can allocate from, but instead of passing the
nodemask down the road to the page allocator, it iterates over the nodes
within that nodemask right there, meaning that the page allocator will
receive a preferred_nid and a null nodemask.

This is a problem when using a memory policy, because it might be that the
page allocator ends up using a node as a fallback which is not represented
in the policy.

Avoid that by passing the nodemask directly to the page allocator, so it
can filter out fallback nodes that are not part of the nodemask.

Link: https://lkml.kernel.org/r/20250415121503.376811-1-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mm/hugetlb.c

index a2c1114478127f78e85e37beab53cd2576528e3b..38738293e6b672e783981e5ae85dab1f9dd14d96 100644 (file)
@@ -2419,7 +2419,6 @@ static int gather_surplus_pages(struct hstate *h, long delta)
        long i;
        long needed, allocated;
        bool alloc_ok = true;
-       int node;
        nodemask_t *mbind_nodemask, alloc_nodemask;
 
        mbind_nodemask = policy_mbind_nodemask(htlb_alloc_mask(h));
@@ -2443,21 +2442,12 @@ retry:
        for (i = 0; i < needed; i++) {
                folio = NULL;
 
-               /* Prioritize current node */
-               if (node_isset(numa_mem_id(), alloc_nodemask))
-                       folio = alloc_surplus_hugetlb_folio(h, htlb_alloc_mask(h),
-                                       numa_mem_id(), NULL);
-
-               if (!folio) {
-                       for_each_node_mask(node, alloc_nodemask) {
-                               if (node == numa_mem_id())
-                                       continue;
-                               folio = alloc_surplus_hugetlb_folio(h, htlb_alloc_mask(h),
-                                               node, NULL);
-                               if (folio)
-                                       break;
-                       }
-               }
+               /*
+                * It is okay to use NUMA_NO_NODE because we use numa_mem_id()
+                * down the road to pick the current node if that is the case.
+                */
+               folio = alloc_surplus_hugetlb_folio(h, htlb_alloc_mask(h),
+                                                   NUMA_NO_NODE, &alloc_nodemask);
                if (!folio) {
                        alloc_ok = false;
                        break;