mm/memory-failure: fix redundant updates for already poisoned pages
authorKyle Meyer <kyle.meyer@hpe.com>
Thu, 28 Aug 2025 18:38:20 +0000 (13:38 -0500)
committerAndrew Morton <akpm@linux-foundation.org>
Thu, 4 Sep 2025 00:10:38 +0000 (17:10 -0700)
Duplicate memory errors can be reported by multiple sources.

Passing an already poisoned page to action_result() causes issues:

* The amount of hardware corrupted memory is incorrectly updated.
* Per NUMA node MF stats are incorrectly updated.
* Redundant "already poisoned" messages are printed.

Avoid those issues by:

* Skipping hardware corrupted memory updates for already poisoned pages.
* Skipping per NUMA node MF stats updates for already poisoned pages.
* Dropping redundant "already poisoned" messages.

Make MF_MSG_ALREADY_POISONED consistent with other action_page_types and
make calls to action_result() consistent for already poisoned normal pages
and huge pages.

Link: https://lkml.kernel.org/r/aLCiHMy12Ck3ouwC@hpe.com
Fixes: b8b9488d50b7 ("mm/memory-failure: improve memory failure action_result messages")
Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
Reviewed-by: Jiaqi Yan <jiaqiyan@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jane Chu <jane.chu@oracle.com>
Acked-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Kyle Meyer <kyle.meyer@hpe.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Russ Anderson <russ.anderson@hpe.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mm/memory-failure.c

index fc30ca4804bf4763ed6b14313cbc8b323940c2ee..10b3c281c2aee073ef9644da4a4399db92f119d1 100644 (file)
@@ -956,7 +956,7 @@ static const char * const action_page_types[] = {
        [MF_MSG_BUDDY]                  = "free buddy page",
        [MF_MSG_DAX]                    = "dax page",
        [MF_MSG_UNSPLIT_THP]            = "unsplit thp",
-       [MF_MSG_ALREADY_POISONED]       = "already poisoned",
+       [MF_MSG_ALREADY_POISONED]       = "already poisoned page",
        [MF_MSG_UNKNOWN]                = "unknown page",
 };
 
@@ -1349,9 +1349,10 @@ static int action_result(unsigned long pfn, enum mf_action_page_type type,
 {
        trace_memory_failure_event(pfn, type, result);
 
-       num_poisoned_pages_inc(pfn);
-
-       update_per_node_mf_stats(pfn, result);
+       if (type != MF_MSG_ALREADY_POISONED) {
+               num_poisoned_pages_inc(pfn);
+               update_per_node_mf_stats(pfn, result);
+       }
 
        pr_err("%#lx: recovery action for %s: %s\n",
                pfn, action_page_types[type], action_name[result]);
@@ -2094,12 +2095,11 @@ retry:
                *hugetlb = 0;
                return 0;
        } else if (res == -EHWPOISON) {
-               pr_err("%#lx: already hardware poisoned\n", pfn);
                if (flags & MF_ACTION_REQUIRED) {
                        folio = page_folio(p);
                        res = kill_accessing_process(current, folio_pfn(folio), flags);
-                       action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
                }
+               action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
                return res;
        } else if (res == -EBUSY) {
                if (!(flags & MF_NO_RETRY)) {
@@ -2285,7 +2285,6 @@ try_again:
                goto unlock_mutex;
 
        if (TestSetPageHWPoison(p)) {
-               pr_err("%#lx: already hardware poisoned\n", pfn);
                res = -EHWPOISON;
                if (flags & MF_ACTION_REQUIRED)
                        res = kill_accessing_process(current, pfn, flags);