percpu: fix race on alloc failed warning limit
authorVlad Dumitrescu <vdumitrescu@nvidia.com>
Fri, 22 Aug 2025 22:55:16 +0000 (15:55 -0700)
committerAndrew Morton <akpm@linux-foundation.org>
Tue, 9 Sep 2025 06:45:10 +0000 (23:45 -0700)
The 'allocation failed, ...' warning messages can cause unlimited log
spam, contrary to the implementation's intent.

The warn_limit variable is accessed without synchronization.  If more than
<warn_limit> threads enter the warning path at the same time, the variable
will get decremented past 0.  Once it becomes negative, the non-zero check
will always return true leading to unlimited log spam.

Use atomic operation to access warn_limit and change condition to test for
non-negative (>= 0) - atomic_dec_if_positive will return -1 once
warn_limit becomes 0.  Continue to print disable message alongside the
last warning.

While the change cited in Fixes is only adjacent, the warning limit
implementation was correct before it.  Only non-atomic allocations were
considered for warnings, and those happened to hold pcpu_alloc_mutex while
accessing warn_limit.

[vdumitrescu@nvidia.com: prevent warn_limit from going negative, per Christoph Lameter]
Link: https://lkml.kernel.org/r/ee87cc59-2717-4dbb-8052-1d2692c5aaaa@nvidia.com
Link: https://lkml.kernel.org/r/ab22061a-a62f-4429-945b-744e5cc4ba35@nvidia.com
Fixes: f7d77dfc91f7 ("mm/percpu.c: print error message too if atomic alloc failed")
Signed-off-by: Vlad Dumitrescu <vdumitrescu@nvidia.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Cc: Christoph Lameter (Ampere) <cl@gentwo.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mm/percpu.c

index a56f35dcc417e660ac998591a72c54b8bee9ba12..81462ce5866e1665a1ab13a86584610c355890b3 100644 (file)
@@ -1734,7 +1734,7 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t align, bool reserved,
        bool is_atomic;
        bool do_warn;
        struct obj_cgroup *objcg = NULL;
-       static int warn_limit = 10;
+       static atomic_t warn_limit = ATOMIC_INIT(10);
        struct pcpu_chunk *chunk, *next;
        const char *err;
        int slot, off, cpu, ret;
@@ -1904,13 +1904,17 @@ fail_unlock:
 fail:
        trace_percpu_alloc_percpu_fail(reserved, is_atomic, size, align);
 
-       if (do_warn && warn_limit) {
-               pr_warn("allocation failed, size=%zu align=%zu atomic=%d, %s\n",
-                       size, align, is_atomic, err);
-               if (!is_atomic)
-                       dump_stack();
-               if (!--warn_limit)
-                       pr_info("limit reached, disable warning\n");
+       if (do_warn) {
+               int remaining = atomic_dec_if_positive(&warn_limit);
+
+               if (remaining >= 0) {
+                       pr_warn("allocation failed, size=%zu align=%zu atomic=%d, %s\n",
+                               size, align, is_atomic, err);
+                       if (!is_atomic)
+                               dump_stack();
+                       if (remaining == 0)
+                               pr_info("limit reached, disable warning\n");
+               }
        }
 
        if (is_atomic) {