bpf: Reuse freed element in free_by_rcu during allocation
authorHou Tao <houtao1@huawei.com>
Fri, 9 Dec 2022 01:09:46 +0000 (09:09 +0800)
committerAlexei Starovoitov <ast@kernel.org>
Fri, 9 Dec 2022 01:50:17 +0000 (17:50 -0800)
When there are batched freeing operations on a specific CPU, part of
the freed elements ((high_watermark - lower_watermark) / 2 + 1) will be
indirectly moved into waiting_for_gp list through free_by_rcu list.
After call_rcu_in_progress becomes false again, the remaining elements
in free_by_rcu list will be moved to waiting_for_gp list by the next
invocation of free_bulk(). However if the expiration of RCU tasks trace
grace period is relatively slow, none element in free_by_rcu list will
be moved.

So instead of invoking __alloc_percpu_gfp() or kmalloc_node() to
allocate a new object, in alloc_bulk() just check whether or not there is
freed element in free_by_rcu list and reuse it if available.

Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20221209010947.3130477-2-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
kernel/bpf/memalloc.c

index 8f0d65f2474ad9fa698313ed472158cd445775c3..04d96d1b98a3dd53f1f0bc23038d71a9ca76e034 100644 (file)
@@ -171,9 +171,24 @@ static void alloc_bulk(struct bpf_mem_cache *c, int cnt, int node)
        memcg = get_memcg(c);
        old_memcg = set_active_memcg(memcg);
        for (i = 0; i < cnt; i++) {
-               obj = __alloc(c, node);
-               if (!obj)
-                       break;
+               /*
+                * free_by_rcu is only manipulated by irq work refill_work().
+                * IRQ works on the same CPU are called sequentially, so it is
+                * safe to use __llist_del_first() here. If alloc_bulk() is
+                * invoked by the initial prefill, there will be no running
+                * refill_work(), so __llist_del_first() is fine as well.
+                *
+                * In most cases, objects on free_by_rcu are from the same CPU.
+                * If some objects come from other CPUs, it doesn't incur any
+                * harm because NUMA_NO_NODE means the preference for current
+                * numa node and it is not a guarantee.
+                */
+               obj = __llist_del_first(&c->free_by_rcu);
+               if (!obj) {
+                       obj = __alloc(c, node);
+                       if (!obj)
+                               break;
+               }
                if (IS_ENABLED(CONFIG_PREEMPT_RT))
                        /* In RT irq_work runs in per-cpu kthread, so disable
                         * interrupts to avoid preemption and interrupts and