selftests/bpf: Extend uprobe/uretprobe triggering benchmarks
authorAndrii Nakryiko <andrii@kernel.org>
Fri, 1 Mar 2024 21:45:51 +0000 (13:45 -0800)
committerDaniel Borkmann <daniel@iogearbox.net>
Mon, 4 Mar 2024 13:40:24 +0000 (14:40 +0100)
commit8f79870ec8a9409983ad5981e1b7d599cbf047bd
tree70171da4dbc7892c20cf522a471ae8fe8baa156f
parent25703adf45f8430ec59effa20920c80139d13cdc
selftests/bpf: Extend uprobe/uretprobe triggering benchmarks

Settle on three "flavors" of uprobe/uretprobe, installed on different
kinds of instruction: nop, push, and ret. All three are testing
different internal code paths emulating or single-stepping instructions,
so are interesting to compare and benchmark separately.

To ensure `push rbp` instruction we ensure that uprobe_target_push() is
not a leaf function by calling (global __weak) noop function and
returning something afterwards (if we don't do that, compiler will just
do a tail call optimization).

Also, we need to make sure that compiler isn't skipping frame pointer
generation, so let's add `-fno-omit-frame-pointers` to Makefile.

Just to give an idea of where we currently stand in terms of relative
performance of different uprobe/uretprobe cases vs a cheap syscall
(getpgid()) baseline, here are results from my local machine:

$ benchs/run_bench_uprobes.sh
base           :    1.561 ± 0.020M/s
uprobe-nop     :    0.947 ± 0.007M/s
uprobe-push    :    0.951 ± 0.004M/s
uprobe-ret     :    0.443 ± 0.007M/s
uretprobe-nop  :    0.471 ± 0.013M/s
uretprobe-push :    0.483 ± 0.004M/s
uretprobe-ret  :    0.306 ± 0.007M/s

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240301214551.1686095-1-andrii@kernel.org
tools/testing/selftests/bpf/Makefile
tools/testing/selftests/bpf/bench.c
tools/testing/selftests/bpf/benchs/bench_trigger.c
tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh [new file with mode: 0755]