net: rfs: add sock_rps_delete_flow() helper
RFS can exhibit lower performance for workloads using short-lived
flows and a small set of 4-tuple.
This is often the case for load-testers, using a pair of hosts,
if the server has a single listener port.
Typical use case :
Server : tcp_crr -T128 -F1000 -6 -U -l30 -R 14250
Client : tcp_crr -T128 -F1000 -6 -U -l30 -c -H server | grep local_throughput
This is because RFS global hash table contains stale information,
when the same RSS key is recycled for another socket and another cpu.
Make sure to undo the changes and go back to initial state when
a flow is disconnected.
Performance of the above test is increased by 22 %,
going from 372604 transactions per second to 457773.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Octavian Purdila <tavip@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20250515100354.3339920-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>