tcp: always use tcp_limit_output_bytes limitation
authorEric Dumazet <edumazet@google.com>
Tue, 13 May 2025 19:39:18 +0000 (19:39 +0000)
committerJakub Kicinski <kuba@kernel.org>
Thu, 15 May 2025 18:30:09 +0000 (11:30 -0700)
This partially reverts commit c73e5807e4f6 ("tcp: tsq: no longer use
limit_output_bytes for paced flows")

Overriding the tcp_limit_output_bytes sysctl value
for FQ enabled flows has the following problem:

It allows TCP to queue around 2 ms worth of data per flow,
defeating tcp_rcv_rtt_update() accuracy on the receiver,
forcing it to increase sk->sk_rcvbuf even if the real
RTT is around 100 us.

After this change, we keep enough packets in flight to fill
the pipe, and let receive queues small enough to get
good cache behavior (cpu caches and/or NIC driver page pools).

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250513193919.1089692-11-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net/ipv4/tcp_output.c

index 13295a59d22e65305d8c4094313e4aa37306cbff..3ac8d2d17e1ff42aaeb9adf0a9e0c99c13d141a8 100644 (file)
@@ -2619,9 +2619,8 @@ static bool tcp_small_queue_check(struct sock *sk, const struct sk_buff *skb,
        limit = max_t(unsigned long,
                      2 * skb->truesize,
                      READ_ONCE(sk->sk_pacing_rate) >> READ_ONCE(sk->sk_pacing_shift));
-       if (sk->sk_pacing_status == SK_PACING_NONE)
-               limit = min_t(unsigned long, limit,
-                             READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_limit_output_bytes));
+       limit = min_t(unsigned long, limit,
+                     READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_limit_output_bytes));
        limit <<= factor;
 
        if (static_branch_unlikely(&tcp_tx_delay_enabled) &&