io_uring/net: allow coalescing of mapped segments
For bundles, when multiple buffers are selected, it's not unlikely
that some/all of them will be virtually contigious. If these segments
aren't big, then nice wins can be reaped by coalescing them into
bigger segments. This makes networking copies more efficient, and
reduces the number of iterations that need to be done over an iovec.
Ideally, multiple segments that would've been mapped as an ITER_IOVEC
before can now be mapped into a single ITER_UBUF iterator.
Example from an io_uring network backend receiving data, with various
transfer sizes, over a 100G network link.
recv size coalesce threads bw cpu usage bw diff
=====================================================================
64 0 1 23GB/sec 100%
64 1 1 46GB/sec 79% +100%
64 0 4 81GB/sec 370%
64 1 4 96GB/sec 160% + 20%
256 0 1 44GB/sec 90%
256 1 1 47GB/sec 48% + 7%
256 0 4 90GB/sec 190%
256 1 4 96GB/sec 120% + 7%
1024 0 1 49GB/sec 60%
1024 1 1 50GB/sec 53% + 2%
1024 0 4 94GB/sec 140%
1024 1 4 96GB/sec 120% + 2%
where obviously small buffer sizes benefit the most, but where an
efficiency gain is seen even at higher buffer sizes as well.
Signed-off-by: Jens Axboe <axboe@kernel.dk>