ath11k: optimize RX path latency
authorJohn Crispin <john@phrozen.org>
Wed, 27 Nov 2019 16:29:56 +0000 (18:29 +0200)
committerKalle Valo <kvalo@codeaurora.org>
Fri, 29 Nov 2019 07:35:26 +0000 (09:35 +0200)
commit293cb5839729b186f951a82289aa5e0257c6d1b8
tree5fe52887e7ed8322719b1a1c916987f93359aaa1
parent0f37fbf43c3fc39a6c49c5a27846df937d217356
ath11k: optimize RX path latency

This patch drops ath11k_hal_rx_parse_dst_ring_desc(). This function was
creating a huge amount of load, which lead to a signifcant latency delay
when processing data in the RX path.

Pegging the processing on a specific core and running perf --top we get
the following output when running HE80 at a fixed bandwidth of 1gbit.

with patch
    19.19%  [ath11k]       [k] ath11k_dp_process_rx
     5.02%  [ath11k]       [k] ath11k_dp_rx_tid_del_func
     4.39%  [kernel]       [k] v7_dma_inv_range
     4.15%  [kernel]       [k] __slab_alloc.constprop.1
     4.03%  [kernel]       [k] dev_gro_receive
     3.86%  [kernel]       [k] tcp_gro_receive
     3.07%  [ip_tables]    [k] ipt_do_table
     2.96%  [kernel]       [k] dma_cache_maint_page

without patch
    21.64%  [ath11k]       [k] ath11k_hal_rx_parse_dst_ring_desc
    10.80%  [ath11k]       [k] ath11k_dp_process_rx
     3.77%  [kernel]       [k] v7_dma_inv_range
     3.48%  [kernel]       [k] dev_gro_receive
     3.32%  [ath11k]       [k] ath11k_dp_rx_tid_del_func
     3.17%  [mac80211]     [k] ieee80211_rx_napi
     2.70%  [kernel]       [k] dma_cache_maint_page
     2.65%  [mac80211]     [k] ieee80211_sta_ps_transition

When removing the the bandwidth limit and rerunning the test we see an
overall throughput improvement of 3-400mbit when running 4x4 HE80.

Signed-off-by: Shashidhar Lakkavalli <slakkavalli@datto.com>
Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
drivers/net/wireless/ath/ath11k/dp_rx.c
drivers/net/wireless/ath/ath11k/hal_rx.c
drivers/net/wireless/ath/ath11k/hal_rx.h