linux-block.git
21 months agowifi: mac80211: mlme: Fix missing unlock on beacon RX
Rafael Mendonca [Sat, 24 Sep 2022 18:40:41 +0000 (15:40 -0300)]
wifi: mac80211: mlme: Fix missing unlock on beacon RX

Commit 98b0b467466c ("wifi: mac80211: mlme: use correct link_sta")
switched to link station instead of deflink and added some checks to do
that, which are done with the 'sta_mtx' mutex held. However, the error
path of these checks does not unlock 'sta_mtx' before returning.

Fixes: 98b0b467466c ("wifi: mac80211: mlme: use correct link_sta")
Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Link: https://lore.kernel.org/r/20220924184042.778676-1-rafaelmendsr@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
21 months agowifi: mac80211: fix memory corruption in minstrel_ht_update_rates()
Paweł Lenkow [Mon, 19 Sep 2022 15:01:35 +0000 (17:01 +0200)]
wifi: mac80211: fix memory corruption in minstrel_ht_update_rates()

During our testing of WFM200 module over SDIO on i.MX6Q-based platform,
we discovered a memory corruption on the system, tracing back to the wfx
driver. Using kfence, it was possible to trace it back to the root
cause, which is hw->max_rates set to 8 in wfx_init_common,
while the maximum defined by IEEE80211_TX_TABLE_SIZE is 4.

This causes array out-of-bounds writes during updates of the rate table,
as seen below:

BUG: KFENCE: memory corruption in kfree_rcu_work+0x320/0x36c

Corrupted memory at 0xe0a4ffe0 [ 0x03 0x03 0x03 0x03 0x01 0x00 0x00
0x02 0x02 0x02 0x09 0x00 0x21 0xbb 0xbb 0xbb ] (in kfence-#81):
kfree_rcu_work+0x320/0x36c
process_one_work+0x3ec/0x920
worker_thread+0x60/0x7a4
kthread+0x174/0x1b4
ret_from_fork+0x14/0x2c
0x0

kfence-#81: 0xe0a4ffc0-0xe0a4ffdf, size=32, cache=kmalloc-64

allocated by task 297 on cpu 0 at 631.039555s:
minstrel_ht_update_rates+0x38/0x2b0 [mac80211]
rate_control_tx_status+0xb4/0x148 [mac80211]
ieee80211_tx_status_ext+0x364/0x1030 [mac80211]
ieee80211_tx_status+0xe0/0x118 [mac80211]
ieee80211_tasklet_handler+0xb0/0xe0 [mac80211]
tasklet_action_common.constprop.0+0x11c/0x148
__do_softirq+0x1a4/0x61c
irq_exit+0xcc/0x104
call_with_stack+0x18/0x20
__irq_svc+0x80/0xb0
wq_worker_sleeping+0x10/0x100
wq_worker_sleeping+0x10/0x100
schedule+0x50/0xe0
schedule_timeout+0x2e0/0x474
wait_for_completion+0xdc/0x1ec
mmc_wait_for_req_done+0xc4/0xf8
mmc_io_rw_extended+0x3b4/0x4ec
sdio_io_rw_ext_helper+0x290/0x384
sdio_memcpy_toio+0x30/0x38
wfx_sdio_copy_to_io+0x88/0x108 [wfx]
wfx_data_write+0x88/0x1f0 [wfx]
bh_work+0x1c8/0xcc0 [wfx]
process_one_work+0x3ec/0x920
worker_thread+0x60/0x7a4
kthread+0x174/0x1b4
ret_from_fork+0x14/0x2c 0x0

After discussion on the wireless mailing list it was clarified
that the issue has been introduced by:
commit ee0e16ab756a ("mac80211: minstrel_ht: fill all requested rates")
and fix shall be in minstrel_ht_update_rates in rc80211_minstrel_ht.c.

Fixes: ee0e16ab756a ("mac80211: minstrel_ht: fill all requested rates")
Link: https://lore.kernel.org/all/12e5adcd-8aed-f0f7-70cc-4fb7b656b829@camlingroup.com/
Link: https://lore.kernel.org/linux-wireless/20220915131445.30600-1-lech.perczak@camlingroup.com/
Cc: Jérôme Pouiller <jerome.pouiller@silabs.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Peter Seiderer <ps.report@gmx.net>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Krzysztof Drobiński <krzysztof.drobinski@camlingroup.com>,
Signed-off-by: Paweł Lenkow <pawel.lenkow@camlingroup.com>
Signed-off-by: Lech Perczak <lech.perczak@camlingroup.com>
Reviewed-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Jérôme Pouiller <jerome.pouiller@silabs.com>
Acked-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
21 months agowifi: mac80211: fix regression with non-QoS drivers
Hans de Goede [Sun, 18 Sep 2022 19:20:52 +0000 (21:20 +0200)]
wifi: mac80211: fix regression with non-QoS drivers

Commit 10cb8e617560 ("mac80211: enable QoS support for nl80211 ctrl port")
changed ieee80211_tx_control_port() to aways call
__ieee80211_select_queue() without checking local->hw.queues.

__ieee80211_select_queue() returns a queue-id between 0 and 3, which means
that now ieee80211_tx_control_port() may end up setting the queue-mapping
for a skb to a value higher then local->hw.queues if local->hw.queues
is less then 4.

Specifically this is a problem for ralink rt2500-pci cards where
local->hw.queues is 2. There this causes rt2x00queue_get_tx_queue() to
return NULL and the following error to be logged: "ieee80211 phy0:
rt2x00mac_tx: Error - Attempt to send packet over invalid queue 2",
after which association with the AP fails.

Other callers of __ieee80211_select_queue() skip calling it when
local->hw.queues < IEEE80211_NUM_ACS, add the same check to
ieee80211_tx_control_port(). This fixes ralink rt2500-pci and
similar cards when less then 4 tx-queues no longer working.

Fixes: 10cb8e617560 ("mac80211: enable QoS support for nl80211 ctrl port")
Cc: Markus Theil <markus.theil@tu-ilmenau.de>
Suggested-by: Stanislaw Gruszka <stf_xl@wp.pl>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20220918192052.443529-1-hdegoede@redhat.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
21 months agowifi: mac80211: ensure vif queues are operational after start
Alexander Wetzel [Thu, 15 Sep 2022 13:09:46 +0000 (15:09 +0200)]
wifi: mac80211: ensure vif queues are operational after start

Make sure local->queue_stop_reasons and vif.txqs_stopped stay in sync.

When a new vif is created the queues may end up in an inconsistent state
and be inoperable:
Communication not using iTXQ will work, allowing to e.g. complete the
association. But the 4-way handshake will time out. The sta will not
send out any skbs queued in iTXQs.

All normal attempts to start the queues will fail when reaching this
state.
local->queue_stop_reasons will have marked all queues as operational but
vif.txqs_stopped will still be set, creating an inconsistent internal
state.

In reality this seems to be race between the mac80211 function
ieee80211_do_open() setting SDATA_STATE_RUNNING and the wake_txqs_tasklet:
Depending on the driver and the timing the queues may end up to be
operational or not.

Cc: stable@vger.kernel.org
Fixes: f856373e2f31 ("wifi: mac80211: do not wake queues on a vif that is being stopped")
Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
Acked-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20220915130946.302803-1-alexander@wetzel-home.de
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
21 months agowifi: mac80211: don't start TX with fq->lock to fix deadlock
Alexander Wetzel [Thu, 15 Sep 2022 12:41:20 +0000 (14:41 +0200)]
wifi: mac80211: don't start TX with fq->lock to fix deadlock

ieee80211_txq_purge() calls fq_tin_reset() and
ieee80211_purge_tx_queue(); Both are then calling
ieee80211_free_txskb(). Which can decide to TX the skb again.

There are at least two ways to get a deadlock:

1) When we have a TDLS teardown packet queued in either tin or frags
   ieee80211_tdls_td_tx_handle() will call ieee80211_subif_start_xmit()
   while we still hold fq->lock. ieee80211_txq_enqueue() will thus
   deadlock.

2) A variant of the above happens if aggregation is up and running:
   In that case ieee80211_iface_work() will deadlock with the original
   task: The original tasks already holds fq->lock and tries to get
   sta->lock after kicking off ieee80211_iface_work(). But the worker
   can get sta->lock prior to the original task and will then spin for
   fq->lock.

Avoid these deadlocks by not sending out any skbs when called via
ieee80211_free_txskb().

Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
Link: https://lore.kernel.org/r/20220915124120.301918-1-alexander@wetzel-home.de
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
21 months agowifi: cfg80211: fix MCS divisor value
Tamizh Chelvam Raja [Thu, 8 Sep 2022 18:10:34 +0000 (23:40 +0530)]
wifi: cfg80211: fix MCS divisor value

The Bitrate for HE/EHT MCS6 is calculated wrongly due to the
incorrect MCS divisor value for mcs6. Fix it with the proper
value.

previous mcs_divisor value = (11769/6144) = 1.915527

fixed mcs_divisor value = (11377/6144) = 1.851725

Fixes: 9c97c88d2f4b ("cfg80211: Add support to calculate and report 4096-QAM HE rates")
Signed-off-by: Tamizh Chelvam Raja <quic_tamizhr@quicinc.com>
Link: https://lore.kernel.org/r/20220908181034.9936-1-quic_tamizhr@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
21 months agowifi: mt76: fix 5 GHz connection regression on mt76x0/mt76x2
Felix Fietkau [Wed, 7 Sep 2022 09:52:28 +0000 (11:52 +0200)]
wifi: mt76: fix 5 GHz connection regression on mt76x0/mt76x2

Some users have reported being unable to connect to MT76x0 APs running mt76
after a commit enabling the VHT extneded NSS BW feature.
Fix this regression by ensuring that this feature only gets enabled on drivers
that support it

Cc: stable@vger.kernel.org
Fixes: d9fcfc1424aa ("mt76: enable the VHT extended NSS BW feature")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220907095228.82072-1-nbd@nbd.name
21 months agowifi: mt76: fix reading current per-tid starting sequence number for aggregation
Felix Fietkau [Fri, 26 Aug 2022 18:23:29 +0000 (20:23 +0200)]
wifi: mt76: fix reading current per-tid starting sequence number for aggregation

The code was accidentally shifting register values down by tid % 32 instead of
(tid * field_size) % 32.

Cc: stable@vger.kernel.org
Fixes: a28bef561a5c ("mt76: mt7615: re-enable offloading of sequence number assignment")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220826182329.18155-1-nbd@nbd.name
21 months agowifi: iwlwifi: Mark IWLMEI as broken
Toke Høiland-Jørgensen [Wed, 7 Sep 2022 13:44:50 +0000 (15:44 +0200)]
wifi: iwlwifi: Mark IWLMEI as broken

The iwlmei driver breaks iwlwifi when returning from suspend. The interface
ends up in the 'down' state after coming back from suspend. And iwd doesn't
touch the interface state, but wpa_supplicant does, so the bug only happens on
iwd.

The bug report[0] has been open for four months now, and no fix seems to be
forthcoming. Since just disabling the iwlmei driver works as a workaround,
let's mark the config option as broken until it can be fixed properly.

[0] https://bugzilla.kernel.org/show_bug.cgi?id=215937

Fixes: 2da4366f9e2c ("iwlwifi: mei: add the driver to allow cooperation with CSME")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220907134450.1183045-1-toke@toke.dk
21 months agowifi: iwlwifi: don't spam logs with NSS>2 messages
Jason A. Donenfeld [Mon, 5 Sep 2022 17:22:46 +0000 (19:22 +0200)]
wifi: iwlwifi: don't spam logs with NSS>2 messages

I get a log line like this every 4 seconds when connected to my AP:

[15650.221468] iwlwifi 0000:09:00.0: Got NSS = 4 - trimming to 2

Looking at the code, this seems to be related to a hardware limitation,
and there's nothing to be done. In an effort to keep my dmesg
manageable, downgrade this error to "debug" rather than "info".

Cc: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220905172246.105383-1-Jason@zx2c4.com
21 months agowifi: use struct_group to copy addresses
Johannes Berg [Mon, 29 Aug 2022 09:46:38 +0000 (11:46 +0200)]
wifi: use struct_group to copy addresses

We sometimes copy all the addresses from the 802.11 header
for the AAD, which may cause complaints from fortify checks.
Use struct_group() to avoid the compiler warnings/errors.

Change-Id: Ic3ea389105e7813b22095b295079eecdabde5045
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
21 months agowifi: mac80211_hwsim: check length for virtio packets
Soenke Huster [Fri, 2 Sep 2022 08:19:58 +0000 (10:19 +0200)]
wifi: mac80211_hwsim: check length for virtio packets

An invalid packet with a length shorter than the specified length in the
netlink header can lead to use-after-frees and slab-out-of-bounds in the
processing of the netlink attributes, such as the following:

  BUG: KASAN: slab-out-of-bounds in __nla_validate_parse+0x1258/0x2010
  Read of size 2 at addr ffff88800ac7952c by task kworker/0:1/12

  Workqueue: events hwsim_virtio_rx_work
  Call Trace:
   <TASK>
   dump_stack_lvl+0x45/0x5d
   print_report.cold+0x5e/0x5e5
   kasan_report+0xb1/0x1c0
   __nla_validate_parse+0x1258/0x2010
   __nla_parse+0x22/0x30
   hwsim_virtio_handle_cmd.isra.0+0x13f/0x2d0
   hwsim_virtio_rx_work+0x1b2/0x370
   process_one_work+0x8df/0x1530
   worker_thread+0x575/0x11a0
   kthread+0x29d/0x340
   ret_from_fork+0x22/0x30
 </TASK>

Discarding packets with an invalid length solves this.
Therefore, skb->len must be set at reception.

Change-Id: Ieaeb9a4c62d3beede274881a7c2722c6c6f477b6
Signed-off-by: Soenke Huster <soenke.huster@eknoes.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
21 months agowifi: mac80211: fix locking in auth/assoc timeout
Johannes Berg [Fri, 2 Sep 2022 14:11:14 +0000 (16:11 +0200)]
wifi: mac80211: fix locking in auth/assoc timeout

If we hit an authentication or association timeout, we only
release the chanctx for the deflink, and the other link(s)
are released later by ieee80211_vif_set_links(), but we're
not locking this correctly.

Fix the locking here while releasing the channels and links.

Change-Id: I9e08c1a5434592bdc75253c1abfa6c788f9f39b1
Fixes: 81151ce462e5 ("wifi: mac80211: support MLO authentication/association with one link")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
21 months agowifi: mac80211: mlme: release deflink channel in error case
Johannes Berg [Fri, 2 Sep 2022 14:11:15 +0000 (16:11 +0200)]
wifi: mac80211: mlme: release deflink channel in error case

In the prep_channel error case we didn't release the deflink
channel leaving it to be left around. Fix that.

Change-Id: If0dfd748125ec46a31fc6045a480dc28e03723d2
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
21 months agowifi: mac80211: fix link warning in RX agg timer expiry
Mukesh Sisodiya [Fri, 2 Sep 2022 14:11:31 +0000 (16:11 +0200)]
wifi: mac80211: fix link warning in RX agg timer expiry

The rx data link pointer isn't set from the RX aggregation timer,
resulting in a later warning. Fix that by setting it to the first
valid link for now, with a FIXME to worry about statistics later,
it's not very important since it's just the timeout case.

Reported-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/498d714c-76be-9d04-26db-a1206878de5e@redhat.com
Fixes: 56057da4569b ("wifi: mac80211: rx: track link in RX data")
Signed-off-by: Mukesh Sisodiya <mukesh.sisodiya@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: iwlegacy: 4965: corrected fix for potential off-by-one overflow in il4965_rs_fi...
Stanislaw Gruszka [Mon, 15 Aug 2022 07:37:37 +0000 (09:37 +0200)]
wifi: iwlegacy: 4965: corrected fix for potential off-by-one overflow in il4965_rs_fill_link_cmd()

This reverts commit a8eb8e6f7159c7c20c0ddac428bde3d110890aa7 as
it can cause invalid link quality command sent to the firmware
and address the off-by-one issue by fixing condition of while loop.

Cc: stable@vger.kernel.org
Fixes: a8eb8e6f7159 ("wifi: iwlegacy: 4965: fix potential off-by-one overflow in il4965_rs_fill_link_cmd()")
Signed-off-by: Stanislaw Gruszka <stf_xl@wp.pl>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220815073737.GA999388@wp.pl
22 months agowifi: wilc1000: fix DMA on stack objects
Ajay.Kathat@microchip.com [Tue, 9 Aug 2022 07:57:56 +0000 (07:57 +0000)]
wifi: wilc1000: fix DMA on stack objects

Sometimes 'wilc_sdio_cmd53' is called with addresses pointing to an
object on the stack. Use dynamically allocated memory for cmd53 instead
of stack address which is not DMA'able.

Fixes: 5625f965d764 ("wilc1000: move wilc driver out of staging")
Reported-by: Michael Walle <mwalle@kernel.org>
Suggested-by: Michael Walle <mwalle@kernel.org>
Signed-off-by: Ajay Singh <ajay.kathat@microchip.com>
Reviewed-by: Michael Walle <mwalle@kernel.org>
Tested-by: Michael Walle <mwalle@kernel.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220809075749.62752-1-ajay.kathat@microchip.com
22 months agowifi: mt76: mt7921e: fix crash in chip reset fail
Deren Wu [Tue, 2 Aug 2022 15:15:07 +0000 (23:15 +0800)]
wifi: mt76: mt7921e: fix crash in chip reset fail

In case of drv own fail in reset, we may need to run mac_reset several
times. The sequence would trigger system crash as the log below.

Because we do not re-enable/schedule "tx_napi" before disable it again,
the process would keep waiting for state change in napi_diable(). To
avoid the problem and keep status synchronize for each run, goto final
resource handling if drv own failed.

[ 5857.353423] mt7921e 0000:3b:00.0: driver own failed
[ 5858.433427] mt7921e 0000:3b:00.0: Timeout for driver own
[ 5859.633430] mt7921e 0000:3b:00.0: driver own failed
[ 5859.633444] ------------[ cut here ]------------
[ 5859.633446] WARNING: CPU: 6 at kernel/kthread.c:659 kthread_park+0x11d
[ 5859.633717] Workqueue: mt76 mt7921_mac_reset_work [mt7921_common]
[ 5859.633728] RIP: 0010:kthread_park+0x11d/0x150
[ 5859.633736] RSP: 0018:ffff8881b676fc68 EFLAGS: 00010202
......
[ 5859.633766] Call Trace:
[ 5859.633768]  <TASK>
[ 5859.633771]  mt7921e_mac_reset+0x176/0x6f0 [mt7921e]
[ 5859.633778]  mt7921_mac_reset_work+0x184/0x3a0 [mt7921_common]
[ 5859.633785]  ? mt7921_mac_set_timing+0x520/0x520 [mt7921_common]
[ 5859.633794]  ? __kasan_check_read+0x11/0x20
[ 5859.633802]  process_one_work+0x7ee/0x1320
[ 5859.633810]  worker_thread+0x53c/0x1240
[ 5859.633818]  kthread+0x2b8/0x370
[ 5859.633824]  ? process_one_work+0x1320/0x1320
[ 5859.633828]  ? kthread_complete_and_exit+0x30/0x30
[ 5859.633834]  ret_from_fork+0x1f/0x30
[ 5859.633842]  </TASK>

Cc: stable@vger.kernel.org
Fixes: 0efaf31dec57 ("mt76: mt7921: fix MT7921E reset failure")
Signed-off-by: Deren Wu <deren.wu@mediatek.com>
Link: https://lore.kernel.org/r/727eb5ffd3c7c805245e512da150ecf0a7154020.1659452909.git.deren.wu@mediatek.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agoMerge tag 'wireless-2022-08-26' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Fri, 26 Aug 2022 10:43:20 +0000 (11:43 +0100)]
Merge tag 'wireless-2022-08-26' of git://git./linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
pull-request: wireless-2022-08-26

Here are a couple of fixes for the current cycle,
see the tag description below.

Just a couple of fixes:
 * two potential leaks
 * use-after-free in certain scan races
 * warning in IBSS code
 * error return from a debugfs file was wrong
 * possible NULL-ptr-deref when station lookup fails

Please pull and let me know if there's any problem.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoMerge tag 'net-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 25 Aug 2022 21:03:58 +0000 (14:03 -0700)]
Merge tag 'net-6.0-rc3' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from ipsec and netfilter (with one broken Fixes tag).

  Current release - new code bugs:

   - dsa: don't dereference NULL extack in dsa_slave_changeupper()

   - dpaa: fix <1G ethernet on LS1046ARDB

   - neigh: don't call kfree_skb() under spin_lock_irqsave()

  Previous releases - regressions:

   - r8152: fix the RX FIFO settings when suspending

   - dsa: microchip: keep compatibility with device tree blobs with no
     phy-mode

   - Revert "net: macsec: update SCI upon MAC address change."

   - Revert "xfrm: update SA curlft.use_time", comply with RFC 2367

  Previous releases - always broken:

   - netfilter: conntrack: work around exceeded TCP receive window

   - ipsec: fix a null pointer dereference of dst->dev on a metadata dst
     in xfrm_lookup_with_ifid

   - moxa: get rid of asymmetry in DMA mapping/unmapping

   - dsa: microchip: make learning configurable and keep it off while
     standalone

   - ice: xsk: prohibit usage of non-balanced queue id

   - rxrpc: fix locking in rxrpc's sendmsg

  Misc:

   - another chunk of sysctl data race silencing"

* tag 'net-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
  net: lantiq_xrx200: restore buffer if memory allocation failed
  net: lantiq_xrx200: fix lock under memory pressure
  net: lantiq_xrx200: confirm skb is allocated before using
  net: stmmac: work around sporadic tx issue on link-up
  ionic: VF initial random MAC address if no assigned mac
  ionic: fix up issues with handling EAGAIN on FW cmds
  ionic: clear broken state on generation change
  rxrpc: Fix locking in rxrpc's sendmsg
  net: ethernet: mtk_eth_soc: fix hw hash reporting for MTK_NETSYS_V2
  MAINTAINERS: rectify file entry in BONDING DRIVER
  i40e: Fix incorrect address type for IPv6 flow rules
  ixgbe: stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter
  net: Fix a data-race around sysctl_somaxconn.
  net: Fix a data-race around netdev_unregister_timeout_secs.
  net: Fix a data-race around gro_normal_batch.
  net: Fix data-races around sysctl_devconf_inherit_init_net.
  net: Fix data-races around sysctl_fb_tunnels_only_for_init_net.
  net: Fix a data-race around netdev_budget_usecs.
  net: Fix data-races around sysctl_max_skb_frags.
  net: Fix a data-race around netdev_budget.
  ...

22 months agoMerge branch 'net-lantiq_xrx200-fix-errors-under-memory-pressure'
Jakub Kicinski [Thu, 25 Aug 2022 19:41:41 +0000 (12:41 -0700)]
Merge branch 'net-lantiq_xrx200-fix-errors-under-memory-pressure'

Aleksander Jan Bajkowski says:

====================
net: lantiq_xrx200: fix errors under memory pressure

This series fixes issues that can occur in the driver under memory pressure.
Situations when the system cannot allocate memory are rare, so the mentioned
bugs have been fixed recently. The patches have been tested on a BT Home
router with the Lantiq xRX200 chipset.

Changelog:
  v3: - removed netdev_err() log from the first patch
  v2:
   - the second patch has been changed, so that under memory pressure situation
     the driver will not receive packets indefinitely regardless of the NAPI budget,
   - the third patch has been added.
====================

Link: https://lore.kernel.org/r/20220824215408.4695-1-olek2@wp.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: lantiq_xrx200: restore buffer if memory allocation failed
Aleksander Jan Bajkowski [Wed, 24 Aug 2022 21:54:08 +0000 (23:54 +0200)]
net: lantiq_xrx200: restore buffer if memory allocation failed

In a situation where memory allocation fails, an invalid buffer address
is stored. When this descriptor is used again, the system panics in the
build_skb() function when accessing memory.

Fixes: 7ea6cd16f159 ("lantiq: net: fix duplicated skb in rx descriptor ring")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: lantiq_xrx200: fix lock under memory pressure
Aleksander Jan Bajkowski [Wed, 24 Aug 2022 21:54:07 +0000 (23:54 +0200)]
net: lantiq_xrx200: fix lock under memory pressure

When the xrx200_hw_receive() function returns -ENOMEM, the NAPI poll
function immediately returns an error.
This is incorrect for two reasons:
* the function terminates without enabling interrupts or scheduling NAPI,
* the error code (-ENOMEM) is returned instead of the number of received
packets.

After the first memory allocation failure occurs, packet reception is
locked due to disabled interrupts from DMA..

Fixes: fe1a56420cf2 ("net: lantiq: Add Lantiq / Intel VRX200 Ethernet driver")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: lantiq_xrx200: confirm skb is allocated before using
Aleksander Jan Bajkowski [Wed, 24 Aug 2022 21:54:06 +0000 (23:54 +0200)]
net: lantiq_xrx200: confirm skb is allocated before using

xrx200_hw_receive() assumes build_skb() always works and goes straight
to skb_reserve(). However, build_skb() can fail under memory pressure.

Add a check in case build_skb() failed to allocate and return NULL.

Fixes: e015593573b3 ("net: lantiq_xrx200: convert to build_skb")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: stmmac: work around sporadic tx issue on link-up
Heiner Kallweit [Wed, 24 Aug 2022 20:34:49 +0000 (22:34 +0200)]
net: stmmac: work around sporadic tx issue on link-up

This is a follow-up to the discussion in [0]. It seems to me that
at least the IP version used on Amlogic SoC's sometimes has a problem
if register MAC_CTRL_REG is written whilst the chip is still processing
a previous write. But that's just a guess.
Adding a delay between two writes to this register helps, but we can
also simply omit the offending second write. This patch uses the second
approach and is based on a suggestion from Qi Duan.
Benefit of this approach is that we can save few register writes, also
on not affected chip versions.

[0] https://www.spinics.net/lists/netdev/msg831526.html

Fixes: bfab27a146ed ("stmmac: add the experimental PCI support")
Suggested-by: Qi Duan <qi.duan@amlogic.com>
Suggested-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/e99857ce-bd90-5093-ca8c-8cd480b5a0a2@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoMerge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Jakub Kicinski [Thu, 25 Aug 2022 19:40:29 +0000 (12:40 -0700)]
Merge branch '10GbE' of git://git./linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-08-24 (ixgbe, i40e)

This series contains updates to ixgbe and i40e drivers.

Jake stops incorrect resetting of SYSTIME registers when starting
cyclecounter for ixgbe.

Sylwester corrects a check on source IP address when validating destination
for i40e.

* '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  i40e: Fix incorrect address type for IPv6 flow rules
  ixgbe: stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter
====================

Link: https://lore.kernel.org/r/20220824193748.874343-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoMerge branch 'ionic-bug-fixes'
Jakub Kicinski [Thu, 25 Aug 2022 19:40:17 +0000 (12:40 -0700)]
Merge branch 'ionic-bug-fixes'

Shannon Nelson says:

====================
ionic: bug fixes

These are a couple of maintenance bug fixes for the Pensando ionic
networking driver.

Mohamed takes care of a "plays well with others" issue where the
VF spec is a bit vague on VF mac addresses, but certain customers
have come to expect behavior based on other vendor drivers.

Shannon addresses a couple of corner cases seen in internal
stress testing.
====================

Link: https://lore.kernel.org/r/20220824165051.6185-1-snelson@pensando.io
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoionic: VF initial random MAC address if no assigned mac
R Mohamed Shah [Wed, 24 Aug 2022 16:50:51 +0000 (09:50 -0700)]
ionic: VF initial random MAC address if no assigned mac

Assign a random mac address to the VF interface station
address if it boots with a zero mac address in order to match
similar behavior seen in other VF drivers.  Handle the errors
where the older firmware does not allow the VF to set its own
station address.

Newer firmware will allow the VF to set the station mac address
if it hasn't already been set administratively through the PF.
Setting it will also be allowed if the VF has trust.

Fixes: fbb39807e9ae ("ionic: support sr-iov operations")
Signed-off-by: R Mohamed Shah <mohamed@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoionic: fix up issues with handling EAGAIN on FW cmds
Shannon Nelson [Wed, 24 Aug 2022 16:50:50 +0000 (09:50 -0700)]
ionic: fix up issues with handling EAGAIN on FW cmds

In looping on FW update tests we occasionally see the
FW_ACTIVATE_STATUS command fail while it is in its EAGAIN loop
waiting for the FW activate step to finsh inside the FW.  The
firmware is complaining that the done bit is set when a new
dev_cmd is going to be processed.

Doing a clean on the cmd registers and doorbell before exiting
the wait-for-done and cleaning the done bit before the sleep
prevents this from occurring.

Fixes: fbfb8031533c ("ionic: Add hardware init and device commands")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoionic: clear broken state on generation change
Shannon Nelson [Wed, 24 Aug 2022 16:50:49 +0000 (09:50 -0700)]
ionic: clear broken state on generation change

There is a case found in heavy testing where a link flap happens just
before a firmware Recovery event and the driver gets stuck in the
BROKEN state.  This comes from the driver getting interrupted by a FW
generation change when coming back up from the link flap, and the call
to ionic_start_queues() in ionic_link_status_check() fails.  This can be
addressed by having the fw_up code clear the BROKEN bit if seen, rather
than waiting for a user to manually force the interface down and then
back up.

Fixes: 9e8eaf8427b6 ("ionic: stop watchdog when in broken state")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agorxrpc: Fix locking in rxrpc's sendmsg
David Howells [Wed, 24 Aug 2022 16:35:45 +0000 (17:35 +0100)]
rxrpc: Fix locking in rxrpc's sendmsg

Fix three bugs in the rxrpc's sendmsg implementation:

 (1) rxrpc_new_client_call() should release the socket lock when returning
     an error from rxrpc_get_call_slot().

 (2) rxrpc_wait_for_tx_window_intr() will return without the call mutex
     held in the event that we're interrupted by a signal whilst waiting
     for tx space on the socket or relocking the call mutex afterwards.

     Fix this by: (a) moving the unlock/lock of the call mutex up to
     rxrpc_send_data() such that the lock is not held around all of
     rxrpc_wait_for_tx_window*() and (b) indicating to higher callers
     whether we're return with the lock dropped.  Note that this means
     recvmsg() will not block on this call whilst we're waiting.

 (3) After dropping and regaining the call mutex, rxrpc_send_data() needs
     to go and recheck the state of the tx_pending buffer and the
     tx_total_len check in case we raced with another sendmsg() on the same
     call.

Thinking on this some more, it might make sense to have different locks for
sendmsg() and recvmsg().  There's probably no need to make recvmsg() wait
for sendmsg().  It does mean that recvmsg() can return MSG_EOR indicating
that a call is dead before a sendmsg() to that call returns - but that can
currently happen anyway.

Without fix (2), something like the following can be induced:

WARNING: bad unlock balance detected!
5.16.0-rc6-syzkaller #0 Not tainted
-------------------------------------
syz-executor011/3597 is trying to release lock (&call->user_mutex) at:
[<ffffffff885163a3>] rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748
but there are no more locks to release!

other info that might help us debug this:
no locks held by syz-executor011/3597.
...
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 print_unlock_imbalance_bug include/trace/events/lock.h:58 [inline]
 __lock_release kernel/locking/lockdep.c:5306 [inline]
 lock_release.cold+0x49/0x4e kernel/locking/lockdep.c:5657
 __mutex_unlock_slowpath+0x99/0x5e0 kernel/locking/mutex.c:900
 rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748
 rxrpc_sendmsg+0x420/0x630 net/rxrpc/af_rxrpc.c:561
 sock_sendmsg_nosec net/socket.c:704 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:724
 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2463
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

[Thanks to Hawkins Jiawei and Khalid Masum for their attempts to fix this]

Fixes: bc5e3a546d55 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
Reported-by: syzbot+7f0483225d0c94cb3441@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Tested-by: syzbot+7f0483225d0c94cb3441@syzkaller.appspotmail.com
cc: Hawkins Jiawei <yin31149@gmail.com>
cc: Khalid Masum <khalid.masum.92@gmail.com>
cc: Dan Carpenter <dan.carpenter@oracle.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/166135894583.600315.7170979436768124075.stgit@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoMerge tag 'cgroup-for-6.0-rc2-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 25 Aug 2022 17:52:16 +0000 (10:52 -0700)]
Merge tag 'cgroup-for-6.0-rc2-fixes-2' of git://git./linux/kernel/git/tj/cgroup

Pull another cgroup fix from Tejun Heo:
 "Commit 4f7e7236435c ("cgroup: Fix threadgroup_rwsem <->
  cpus_read_lock() deadlock") required the cgroup
  core to grab cpus_read_lock() before invoking ->attach().

  Unfortunately, it missed adding cpus_read_lock() in
  cgroup_attach_task_all(). Fix it"

* tag 'cgroup-for-6.0-rc2-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Add missing cpus_read_lock() to cgroup_attach_task_all()

22 months agocgroup: Add missing cpus_read_lock() to cgroup_attach_task_all()
Tetsuo Handa [Thu, 25 Aug 2022 08:38:38 +0000 (17:38 +0900)]
cgroup: Add missing cpus_read_lock() to cgroup_attach_task_all()

syzbot is hitting percpu_rwsem_assert_held(&cpu_hotplug_lock) warning at
cpuset_attach() [1], for commit 4f7e7236435ca0ab ("cgroup: Fix
threadgroup_rwsem <-> cpus_read_lock() deadlock") missed that
cpuset_attach() is also called from cgroup_attach_task_all().
Add cpus_read_lock() like what cgroup_procs_write_start() does.

Link: https://syzkaller.appspot.com/bug?extid=29d3a3b4d86c8136ad9e
Reported-by: syzbot <syzbot+29d3a3b4d86c8136ad9e@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 4f7e7236435ca0ab ("cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock")
Signed-off-by: Tejun Heo <tj@kernel.org>
22 months agonet: ethernet: mtk_eth_soc: fix hw hash reporting for MTK_NETSYS_V2
Lorenzo Bianconi [Tue, 23 Aug 2022 12:24:07 +0000 (14:24 +0200)]
net: ethernet: mtk_eth_soc: fix hw hash reporting for MTK_NETSYS_V2

Properly report hw rx hash for mt7986 chipset accroding to the new dma
descriptor layout.

Fixes: 197c9e9b17b11 ("net: ethernet: mtk_eth_soc: introduce support for mt7986 chipset")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/091394ea4e705fbb35f828011d98d0ba33808f69.1661257293.git.lorenzo@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
22 months agowifi: mac80211: potential NULL dereference in ieee80211_tx_control_port()
Dan Carpenter [Thu, 28 Jul 2022 14:25:16 +0000 (17:25 +0300)]
wifi: mac80211: potential NULL dereference in ieee80211_tx_control_port()

The ieee80211_lookup_ra_sta() function will sometimes set "sta" to NULL
so add this NULL check to prevent an Oops.

Fixes: 9dd1953846c7 ("wifi: nl80211/mac80211: clarify link ID in control port TX")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/YuKcTAyO94YOy0Bu@kili
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: cfg80211: debugfs: fix return type in ht40allow_map_read()
Dan Carpenter [Thu, 4 Aug 2022 07:03:21 +0000 (10:03 +0300)]
wifi: cfg80211: debugfs: fix return type in ht40allow_map_read()

The return type is supposed to be ssize_t, which is signed long,
but "r" was declared as unsigned int.  This means that on 64 bit systems
we return positive values instead of negative error codes.

Fixes: 80a3511d70e8 ("cfg80211: add debugfs HT40 allow map")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/YutvOQeJm0UjLhwU@kili
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: mac80211: Don't finalize CSA in IBSS mode if state is disconnected
Siddh Raman Pant [Sun, 14 Aug 2022 15:15:12 +0000 (20:45 +0530)]
wifi: mac80211: Don't finalize CSA in IBSS mode if state is disconnected

When we are not connected to a channel, sending channel "switch"
announcement doesn't make any sense.

The BSS list is empty in that case. This causes the for loop in
cfg80211_get_bss() to be bypassed, so the function returns NULL
(check line 1424 of net/wireless/scan.c), causing the WARN_ON()
in ieee80211_ibss_csa_beacon() to get triggered (check line 500
of net/mac80211/ibss.c), which was consequently reported on the
syzkaller dashboard.

Thus, check if we have an existing connection before generating
the CSA beacon in ieee80211_ibss_finish_csa().

Cc: stable@vger.kernel.org
Fixes: cd7760e62c2a ("mac80211: add support for CSA in IBSS mode")
Link: https://syzkaller.appspot.com/bug?id=05603ef4ae8926761b678d2939a3b2ad28ab9ca6
Reported-by: syzbot+b6c9fe29aefe68e4ad34@syzkaller.appspotmail.com
Signed-off-by: Siddh Raman Pant <code@siddh.me>
Tested-by: syzbot+b6c9fe29aefe68e4ad34@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/20220814151512.9985-1-code@siddh.me
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: mac80211: fix possible leak in ieee80211_tx_control_port()
Yang Yingliang [Thu, 18 Aug 2022 04:33:49 +0000 (12:33 +0800)]
wifi: mac80211: fix possible leak in ieee80211_tx_control_port()

Add missing dev_kfree_skb() in an error path in
ieee80211_tx_control_port() to avoid a memory leak.

Fixes: dd820ed6336a ("wifi: mac80211: return error from control port TX for drops")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20220818043349.4168835-1-yangyingliang@huawei.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: mac80211: always free sta in __sta_info_alloc in case of error
Lorenzo Bianconi [Tue, 23 Aug 2022 13:22:23 +0000 (15:22 +0200)]
wifi: mac80211: always free sta in __sta_info_alloc in case of error

Free sta pointer in __sta_info_alloc routine if sta_info_alloc_link()
fails.

Fixes: 246b39e4a1ba5 ("wifi: mac80211: refactor some sta_info link handling")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/a3d079208684cddbc25289f7f7e0fed795b0cad4.1661260857.git.lorenzo@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agowifi: mac80211: Fix UAF in ieee80211_scan_rx()
Siddh Raman Pant [Fri, 19 Aug 2022 20:03:40 +0000 (01:33 +0530)]
wifi: mac80211: Fix UAF in ieee80211_scan_rx()

ieee80211_scan_rx() tries to access scan_req->flags after a
null check, but a UAF is observed when the scan is completed
and __ieee80211_scan_completed() executes, which then calls
cfg80211_scan_done() leading to the freeing of scan_req.

Since scan_req is rcu_dereference()'d, prevent the racing in
__ieee80211_scan_completed() by ensuring that from mac80211's
POV it is no longer accessed from an RCU read critical section
before we call cfg80211_scan_done().

Cc: stable@vger.kernel.org
Link: https://syzkaller.appspot.com/bug?extid=f9acff9bf08a845f225d
Reported-by: syzbot+f9acff9bf08a845f225d@syzkaller.appspotmail.com
Suggested-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Siddh Raman Pant <code@siddh.me>
Link: https://lore.kernel.org/r/20220819200340.34826-1-code@siddh.me
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
22 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Jakub Kicinski [Thu, 25 Aug 2022 02:18:09 +0000 (19:18 -0700)]
Merge git://git./linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Fix crash with malformed ebtables blob which do not provide all
   entry points, from Florian Westphal.

2) Fix possible TCP connection clogging up with default 5-days
   timeout in conntrack, from Florian.

3) Fix crash in nf_tables tproxy with unsupported chains, also from Florian.

4) Do not allow to update implicit chains.

5) Make table handle allocation per-netns to fix data race.

6) Do not truncated payload length and offset, and checksum offset.
   Instead report EINVAl.

7) Enable chain stats update via static key iff no error occurs.

8) Restrict osf expression to ip, ip6 and inet families.

9) Restrict tunnel expression to netdev family.

10) Fix crash when trying to bind again an already bound chain.

11) Flowtable garbage collector might leave behind pending work to
    delete entries. This patch comes with a previous preparation patch
    as dependency.

12) Allow net.netfilter.nf_conntrack_frag6_high_thresh to be lowered,
    from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_defrag_ipv6: allow nf_conntrack_frag6_high_thresh increases
  netfilter: flowtable: fix stuck flows on cleanup due to pending work
  netfilter: flowtable: add function to invoke garbage collection immediately
  netfilter: nf_tables: disallow binding to already bound chain
  netfilter: nft_tunnel: restrict it to netdev family
  netfilter: nft_osf: restrict osf to ipv4, ipv6 and inet families
  netfilter: nf_tables: do not leave chain stats enabled on error
  netfilter: nft_payload: do not truncate csum_offset and csum_type
  netfilter: nft_payload: report ERANGE for too long offset and length
  netfilter: nf_tables: make table handle allocation per-netns friendly
  netfilter: nf_tables: disallow updates of implicit chain
  netfilter: nft_tproxy: restrict to prerouting hook
  netfilter: conntrack: work around exceeded receive window
  netfilter: ebtables: reject blobs that don't provide all entry points
====================

Link: https://lore.kernel.org/r/20220824220330.64283-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoMAINTAINERS: rectify file entry in BONDING DRIVER
Lukas Bulwahn [Wed, 24 Aug 2022 07:29:45 +0000 (09:29 +0200)]
MAINTAINERS: rectify file entry in BONDING DRIVER

Commit c078290a2b76 ("selftests: include bonding tests into the kselftest
infra") adds the bonding tests in the directory:

  tools/testing/selftests/drivers/net/bonding/

The file entry in MAINTAINERS for the BONDING DRIVER however refers to:

  tools/testing/selftests/net/bonding/

Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about a
broken file pattern.

Repair this file entry in BONDING DRIVER.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Link: https://lore.kernel.org/r/20220824072945.28606-1-lukas.bulwahn@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoi40e: Fix incorrect address type for IPv6 flow rules
Sylwester Dziedziuch [Fri, 19 Aug 2022 10:45:52 +0000 (12:45 +0200)]
i40e: Fix incorrect address type for IPv6 flow rules

It was not possible to create 1-tuple flow director
rule for IPv6 flow type. It was caused by incorrectly
checking for source IP address when validating user provided
destination IP address.

Fix this by changing ip6src to correct ip6dst address
in destination IP address validation for IPv6 flow type.

Fixes: efca91e89b67 ("i40e: Add flow director support for IPv6")
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
22 months agoixgbe: stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter
Jacob Keller [Tue, 2 Aug 2022 00:24:19 +0000 (17:24 -0700)]
ixgbe: stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter

The ixgbe_ptp_start_cyclecounter is intended to be called whenever the
cyclecounter parameters need to be changed.

Since commit a9763f3cb54c ("ixgbe: Update PTP to support X550EM_x
devices"), this function has cleared the SYSTIME registers and reset the
TSAUXC DISABLE_SYSTIME bit.

While these need to be cleared during ixgbe_ptp_reset, it is wrong to clear
them during ixgbe_ptp_start_cyclecounter. This function may be called
during both reset and link status change. When link changes, the SYSTIME
counter is still operating normally, but the cyclecounter should be updated
to account for the possibly changed parameters.

Clearing SYSTIME when link changes causes the timecounter to jump because
the cycle counter now reads zero.

Extract the SYSTIME initialization out to a new function and call this
during ixgbe_ptp_reset. This prevents the timecounter adjustment and avoids
an unnecessary reset of the current time.

This also restores the original SYSTIME clearing that occurred during
ixgbe_ptp_reset before the commit above.

Reported-by: Steve Payne <spayne@aurora.tech>
Reported-by: Ilya Evenbach <ievenbach@aurora.tech>
Fixes: a9763f3cb54c ("ixgbe: Update PTP to support X550EM_x devices")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
22 months agoMerge tag 'trace-v6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Wed, 24 Aug 2022 17:43:34 +0000 (10:43 -0700)]
Merge tag 'trace-v6.0-rc2' of git://git./linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:

 - Fix build warning for when MODULES and FTRACE_WITH_DIRECT_CALLS are
   not set. A warning happens with ops_references_rec() defined but not
   used.

* tag 'trace-v6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace: Fix build warning for ops_references_rec() not used

22 months agoMerge branch 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvar...
Linus Torvalds [Wed, 24 Aug 2022 17:19:20 +0000 (10:19 -0700)]
Merge branch 'dmi-for-linus' of git://git./linux/kernel/git/jdelvare/staging

Pull dmi update from Jean Delvare.

Tiny cleanup.

* 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
  firmware: dmi: Use the proper accessor for the version field

22 months agoMerge branch 'sysctl-data-races'
David S. Miller [Wed, 24 Aug 2022 12:46:59 +0000 (13:46 +0100)]
Merge branch 'sysctl-data-races'

Kuniyuki Iwashima says:

====================
net: sysctl: Fix data-races around net.core.XXX

This series fixes data-races around all knobs in net_core_table and
netns_core_table except for bpf stuff.

These knobs are skipped:

  - 4 bpf knobs
  - netdev_rss_key: Written only once by net_get_random_once() and
                    read-only knob
  - rps_sock_flow_entries: Protected with sock_flow_mutex
  - flow_limit_cpu_bitmap: Protected with flow_limit_update_mutex
  - flow_limit_table_len: Protected with flow_limit_update_mutex
  - default_qdisc: Protected with qdisc_mod_lock
  - warnings: Unused
  - high_order_alloc_disable: Protected with static_key_mutex
  - skb_defer_max: Already using READ_ONCE()
  - sysctl_txrehash: Already using READ_ONCE()

Note 5th patch fixes net.core.message_cost and net.core.message_burst,
and lib/ratelimit.c does not have an explicit maintainer.

Changes:
  v3:
    * Fix build failures of CONFIG_SYSCTL=n case in 13th & 14th patches

  v2: https://lore.kernel.org/netdev/20220818035227.81567-1-kuniyu@amazon.com/
    * Remove 4 bpf knobs and added 6 knobs

  v1: https://lore.kernel.org/netdev/20220816052347.70042-1-kuniyu@amazon.com/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: Fix a data-race around sysctl_somaxconn.
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:47:00 +0000 (10:47 -0700)]
net: Fix a data-race around sysctl_somaxconn.

While reading sysctl_somaxconn, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: Fix a data-race around netdev_unregister_timeout_secs.
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:59 +0000 (10:46 -0700)]
net: Fix a data-race around netdev_unregister_timeout_secs.

While reading netdev_unregister_timeout_secs, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its reader.

Fixes: 5aa3afe107d9 ("net: make unregister netdev warning timeout configurable")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: Fix a data-race around gro_normal_batch.
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:58 +0000 (10:46 -0700)]
net: Fix a data-race around gro_normal_batch.

While reading gro_normal_batch, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 323ebb61e32b ("net: use listified RX for handling GRO_NORMAL skbs")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: Fix data-races around sysctl_devconf_inherit_init_net.
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:57 +0000 (10:46 -0700)]
net: Fix data-races around sysctl_devconf_inherit_init_net.

While reading sysctl_devconf_inherit_init_net, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its readers.

Fixes: 856c395cfa63 ("net: introduce a knob to control whether to inherit devconf config")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: Fix data-races around sysctl_fb_tunnels_only_for_init_net.
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:56 +0000 (10:46 -0700)]
net: Fix data-races around sysctl_fb_tunnels_only_for_init_net.

While reading sysctl_fb_tunnels_only_for_init_net, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its readers.

Fixes: 79134e6ce2c9 ("net: do not create fallback tunnels for non-default namespaces")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: Fix a data-race around netdev_budget_usecs.
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:55 +0000 (10:46 -0700)]
net: Fix a data-race around netdev_budget_usecs.

While reading netdev_budget_usecs, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 7acf8a1e8a28 ("Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: Fix data-races around sysctl_max_skb_frags.
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:54 +0000 (10:46 -0700)]
net: Fix data-races around sysctl_max_skb_frags.

While reading sysctl_max_skb_frags, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 5f74f82ea34c ("net:Add sysctl_max_skb_frags")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: Fix a data-race around netdev_budget.
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:53 +0000 (10:46 -0700)]
net: Fix a data-race around netdev_budget.

While reading netdev_budget, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 51b0bdedb8e7 ("[NET]: Separate two usages of netdev_max_backlog.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: Fix a data-race around sysctl_net_busy_read.
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:52 +0000 (10:46 -0700)]
net: Fix a data-race around sysctl_net_busy_read.

While reading sysctl_net_busy_read, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 2d48d67fa8cd ("net: poll/select low latency socket support")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: Fix a data-race around sysctl_net_busy_poll.
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:51 +0000 (10:46 -0700)]
net: Fix a data-race around sysctl_net_busy_poll.

While reading sysctl_net_busy_poll, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 060212928670 ("net: add low latency socket poll")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: Fix a data-race around sysctl_tstamp_allow_data.
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:50 +0000 (10:46 -0700)]
net: Fix a data-race around sysctl_tstamp_allow_data.

While reading sysctl_tstamp_allow_data, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its reader.

Fixes: b245be1f4db1 ("net-timestamp: no-payload only sysctl")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: Fix data-races around sysctl_optmem_max.
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:49 +0000 (10:46 -0700)]
net: Fix data-races around sysctl_optmem_max.

While reading sysctl_optmem_max, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoratelimit: Fix data-races in ___ratelimit().
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:48 +0000 (10:46 -0700)]
ratelimit: Fix data-races in ___ratelimit().

While reading rs->interval and rs->burst, they can be changed
concurrently via sysctl (e.g. net_ratelimit_state).  Thus, we
need to add READ_ONCE() to their readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: Fix data-races around netdev_tstamp_prequeue.
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:47 +0000 (10:46 -0700)]
net: Fix data-races around netdev_tstamp_prequeue.

While reading netdev_tstamp_prequeue, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 3b098e2d7c69 ("net: Consistent skb timestamping")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: Fix data-races around netdev_max_backlog.
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:46 +0000 (10:46 -0700)]
net: Fix data-races around netdev_max_backlog.

While reading netdev_max_backlog, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

While at it, we remove the unnecessary spaces in the doc.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: Fix data-races around weight_p and dev_weight_[rt]x_bias.
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:45 +0000 (10:46 -0700)]
net: Fix data-races around weight_p and dev_weight_[rt]x_bias.

While reading weight_p, it can be changed concurrently.  Thus, we need
to add READ_ONCE() to its reader.

Also, dev_[rt]x_weight can be read/written at the same time.  So, we
need to use READ_ONCE() and WRITE_ONCE() for its access.  Moreover, to
use the same weight_p while changing dev_[rt]x_weight, we add a mutex
in proc_do_dev_weight().

Fixes: 3d48b53fb2ae ("net: dev_weight: TX/RX orthogonality")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: Fix data-races around sysctl_[rw]mem_(max|default).
Kuniyuki Iwashima [Tue, 23 Aug 2022 17:46:44 +0000 (10:46 -0700)]
net: Fix data-races around sysctl_[rw]mem_(max|default).

While reading sysctl_[rw]mem_(max|default), they can be changed
concurrently.  Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet/core/skbuff: Check the return value of skb_copy_bits()
lily [Tue, 23 Aug 2022 05:44:11 +0000 (22:44 -0700)]
net/core/skbuff: Check the return value of skb_copy_bits()

skb_copy_bits() could fail, which requires a check on the return
value.

Signed-off-by: Li Zhong <floridsleeves@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
David S. Miller [Wed, 24 Aug 2022 11:51:50 +0000 (12:51 +0100)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec

Steffen Klassert says:

====================
pull request (net): ipsec 2022-08-24

1) Fix a refcount leak in __xfrm_policy_check.
   From Xin Xiong.

2) Revert "xfrm: update SA curlft.use_time". This
   violates RFC 2367. From Antony Antony.

3) Fix a comment on XFRMA_LASTUSED.
   From Antony Antony.

4) x->lastused is not cloned in xfrm_do_migrate.
   Fix from Antony Antony.

5) Serialize the calls to xfrm_probe_algs.
   From Herbert Xu.

6) Fix a null pointer dereference of dst->dev on a metadata
   dst in xfrm_lookup_with_ifid. From Nikolay Aleksandrov.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agofec: Restart PPS after link state change
Csókás Bence [Mon, 22 Aug 2022 08:10:52 +0000 (10:10 +0200)]
fec: Restart PPS after link state change

On link state change, the controller gets reset,
causing PPS to drop out and the PHC to lose its
time and calibration. So we restart it if needed,
restoring calibration and time registers.

Changes since v2:
* Add `fec_ptp_save_state()`/`fec_ptp_restore_state()`
* Use `ktime_get_real_ns()`
* Use `BIT()` macro
Changes since v1:
* More ECR #define's
* Stop PPS in `fec_ptp_stop()`

Signed-off-by: Csókás Bence <csokas.bence@prolan.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonet: neigh: don't call kfree_skb() under spin_lock_irqsave()
Yang Yingliang [Mon, 22 Aug 2022 02:53:46 +0000 (10:53 +0800)]
net: neigh: don't call kfree_skb() under spin_lock_irqsave()

It is not allowed to call kfree_skb() from hardware interrupt
context or with interrupts being disabled. So add all skb to
a tmp list, then free them after spin_unlock_irqrestore() at
once.

Fixes: 66ba215cb513 ("neigh: fix possible DoS due to net iface start/stop loop")
Suggested-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
22 months agonetfilter: nf_defrag_ipv6: allow nf_conntrack_frag6_high_thresh increases
Eric Dumazet [Tue, 23 Aug 2022 23:38:48 +0000 (16:38 -0700)]
netfilter: nf_defrag_ipv6: allow nf_conntrack_frag6_high_thresh increases

Currently, net.netfilter.nf_conntrack_frag6_high_thresh can only be lowered.

I found this issue while investigating a probable kernel issue
causing flakes in tools/testing/selftests/net/ip_defrag.sh

In particular, these sysctl changes were ignored:
ip netns exec "${NETNS}" sysctl -w net.netfilter.nf_conntrack_frag6_high_thresh=9000000 >/dev/null 2>&1
ip netns exec "${NETNS}" sysctl -w net.netfilter.nf_conntrack_frag6_low_thresh=7000000  >/dev/null 2>&1

This change is inline with commit 836196239298 ("net/ipfrag: let ip[6]frag_high_thresh
in ns be higher than in init_net")

Fixes: 8db3d41569bb ("netfilter: nf_defrag_ipv6: use net_generic infra")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agonetfilter: flowtable: fix stuck flows on cleanup due to pending work
Pablo Neira Ayuso [Thu, 18 Nov 2021 21:24:15 +0000 (22:24 +0100)]
netfilter: flowtable: fix stuck flows on cleanup due to pending work

To clear the flow table on flow table free, the following sequence
normally happens in order:

  1) gc_step work is stopped to disable any further stats/del requests.
  2) All flow table entries are set to teardown state.
  3) Run gc_step which will queue HW del work for each flow table entry.
  4) Waiting for the above del work to finish (flush).
  5) Run gc_step again, deleting all entries from the flow table.
  6) Flow table is freed.

But if a flow table entry already has pending HW stats or HW add work
step 3 will not queue HW del work (it will be skipped), step 4 will wait
for the pending add/stats to finish, and step 5 will queue HW del work
which might execute after freeing of the flow table.

To fix the above, this patch flushes the pending work, then it sets the
teardown flag to all flows in the flowtable and it forces a garbage
collector run to queue work to remove the flows from hardware, then it
flushes this new pending work and (finally) it forces another garbage
collector run to remove the entry from the software flowtable.

Stack trace:
[47773.882335] BUG: KASAN: use-after-free in down_read+0x99/0x460
[47773.883634] Write of size 8 at addr ffff888103b45aa8 by task kworker/u20:6/543704
[47773.885634] CPU: 3 PID: 543704 Comm: kworker/u20:6 Not tainted 5.12.0-rc7+ #2
[47773.886745] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)
[47773.888438] Workqueue: nf_ft_offload_del flow_offload_work_handler [nf_flow_table]
[47773.889727] Call Trace:
[47773.890214]  dump_stack+0xbb/0x107
[47773.890818]  print_address_description.constprop.0+0x18/0x140
[47773.892990]  kasan_report.cold+0x7c/0xd8
[47773.894459]  kasan_check_range+0x145/0x1a0
[47773.895174]  down_read+0x99/0x460
[47773.899706]  nf_flow_offload_tuple+0x24f/0x3c0 [nf_flow_table]
[47773.907137]  flow_offload_work_handler+0x72d/0xbe0 [nf_flow_table]
[47773.913372]  process_one_work+0x8ac/0x14e0
[47773.921325]
[47773.921325] Allocated by task 592159:
[47773.922031]  kasan_save_stack+0x1b/0x40
[47773.922730]  __kasan_kmalloc+0x7a/0x90
[47773.923411]  tcf_ct_flow_table_get+0x3cb/0x1230 [act_ct]
[47773.924363]  tcf_ct_init+0x71c/0x1156 [act_ct]
[47773.925207]  tcf_action_init_1+0x45b/0x700
[47773.925987]  tcf_action_init+0x453/0x6b0
[47773.926692]  tcf_exts_validate+0x3d0/0x600
[47773.927419]  fl_change+0x757/0x4a51 [cls_flower]
[47773.928227]  tc_new_tfilter+0x89a/0x2070
[47773.936652]
[47773.936652] Freed by task 543704:
[47773.937303]  kasan_save_stack+0x1b/0x40
[47773.938039]  kasan_set_track+0x1c/0x30
[47773.938731]  kasan_set_free_info+0x20/0x30
[47773.939467]  __kasan_slab_free+0xe7/0x120
[47773.940194]  slab_free_freelist_hook+0x86/0x190
[47773.941038]  kfree+0xce/0x3a0
[47773.941644]  tcf_ct_flow_table_cleanup_work

Original patch description and stack trace by Paul Blakey.

Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Reported-by: Paul Blakey <paulb@nvidia.com>
Tested-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agonetfilter: flowtable: add function to invoke garbage collection immediately
Pablo Neira Ayuso [Mon, 22 Aug 2022 21:13:00 +0000 (23:13 +0200)]
netfilter: flowtable: add function to invoke garbage collection immediately

Expose nf_flow_table_gc_run() to force a garbage collector run from the
offload infrastructure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agonetfilter: nf_tables: disallow binding to already bound chain
Pablo Neira Ayuso [Mon, 22 Aug 2022 09:06:39 +0000 (11:06 +0200)]
netfilter: nf_tables: disallow binding to already bound chain

Update nft_data_init() to report EINVAL if chain is already bound.

Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Reported-by: Gwangun Jung <exsociety@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agonetfilter: nft_tunnel: restrict it to netdev family
Pablo Neira Ayuso [Sun, 21 Aug 2022 14:32:44 +0000 (16:32 +0200)]
netfilter: nft_tunnel: restrict it to netdev family

Only allow to use this expression from NFPROTO_NETDEV family.

Fixes: af308b94a2a4 ("netfilter: nf_tables: add tunnel support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agonetfilter: nft_osf: restrict osf to ipv4, ipv6 and inet families
Pablo Neira Ayuso [Sun, 21 Aug 2022 14:25:07 +0000 (16:25 +0200)]
netfilter: nft_osf: restrict osf to ipv4, ipv6 and inet families

As it was originally intended, restrict extension to supported families.

Fixes: b96af92d6eaf ("netfilter: nf_tables: implement Passive OS fingerprint module in nft_osf")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agonetfilter: nf_tables: do not leave chain stats enabled on error
Pablo Neira Ayuso [Sun, 21 Aug 2022 10:41:33 +0000 (12:41 +0200)]
netfilter: nf_tables: do not leave chain stats enabled on error

Error might occur later in the nf_tables_addchain() codepath, enable
static key only after transaction has been created.

Fixes: 9f08ea848117 ("netfilter: nf_tables: keep chain counters away from hot path")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agonetfilter: nft_payload: do not truncate csum_offset and csum_type
Pablo Neira Ayuso [Sun, 21 Aug 2022 09:55:19 +0000 (11:55 +0200)]
netfilter: nft_payload: do not truncate csum_offset and csum_type

Instead report ERANGE if csum_offset is too long, and EOPNOTSUPP if type
is not support.

Fixes: 7ec3f7b47b8d ("netfilter: nft_payload: add packet mangling support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agonetfilter: nft_payload: report ERANGE for too long offset and length
Pablo Neira Ayuso [Sun, 21 Aug 2022 09:47:04 +0000 (11:47 +0200)]
netfilter: nft_payload: report ERANGE for too long offset and length

Instead of offset and length are truncation to u8, report ERANGE.

Fixes: 96518518cc41 ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agonetfilter: nf_tables: make table handle allocation per-netns friendly
Pablo Neira Ayuso [Sun, 21 Aug 2022 08:52:48 +0000 (10:52 +0200)]
netfilter: nf_tables: make table handle allocation per-netns friendly

mutex is per-netns, move table_netns to the pernet area.

*read-write* to 0xffffffff883a01e8 of 8 bytes by task 6542 on cpu 0:
 nf_tables_newtable+0x6dc/0xc00 net/netfilter/nf_tables_api.c:1221
 nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline]
 nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline]
 nfnetlink_rcv+0xa6a/0x13a0 net/netfilter/nfnetlink.c:652
 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
 netlink_unicast+0x652/0x730 net/netlink/af_netlink.c:1345
 netlink_sendmsg+0x643/0x740 net/netlink/af_netlink.c:1921

Fixes: f102d66b335a ("netfilter: nf_tables: use dedicated mutex to guard transactions")
Reported-by: Abhishek Shah <abhishek.shah@columbia.edu>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agonetfilter: nf_tables: disallow updates of implicit chain
Pablo Neira Ayuso [Sun, 21 Aug 2022 08:28:25 +0000 (10:28 +0200)]
netfilter: nf_tables: disallow updates of implicit chain

Updates on existing implicit chain make no sense, disallow this.

Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
22 months agoMerge tag 'cgroup-for-6.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Wed, 24 Aug 2022 02:33:28 +0000 (19:33 -0700)]
Merge tag 'cgroup-for-6.0-rc2-fixes' of git://git./linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:

 - The psi data structure was changed to be allocated dynamically but
   it wasn't being cleared leading to it reporting garbage values and
   triggering spurious oom kills.

 - A deadlock involving cpuset and cpu hotplug.

 - When a controller is moved across cgroup hierarchies,
   css->rstat_css_node didn't get RCU drained properly from the previous
   list.

* tag 'cgroup-for-6.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Fix race condition at rebind_subsystems()
  cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock
  sched/psi: Remove redundant cgroup_psi() when !CONFIG_CGROUPS
  sched/psi: Remove unused parameter nbytes of psi_trigger_create()
  sched/psi: Zero the memory of struct psi_group

22 months agoMerge tag 'audit-pr-20220823' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoor...
Linus Torvalds [Wed, 24 Aug 2022 02:26:48 +0000 (19:26 -0700)]
Merge tag 'audit-pr-20220823' of git://git./linux/kernel/git/pcmoore/audit

Pull audit fix from Paul Moore:
 "A single fix for a potential double-free on a fsnotify error path"

* tag 'audit-pr-20220823' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: fix potential double free on error path from fsnotify_add_inode_mark

22 months agoMerge tag 'fs.fixes.v6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs...
Linus Torvalds [Wed, 24 Aug 2022 02:17:26 +0000 (19:17 -0700)]
Merge tag 'fs.fixes.v6.0-rc3' of git://git./linux/kernel/git/vfs/idmapping

Pull file_remove_privs() fix from Christian Brauner:
 "As part of Stefan's and Jens' work to add async buffered write
  support to xfs we refactored file_remove_privs() and added
  __file_remove_privs() to avoid calling __remove_privs() when
  IOCB_NOWAIT is passed.

  While debugging a recent performance regression report I found that
  during review we missed that commit faf99b563558 ("fs: add
  __remove_file_privs() with flags parameter") accidently changed
  behavior when dentry_needs_remove_privs() returns zero.

  Before the commit it would still call inode_has_no_xattr() setting
  the S_NOSEC bit and thereby avoiding even calling into
  dentry_needs_remove_privs() the next time this function is called.
  After that commit inode_has_no_xattr() would only be called if
  __remove_privs() had to be called.

  Restore the old behavior. This is likely the cause of the performance
  regression"

* tag 'fs.fixes.v6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
  fs: __file_remove_privs(): restore call to inode_has_no_xattr()

22 months agoMerge tag 'mlx5-fixes-2022-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git...
Jakub Kicinski [Wed, 24 Aug 2022 00:50:26 +0000 (17:50 -0700)]
Merge tag 'mlx5-fixes-2022-08-22' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 fixes 2022-08-22

This series provides bug fixes to mlx5 driver.

* tag 'mlx5-fixes-2022-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Unlock on error in mlx5_sriov_enable()
  net/mlx5e: Fix use after free in mlx5e_fs_init()
  net/mlx5e: kTLS, Use _safe() iterator in mlx5e_tls_priv_tx_list_cleanup()
  net/mlx5: unlock on error path in esw_vfs_changed_event_handler()
  net/mlx5e: Fix wrong tc flag used when set hw-tc-offload off
  net/mlx5e: TC, Add missing policer validation
  net/mlx5e: Fix wrong application of the LRO state
  net/mlx5: Avoid false positive lockdep warning by adding lock_class_key
  net/mlx5: Fix cmd error logging for manage pages cmd
  net/mlx5: Disable irq when locking lag_lock
  net/mlx5: Eswitch, Fix forwarding decision to uplink
  net/mlx5: LAG, fix logic over MLX5_LAG_FLAG_NDEVS_READY
  net/mlx5e: Properly disable vlan strip on non-UL reps
====================

Link: https://lore.kernel.org/r/20220822195917.216025-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net...
Jakub Kicinski [Wed, 24 Aug 2022 00:49:02 +0000 (17:49 -0700)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
ice: xsk: reduced queue count fixes

Maciej Fijalkowski says:

this small series is supposed to fix the issues around AF_XDP usage with
reduced queue count on interface. Due to the XDP rings setup, some
configurations can result in sockets not seeing traffic flowing. More
about this in description of patch 2.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  ice: xsk: use Rx ring's XDP ring when picking NAPI context
  ice: xsk: prohibit usage of non-balanced queue id
====================

Link: https://lore.kernel.org/r/20220822163257.2382487-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoMerge branch 'bnxt_en-bug-fixes'
Jakub Kicinski [Tue, 23 Aug 2022 22:32:24 +0000 (15:32 -0700)]
Merge branch 'bnxt_en-bug-fixes'

Michael Chan says:

====================
bnxt_en: Bug fixes

This series includes 2 fixes for regressions introduced by the XDP
multi-buffer feature, 1 devlink reload bug fix, and 1 SRIOV resource
accounting bug fix.
====================

Link: https://lore.kernel.org/r/1661180814-19350-1-git-send-email-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agobnxt_en: fix LRO/GRO_HW features in ndo_fix_features callback
Vikas Gupta [Mon, 22 Aug 2022 15:06:54 +0000 (11:06 -0400)]
bnxt_en: fix LRO/GRO_HW features in ndo_fix_features callback

LRO/GRO_HW should be disabled if there is an attached XDP program.
BNXT_FLAG_TPA is the current setting of the LRO/GRO_HW.  Using
BNXT_FLAG_TPA to disable LRO/GRO_HW will cause these features to be
permanently disabled once they are disabled.

Fixes: 1dc4c557bfed ("bnxt: adding bnxt_xdp_build_skb to build skb from multibuffer xdp_buff")
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agobnxt_en: fix NQ resource accounting during vf creation on 57500 chips
Vikas Gupta [Mon, 22 Aug 2022 15:06:53 +0000 (11:06 -0400)]
bnxt_en: fix NQ resource accounting during vf creation on 57500 chips

There are 2 issues:

1. We should decrement hw_resc->max_nqs instead of hw_resc->max_irqs
   with the number of NQs assigned to the VFs.  The IRQs are fixed
   on each function and cannot be re-assigned.  Only the NQs are being
   assigned to the VFs.

2. vf_msix is the total number of NQs to be assigned to the VFs.  So
   we should decrement vf_msix from hw_resc->max_nqs.

Fixes: b16b68918674 ("bnxt_en: Add SR-IOV support for 57500 chips.")
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agobnxt_en: set missing reload flag in devlink features
Vikas Gupta [Mon, 22 Aug 2022 15:06:52 +0000 (11:06 -0400)]
bnxt_en: set missing reload flag in devlink features

Add missing devlink_set_features() API for callbacks reload_down
and reload_up to function.

Fixes: 228ea8c187d8 ("bnxt_en: implement devlink dev reload driver_reinit")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agobnxt_en: Use PAGE_SIZE to init buffer when multi buffer XDP is not in use
Pavan Chebbi [Mon, 22 Aug 2022 15:06:51 +0000 (11:06 -0400)]
bnxt_en: Use PAGE_SIZE to init buffer when multi buffer XDP is not in use

Using BNXT_PAGE_MODE_BUF_SIZE + offset as buffer length value is not
sufficient when running single buffer XDP programs doing redirect
operations. The stack will complain on missing skb tail room. Fix it
by using PAGE_SIZE when calling xdp_init_buff() for single buffer
programs.

Fixes: b231c3f3414c ("bnxt: refactor bnxt_rx_xdp to separate xdp_init_buff/xdp_prepare_buff")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: dsa: microchip: make learning configurable and keep it off while standalone
Vladimir Oltean [Thu, 18 Aug 2022 16:48:09 +0000 (19:48 +0300)]
net: dsa: microchip: make learning configurable and keep it off while standalone

Address learning should initially be turned off by the driver for port
operation in standalone mode, then the DSA core handles changes to it
via ds->ops->port_bridge_flags().

Leaving address learning enabled while ports are standalone breaks any
kind of communication which involves port B receiving what port A has
sent. Notably it breaks the ksz9477 driver used with a (non offloaded,
ports act as if standalone) bonding interface in active-backup mode,
when the ports are connected together through external switches, for
redundancy purposes.

This fixes a major design flaw in the ksz9477 and ksz8795 drivers, which
unconditionally leave address learning enabled even while ports operate
as standalone.

Fixes: b987e98e50ab ("dsa: add DSA switch driver for Microchip KSZ9477")
Link: https://lore.kernel.org/netdev/CAFZh4h-JVWt80CrQWkFji7tZJahMfOToUJQgKS5s0_=9zzpvYQ@mail.gmail.com/
Reported-by: Brian Hutchinson <b.hutchman@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220818164809.3198039-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agoMerge tag 'parisc-for-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller...
Linus Torvalds [Tue, 23 Aug 2022 20:42:31 +0000 (13:42 -0700)]
Merge tag 'parisc-for-6.0-2' of git://git./linux/kernel/git/deller/parisc-linux

Pull parisc fixes from Helge Deller:
 "Some interesting background to the current patchset:

  It turned out that the fldw instruction (which loads a 32-bit word
  from memory into one half of a FP register) failed on unaligned
  addresses and even trashed some other random FP register instead. It's
  a trivial one-liner fix in the exception handler but this failure
  dates back to the very beginnings of the parisc-port. It's strange
  that it was never noticed before.

  Another patch fixes an annoyance noticed by Randy Dunlap. Running
  "make ARCH=parisc64 randconfig" always returned a 32-bit config,
  although one would expect a 64-bit config. Masahiro Yamada suggested
  to mimik sparc Kconfig code, which fixed the issue nicely. This
  allowed to drop some compiler build checks too.

  Third, it's possible to build an optimized 32-bit kernel for PA8X00
  (64-bit) CPUs, which then wouldn't start on 32-bit-only (PA1.x)
  machines. I've added a bootup check which prevents that and which
  prints a message to the console. This can be tested with qemu, which
  currently only supports 32-bit emulation.

  The other patches are usual clean-up stuff like added return value
  checks and typo fixes in comments.

  Summary:

   - Fix emulation of fldw instruction on unaligned addresses

   - Fix "make ARCH=parisc64 randconfig" to return a 64-bit config

   - Prevent boot if trying to boot a 32-bit kernel compiled for PA8X00
     CPUs on 32-bit only machines

   - ccio-dma: Handle kmalloc failure in ccio_init_resources()"

* tag 'parisc-for-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Add runtime check to prevent PA2.0 kernels on PA1.x machines
  parisc: ccio-dma: Handle kmalloc failure in ccio_init_resources()
  parisc: led: Move from strlcpy with unused retval to strscpy
  parisc: ccio-dma: Fix typo in comment
  Revert "parisc: Show error if wrong 32/64-bit compiler is being used"
  parisc: Make CONFIG_64BIT available for ARCH=parisc64 only
  parisc: Fix exception handler for fldw and fstw instructions

22 months agoMerge tag 'mm-hotfixes-stable-2022-08-22' of git://git.kernel.org/pub/scm/linux/kerne...
Linus Torvalds [Tue, 23 Aug 2022 20:33:08 +0000 (13:33 -0700)]
Merge tag 'mm-hotfixes-stable-2022-08-22' of git://git./linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "Thirteen fixes, almost all for MM.

  Seven of these are cc:stable and the remainder fix up the changes
  which went into this -rc cycle"

* tag 'mm-hotfixes-stable-2022-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  kprobes: don't call disarm_kprobe() for disabled kprobes
  mm/shmem: shmem_replace_page() remember NR_SHMEM
  mm/shmem: tmpfs fallocate use file_modified()
  mm/shmem: fix chattr fsflags support in tmpfs
  mm/hugetlb: support write-faults in shared mappings
  mm/hugetlb: fix hugetlb not supporting softdirty tracking
  mm/uffd: reset write protection when unregister with wp-mode
  mm/smaps: don't access young/dirty bit if pte unpresent
  mm: add DEVICE_ZONE to FOR_ALL_ZONES
  kernel/sys_ni: add compat entry for fadvise64_64
  mm/gup: fix FOLL_FORCE COW security issue and remove FOLL_COW
  Revert "zram: remove double compression logic"
  get_maintainer: add Alan to .get_maintainer.ignore

22 months agoMerge tag 'linux-kselftest-kunit-fixes-6.0-rc3' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Tue, 23 Aug 2022 20:23:07 +0000 (13:23 -0700)]
Merge tag 'linux-kselftest-kunit-fixes-6.0-rc3' of git://git./linux/kernel/git/shuah/linux-kselftest

Pull KUnit fixes from Shuah Khan:
 "Fix for a mmc test and to load .kunit_test_suites section when
  CONFIG_KUNIT=m, and not just when KUnit is built-in"

* tag 'linux-kselftest-kunit-fixes-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  module: kunit: Load .kunit_test_suites section when CONFIG_KUNIT=m
  mmc: sdhci-of-aspeed: test: Fix dependencies when KUNIT=m

22 months agoMerge tag 'linux-kselftest-fixes-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kerne...
Linus Torvalds [Tue, 23 Aug 2022 20:13:36 +0000 (13:13 -0700)]
Merge tag 'linux-kselftest-fixes-6.0-rc3' of git://git./linux/kernel/git/shuah/linux-kselftest

Pull Kselftest fixes from Shuah Khan:
 "Fixes to vm and sgx test builds"

* tag 'linux-kselftest-fixes-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/vm: fix inability to build any vm tests
  selftests/sgx: Ignore OpenSSL 3.0 deprecated functions warning

22 months agonetfilter: nft_tproxy: restrict to prerouting hook
Florian Westphal [Sat, 20 Aug 2022 15:54:06 +0000 (17:54 +0200)]
netfilter: nft_tproxy: restrict to prerouting hook

TPROXY is only allowed from prerouting, but nft_tproxy doesn't check this.
This fixes a crash (null dereference) when using tproxy from e.g. output.

Fixes: 4ed8eb6570a4 ("netfilter: nf_tables: Add native tproxy support")
Reported-by: Shell Chen <xierch@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
22 months agocgroup: Fix race condition at rebind_subsystems()
Jing-Ting Wu [Tue, 23 Aug 2022 05:41:46 +0000 (13:41 +0800)]
cgroup: Fix race condition at rebind_subsystems()

Root cause:
The rebind_subsystems() is no lock held when move css object from A
list to B list,then let B's head be treated as css node at
list_for_each_entry_rcu().

Solution:
Add grace period before invalidating the removed rstat_css_node.

Reported-by: Jing-Ting Wu <jing-ting.wu@mediatek.com>
Suggested-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Jing-Ting Wu <jing-ting.wu@mediatek.com>
Tested-by: Jing-Ting Wu <jing-ting.wu@mediatek.com>
Link: https://lore.kernel.org/linux-arm-kernel/d8f0bc5e2fb6ed259f9334c83279b4c011283c41.camel@mediatek.com/T/
Acked-by: Mukesh Ojha <quic_mojha@quicinc.com>
Fixes: a7df69b81aac ("cgroup: rstat: support cgroup1")
Cc: stable@vger.kernel.org # v5.13+
Signed-off-by: Tejun Heo <tj@kernel.org>
22 months agonetfilter: conntrack: work around exceeded receive window
Florian Westphal [Thu, 18 Aug 2022 22:42:31 +0000 (00:42 +0200)]
netfilter: conntrack: work around exceeded receive window

When a TCP sends more bytes than allowed by the receive window, all future
packets can be marked as invalid.
This can clog up the conntrack table because of 5-day default timeout.

Sequence of packets:
 01 initiator > responder: [S], seq 171, win 5840, options [mss 1330,sackOK,TS val 63 ecr 0,nop,wscale 1]
 02 responder > initiator: [S.], seq 33211, ack 172, win 65535, options [mss 1460,sackOK,TS val 010 ecr 63,nop,wscale 8]
 03 initiator > responder: [.], ack 33212, win 2920, options [nop,nop,TS val 068 ecr 010], length 0
 04 initiator > responder: [P.], seq 172:240, ack 33212, win 2920, options [nop,nop,TS val 279 ecr 010], length 68

Window is 5840 starting from 33212 -> 39052.

 05 responder > initiator: [.], ack 240, win 256, options [nop,nop,TS val 872 ecr 279], length 0
 06 responder > initiator: [.], seq 33212:34530, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318

This is fine, conntrack will flag the connection as having outstanding
data (UNACKED), which lowers the conntrack timeout to 300s.

 07 responder > initiator: [.], seq 34530:35848, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318
 08 responder > initiator: [.], seq 35848:37166, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318
 09 responder > initiator: [.], seq 37166:38484, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318
 10 responder > initiator: [.], seq 38484:39802, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318

Packet 10 is already sending more than permitted, but conntrack doesn't
validate this (only seq is tested vs. maxend, not 'seq+len').

38484 is acceptable, but only up to 39052, so this packet should
not have been sent (or only 568 bytes, not 1318).

At this point, connection is still in '300s' mode.

Next packet however will get flagged:
 11 responder > initiator: [P.], seq 39802:40128, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 326

nf_ct_proto_6: SEQ is over the upper bound (over the window of the receiver) .. LEN=378 .. SEQ=39802 ACK=240 ACK PSH ..

Now, a couple of replies/acks comes in:

 12 initiator > responder: [.], ack 34530, win 4368,
[.. irrelevant acks removed ]
 16 initiator > responder: [.], ack 39802, win 8712, options [nop,nop,TS val 296201291 ecr 2982371892], length 0

This ack is significant -- this acks the last packet send by the
responder that conntrack considered valid.

This means that ack == td_end.  This will withdraw the
'unacked data' flag, the connection moves back to the 5-day timeout
of established conntracks.

 17 initiator > responder: ack 40128, win 10030, ...

This packet is also flagged as invalid.

Because conntrack only updates state based on packets that are
considered valid, packet 11 'did not exist' and that gets us:

nf_ct_proto_6: ACK is over upper bound 39803 (ACKed data not seen yet) .. SEQ=240 ACK=40128 WINDOW=10030 RES=0x00 ACK URG

Because this received and processed by the endpoints, the conntrack entry
remains in a bad state, no packets will ever be considered valid again:

 30 responder > initiator: [F.], seq 40432, ack 2045, win 391, ..
 31 initiator > responder: [.], ack 40433, win 11348, ..
 32 initiator > responder: [F.], seq 2045, ack 40433, win 11348 ..

... all trigger 'ACK is over bound' test and we end up with
non-early-evictable 5-day default timeout.

NB: This patch triggers a bunch of checkpatch warnings because of silly
indent.  I will resend the cleanup series linked below to reduce the
indent level once this change has propagated to net-next.

I could route the cleanup via nf but that causes extra backport work for
stable maintainers.

Link: https://lore.kernel.org/netfilter-devel/20220720175228.17880-1-fw@strlen.de/T/#mb1d7147d36294573cc4f81d00f9f8dadfdd06cd8
Signed-off-by: Florian Westphal <fw@strlen.de>
22 months agonetfilter: ebtables: reject blobs that don't provide all entry points
Florian Westphal [Sat, 20 Aug 2022 15:38:37 +0000 (17:38 +0200)]
netfilter: ebtables: reject blobs that don't provide all entry points

Harshit Mogalapalli says:
 In ebt_do_table() function dereferencing 'private->hook_entry[hook]'
 can lead to NULL pointer dereference. [..] Kernel panic:

general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
[..]
RIP: 0010:ebt_do_table+0x1dc/0x1ce0
Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 5c 16 00 00 48 b8 00 00 00 00 00 fc ff df 49 8b 6c df 08 48 8d 7d 2c 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 88
[..]
Call Trace:
 nf_hook_slow+0xb1/0x170
 __br_forward+0x289/0x730
 maybe_deliver+0x24b/0x380
 br_flood+0xc6/0x390
 br_dev_xmit+0xa2e/0x12c0

For some reason ebtables rejects blobs that provide entry points that are
not supported by the table, but what it should instead reject is the
opposite: blobs that DO NOT provide an entry point supported by the table.

t->valid_hooks is the bitmask of hooks (input, forward ...) that will see
packets.  Providing an entry point that is not support is harmless
(never called/used), but the inverse isn't: it results in a crash
because the ebtables traverser doesn't expect a NULL blob for a location
its receiving packets for.

Instead of fixing all the individual checks, do what iptables is doing and
reject all blobs that differ from the expected hooks.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
22 months agonet: dsa: don't dereference NULL extack in dsa_slave_changeupper()
Vladimir Oltean [Fri, 19 Aug 2022 17:39:25 +0000 (20:39 +0300)]
net: dsa: don't dereference NULL extack in dsa_slave_changeupper()

When a driver returns -EOPNOTSUPP in dsa_port_bridge_join() but failed
to provide a reason for it, DSA attempts to set the extack to say that
software fallback will kick in.

The problem is, when we use brctl and the legacy bridge ioctls, the
extack will be NULL, and DSA dereferences it in the process of setting
it.

Sergei Antonov proves this using the following stack trace:

Unable to handle kernel NULL pointer dereference at virtual address 00000000
PC is at dsa_slave_changeupper+0x5c/0x158

 dsa_slave_changeupper from raw_notifier_call_chain+0x38/0x6c
 raw_notifier_call_chain from __netdev_upper_dev_link+0x198/0x3b4
 __netdev_upper_dev_link from netdev_master_upper_dev_link+0x50/0x78
 netdev_master_upper_dev_link from br_add_if+0x430/0x7f4
 br_add_if from br_ioctl_stub+0x170/0x530
 br_ioctl_stub from br_ioctl_call+0x54/0x7c
 br_ioctl_call from dev_ifsioc+0x4e0/0x6bc
 dev_ifsioc from dev_ioctl+0x2f8/0x758
 dev_ioctl from sock_ioctl+0x5f0/0x674
 sock_ioctl from sys_ioctl+0x518/0xe40
 sys_ioctl from ret_fast_syscall+0x0/0x1c

Fix the problem by only overriding the extack if non-NULL.

Fixes: 1c6e8088d9a7 ("net: dsa: allow port_bridge_join() to override extack message")
Link: https://lore.kernel.org/netdev/CABikg9wx7vB5eRDAYtvAm7fprJ09Ta27a4ZazC=NX5K4wn6pWA@mail.gmail.com/
Reported-by: Sergei Antonov <saproj@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Sergei Antonov <saproj@gmail.com>
Link: https://lore.kernel.org/r/20220819173925.3581871-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
22 months agonet: ipvtap - add __init/__exit annotations to module init/exit funcs
Maciej Żenczykowski [Sun, 21 Aug 2022 13:08:08 +0000 (06:08 -0700)]
net: ipvtap - add __init/__exit annotations to module init/exit funcs

Looks to have been left out in an oversight.

Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Sainath Grandhi <sainath.grandhi@intel.com>
Fixes: 235a9d89da97 ('ipvtap: IP-VLAN based tap driver')
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Link: https://lore.kernel.org/r/20220821130808.12143-1-zenczykowski@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>