linux-block.git
11 months agowifi: iwlwifi: add support for new ini region types
Miri Korenblit [Wed, 4 Oct 2023 09:36:21 +0000 (12:36 +0300)]
wifi: iwlwifi: add support for new ini region types

YoYo introduces 2 new region types: prph mac and phy blocks.
The data in this regions consists of a list of
(base address, size) pairs.
This way we can set a block of consecutive registers by the
base address and the size, instead of a list of registers.
Add support for parsing and dumping these new region types

Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20231004123422.0a10320f4259.I680ef6e16267d95329ee239f05d0999f5a1719ac@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: iwlwifi: Extract common prph mac/phy regions data dump logic
Miri Korenblit [Wed, 4 Oct 2023 09:36:20 +0000 (12:36 +0300)]
wifi: iwlwifi: Extract common prph mac/phy regions data dump logic

YoYo (debug data collection mechanism) is introducing 2 new types
of regions: prph mac/phy ranges.
These types will have a common logic of reading the data from the
device as the mac/phy prph has. Put it in a separate function so it can
be reused.

Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20231004123422.16f06414c65c.Ie911bc83a1e2f8fddb27b4c5bd24f933f8b674b6@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: nl80211: fix doc typos
Randy Dunlap [Sun, 1 Oct 2023 19:16:33 +0000 (12:16 -0700)]
wifi: nl80211: fix doc typos

Correct some typos.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: linux-wireless@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231001191633.19090-3-rdunlap@infradead.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: mac80211: fix header kernel-doc typos
Randy Dunlap [Sun, 1 Oct 2023 19:16:32 +0000 (12:16 -0700)]
wifi: mac80211: fix header kernel-doc typos

Correct typos and fix run-on sentences.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: linux-wireless@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231001191633.19090-2-rdunlap@infradead.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: cfg80211: fix header kernel-doc typos
Randy Dunlap [Sun, 1 Oct 2023 19:16:31 +0000 (12:16 -0700)]
wifi: cfg80211: fix header kernel-doc typos

Correct spelling of several words.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: linux-wireless@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231001191633.19090-1-rdunlap@infradead.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: mac80211: add link id to mgd_prepare_tx()
Miri Korenblit [Thu, 28 Sep 2023 14:35:34 +0000 (17:35 +0300)]
wifi: mac80211: add link id to mgd_prepare_tx()

As we are moving to MLO and links terms, also the airtime protection
will be done for a link rather than for a vif. Thus, some
drivers will need to know for which link to protect airtime.
Add link id as a parameter to the mgd_prepare_tx() callback.

Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230928172905.c7fc59a6780b.Ic88a5037d31e184a2dce0b031ece1a0a93a3a9da@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: mac80211: Check if we had first beacon with relevant links
Miri Korenblit [Wed, 4 Oct 2023 09:12:02 +0000 (12:12 +0300)]
wifi: mac80211: Check if we had first beacon with relevant links

If there is a disassoc before the fisrt beacon we need to protect a
session for the deauth frame. Currently we are checking if we had a
beacon in the default link, which is wrong in a MLO connection and
link id != 0.
Fix this by checking all the active links, if none had a beacon then
protect a session.
If at least one link had a beacon there is no need for session
protection.

Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20231004120820.d290f0ab77b0.Ic1505cf3d60f74580d31efa7e52046947c490b85@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: mac80211: flush STA queues on unauthorization
Johannes Berg [Thu, 28 Sep 2023 14:35:39 +0000 (17:35 +0300)]
wifi: mac80211: flush STA queues on unauthorization

When the station is marked as no longer authorized, we shouldn't
transmit to it any longer, but in particular we shouldn't be able
to transmit to it after removing keys, which might lead to frames
being sent out unencrypted depending on the exact hardware offload
mechanism. Thus, instead of flushing only on station destruction,
which covers only some cases, always flush on unauthorization.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230928172905.d47f528829e7.I96903652c7ee0c5c66891f8b2364383da8e45a1f@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: mac80211: purge TX queues in flush_queues flow
Miri Korenblit [Thu, 28 Sep 2023 14:35:38 +0000 (17:35 +0300)]
wifi: mac80211: purge TX queues in flush_queues flow

When this flow is invoked with the "drop" parameter as true,
we only drop the frames from the hw queues, but not from the
sw queues.
So when we call wake_queues() after hw queue purging, all the
frames from the sw queues will be TX'ed,
when what we actually want to do is to purge all queues
in order to not TX anything...
This can cause, for example, TXing data frames to the peer
after the deauth frame was sent.
Fix this by purging the sw queues in addition to the hw queues
if the drop parameter is true.

Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230928172905.8fc2ee23e56f.I8b3f6def9c28ea96261e2d31df8786986fb5385b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: cfg80211: wext: convert return value to kernel-doc
Johannes Berg [Thu, 28 Sep 2023 14:35:37 +0000 (17:35 +0300)]
wifi: cfg80211: wext: convert return value to kernel-doc

Since I'm getting a warning here right now, fix the
kernel-doc to be "Returns:" rather than just writing
that out in the doc paragraph.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230928172905.ab3b9274bf07.If263f9f6726d6ad4661f8603e6a4485e0385d67f@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: mac80211: fix a expired vs. cancel race in roc
Emmanuel Grumbach [Thu, 28 Sep 2023 14:35:36 +0000 (17:35 +0300)]
wifi: mac80211: fix a expired vs. cancel race in roc

When the remain on channel is removed at the time it should
have expired, we have a race: the driver could be handling
the flow of the expiration while mac80211 is cancelling
that very same remain on channel request.

This wouldn't be problem in itself, but since mac80211
can send the next request to the driver in the cancellation
flow, we can get to the following situation:

           CPU0                             CPU1
expiration of roc in driver
ieee80211_remain_on_channel_expired()
                                         Cancellation of the roc
schedules a worker (hw_roc_done)
                                         Add next roc
hw_roc_done_wk runs and ends
the second roc prematurely.

Since, by design, there is only one single request sent to the
driver at a time, we can safely assume that after the cancel()
request returns from the driver, we should not handle any worker
that handles the expiration of the request.

Cancel the hw_roc_done worker after the cancellation to make
sure we start the next one with a clean slate.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230928172905.4e4469be20ac.Iab0525f5cc4698acf23eab98b8b1eec02099cde0@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: mac80211: make mgd_protect_tdls_discover MLO-aware
Miri Korenblit [Thu, 28 Sep 2023 14:35:35 +0000 (17:35 +0300)]
wifi: mac80211: make mgd_protect_tdls_discover MLO-aware

Since userspace can choose now what link to establish the
TDLS on, we should know on what channel to do session protection.
Add a link id parameter to this callback.

Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230928172905.ef12ce3eb835.If864f406cfd9e24f36a2b88fd13a37328633fcf9@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: cfg80211: Fix typo in documentation
Ilan Peer [Thu, 28 Sep 2023 14:35:32 +0000 (17:35 +0300)]
wifi: cfg80211: Fix typo in documentation

Fix a small typo in a comment.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230928172905.9dce226e393f.I929bfb9371e31c9e8d2bb1c1a96e9b1f3d02f2d0@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: mac80211: Fix setting vif links
Ilan Peer [Thu, 28 Sep 2023 14:35:31 +0000 (17:35 +0300)]
wifi: mac80211: Fix setting vif links

When setting the interface links, ignore the change iff both the
valid links and the dormant links did not change. This is needed
to support cases where the valid links didn't change but the dormant
links did.

Fixes: 6d543b34dbcf ("wifi: mac80211: Support disabled links during association")
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230928172905.0357b6306587.I7dbfec347949b629fea680d246a650d6207ff217@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: cfg80211: Handle specific BSSID in 6GHz scanning
Ilan Peer [Thu, 28 Sep 2023 14:35:30 +0000 (17:35 +0300)]
wifi: cfg80211: Handle specific BSSID in 6GHz scanning

When the scan parameters for a 6GHz scan specify a unicast
BSSID address, and the corresponding AP is found in the scan
list, add a corresponding entry in the collocated AP list,
so this AP would be directly probed even if it was not
advertised as a collocated AP.

This is needed for handling a scan request that is intended
for a ML probe flow, where user space can requests a scan
to retrieve information for other links in the AP MLD.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230928172905.54b954bc02ad.I1c072793d3d77a4c8fbbc64b4db5cce1bbb00382@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: mac80211: mesh: fix some kdoc warnings
Benjamin Berg [Thu, 28 Sep 2023 14:35:29 +0000 (17:35 +0300)]
wifi: mac80211: mesh: fix some kdoc warnings

These were mostly missing or incorrectly tagged return values.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230928172905.33fea2968c62.I41d197b570370ab7cad1405518512fdd36e08717@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: cfg80211: Include operating class 137 in 6GHz band
Ilan Peer [Thu, 28 Sep 2023 14:35:28 +0000 (17:35 +0300)]
wifi: cfg80211: Include operating class 137 in 6GHz band

Draft P802.11be_D3.1 added operating class to describe 320 MHz
operation in the 6GHz band. Include this new operating class in
ieee80211_operating_class_to_band().

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230928172905.bed4a007d81b.I3eb4b8fe39c0c1a988c98a103b11a9f45a92b038@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: mac80211: Rename and update IEEE80211_VIF_DISABLE_SMPS_OVERRIDE
Ilan Peer [Thu, 28 Sep 2023 14:35:27 +0000 (17:35 +0300)]
wifi: mac80211: Rename and update IEEE80211_VIF_DISABLE_SMPS_OVERRIDE

EMLSR operation and SMPS operation cannot coexist. Thus, when EMLSR is
enabled, all SMPS signaling towards the AP should be stopped (it is
expected that the AP will consider SMPS to be off).

Rename IEEE80211_VIF_DISABLE_SMPS_OVERRIDE to IEEE80211_VIF_EML_ACTIVE
and use the flag as an indication from the driver that EMLSR is enabled.
When EMLSR is enabled SMPS flows towards the AP MLD should be stopped.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230928172905.fb2c2f9a0645.If6df5357568abd623a081f0f33b07e63fb8bba99@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: mac80211: handle debugfs when switching to/from MLO
Miri Korenblit [Thu, 28 Sep 2023 14:35:26 +0000 (17:35 +0300)]
wifi: mac80211: handle debugfs when switching to/from MLO

In MLO, we have a per-link debugfs directory which contains the
per-link files. In case of non-MLO we would like to put the per-link
files in the netdev directory to keep it how it was before MLO.

- Upon interface creation the netdev will be created with the per-link
  files in it.
- Upon switching to MLO: delete the entire netdev directory and then
  recreate it without the per-link files. Then the per-link directories
  with the per-link files in it will be created in ieee80211_link_init()
- Upon switching to non-MLO: delete the entire netdev directory
  (including the per-link directories) and recreate it with the per-link
  files in it.

Note that this also aligns to always call the vif link debugfs
method for the deflink as promised in the documentation, which
wasn't done before.

Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230928172905.082e698caca9.I5bef7b2026e0f58b4a958b3d1f459ac5baeccfc9@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: mac80211: add a driver callback to add vif debugfs
Miri Korenblit [Thu, 28 Sep 2023 14:35:25 +0000 (17:35 +0300)]
wifi: mac80211: add a driver callback to add vif debugfs

Add a callback which the driver can use to add the vif debugfs.
We used to have this back until commit d260ff12e776 ("mac80211:
remove vif debugfs driver callbacks") where we thought that it
will be easier to just add them during interface add/remove.

However, now with multi-link, we want to have proper debugfs
for drivers for multi-link where some files might be in the
netdev for non-MLO connections, and in the links for MLO ones,
so we need to do some reconstruction when switching the mode.

Moving to this new call enables that and MLO drivers will have
to use it for proper debugfs operation.

Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230928172905.ac38913f6ab7.Iee731d746bb08fcc628fa776f337016a12dc62ac@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: mac80211: don't recreate driver link debugfs in reconfig
Johannes Berg [Thu, 28 Sep 2023 14:35:24 +0000 (17:35 +0300)]
wifi: mac80211: don't recreate driver link debugfs in reconfig

We can delete any that we want to remove, but we can't
recreate the links as they already exist.

Fixes: 170cd6a66d9a ("wifi: mac80211: add netdev per-link debugfs data and driver hook")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230928172905.3d0214838421.I512a0ff86f631ff42bf25ea0cb2e8e8616794a94@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: mac80211: cleanup auth_data only if association continues
Benjamin Berg [Thu, 28 Sep 2023 14:35:22 +0000 (17:35 +0300)]
wifi: mac80211: cleanup auth_data only if association continues

If the association command fails then the authentication is still valid
and it makes sense to keep it alive. Otherwise, we would currently get
into an inconsistent state because mac80211 on the one hand is
disconnected but on the other hand the state is not entirely cleared
and a new authentication could not continue.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230928172905.c9855f46ebc8.I7f3dcd4120a186484a91b87560e9b7201d40984f@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: iwlwifi: bump FW API to 84 for AX/BZ/SC devices
Gregory Greenman [Tue, 26 Sep 2023 08:07:21 +0000 (11:07 +0300)]
wifi: iwlwifi: bump FW API to 84 for AX/BZ/SC devices

Start supporting API version 84 for new devices.

Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230926110319.eae20f9fdc06.Ifa9be6482121ea6df364bddc96ea6a7d101366b6@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: iwlwifi: mvm: use correct sta ID for IGTK/BIGTK
Johannes Berg [Tue, 26 Sep 2023 08:07:20 +0000 (11:07 +0300)]
wifi: iwlwifi: mvm: use correct sta ID for IGTK/BIGTK

We don't (yet) send the IGTK down to the firmware, but when
we do it needs to be with the broadcast station ID, not the
multicast station ID. Same for the BIGTK, which we may send
already if firmware advertises it (but it doesn't yet.)

Fixes: a5de7de7e78e ("wifi: iwlwifi: mvm: enable TX beacon protection")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230926110319.dbc653913353.I82e90c86010f0b9588a180d9835fd11f666f5196@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: iwlwifi: mvm: offload IGTK in AP if BIGTK is supported
Johannes Berg [Tue, 26 Sep 2023 08:07:19 +0000 (11:07 +0300)]
wifi: iwlwifi: mvm: offload IGTK in AP if BIGTK is supported

We can't really know easily if a BIGTK will be used, but
in case firmware supports BIGTK it also supports the very
easy IGTK use (nothing to do on the host), and requires
that we program both IGTK and BIGTK to be able to use the
BIGTK. Thus, change the condition here to set the keys in
firmware (both IGTK/BIGTK) if BIGTK is supported.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230926110319.425ebc1ce484.If485ec962636c23d463b678e7da86e11b6fa86c9@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: iwlwifi: mvm: fix removing pasn station for responder
Avraham Stern [Tue, 26 Sep 2023 08:07:18 +0000 (11:07 +0300)]
wifi: iwlwifi: mvm: fix removing pasn station for responder

In case of MLD operation the station should be removed using the
mld api.

Fixes: fd940de72d49 ("wifi: iwlwifi: mvm: FTM responder MLO support")
Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230926110319.7eb353abb95c.I2b30be09b99f5a2379956e010bafaa465ff053ba@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: iwlwifi: pcie: clean up WFPM control bits
Johannes Berg [Tue, 26 Sep 2023 08:07:17 +0000 (11:07 +0300)]
wifi: iwlwifi: pcie: clean up WFPM control bits

We define the same bit twice, remove the less precise
definition and use the better one instead.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230926110319.82d2744690b3.I90c08a27dca26a181dacb069184f39ece77849b5@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: iwlwifi: fix opmode start/stop race
Johannes Berg [Tue, 26 Sep 2023 08:07:16 +0000 (11:07 +0300)]
wifi: iwlwifi: fix opmode start/stop race

There's a race when the device is unbound (maybe because the
module is unloaded) while the opmode start hasn't finished yet.
The complete(request_firmware_complete) after the opmode start
was meant (and commented accordingly) to prevent this problem,
but it's not sufficient when the opmode module is loaded after
the firmware load already completed, which happens regularly
now because firmware load doesn't require userspace, unlike
module load.

Fix this by using the existing opmode registration mutex to
protected the start/stop flows against each other properly.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230926110319.85951554fed8.I62f20f40d79d0f136fa05e46d7fc16dc437fa3db@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: iwlwifi: skip opmode start retries on dead transport
Johannes Berg [Tue, 26 Sep 2023 08:07:15 +0000 (11:07 +0300)]
wifi: iwlwifi: skip opmode start retries on dead transport

These retries aren't going to succeed if the device was
deemed dead and needs to be unbound/rebound/... to be
recovered; skip the retries in that case.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230926110319.9f472069d75d.Ib6684c5b2ea8ed98f082c9b0e9bb2b03c3ea4fe3@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: iwlwifi: pcie: propagate iwl_pcie_gen2_apm_init() error
Johannes Berg [Tue, 26 Sep 2023 08:07:14 +0000 (11:07 +0300)]
wifi: iwlwifi: pcie: propagate iwl_pcie_gen2_apm_init() error

If iwl_pcie_gen2_apm_init() fails, we should propagate the
error code up, rather than ignoring it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230926110319.883768afe77b.Ic47cb8ce0a0abba3b4745cc2a721217c33360d6c@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: iwlwifi: mvm: update station's MFP flag after association
Avraham Stern [Tue, 26 Sep 2023 08:07:13 +0000 (11:07 +0300)]
wifi: iwlwifi: mvm: update station's MFP flag after association

The management frames protection flag is always set when the station
is not yet authorized. However, it was not cleared after association
even if the association did not use MFP. As a result, all public
action frames are not parsed by fw (which will cause FTM to fail,
for example). Update the station MFP flag after the station is
authorized.

Fixes: 4c8d5c8d079e ("wifi: iwlwifi: mvm: tell firmware about per-STA MFP enablement")
Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230926110319.2488cbd01bde.Ic0f08b7d3efcbdce27ec897f84d740fec8d169ef@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
11 months agowifi: wilc1000: use vmm_table as array in wilc struct
Ajay Singh [Tue, 17 Oct 2023 08:43:38 +0000 (10:43 +0200)]
wifi: wilc1000: use vmm_table as array in wilc struct

Enabling KASAN and running some iperf tests raises some memory issues with
vmm_table:

BUG: KASAN: slab-out-of-bounds in wilc_wlan_handle_txq+0x6ac/0xdb4
Write of size 4 at addr c3a61540 by task wlan0-tx/95

KASAN detects that we are writing data beyond range allocated to vmm_table.
There is indeed a mismatch between the size passed to allocator in
wilc_wlan_init, and the range of possible indexes used later: allocation
size is missing a multiplication by sizeof(u32)

Fixes: 40b717bfcefa ("wifi: wilc1000: fix DMA on stack objects")
Cc: stable@vger.kernel.org
Signed-off-by: Ajay Singh <ajay.kathat@microchip.com>
Signed-off-by: Alexis Lothoré <alexis.lothore@bootlin.com>
Reviewed-by: Michael Walle <mwalle@kernel.org>
Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231017-wilc1000_tx_oops-v3-1-b2155f1f7bee@bootlin.com
11 months agowifi: rtlwifi: drop chk_switch_dmdp() from HAL interface
Dmitry Antipov [Mon, 16 Oct 2023 13:59:10 +0000 (16:59 +0300)]
wifi: rtlwifi: drop chk_switch_dmdp() from HAL interface

Since there is no chip-specific code behind 'chk_switch_dmdp()',
there is no need to maintain function pointer in 'struct rtl_hal_ops'
and relevant common code may be simplified. Compile tested only.

Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231016135925.129223-3-dmantipov@yandex.ru
11 months agowifi: rtlwifi: drop fill_fake_txdesc() from HAL interface
Dmitry Antipov [Mon, 16 Oct 2023 13:59:09 +0000 (16:59 +0300)]
wifi: rtlwifi: drop fill_fake_txdesc() from HAL interface

Since 'fill_fake_txdesc()' is actually implemented for rtl8192cu
only but never used, there is no need to maintain function pointer
in 'struct rtl_hal_ops' and 'rtl92cu_fill_fake_txdesc()' may be
dropped. Compile tested only.

Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231016135925.129223-2-dmantipov@yandex.ru
11 months agowifi: rtlwifi: drop pre_fill_tx_bd_desc() from HAL interface
Dmitry Antipov [Mon, 16 Oct 2023 13:59:08 +0000 (16:59 +0300)]
wifi: rtlwifi: drop pre_fill_tx_bd_desc() from HAL interface

Since 'pre_fill_tx_bd_desc()' is actually used for rtl8192ee only,
there is no need to maintain function pointer in 'struct rtl_hal_ops',
and 'rtl92ee_pre_fill_tx_bd_desc()' may be converted to static.
Compile tested only.

Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231016135925.129223-1-dmantipov@yandex.ru
11 months agowifi: rtw89: move software DCFO compensation setting to proper position
Cheng-Chieh Hsieh [Mon, 16 Oct 2023 06:51:15 +0000 (14:51 +0800)]
wifi: rtw89: move software DCFO compensation setting to proper position

We need this register setting only for the software DCFO(digital carrier
frequency offset) compensation so we move it to the proper position to
prevent the incorrect setting.

Signed-off-by: Cheng-Chieh Hsieh <cj.hsieh@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231016065115.751662-6-pkshih@realtek.com
11 months agowifi: rtw89: correct the DCFO tracking flow to improve CFO compensation
Cheng-Chieh Hsieh [Mon, 16 Oct 2023 06:51:14 +0000 (14:51 +0800)]
wifi: rtw89: correct the DCFO tracking flow to improve CFO compensation

DCFO tracking compensate the CFO (carrier frequency offset) by digital
hardware that provides fine CFO estimation. Although the avg_cfo which
is a coarse information becomes zero, still we need DCFO tracking to
compensate the residual CFO. However, the original flow skips the case
when avg_cfo is zero, so we fix it to have expected performance.

Signed-off-by: Cheng-Chieh Hsieh <cj.hsieh@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231016065115.751662-5-pkshih@realtek.com
11 months agowifi: rtw89: modify the register setting and the flow of CFO tracking
Cheng-Chieh Hsieh [Mon, 16 Oct 2023 06:51:13 +0000 (14:51 +0800)]
wifi: rtw89: modify the register setting and the flow of CFO tracking

The register address used for CFO(carrier frequency offset) tracking is
different from WiFi 7 series, so we change the way to access it. And we
refine the flow of CFO tracking to compatible all WiFi 7 and 6 ICs.

Signed-off-by: Cheng-Chieh Hsieh <cj.hsieh@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231016065115.751662-4-pkshih@realtek.com
11 months agowifi: rtw89: phy: generalize valid bit of BSS color
Ping-Ke Shih [Mon, 16 Oct 2023 06:51:12 +0000 (14:51 +0800)]
wifi: rtw89: phy: generalize valid bit of BSS color

The register fields of BSS color map and valid bit are in the same register
for existing chips, but coming WiFi 7 chips define another register to
set valid bit, so add a field to chip_info to reuse the code.

Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231016065115.751662-3-pkshih@realtek.com
11 months agowifi: rtw89: phy: change naming related BT coexistence functions
Chung-Hsuan Hung [Mon, 16 Oct 2023 06:51:11 +0000 (14:51 +0800)]
wifi: rtw89: phy: change naming related BT coexistence functions

Change naming to disambiguate the functions because their names are common
and not clear about the purpose. Not change logic at all.

These functions are to control baseband AGC while BT coexists with WiFi.
Among these functions, ctrl_btg_bt_rx is used to control AGC related
settings, which is affected by BT RX, while BT shares the same path
with wifi; ctrl_nbtg_bt_tx is used to control AGC settings under
non-shared path condition, which is affected by BT TX.

Signed-off-by: Chung-Hsuan Hung <hsuan8331@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231016065115.751662-2-pkshih@realtek.com
11 months agowifi: rtw88: dump firmware debug information in abnormal state
Chin-Yen Lee [Mon, 16 Oct 2023 05:35:54 +0000 (13:35 +0800)]
wifi: rtw88: dump firmware debug information in abnormal state

Sometimes firmware may enter strange state or infinite
loop due to unknown bug, and then it will lead critical
function fail, such as sending H2C command or changing
power mode. In these abnormal states, we add more debug
information, including hardware register status, to help
further investigation.

Signed-off-by: Chin-Yen Lee <timlee@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231016053554.744180-3-pkshih@realtek.com
11 months agowifi: rtw88: debug: add to check if debug mask is enabled
Chin-Yen Lee [Mon, 16 Oct 2023 05:35:53 +0000 (13:35 +0800)]
wifi: rtw88: debug: add to check if debug mask is enabled

The coming dump function for FW malfunction will add a function to
dump registers to reflect status. However, if we are not debugging
the mechanism, we don't print anything, so avoid reading registers by
checking debug mask to reduce IO.

Signed-off-by: Chin-Yen Lee <timlee@realtek.com>
Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231016053554.744180-2-pkshih@realtek.com
11 months agowifi: rtlwifi: cleanup struct rtl_ps_ctl
Dmitry Antipov [Fri, 13 Oct 2023 12:45:31 +0000 (15:45 +0300)]
wifi: rtlwifi: cleanup struct rtl_ps_ctl

Remove set but otherwise unused 'sleep_ms', 'last_action', 'state'
and 'last_slept' members of 'struct rtl_ps_ctl' (these seems to be
a leftovers from some older code) and adjust 'rtl_swlps_wq_callback()'
accordingly. Compile tested only.

Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231013124534.19714-1-dmantipov@yandex.ru
11 months agossb: relax SSB_EMBEDDED dependencies
Randy Dunlap [Thu, 12 Oct 2023 22:08:55 +0000 (15:08 -0700)]
ssb: relax SSB_EMBEDDED dependencies

This is a kconfig warning in a randconfig when CONFIG_PCI is not set:

WARNING: unmet direct dependencies detected for SSB_EMBEDDED
  Depends on [n]: SSB [=y] && SSB_DRIVER_MIPS [=y] && SSB_PCICORE_HOSTMODE [=n]
  Selected by [y]:
  - BCM47XX_SSB [=y] && BCM47XX [=y]

This is caused by arch/mips/bcm47xx/Kconfig's symbol BCM47XX_SSB
selecting SSB_EMBEDDED when CONFIG_PCI is not set.

This warning can be prevented by altering SSB_EMBEDDED to allow for
PCI=n or the former SSB_PCICORE_HOSTMODE.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Michael Büsch <m@bues.ch>
Cc: linux-wireless@vger.kernel.org
Cc: Kalle Valo <kvalo@kernel.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20231012220856.23260-1-rdunlap@infradead.org
11 months agoMerge tag 'mlx5-updates-2023-10-10' of git://git.kernel.org/pub/scm/linux/kernel...
Jakub Kicinski [Wed, 18 Oct 2023 01:27:26 +0000 (18:27 -0700)]
Merge tag 'mlx5-updates-2023-10-10' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2023-10-10

1) Adham Faris, Increase max supported channels number to 256

2) Leon Romanovsky, Allow IPsec soft/hard limits in bytes

3) Shay Drory, Replace global mlx5_intf_lock with
   HCA devcom component lock

4) Wei Zhang, Optimize SF creation flow

During SF creation, HCA state gets changed from INVALID to
IN_USE step by step. Accordingly, FW sends vhca event to
driver to inform about this state change asynchronously.
Each vhca event is critical because all related SW/FW
operations are triggered by it.

Currently there is only a single mlx5 general event handler
which not only handles vhca event but many other events.
This incurs huge bottleneck because all events are forced
to be handled in serial manner.

Moreover, all SFs share same table_lock which inevitably
impacts each other when they are created in parallel.

This series will solve this issue by:

1. A dedicated vhca event handler is introduced to eliminate
   the mutual impact with other mlx5 events.
2. Max FW threads work queues are employed in the vhca event
   handler to fully utilize FW capability.
3. Redesign SF active work logic to completely remove
   table_lock.

With above optimization, SF creation time is reduced by 25%,
i.e. from 80s to 60s when creating 100 SFs.

Patches summary:

Patch 1 - implement dedicated vhca event handler with max FW
          cmd threads of work queues.
Patch 2 - remove table_lock by redesigning SF active work
          logic.

* tag 'mlx5-updates-2023-10-10' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5e: Allow IPsec soft/hard limits in bytes
  net/mlx5e: Increase max supported channels number to 256
  net/mlx5e: Preparations for supporting larger number of channels
  net/mlx5e: Refactor mlx5e_rss_init() and mlx5e_rss_free() API's
  net/mlx5e: Refactor mlx5e_rss_set_rxfh() and mlx5e_rss_get_rxfh()
  net/mlx5e: Refactor rx_res_init() and rx_res_free() APIs
  net/mlx5e: Use PTR_ERR_OR_ZERO() to simplify code
  net/mlx5: Use PTR_ERR_OR_ZERO() to simplify code
  net/mlx5: fix config name in Kconfig parameter documentation
  net/mlx5: Remove unused declaration
  net/mlx5: Replace global mlx5_intf_lock with HCA devcom component lock
  net/mlx5: Refactor LAG peer device lookout bus logic to mlx5 devcom
  net/mlx5: Avoid false positive lockdep warning by adding lock_class_key
  net/mlx5: Redesign SF active work to remove table_lock
  net/mlx5: Parallelize vhca event handling
====================

Link: https://lore.kernel.org/r/20231014171908.290428-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agohamradio: replace deprecated strncpy with strscpy_pad
Justin Stitt [Mon, 16 Oct 2023 18:42:42 +0000 (18:42 +0000)]
hamradio: replace deprecated strncpy with strscpy_pad

strncpy() is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.

We expect both hi.data.modename and hi.data.drivername to be
NUL-terminated based on its usage with sprintf:
|       sprintf(hi.data.modename, "%sclk,%smodem,fclk=%d,bps=%d%s",
|               bc->cfg.intclk ? "int" : "ext",
|               bc->cfg.extmodem ? "ext" : "int", bc->cfg.fclk, bc->cfg.bps,
|               bc->cfg.loopback ? ",loopback" : "");

Note that this data is copied out to userspace with:
|       if (copy_to_user(data, &hi, sizeof(hi)))
... however, the data was also copied FROM the user here:
|       if (copy_from_user(&hi, data, sizeof(hi)))

Considering the above, a suitable replacement is strscpy_pad() as it
guarantees NUL-termination on the destination buffer while also
NUL-padding (which is good+wanted behavior when copying data to
userspace).

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings
Link: https://github.com/KSPP/linux/issues/90
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231016-strncpy-drivers-net-hamradio-baycom_epp-c-v2-1-39f72a72de30@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agodocs: netlink: clean up after deprecating version
Jakub Kicinski [Mon, 16 Oct 2023 21:45:40 +0000 (14:45 -0700)]
docs: netlink: clean up after deprecating version

Jiri moved version to legacy specs in commit 0f07415ebb78 ("netlink:
specs: don't allow version to be specified for genetlink").
Update the documentation.

Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20231016214540.1822392-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agotools: ynl: fix converting flags to names after recent cleanup
Jakub Kicinski [Mon, 16 Oct 2023 21:39:37 +0000 (14:39 -0700)]
tools: ynl: fix converting flags to names after recent cleanup

I recently cleaned up specs to not specify enum-as-flags
when target enum is already defined as flags.
YNL Python library did not convert flags, unfortunately,
so this caused breakage for Stan and Willem.

Note that the nlspec.py abstraction already hides the differences
between flags and enums (value vs user_value), so the changes
are pretty trivial.

Fixes: 0629f22ec130 ("ynl: netdev: drop unnecessary enum-as-flags")
Reported-and-tested-by: Willem de Bruijn <willemb@google.com>
Reported-and-tested-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/all/ZS10NtQgd_BJZ3RU@google.com/
Link: https://lore.kernel.org/r/20231016213937.1820386-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge branch 'net-remove-last-of-the-phylink-validate-methods-and-clean-up'
Jakub Kicinski [Wed, 18 Oct 2023 00:51:53 +0000 (17:51 -0700)]
Merge branch 'net-remove-last-of-the-phylink-validate-methods-and-clean-up'

Russell King says:

====================
net: remove last of the phylink validate methods and clean up

This four patch series removes the last of the phylink MAC .validate
methods which can be found in the Freescale fman driver. fman has a
requirement that half duplex may not be supported in RGMII mode,
which is currently handled in its .validate method.

In order to keep this functionality when removing the .validate method,
we need to replace that with equivalent functionality, for which I
propose the optional .mac_get_caps method in the first patch.

The advantage of this approach over the .validate callback is that MAC
drivers only have to deal with the MAC_* capabilities, and don't need
to call back into phylink functions to do the masking of the ethtool
linkmodes etc - which then becomes internal to phylink. This can be
seen in the fourth patch where we make a load of these methods static.
====================

Link: https://lore.kernel.org/r/ZS1Z5DDfHyjMryYu@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: phylink: remove a bunch of unused validation methods
Russell King (Oracle) [Mon, 16 Oct 2023 15:43:08 +0000 (16:43 +0100)]
net: phylink: remove a bunch of unused validation methods

Remove exports for phylink_caps_to_linkmodes(),
phylink_get_capabilities(), phylink_validate_mask_caps() and
phylink_generic_validate(). Also, as phylink_generic_validate() is no
longer called, we can remove its implementation as well.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1qsPkK-009wip-W9@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: phylink: remove .validate() method
Russell King (Oracle) [Mon, 16 Oct 2023 15:43:03 +0000 (16:43 +0100)]
net: phylink: remove .validate() method

The MAC .validate() method is no longer used, so remove it from the
phylink_mac_ops structure, and remove the callsite in
phylink_validate_mac_and_pcs().

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1qsPkF-009wij-QM@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: fman: convert to .mac_get_caps()
Russell King (Oracle) [Mon, 16 Oct 2023 15:42:58 +0000 (16:42 +0100)]
net: fman: convert to .mac_get_caps()

Convert fman to use the .mac_get_caps() method rather than the
.validate() method.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Sean Anderson <sean.anderson@seco.com>
Link: https://lore.kernel.org/r/E1qsPkA-009wid-Kv@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: phylink: provide mac_get_caps() method
Russell King (Oracle) [Mon, 16 Oct 2023 15:42:53 +0000 (16:42 +0100)]
net: phylink: provide mac_get_caps() method

Provide a new method, mac_get_caps() to get the MAC capabilities for
the specified interface mode. This is for MACs which have special
requirements, such as not supporting half-duplex in certain interface
modes, and will replace the validate() method.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1qsPk5-009wiX-G5@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoeth: bnxt: fix backward compatibility with older devices
Jakub Kicinski [Mon, 16 Oct 2023 17:16:40 +0000 (10:16 -0700)]
eth: bnxt: fix backward compatibility with older devices

Recent FW interface update bumped the size of struct hwrm_func_cfg_input
above 128B which is the max some devices support.

Probe on Stratus (BCM957452) with FW 20.8.3.11 fails with:

   bnxt_en ...: Unable to reserve tx rings
   bnxt_en ...: 2nd rings reservation failed.
   bnxt_en ...: Not enough rings available.

Once probe is fixed other errors pop up:

   bnxt_en ...: Failed to set async event completion ring.

This is because __hwrm_send() rejects requests larger than
bp->hwrm_max_ext_req_len with -E2BIG. Since the driver doesn't
actually access any of the new fields, yet, trim the length.
It should be safe.

Similar workaround exists for backing_store_cfg_input.
Although that one mins() to a constant of 256, not 128
we'll effectively use here. Michael explains: "the backing
store cfg command is supported by relatively newer firmware
that will accept 256 bytes at least."

To make debugging easier in the future add a warning
for oversized requests.

Fixes: 754fbf604ff6 ("bnxt_en: Update firmware interface to 1.10.2.171")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20231016171640.1481493-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge branch 'bridge-add-a-limit-on-learned-fdb-entries'
Jakub Kicinski [Wed, 18 Oct 2023 00:39:03 +0000 (17:39 -0700)]
Merge branch 'bridge-add-a-limit-on-learned-fdb-entries'

Johannes Nixdorf says:

====================
bridge: Add a limit on learned FDB entries

Introduce a limit on the amount of learned FDB entries on a bridge,
configured by netlink with a build time default on bridge creation in
the kernel config.

For backwards compatibility the kernel config default is disabling the
limit (0).

Without any limit a malicious actor may OOM a kernel by spamming packets
with changing MAC addresses on their bridge port, so allow the bridge
creator to limit the number of entries.

Currently the manual entries are identified by the bridge flags
BR_FDB_LOCAL or BR_FDB_ADDED_BY_USER, atomically bundled under the new
flag BR_FDB_DYNAMIC_LEARNED. This means the limit also applies to
entries created with BR_FDB_ADDED_BY_EXT_LEARN but none of BR_FDB_LOCAL
or BR_FDB_ADDED_BY_USER, e.g. ones added by SWITCHDEV_FDB_ADD_TO_BRIDGE.

Link to the corresponding iproute2 changes:
https://lore.kernel.org/r/20230919-fdb_limit-v4-1-b4d2dc4df30f@avm.de

v4: https://lore.kernel.org/r/20230919-fdb_limit-v4-0-39f0293807b8@avm.de/
v3: https://lore.kernel.org/r/20230905-fdb_limit-v3-0-7597cd500a82@avm.de/
v2: https://lore.kernel.org/netdev/20230619071444.14625-1-jnixdorf-oss@avm.de/
v1: https://lore.kernel.org/netdev/20230515085046.4457-1-jnixdorf-oss@avm.de/
====================

Link: https://lore.kernel.org/r/20231016-fdb_limit-v5-0-32cddff87758@avm.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoselftests: forwarding: bridge_fdb_learning_limit: Add a new selftest
Johannes Nixdorf [Mon, 16 Oct 2023 13:27:24 +0000 (15:27 +0200)]
selftests: forwarding: bridge_fdb_learning_limit: Add a new selftest

Add a suite covering the fdb_n_learned and fdb_max_learned bridge
features, touching all special cases in accounting at least once.

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Johannes Nixdorf <jnixdorf-oss@avm.de>
Link: https://lore.kernel.org/r/20231016-fdb_limit-v5-5-32cddff87758@avm.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: bridge: Set strict_start_type for br_policy
Johannes Nixdorf [Mon, 16 Oct 2023 13:27:23 +0000 (15:27 +0200)]
net: bridge: Set strict_start_type for br_policy

Set any new attributes added to br_policy to be parsed strictly, to
prevent userspace from passing garbage.

Signed-off-by: Johannes Nixdorf <jnixdorf-oss@avm.de>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20231016-fdb_limit-v5-4-32cddff87758@avm.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: bridge: Add netlink knobs for number / max learned FDB entries
Johannes Nixdorf [Mon, 16 Oct 2023 13:27:22 +0000 (15:27 +0200)]
net: bridge: Add netlink knobs for number / max learned FDB entries

The previous patch added accounting and a limit for the number of
dynamically learned FDB entries per bridge. However it did not provide
means to actually configure those bounds or read back the count. This
patch does that.

Two new netlink attributes are added for the accounting and limit of
dynamically learned FDB entries:
 - IFLA_BR_FDB_N_LEARNED (RO) for the number of entries accounted for
   a single bridge.
 - IFLA_BR_FDB_MAX_LEARNED (RW) for the configured limit of entries for
   the bridge.

The new attributes are used like this:

 # ip link add name br up type bridge fdb_max_learned 256
 # ip link add name v1 up master br type veth peer v2
 # ip link set up dev v2
 # mausezahn -a rand -c 1024 v2
 0.01 seconds (90877 packets per second
 # bridge fdb | grep -v permanent | wc -l
 256
 # ip -d link show dev br
 13: br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 [...]
     [...] fdb_n_learned 256 fdb_max_learned 256

Signed-off-by: Johannes Nixdorf <jnixdorf-oss@avm.de>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20231016-fdb_limit-v5-3-32cddff87758@avm.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: bridge: Track and limit dynamically learned FDB entries
Johannes Nixdorf [Mon, 16 Oct 2023 13:27:21 +0000 (15:27 +0200)]
net: bridge: Track and limit dynamically learned FDB entries

A malicious actor behind one bridge port may spam the kernel with packets
with a random source MAC address, each of which will create an FDB entry,
each of which is a dynamic allocation in the kernel.

There are roughly 2^48 different MAC addresses, further limited by the
rhashtable they are stored in to 2^31. Each entry is of the type struct
net_bridge_fdb_entry, which is currently 128 bytes big. This means the
maximum amount of memory allocated for FDB entries is 2^31 * 128B =
256GiB, which is too much for most computers.

Mitigate this by maintaining a per bridge count of those automatically
generated entries in fdb_n_learned, and a limit in fdb_max_learned. If
the limit is hit new entries are not learned anymore.

For backwards compatibility the default setting of 0 disables the limit.

User-added entries by netlink or from bridge or bridge port addresses
are never blocked and do not count towards that limit.

Introduce a new fdb entry flag BR_FDB_DYNAMIC_LEARNED to keep track of
whether an FDB entry is included in the count. The flag is enabled for
dynamically learned entries, and disabled for all other entries. This
should be equivalent to BR_FDB_ADDED_BY_USER and BR_FDB_LOCAL being unset,
but contrary to the two flags it can be toggled atomically.

Atomicity is required here, as there are multiple callers that modify the
flags, but are not under a common lock (br_fdb_update is the exception
for br->hash_lock, br_fdb_external_learn_add for RTNL).

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Johannes Nixdorf <jnixdorf-oss@avm.de>
Link: https://lore.kernel.org/r/20231016-fdb_limit-v5-2-32cddff87758@avm.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: bridge: Set BR_FDB_ADDED_BY_USER early in fdb_add_entry
Johannes Nixdorf [Mon, 16 Oct 2023 13:27:20 +0000 (15:27 +0200)]
net: bridge: Set BR_FDB_ADDED_BY_USER early in fdb_add_entry

In preparation of the following fdb limit for dynamically learned entries,
allow fdb_create to detect that the entry was added by the user. This
way it can skip applying the limit in this case.

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Johannes Nixdorf <jnixdorf-oss@avm.de>
Link: https://lore.kernel.org/r/20231016-fdb_limit-v5-1-32cddff87758@avm.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge tag 'wireless-next-2023-10-16' of git://git.kernel.org/pub/scm/linux/kernel...
Jakub Kicinski [Tue, 17 Oct 2023 23:52:53 +0000 (16:52 -0700)]
Merge tag 'wireless-next-2023-10-16' of git://git./linux/kernel/git/wireless/wireless-next

Kalle Valo says:

====================
wireless-next patches for v6.7

The second pull request for v6.7, with only driver changes this time.
We have now support for mt7925 PCIe and USB variants, few new features
and of course some fixes.

Major changes:

mt76
 - mt7925 support

ath12k
 - read board data variant name from SMBIOS

wfx
 - Remain-On-Channel (ROC) support

* tag 'wireless-next-2023-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (109 commits)
  wifi: rtw89: mac: do bf_monitor only if WiFi 6 chips
  wifi: rtw89: mac: set bf_assoc capabilities according to chip gen
  wifi: rtw89: mac: set bfee_ctrl() according to chip gen
  wifi: rtw89: mac: add registers of MU-EDCA parameters for WiFi 7 chips
  wifi: rtw89: mac: generalize register of MU-EDCA switch according to chip gen
  wifi: rtw89: mac: update RTS threshold according to chip gen
  wifi: rtlwifi: simplify TX command fill callbacks
  wifi: hostap: remove unused ioctl function
  wifi: atmel: remove unused ioctl function
  wifi: rtw89: coex: add annotation __counted_by() to struct rtw89_btc_btf_set_mon_reg
  wifi: rtw89: coex: add annotation __counted_by() for struct rtw89_btc_btf_set_slot_table
  wifi: rtw89: add EHT radiotap in monitor mode
  wifi: rtw89: show EHT rate in debugfs
  wifi: rtw89: parse TX EHT rate selected by firmware from RA C2H report
  wifi: rtw89: Add EHT rate mask as parameters of RA H2C command
  wifi: rtw89: parse EHT information from RX descriptor and PPDU status packet
  wifi: radiotap: add bandwidth definition of EHT U-SIG
  wifi: rtlwifi: use convenient list_count_nodes()
  wifi: p54: Annotate struct p54_cal_database with __counted_by
  wifi: brcmfmac: fweh: Add __counted_by for struct brcmf_fweh_queue_item and use struct_size()
  ...
====================

Link: https://lore.kernel.org/r/20231016143822.880D8C433C8@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: openvswitch: Annotate struct mask_array with __counted_by
Christophe JAILLET [Sat, 14 Oct 2023 06:34:53 +0000 (08:34 +0200)]
net: openvswitch: Annotate struct mask_array with __counted_by

Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/ca5c8049f58bb933f231afd0816e30a5aaa0eddd.1697264974.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agonet: openvswitch: Use struct_size()
Christophe JAILLET [Sat, 14 Oct 2023 06:34:52 +0000 (08:34 +0200)]
net: openvswitch: Use struct_size()

Use struct_size() instead of hand writing it.
This is less verbose and more robust.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/e5122b4ff878cbf3ed72653a395ad5c4da04dc1e.1697264974.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agoMerge branch 'i3c-mctp-net-driver'
Paolo Abeni [Tue, 17 Oct 2023 10:47:17 +0000 (12:47 +0200)]
Merge branch 'i3c-mctp-net-driver'

Matt Johnston says:

====================
I3C MCTP net driver

This series adds an I3C transport for the kernel's MCTP network
protocol. MCTP is a communication protocol between system components
(BMCs, drives, NICs etc), with higher level protocols such as NVMe-MI or
PLDM built on top of it (in userspace). It runs over various transports
such as I2C, PCIe, or I3C.

The mctp-i3c driver follows a similar approach to the kernel's existing
mctp-i2c driver, creating a "mctpi3cX" network interface for each
numbered I3C bus. Busses opt in to support by adding a "mctp-controller"
property to the devicetree:

&i3c0 {
        mctp-controller;
}

The driver will bind to MCTP class devices (DCR 0xCC) that are on a
supported I3C bus. Each bus is represented by a `struct mctp_i3c_bus`
that keeps state for the network device. An individual I3C device
(struct mctp_i3c_device) performs operations using the "parent"
mctp_i3c_bus object. The I3C notify/enumeration patch is needed so that
the mctp-i3c driver can handle creating/removing mctp_i3c_bus objects as
required.

The mctp-i3c driver is using the Provisioned ID as an identifier for
target I3C devices (the neighbour address), as that will be more stable
than the I3C dynamic address. The driver internally translates that to a
dynamic address for bus operations.

The driver has been tested using an AST2600 platform. A remote endpoint
has been tested against QEMU, as well as using the target mode support
in Aspeed's vendor tree.

I3C maintainers have acked merging this through net-next tree.
====================

Link: https://lore.kernel.org/r/20231013040628.354323-1-matt@codeconstruct.com.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agomctp i3c: MCTP I3C driver
Matt Johnston [Fri, 13 Oct 2023 04:06:25 +0000 (12:06 +0800)]
mctp i3c: MCTP I3C driver

Provides MCTP network transport over an I3C bus, as specified in
DMTF DSP0233.

Each I3C bus (with "mctp-controller" devicetree property) gets an
"mctpi3cX" net device created. I3C devices are reachable as remote
endpoints through that net device. Link layer addressing uses the
I3C PID as a fixed hardware address for neighbour table entries.

The driver matches I3C devices that have the MIPI assigned DCR 0xCC for
MCTP.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agoi3c: Add support for bus enumeration & notification
Jeremy Kerr [Fri, 13 Oct 2023 04:06:24 +0000 (12:06 +0800)]
i3c: Add support for bus enumeration & notification

This allows other drivers to be notified when new i3c busses are
attached, referring to a whole i3c bus as opposed to individual
devices.

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agodt-bindings: i3c: Add mctp-controller property
Matt Johnston [Fri, 13 Oct 2023 04:06:23 +0000 (12:06 +0800)]
dt-bindings: i3c: Add mctp-controller property

This property is used to describe a I3C bus with attached MCTP I3C
target devices.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 months agonet: gso_test: release each segment individually
Florian Westphal [Thu, 12 Oct 2023 12:02:37 +0000 (14:02 +0200)]
net: gso_test: release each segment individually

consume_skb() doesn't walk the segment list, so segments other than
the first are leaked.

Move this skb_consume call into the loop.

Cc: Willem de Bruijn <willemb@google.com>
Fixes: b3098d32ed6e ("net: add skb_segment kunit test")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 months agoMerge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf...
Jakub Kicinski [Tue, 17 Oct 2023 04:05:32 +0000 (21:05 -0700)]
Merge tag 'for-netdev' of https://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-10-16

We've added 90 non-merge commits during the last 25 day(s) which contain
a total of 120 files changed, 3519 insertions(+), 895 deletions(-).

The main changes are:

1) Add missed stats for kprobes to retrieve the number of missed kprobe
   executions and subsequent executions of BPF programs, from Jiri Olsa.

2) Add cgroup BPF sockaddr hooks for unix sockets. The use case is
   for systemd to reimplement the LogNamespace feature which allows
   running multiple instances of systemd-journald to process the logs
   of different services, from Daan De Meyer.

3) Implement BPF CPUv4 support for s390x BPF JIT, from Ilya Leoshkevich.

4) Improve BPF verifier log output for scalar registers to better
   disambiguate their internal state wrt defaults vs min/max values
   matching, from Andrii Nakryiko.

5) Extend the BPF fib lookup helpers for IPv4/IPv6 to support retrieving
   the source IP address with a new BPF_FIB_LOOKUP_SRC flag,
   from Martynas Pumputis.

6) Add support for open-coded task_vma iterator to help with symbolization
   for BPF-collected user stacks, from Dave Marchevsky.

7) Add libbpf getters for accessing individual BPF ring buffers which
   is useful for polling them individually, for example, from Martin Kelly.

8) Extend AF_XDP selftests to validate the SHARED_UMEM feature,
   from Tushar Vyavahare.

9) Improve BPF selftests cross-building support for riscv arch,
   from Björn Töpel.

10) Add the ability to pin a BPF timer to the same calling CPU,
   from David Vernet.

11) Fix libbpf's bpf_tracing.h macros for riscv to use the generic
   implementation of PT_REGS_SYSCALL_REGS() to access syscall arguments,
   from Alexandre Ghiti.

12) Extend libbpf to support symbol versioning for uprobes, from Hengqi Chen.

13) Fix bpftool's skeleton code generation to guarantee that ELF data
    is 8 byte aligned, from Ian Rogers.

14) Inherit system-wide cpu_mitigations_off() setting for Spectre v1/v4
    security mitigations in BPF verifier, from Yafang Shao.

15) Annotate struct bpf_stack_map with __counted_by attribute to prepare
    BPF side for upcoming __counted_by compiler support, from Kees Cook.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (90 commits)
  bpf: Ensure proper register state printing for cond jumps
  bpf: Disambiguate SCALAR register state output in verifier logs
  selftests/bpf: Make align selftests more robust
  selftests/bpf: Improve missed_kprobe_recursion test robustness
  selftests/bpf: Improve percpu_alloc test robustness
  selftests/bpf: Add tests for open-coded task_vma iter
  bpf: Introduce task_vma open-coded iterator kfuncs
  selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c
  bpf: Don't explicitly emit BTF for struct btf_iter_num
  bpf: Change syscall_nr type to int in struct syscall_tp_t
  net/bpf: Avoid unused "sin_addr_len" warning when CONFIG_CGROUP_BPF is not set
  bpf: Avoid unnecessary audit log for CPU security mitigations
  selftests/bpf: Add tests for cgroup unix socket address hooks
  selftests/bpf: Make sure mount directory exists
  documentation/bpf: Document cgroup unix socket address hooks
  bpftool: Add support for cgroup unix socket address hooks
  libbpf: Add support for cgroup unix socket address hooks
  bpf: Implement cgroup sockaddr hooks for unix sockets
  bpf: Add bpf_sock_addr_set_sun_path() to allow writing unix sockaddr from bpf
  bpf: Propagate modified uaddrlen from cgroup sockaddr programs
  ...
====================

Link: https://lore.kernel.org/r/20231016204803.30153-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agopage_pool: fragment API support for 32-bit arch with 64-bit DMA
Yunsheng Lin [Fri, 13 Oct 2023 06:48:21 +0000 (14:48 +0800)]
page_pool: fragment API support for 32-bit arch with 64-bit DMA

Currently page_pool_alloc_frag() is not supported in 32-bit
arch with 64-bit DMA because of the overlap issue between
pp_frag_count and dma_addr_upper in 'struct page' for those
arches, which seems to be quite common, see [1], which means
driver may need to handle it when using fragment API.

It is assumed that the combination of the above arch with an
address space >16TB does not exist, as all those arches have
64b equivalent, it seems logical to use the 64b version for a
system with a large address space. It is also assumed that dma
address is page aligned when we are dma mapping a page aligned
buffer, see [2].

That means we're storing 12 bits of 0 at the lower end for a
dma address, we can reuse those bits for the above arches to
support 32b+12b, which is 16TB of memory.

If we make a wrong assumption, a warning is emitted so that
user can report to us.

1. https://lore.kernel.org/all/20211117075652.58299-1-linyunsheng@huawei.com/
2. https://lore.kernel.org/all/20230818145145.4b357c89@kernel.org/

Tested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Guillaume Tucker <guillaume.tucker@collabora.com>
CC: Matthew Wilcox <willy@infradead.org>
CC: Linux-MM <linux-mm@kvack.org>
Link: https://lore.kernel.org/r/20231013064827.61135-2-linyunsheng@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: stub tcp_gro_complete if CONFIG_INET=n
Jacob Keller [Fri, 13 Oct 2023 18:54:50 +0000 (11:54 -0700)]
net: stub tcp_gro_complete if CONFIG_INET=n

A few networking drivers including bnx2x, bnxt, qede, and idpf call
tcp_gro_complete as part of offloading TCP GRO. The function is only
defined if CONFIG_INET is true, since its TCP specific and is meaningless
if the kernel lacks IP networking support.

The combination of trying to use the complex network drivers with
CONFIG_NET but not CONFIG_INET is rather unlikely in practice: most use
cases are going to need IP networking.

The tcp_gro_complete function just sets some data in the socket buffer for
use in processing the TCP packet in the event that the GRO was offloaded to
the device. If the kernel lacks TCP support, such setup will simply go
unused.

The bnx2x, bnxt, and qede drivers wrap their TCP offload support in
CONFIG_INET checks and skip handling on such kernels.

The idpf driver did not check CONFIG_INET and thus fails to link if the
kernel is configured  with CONFIG_NET=y, CONFIG_IDPF=(m|y), and
CONFIG_INET=n.

While checking CONFIG_INET does allow the driver to bypass significantly
more instructions in the event that we know TCP networking isn't supported,
the configuration is unlikely to be used widely.

Rather than require driver authors to care about this, stub the
tcp_gro_complete function when CONFIG_INET=n. This allows drivers to be
left as-is. It does mean the idpf driver will perform slightly more work
than strictly necessary when CONFIG_INET=n, since it will still execute
some of the skb setup in idpf_rx_rsc. However, that work would be performed
in the case where CONFIG_INET=y anyways.

I did not change the existing drivers, since they appear to wrap a
significant portion of code when CONFIG_INET=n. There is little benefit in
trashing these drivers just to unwrap and remove the CONFIG_INET check.

Using a stub for tcp_gro_complete is still beneficial, as it means future
drivers no longer need to worry about this case of CONFIG_NET=y and
CONFIG_INET=n, which should reduce noise from buildbots that check such a
configuration.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Link: https://lore.kernel.org/r/20231013185502.1473541-1-jacob.e.keller@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agodrivers: net: wwan: wwan_core.c: resolved spelling mistake
Muhammad Muzammil [Fri, 13 Oct 2023 04:23:04 +0000 (09:23 +0500)]
drivers: net: wwan: wwan_core.c: resolved spelling mistake

resolved typing mistake from devce to device

Signed-off-by: Muhammad Muzammil <m.muzzammilashraf@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231013042304.7881-1-m.muzzammilashraf@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agocgroup, netclassid: on modifying netclassid in cgroup, only consider the main process.
Liansen Zhai [Thu, 12 Oct 2023 09:03:30 +0000 (17:03 +0800)]
cgroup, netclassid: on modifying netclassid in cgroup, only consider the main process.

When modifying netclassid, the command("echo 0x100001 > net_cls.classid")
will take more time on many threads of one process, because the process
create many fds.
for example, one process exists 28000 fds and 60000 threads, echo command
will task 45 seconds.
Now, we only consider the main process when exec "iterate_fd", and the
time is about 52 milliseconds.

Signed-off-by: Liansen Zhai <zhailiansen@kuaishou.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231012090330.29636-1-zhailiansen@kuaishou.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: usb: replace deprecated strncpy with strscpy
Justin Stitt [Thu, 12 Oct 2023 22:33:34 +0000 (22:33 +0000)]
net: usb: replace deprecated strncpy with strscpy

strncpy() is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.

Other implementations of .*get_drvinfo use strscpy so this patch brings
sr_get_drvinfo() in line as well:

igb/igb_ethtool.c +851
static void igb_get_drvinfo(struct net_device *netdev,

igbvf/ethtool.c
167:static void igbvf_get_drvinfo(struct net_device *netdev,

i40e/i40e_ethtool.c
1999:static void i40e_get_drvinfo(struct net_device *netdev,

e1000/e1000_ethtool.c
529:static void e1000_get_drvinfo(struct net_device *netdev,

ixgbevf/ethtool.c
211:static void ixgbevf_get_drvinfo(struct net_device *netdev,

...

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html
Link: https://github.com/KSPP/linux/issues/90
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20231012-strncpy-drivers-net-usb-sr9800-c-v1-1-5540832c8ec2@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agolan78xx: replace deprecated strncpy with strscpy
Justin Stitt [Thu, 12 Oct 2023 22:30:54 +0000 (22:30 +0000)]
lan78xx: replace deprecated strncpy with strscpy

strncpy() is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.

Other implementations of .*get_drvinfo use strscpy so this patch brings
lan78xx_get_drvinfo() in line as well:

igb/igb_ethtool.c +851
static void igb_get_drvinfo(struct net_device *netdev,

igbvf/ethtool.c
167:static void igbvf_get_drvinfo(struct net_device *netdev,

i40e/i40e_ethtool.c
1999:static void i40e_get_drvinfo(struct net_device *netdev,

e1000/e1000_ethtool.c
529:static void e1000_get_drvinfo(struct net_device *netdev,

ixgbevf/ethtool.c
211:static void ixgbevf_get_drvinfo(struct net_device *netdev,

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings
Link: https://github.com/KSPP/linux/issues/90
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20231012-strncpy-drivers-net-usb-lan78xx-c-v1-1-99d513061dfc@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: phy: smsc: replace deprecated strncpy with ethtool_sprintf
Justin Stitt [Thu, 12 Oct 2023 22:27:52 +0000 (22:27 +0000)]
net: phy: smsc: replace deprecated strncpy with ethtool_sprintf

strncpy() is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.

ethtool_sprintf() is designed specifically for get_strings() usage.
Let's replace strncpy in favor of this dedicated helper function.

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html
Link: https://github.com/KSPP/linux/issues/90
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231012-strncpy-drivers-net-phy-smsc-c-v1-1-00528f7524b3@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet: netcp: replace deprecated strncpy with strscpy
Justin Stitt [Thu, 12 Oct 2023 21:05:40 +0000 (21:05 +0000)]
net: netcp: replace deprecated strncpy with strscpy

strncpy() is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.

Considering the above, a suitable replacement is `strscpy` [2] due to
the fact that it guarantees NUL-termination on the destination buffer
without unnecessarily NUL-padding.

Other implementations of .*get_drvinfo also use strscpy so this patch
brings keystone_get_drvinfo() in line as well:

igb/igb_ethtool.c +851
static void igb_get_drvinfo(struct net_device *netdev,

igbvf/ethtool.c
167:static void igbvf_get_drvinfo(struct net_device *netdev,

i40e/i40e_ethtool.c
1999:static void i40e_get_drvinfo(struct net_device *netdev,

e1000/e1000_ethtool.c
529:static void e1000_get_drvinfo(struct net_device *netdev,

ixgbevf/ethtool.c
211:static void ixgbevf_get_drvinfo(struct net_device *netdev,

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html
Link: https://github.com/KSPP/linux/issues/90
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20231012-strncpy-drivers-net-ethernet-ti-netcp_ethss-c-v1-1-93142e620864@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agotcp: Set pingpong threshold via sysctl
Haiyang Zhang [Wed, 11 Oct 2023 20:30:44 +0000 (13:30 -0700)]
tcp: Set pingpong threshold via sysctl

TCP pingpong threshold is 1 by default. But some applications, like SQL DB
may prefer a higher pingpong threshold to activate delayed acks in quick
ack mode for better performance.

The pingpong threshold and related code were changed to 3 in the year
2019 in:
  commit 4a41f453bedf ("tcp: change pingpong threshold to 3")
And reverted to 1 in the year 2022 in:
  commit 4d8f24eeedc5 ("Revert "tcp: change pingpong threshold to 3"")

There is no single value that fits all applications.
Add net.ipv4.tcp_pingpong_thresh sysctl tunable, so it can be tuned for
optimal performance based on the application needs.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/1697056244-21888-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet, sched: Add tcf_set_drop_reason for {__,}tcf_classify
Daniel Borkmann [Mon, 9 Oct 2023 09:26:55 +0000 (11:26 +0200)]
net, sched: Add tcf_set_drop_reason for {__,}tcf_classify

Add an initial user for the newly added tcf_set_drop_reason() helper to set the
drop reason for internal errors leading to TC_ACT_SHOT inside {__,}tcf_classify().

Right now this only adds a very basic SKB_DROP_REASON_TC_ERROR as a generic
fallback indicator to mark drop locations. Where needed, such locations can be
converted to more specific codes, for example, when hitting the reclassification
limit, etc.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Victor Nogueira <victor@mojatatu.com>
Link: https://lore.kernel.org/r/20231009092655.22025-2-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agonet, sched: Make tc-related drop reason more flexible
Daniel Borkmann [Mon, 9 Oct 2023 09:26:54 +0000 (11:26 +0200)]
net, sched: Make tc-related drop reason more flexible

Currently, the kfree_skb_reason() in sch_handle_{ingress,egress}() can only
express a basic SKB_DROP_REASON_TC_INGRESS or SKB_DROP_REASON_TC_EGRESS reason.

Victor kicked-off an initial proposal to make this more flexible by disambiguating
verdict from return code by moving the verdict into struct tcf_result and
letting tcf_classify() return a negative error. If hit, then two new drop
reasons were added in the proposal, that is SKB_DROP_REASON_TC_INGRESS_ERROR
as well as SKB_DROP_REASON_TC_EGRESS_ERROR. Further analysis of the actual
error codes would have required to attach to tcf_classify via kprobe/kretprobe
to more deeply debug skb and the returned error.

In order to make the kfree_skb_reason() in sch_handle_{ingress,egress}() more
extensible, it can be addressed in a more straight forward way, that is: Instead
of placing the verdict into struct tcf_result, we can just put the drop reason
in there, which does not require changes throughout various classful schedulers
given the existing verdict logic can stay as is.

Then, SKB_DROP_REASON_TC_ERROR{,_*} can be added to the enum skb_drop_reason
to disambiguate between an error or an intentional drop. New drop reason error
codes can be added successively to the tc code base.

For internal error locations which have not yet been annotated with a
SKB_DROP_REASON_TC_ERROR{,_*}, the fallback is SKB_DROP_REASON_TC_INGRESS and
SKB_DROP_REASON_TC_EGRESS, respectively. Generic errors could be marked with a
SKB_DROP_REASON_TC_ERROR code until they are converted to more specific ones
if it is found that they would be useful for troubleshooting.

While drop reasons have infrastructure for subsystem specific error codes which
are currently used by mac80211 and ovs, Jakub mentioned that it is preferred
for tc to use the enum skb_drop_reason core codes given it is a better fit and
currently the tooling support is better, too.

With regards to the latter:

  [...] I think Alastair (bpftrace) is working on auto-prettifying enums when
  bpftrace outputs maps. So we can do something like:

  $ bpftrace -e 'tracepoint:skb:kfree_skb { @[args->reason] = count(); }'
  Attaching 1 probe...
  ^C

  @[SKB_DROP_REASON_TC_INGRESS]: 2
  @[SKB_CONSUMED]: 34

  ^^^^^^^^^^^^ names!!

  Auto-magically. [...]

Add a small helper tcf_set_drop_reason() which can be used to set the drop reason
into the tcf_result.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Victor Nogueira <victor@mojatatu.com>
Link: https://lore.kernel.org/netdev/20231006063233.74345d36@kernel.org
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20231009092655.22025-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 months agoMerge branch 'bpf-log-improvements'
Daniel Borkmann [Mon, 16 Oct 2023 11:49:18 +0000 (13:49 +0200)]
Merge branch 'bpf-log-improvements'

Andrii Nakryiko says:

====================
This patch set fixes ambiguity in BPF verifier log output of SCALAR register
in the parts that emit umin/umax, smin/smax, etc ranges. See patch #4 for
details.

Also, patch #5 fixes an issue with verifier log missing instruction context
(state) output for conditionals that trigger precision marking. See details in
the patch.

First two patches are just improvements to two selftests that are very flaky
locally when run in parallel mode.

Patch #3 changes 'align' selftest to be less strict about exact verifier log
output (which patch #4 changes, breaking lots of align tests as written). Now
test does more of a register substate checks, mostly around expected var_off()
values. This 'align' selftests is one of the more brittle ones and requires
constant adjustment when verifier log output changes, without really catching
any new issues. So hopefully these changes can minimize future support efforts
for this specific set of tests.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
11 months agobpf: Ensure proper register state printing for cond jumps
Andrii Nakryiko [Wed, 11 Oct 2023 22:37:28 +0000 (15:37 -0700)]
bpf: Ensure proper register state printing for cond jumps

Verifier emits relevant register state involved in any given instruction
next to it after `;` to the right, if possible. Or, worst case, on the
separate line repeating instruction index.

E.g., a nice and simple case would be:

  2: (d5) if r0 s<= 0x0 goto pc+1       ; R0_w=0

But if there is some intervening extra output (e.g., precision
backtracking log) involved, we are supposed to see the state after the
precision backtrack log:

  4: (75) if r0 s>= 0x0 goto pc+1
  mark_precise: frame0: last_idx 4 first_idx 0 subseq_idx -1
  mark_precise: frame0: regs=r0 stack= before 2: (d5) if r0 s<= 0x0 goto pc+1
  mark_precise: frame0: regs=r0 stack= before 1: (b7) r0 = 0
  6: R0_w=0

First off, note that in `6: R0_w=0` instruction index corresponds to the
next instruction, not to the conditional jump instruction itself, which
is wrong and we'll get to that.

But besides that, the above is a happy case that does work today. Yet,
if it so happens that precision backtracking had to traverse some of the
parent states, this `6: R0_w=0` state output would be missing.

This is due to a quirk of print_verifier_state() routine, which performs
mark_verifier_state_clean(env) at the end. This marks all registers as
"non-scratched", which means that subsequent logic to print *relevant*
registers (that is, "scratched ones") fails and doesn't see anything
relevant to print and skips the output altogether.

print_verifier_state() is used both to print instruction context, but
also to print an **entire** verifier state indiscriminately, e.g.,
during precision backtracking (and in a few other situations, like
during entering or exiting subprogram).  Which means if we have to print
entire parent state before getting to printing instruction context
state, instruction context is marked as clean and is omitted.

Long story short, this is definitely not intentional. So we fix this
behavior in this patch by teaching print_verifier_state() to clear
scratch state only if it was used to print instruction state, not the
parent/callback state. This is determined by print_all option, so if
it's not set, we don't clear scratch state. This fixes missing
instruction state for these cases.

As for the mismatched instruction index, we fix that by making sure we
call print_insn_state() early inside check_cond_jmp_op() before we
adjusted insn_idx based on jump branch taken logic. And with that we get
desired correct information:

  9: (16) if w4 == 0x1 goto pc+9
  mark_precise: frame0: last_idx 9 first_idx 9 subseq_idx -1
  mark_precise: frame0: parent state regs=r4 stack=: R2_w=1944 R4_rw=P1 R10=fp0
  mark_precise: frame0: last_idx 8 first_idx 0 subseq_idx 9
  mark_precise: frame0: regs=r4 stack= before 8: (66) if w4 s> 0x3 goto pc+5
  mark_precise: frame0: regs=r4 stack= before 7: (b7) r4 = 1
  9: R4=1

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20231011223728.3188086-6-andrii@kernel.org
11 months agobpf: Disambiguate SCALAR register state output in verifier logs
Andrii Nakryiko [Wed, 11 Oct 2023 22:37:27 +0000 (15:37 -0700)]
bpf: Disambiguate SCALAR register state output in verifier logs

Currently the way that verifier prints SCALAR_VALUE register state (and
PTR_TO_PACKET, which can have var_off and ranges info as well) is very
ambiguous.

In the name of brevity we are trying to eliminate "unnecessary" output
of umin/umax, smin/smax, u32_min/u32_max, and s32_min/s32_max values, if
possible. Current rules are that if any of those have their default
value (which for mins is the minimal value of its respective types: 0,
S32_MIN, or S64_MIN, while for maxs it's U32_MAX, S32_MAX, S64_MAX, or
U64_MAX) *OR* if there is another min/max value that as matching value.
E.g., if smin=100 and umin=100, we'll emit only umin=10, omitting smin
altogether. This approach has a few problems, being both ambiguous and
sort-of incorrect in some cases.

Ambiguity is due to missing value could be either default value or value
of umin/umax or smin/smax. This is especially confusing when we mix
signed and unsigned ranges. Quite often, umin=0 and smin=0, and so we'll
have only `umin=0` leaving anyone reading verifier log to guess whether
smin is actually 0 or it's actually -9223372036854775808 (S64_MIN). And
often times it's important to know, especially when debugging tricky
issues.

"Sort-of incorrectness" comes from mixing negative and positive values.
E.g., if umin is some large positive number, it can be equal to smin
which is, interpreted as signed value, is actually some negative value.
Currently, that smin will be omitted and only umin will be emitted with
a large positive value, giving an impression that smin is also positive.

Anyway, ambiguity is the biggest issue making it impossible to have an
exact understanding of register state, preventing any sort of automated
testing of verifier state based on verifier log. This patch is
attempting to rectify the situation by removing ambiguity, while
minimizing the verboseness of register state output.

The rules are straightforward:
  - if some of the values are missing, then it definitely has a default
  value. I.e., `umin=0` means that umin is zero, but smin is actually
  S64_MIN;
  - all the various boundaries that happen to have the same value are
  emitted in one equality separated sequence. E.g., if umin and smin are
  both 100, we'll emit `smin=umin=100`, making this explicit;
  - we do not mix negative and positive values together, and even if
  they happen to have the same bit-level value, they will be emitted
  separately with proper sign. I.e., if both umax and smax happen to be
  0xffffffffffffffff, we'll emit them both separately as
  `smax=-1,umax=18446744073709551615`;
  - in the name of a bit more uniformity and consistency,
  {u32,s32}_{min,max} are renamed to {s,u}{min,max}32, which seems to
  improve readability.

The above means that in case of all 4 ranges being, say, [50, 100] range,
we'd previously see hugely ambiguous:

    R1=scalar(umin=50,umax=100)

Now, we'll be more explicit:

    R1=scalar(smin=umin=smin32=umin32=50,smax=umax=smax32=umax32=100)

This is slightly more verbose, but distinct from the case when we don't
know anything about signed boundaries and 32-bit boundaries, which under
new rules will match the old case:

    R1=scalar(umin=50,umax=100)

Also, in the name of simplicity of implementation and consistency, order
for {s,u}32_{min,max} are emitted *before* var_off. Previously they were
emitted afterwards, for unclear reasons.

This patch also includes a few fixes to selftests that expect exact
register state to accommodate slight changes to verifier format. You can
see that the changes are pretty minimal in common cases.

Note, the special case when SCALAR_VALUE register is a known constant
isn't changed, we'll emit constant value once, interpreted as signed
value.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20231011223728.3188086-5-andrii@kernel.org
11 months agoselftests/bpf: Make align selftests more robust
Andrii Nakryiko [Wed, 11 Oct 2023 22:37:26 +0000 (15:37 -0700)]
selftests/bpf: Make align selftests more robust

Align subtest is very specific and finicky about expected verifier log
output and format. This is often completely unnecessary as in a bunch of
situations test actually cares about var_off part of register state. But
given how exact it is right now, any tiny verifier log changes can lead
to align tests failures, requiring constant adjustment.

This patch tries to make this a bit more robust by making logic first
search for specified register and then allowing to match only portion of
register state, not everything exactly. This will come handly with
follow up changes to SCALAR register output disambiguation.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20231011223728.3188086-4-andrii@kernel.org
11 months agoselftests/bpf: Improve missed_kprobe_recursion test robustness
Andrii Nakryiko [Wed, 11 Oct 2023 22:37:25 +0000 (15:37 -0700)]
selftests/bpf: Improve missed_kprobe_recursion test robustness

Given missed_kprobe_recursion is non-serial and uses common testing
kfuncs to count number of recursion misses it's possible that some other
parallel test can trigger extraneous recursion misses. So we can't
expect exactly 1 miss. Relax conditions and expect at least one.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20231011223728.3188086-3-andrii@kernel.org
11 months agoselftests/bpf: Improve percpu_alloc test robustness
Andrii Nakryiko [Wed, 11 Oct 2023 22:37:24 +0000 (15:37 -0700)]
selftests/bpf: Improve percpu_alloc test robustness

Make these non-serial tests filter BPF programs by intended PID of
a test runner process. This makes it isolated from other parallel tests
that might interfere accidentally.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20231011223728.3188086-2-andrii@kernel.org
11 months agotsnep: Inline small fragments within TX descriptor
Gerhard Engleder [Wed, 11 Oct 2023 18:21:54 +0000 (20:21 +0200)]
tsnep: Inline small fragments within TX descriptor

The tsnep network controller is able to extend the descriptor directly
with data to be transmitted. In this case no TX data DMA address is
necessary. Instead of the TX data DMA address the TX data buffer is
placed at the end of the descriptor.

The descriptor is read with a 64 bytes DMA read by the tsnep network
controller. If the sum of descriptor data and TX data is less than or
equal to 64 bytes, then no additional DMA read is necessary to read the
TX data. Therefore, it makes sense to inline small fragments up to this
limit within the descriptor ring.

Inlined fragments need to be copied to the descriptor ring. On the other
hand DMA mapping is not necessary. At most 40 bytes are copied, so
copying should be faster than DMA mapping.

For A53 1.2 GHz copying takes <100ns and DMA mapping takes >200ns. So
inlining small fragments should result in lower CPU load. Performance
improvement is small. Thus, comparision of CPU load with and without
inlining of small fragments did not show any significant difference.
With this optimization less DMA reads will be done, which decreases the
load of the interconnect.

Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 months agoMerge branch 'udp-tunnel-route-lookups'
David S. Miller [Mon, 16 Oct 2023 08:57:52 +0000 (09:57 +0100)]
Merge branch 'udp-tunnel-route-lookups'

Beniamino Galvani says:

====================
net: consolidate IPv4 route lookup for UDP tunnels

At the moment different UDP tunnels rely on different functions for
IPv4 route lookup, and those functions all implement the same
logic. Only bareudp uses the generic ip_route_output_tunnel(), while
geneve and vxlan basically duplicate it slightly differently.

This series first extends the generic lookup function so that it is
suitable for all UDP tunnel implementations. Then, bareudp, geneve and
vxlan are adapted to use them.

This results in code with less duplication and hopefully better
maintainability.

After this series is merged, IPv6 will be converted in a similar way.

Changelog:
v2
 - fix compilation with IPv6 disabled
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 months agovxlan: use generic function for tunnel IPv4 route lookup
Beniamino Galvani [Mon, 16 Oct 2023 07:15:26 +0000 (09:15 +0200)]
vxlan: use generic function for tunnel IPv4 route lookup

The route lookup can be done now via generic function
udp_tunnel_dst_lookup() to replace the custom implementations in
vxlan_get_route().

Note that this patch only touches IPv4, while IPv6 still uses
vxlan6_get_route(). After IPv6 route lookup gets converted as well,
vxlan_xmit_one() can be simplified by removing local variables that
will be passed via "struct ip_tunnel_key", such as remote_ip,
local_ip, flow_flags, label.

Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Beniamino Galvani <b.galvani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 months agogeneve: use generic function for tunnel IPv4 route lookup
Beniamino Galvani [Mon, 16 Oct 2023 07:15:25 +0000 (09:15 +0200)]
geneve: use generic function for tunnel IPv4 route lookup

The route lookup can be done now via generic function
udp_tunnel_dst_lookup() to replace the custom implementation in
geneve_get_v4_rt().

Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Beniamino Galvani <b.galvani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 months agogeneve: add dsfield helper function
Beniamino Galvani [Mon, 16 Oct 2023 07:15:24 +0000 (09:15 +0200)]
geneve: add dsfield helper function

Add a helper function to compute the tos/dsfield. In this way, we can
factor out some duplicate code. Also, the helper will be called from
more places in the next commit.

Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Beniamino Galvani <b.galvani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 months agoipv4: use tunnel flow flags for tunnel route lookups
Beniamino Galvani [Mon, 16 Oct 2023 07:15:23 +0000 (09:15 +0200)]
ipv4: use tunnel flow flags for tunnel route lookups

Commit 451ef36bd229 ("ip_tunnels: Add new flow flags field to
ip_tunnel_key") added a new field to struct ip_tunnel_key to control
route lookups. Currently the flag is used by vxlan and geneve tunnels;
use it also in udp_tunnel_dst_lookup() so that it affects all tunnel
types relying on this function.

Signed-off-by: Beniamino Galvani <b.galvani@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 months agoipv4: add new arguments to udp_tunnel_dst_lookup()
Beniamino Galvani [Mon, 16 Oct 2023 07:15:22 +0000 (09:15 +0200)]
ipv4: add new arguments to udp_tunnel_dst_lookup()

We want to make the function more generic so that it can be used by
other UDP tunnel implementations such as geneve and vxlan. To do that,
add the following arguments:

 - source and destination UDP port;
 - ifindex of the output interface, needed by vxlan;
 - the tos, because in some cases it is not taken from struct
   ip_tunnel_info (for example, when it's inherited from the inner
   packet);
 - the dst cache, because not all tunnel types (e.g. vxlan) want to
   use the one from struct ip_tunnel_info.

With these parameters, the function no longer needs the full struct
ip_tunnel_info as argument and we can pass only the relevant part of
it (struct ip_tunnel_key).

Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Beniamino Galvani <b.galvani@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 months agoipv4: remove "proto" argument from udp_tunnel_dst_lookup()
Beniamino Galvani [Mon, 16 Oct 2023 07:15:21 +0000 (09:15 +0200)]
ipv4: remove "proto" argument from udp_tunnel_dst_lookup()

The function is now UDP-specific, the protocol is always IPPROTO_UDP.

Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Beniamino Galvani <b.galvani@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 months agoipv4: rename and move ip_route_output_tunnel()
Beniamino Galvani [Mon, 16 Oct 2023 07:15:20 +0000 (09:15 +0200)]
ipv4: rename and move ip_route_output_tunnel()

At the moment ip_route_output_tunnel() is used only by bareudp.
Ideally, other UDP tunnel implementations should use it, but to do so
the function needs to accept new parameters that are specific for UDP
tunnels, such as the ports.

Prepare for these changes by renaming the function to
udp_tunnel_dst_lookup() and move it to file
net/ipv4/udp_tunnel_core.c.

Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Beniamino Galvani <b.galvani@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 months agoselftests: net: remove unused variables
zhujun2 [Mon, 16 Oct 2023 06:30:39 +0000 (23:30 -0700)]
selftests: net: remove unused variables

These variables are never referenced in the code, just remove them

Signed-off-by: zhujun2 <zhujun2@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 months agonet: cxgb3: simplify logic for rspq_check_napi
Christian Marangi [Thu, 12 Oct 2023 09:14:29 +0000 (11:14 +0200)]
net: cxgb3: simplify logic for rspq_check_napi

Simplify logic for rspq_check_napi.
Drop redundant and wrong napi_is_scheduled call as it's not race free
and directly use the output of napi_schedule to understand if a napi is
pending or not.

rspq_check_napi main logic is to check if is_new_response is true and
check if a napi is not scheduled. The result of this function is then
used to detect if we are missing some interrupt and act on top of
this... With this knowing, we can rework and simplify the logic and make
it less problematic with testing an internal bit for napi.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 months agoMerge branch 'ptp-multiple-readers'
David S. Miller [Sun, 15 Oct 2023 19:07:53 +0000 (20:07 +0100)]
Merge branch 'ptp-multiple-readers'

Xabier Marquiegui says:

====================
ptp: Support for multiple filtered timestamp event queue readers

On systems with multiple timestamp event channels, there can be scenarios
where multiple userspace readers want to access the timestamping data for
various purposes.

One such example is wanting to use a pps out for time synchronization, and
wanting to timestamp external events with the synchronized time base
simultaneously.

Timestmp event consumers on the other hand, are often interested in a
subset of the available timestamp channels. linuxptp ts2phc, for example,
is not happy if more than one timestamping channel is active on the device
it is reading from.

Linked lists are introduced to support multiple timestamp event queue
consumers, and timestamp event channel filters through IOCTLs, as well as
a debugfs interface to do some simple verifications.

Xabier Marquiegui (6):
  posix-clock: introduce posix_clock_context concept
  ptp: Replace timestamp event queue with linked list
  ptp: support multiple timestamp event readers
  ptp: support event queue reader channel masks
  ptp: add debugfs interface to see applied channel masks
  ptp: add testptp mask test

 drivers/ptp/ptp_chardev.c                   | 129 ++++++++++++++++----
 drivers/ptp/ptp_clock.c                     |  45 ++++++-
 drivers/ptp/ptp_private.h                   |  28 +++--
 drivers/ptp/ptp_sysfs.c                     |  13 +-
 include/linux/posix-clock.h                 |  35 ++++--
 include/uapi/linux/ptp_clock.h              |   2 +
 kernel/time/posix-clock.c                   |  36 ++++--
 tools/testing/selftests/ptp/ptpchmaskfmt.sh |  14 +++
 tools/testing/selftests/ptp/testptp.c       |  19 ++-
 9 files changed, 261 insertions(+), 60 deletions(-)
 create mode 100644 tools/testing/selftests/ptp/ptpchmaskfmt.sh

---
v6:
  - correct commit message
  - correct coding style
v5: https://lore.kernel.org/netdev/cover.1696804243.git.reibax@gmail.com/
  - fix spelling on commit message
  - fix memory leak on ptp_open
v4: https://lore.kernel.org/netdev/cover.1696511486.git.reibax@gmail.com/
  - split modifications in different patches for improved organization
  - rename posix_clock_user to posix_clock_context
  - remove unnecessary flush_users clock operation
  - remove unnecessary tests
  - simpler queue clean procedure
  - fix/clean comment lines
  - simplified release procedures
  - filter modifications exclusive to currently open instance for
    simplicity and security
  - expand mask to 2048 channels
  - make more secure and simple: mask is only applied to the testptp
    instance. Use debugfs to verify effects.
v3: https://lore.kernel.org/netdev/20230928133544.3642650-1-reibax@gmail.com/
  - add this patchset overview file
  - fix use of safe and non safe linked lists for loops
  - introduce new posix_clock private_data and ida object ids for better
    dicrimination of timestamp consumers
  - safer resource release procedures
  - filter application by object id, aided by process id
  - friendlier testptp implementation of event queue channel filters
v2: https://lore.kernel.org/netdev/20230912220217.2008895-1-reibax@gmail.com/
  - fix ptp_poll() return value
  - Style changes to comform to checkpatch strict suggestions
  - more coherent ptp_read error exit routines
  - fix testptp compilation error: unknown type name 'pid_t'
  - rename mask variable for easier code traceability
  - more detailed commit message with two examples
v1: https://lore.kernel.org/netdev/20230906104754.1324412-2-reibax@gmail.com/
====================

Signed-off-by: Xabier Marquiegui <reibax@gmail.com>
Suggested-by: Richard Cochran <richardcochran@gmail.com>
Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 months agoptp: add testptp mask test
Xabier Marquiegui [Wed, 11 Oct 2023 22:39:58 +0000 (00:39 +0200)]
ptp: add testptp mask test

Add option to test timestamp event queue mask manipulation in testptp.

Option -F allows the user to specify a single channel that will be
applied on the mask filter via IOCTL.

The test program will maintain the file open until user input is
received.

This allows checking the effect of the IOCTL in debugfs.

eg:

Console 1:
```
Channel 12 exclusively enabled. Check on debugfs.
Press any key to continue
```

Console 2:
```
0x00000000 0x00000001 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000
0x00000000 0x00000000 0x00000000 0x00000000
```

Signed-off-by: Xabier Marquiegui <reibax@gmail.com>
Suggested-by: Richard Cochran <richardcochran@gmail.com>
Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 months agoptp: add debugfs interface to see applied channel masks
Xabier Marquiegui [Wed, 11 Oct 2023 22:39:57 +0000 (00:39 +0200)]
ptp: add debugfs interface to see applied channel masks

Use debugfs to be able to view channel mask applied to every timestamp
event queue.

Every time the device is opened, a new entry is created in
`$DEBUGFS_MOUNTPOINT/ptpN/$INSTANCE_ADDRESS/mask`.

The mask value can be viewed grouped in 32bit decimal values using cat,
or converted to hexadecimal with the included `ptpchmaskfmt.sh` script.
32 bit values are listed from least significant to most significant.

Signed-off-by: Xabier Marquiegui <reibax@gmail.com>
Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>