Mina Almasry [Tue, 10 Sep 2024 17:14:49 +0000 (17:14 +0000)]
page_pool: devmem support
Convert netmem to be a union of struct page and struct netmem. Overload
the LSB of struct netmem* to indicate that it's a net_iov, otherwise
it's a page.
Currently these entries in struct page are rented by the page_pool and
used exclusively by the net stack:
struct {
unsigned long pp_magic;
struct page_pool *pp;
unsigned long _pp_mapping_pad;
unsigned long dma_addr;
atomic_long_t pp_ref_count;
};
Mirror these (and only these) entries into struct net_iov and implement
netmem helpers that can access these common fields regardless of
whether the underlying type is page or net_iov.
Implement checks for net_iov in netmem helpers which delegate to mm
APIs, to ensure net_iov are never passed to the mm stack.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240910171458.219195-6-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mina Almasry [Tue, 10 Sep 2024 17:14:48 +0000 (17:14 +0000)]
netdev: netdevice devmem allocator
Implement netdev devmem allocator. The allocator takes a given struct
netdev_dmabuf_binding as input and allocates net_iov from that
binding.
The allocation simply delegates to the binding's genpool for the
allocation logic and wraps the returned memory region in a net_iov
struct.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Kaiyuan Zhang <kaiyuanz@google.com>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240910171458.219195-5-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mina Almasry [Tue, 10 Sep 2024 17:14:47 +0000 (17:14 +0000)]
netdev: support binding dma-buf to netdevice
Add a netdev_dmabuf_binding struct which represents the
dma-buf-to-netdevice binding. The netlink API will bind the dma-buf to
rx queues on the netdevice. On the binding, the dma_buf_attach
& dma_buf_map_attachment will occur. The entries in the sg_table from
mapping will be inserted into a genpool to make it ready
for allocation.
The chunks in the genpool are owned by a dmabuf_chunk_owner struct which
holds the dma-buf offset of the base of the chunk and the dma_addr of
the chunk. Both are needed to use allocations that come from this chunk.
We create a new type that represents an allocation from the genpool:
net_iov. We setup the net_iov allocation size in the
genpool to PAGE_SIZE for simplicity: to match the PAGE_SIZE normally
allocated by the page pool and given to the drivers.
The user can unbind the dmabuf from the netdevice by closing the netlink
socket that established the binding. We do this so that the binding is
automatically unbound even if the userspace process crashes.
The binding and unbinding leaves an indicator in struct netdev_rx_queue
that the given queue is bound, and the binding is actuated by resetting
the rx queue using the queue API.
The netdev_dmabuf_binding struct is refcounted, and releases its
resources only when all the refs are released.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Kaiyuan Zhang <kaiyuanz@google.com>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> # excluding netlink
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240910171458.219195-4-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mina Almasry [Tue, 10 Sep 2024 17:14:46 +0000 (17:14 +0000)]
net: netdev netlink api to bind dma-buf to a net device
API takes the dma-buf fd as input, and binds it to the netdevice. The
user can specify the rx queues to bind the dma-buf to.
Suggested-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240910171458.219195-3-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mina Almasry [Tue, 10 Sep 2024 17:14:45 +0000 (17:14 +0000)]
netdev: add netdev_rx_queue_restart()
Add netdev_rx_queue_restart(), which resets an rx queue using the
queue API recently merged[1].
The queue API was merged to enable the core net stack to reset individual
rx queues to actuate changes in the rx queue's configuration. In later
patches in this series, we will use netdev_rx_queue_restart() to reset
rx queues after binding or unbinding dmabuf configuration, which will
cause reallocation of the page_pool to repopulate its memory using the
new configuration.
[1] https://lore.kernel.org/netdev/
20240430231420.699177-1-shailend@google.com/T/
Signed-off-by: David Wei <dw@davidwei.uk>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240910171458.219195-2-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 12 Sep 2024 03:24:43 +0000 (20:24 -0700)]
Merge branch '200GbE' of git://git./linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
idpf: XDP chapter II: convert Tx completion to libeth
Alexander Lobakin says:
XDP for idpf is currently 5 chapters:
* convert Rx to libeth;
* convert Tx completion to libeth (this);
* generic XDP and XSk code changes;
* actual XDP for idpf via libeth_xdp;
* XSk for idpf (^).
Part II does the following:
* adds generic libeth Tx completion routines;
* converts idpf to use generic libeth Tx comp routines;
* fixes Tx queue timeouts and robustifies Tx completion in general;
* fixes Tx event/descriptor flushes (writebacks).
Most idpf patches again remove more lines than adds.
Generic Tx completion helpers and structs are needed as libeth_xdp
(Ch. III) makes use of them. WB_ON_ITR is needed since XDPSQs don't
want to work without it at all. Tx queue timeouts fixes are needed
since without them, it's way easier to catch a Tx timeout event when
WB_ON_ITR is enabled.
* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
idpf: enable WB_ON_ITR
idpf: fix netdev Tx queue stop/wake
idpf: refactor Tx completion routines
netdevice: add netdev_tx_reset_subqueue() shorthand
idpf: convert to libeth Tx buffer completion
libeth: add Tx buffer completion helpers
====================
Link: https://patch.msgid.link/20240909205323.3110312-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Divya Koppera [Mon, 9 Sep 2024 11:43:39 +0000 (17:13 +0530)]
net: phy: microchip_t1: Cable Diagnostics for lan887x
Add support for cable diagnostics in lan887x PHY.
Using this we can diagnose connected/open/short wires and
also length where cable fault is occurred.
Signed-off-by: Divya Koppera <divya.koppera@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20240909114339.3446-1-divya.koppera@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxime Chevallier [Tue, 10 Sep 2024 17:46:35 +0000 (19:46 +0200)]
net: ethtool: phy: Check the req_info.pdn field for GET commands
When processing the netlink GET requests to get PHY info, the req_info.pdn
pointer is NULL when no PHY matches the requested parameters, such as when
the phy_index is invalid, or there's simply no PHY attached to the
interface.
Therefore, check the req_info.pdn pointer for NULL instead of
dereferencing it.
Suggested-by: Eric Dumazet <edumazet@google.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Closes: https://lore.kernel.org/netdev/CANn89iKRW0WpGAh1tKqY345D8WkYCPm3Y9ym--Si42JZrQAu1g@mail.gmail.com/T/#mfced87d607d18ea32b3b4934dfa18d7b36669285
Fixes:
17194be4c8e1 ("net: ethtool: Introduce a command to list PHYs on an interface")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240910174636.857352-1-maxime.chevallier@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rosen Penev [Tue, 10 Sep 2024 22:09:13 +0000 (15:09 -0700)]
net: gianfar: fix NVMEM mac address
If nvmem loads after the ethernet driver, mac address assignments will
not take effect. of_get_ethdev_address returns EPROBE_DEFER in such a
case so we need to handle that to avoid eth_hw_addr_random.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240910220913.14101-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jonathan Cooper [Tue, 10 Sep 2024 15:30:13 +0000 (16:30 +0100)]
sfc: Add X4 PF support
Add X4 series. Most functionality is the same as previous
EF10 nics but enough is different to warrant a new nic type struct
and revision; for example legacy interrupts and SRIOV are
not supported.
Most removed features will be re-added later as new implementations.
Signed-off-by: Jonathan Cooper <jonathan.s.cooper@amd.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://patch.msgid.link/20240910153014.12803-1-jonathan.s.cooper@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Colin Ian King [Tue, 10 Sep 2024 12:06:35 +0000 (13:06 +0100)]
qlcnic: make read-only const array key static
Don't populate the const read-only array key on the stack at
run time, instead make it static.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240910120635.115266-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 11 Sep 2024 22:57:53 +0000 (15:57 -0700)]
Merge branch 'mptcp-fallback-to-tcp-after-3-mpc-drop-cache'
Matthieu Baerts says:
====================
mptcp: fallback to TCP after 3 MPC drop + cache
The SYN + MPTCP_CAPABLE packets could be explicitly dropped by firewalls
somewhere in the network, e.g. if they decide to drop packets based on
the TCP options, instead of stripping them off.
The idea of this series is to fallback to TCP after 3 SYN+MPC drop
(patch 2). If the connection succeeds after the fallback, it very likely
means a blackhole has been detected. In this case (patch 3), MPTCP can
be disabled for a certain period of time, 1h by default. If after this
period, MPTCP is still blocked, the period is doubled. This technique is
inspired by the one used by TCP FastOpen.
This should help applications which want to use MPTCP by default on the
client side if available.
====================
Link: https://patch.msgid.link/20240909-net-next-mptcp-fallback-x-mpc-v1-0-da7ebb4cd2a3@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts (NGI0) [Mon, 9 Sep 2024 20:09:23 +0000 (22:09 +0200)]
mptcp: disable active MPTCP in case of blackhole
An MPTCP firewall blackhole can be detected if the following SYN
retransmission after a fallback to "plain" TCP is accepted.
In case of blackhole, a similar technique to the one in place with TFO
is now used: MPTCP can be disabled for a certain period of time, 1h by
default. This time period will grow exponentially when more blackhole
issues get detected right after MPTCP is re-enabled and will reset to
the initial value when the blackhole issue goes away.
The blackhole period can be modified thanks to a new sysctl knob:
blackhole_timeout. Two new MIB counters help understanding what's
happening:
- 'Blackhole', incremented when a blackhole is detected.
- 'MPCapableSYNTXDisabled', incremented when an MPTCP connection
directly falls back to TCP during the blackhole period.
Because the technique is inspired by the one used by TFO, an important
part of the new code is similar to what can find in tcp_fastopen.c, with
some adaptations to the MPTCP case.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/57
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240909-net-next-mptcp-fallback-x-mpc-v1-3-da7ebb4cd2a3@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts (NGI0) [Mon, 9 Sep 2024 20:09:22 +0000 (22:09 +0200)]
mptcp: fallback to TCP after SYN+MPC drops
Some middleboxes might be nasty with MPTCP, and decide to drop packets
with MPTCP options, instead of just dropping the MPTCP options (or
letting them pass...).
In this case, it sounds better to fallback to "plain" TCP after 2
retransmissions, and try again.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/477
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240909-net-next-mptcp-fallback-x-mpc-v1-2-da7ebb4cd2a3@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Matthieu Baerts (NGI0) [Mon, 9 Sep 2024 20:09:21 +0000 (22:09 +0200)]
mptcp: export mptcp_subflow_early_fallback()
This helper will be used outside protocol.h in the following commit.
While at it, also add a 'pr_fallback()' debug print, to help identifying
fallbacks.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240909-net-next-mptcp-fallback-x-mpc-v1-1-da7ebb4cd2a3@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 11 Sep 2024 22:49:09 +0000 (15:49 -0700)]
Merge branch 'net-hsr-use-the-seqnr-lock-for-frames-received-via-interlink-port'
Sebastian Andrzej Siewior says:
====================
net: hsr: Use the seqnr lock for frames received via interlink port.
This is follow-up to the thread at
https://lore.kernel.org/all/
20240904133725.
1073963-1-edumazet@google.com/
====================
Link: https://patch.msgid.link/20240906132816.657485-1-bigeasy@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Fri, 6 Sep 2024 13:25:32 +0000 (15:25 +0200)]
net: hsr: Remove interlink_sequence_nr.
Remove interlink_sequence_nr which is unused.
[ bigeasy: split out from Eric's patch ].
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20240906132816.657485-3-bigeasy@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sebastian Andrzej Siewior [Fri, 6 Sep 2024 13:25:31 +0000 (15:25 +0200)]
net: hsr: Use the seqnr lock for frames received via interlink port.
syzbot reported that the seqnr_lock is not acquire for frames received
over the interlink port. In the interlink case a new seqnr is generated
and assigned to the frame.
Frames, which are received over the slave port have already a sequence
number assigned so the lock is not required.
Acquire the hsr_priv::seqnr_lock during in the invocation of
hsr_forward_skb() if a packet has been received from the interlink port.
Reported-by: syzbot+3d602af7549af539274e@syzkaller.appspotmail.com
Closes: https://groups.google.com/g/syzkaller-bugs/c/KppVvGviGg4/m/EItSdCZdBAAJ
Fixes:
5055cccfc2d1c ("net: hsr: Provide RedBox support (HSR-SAN)")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Lukasz Majewski <lukma@denx.de>
Tested-by: Lukasz Majewski <lukma@denx.de>
Link: https://patch.msgid.link/20240906132816.657485-2-bigeasy@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 11 Sep 2024 20:46:56 +0000 (13:46 -0700)]
Merge tag 'wireless-next-2024-09-11' of git://git./linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.12
The last -next "new features" pull request for v6.12. The stack now
supports DFS on MLO but otherwise nothing really standing out.
Major changes:
cfg80211/mac80211
* EHT rate support in AQL airtime
* DFS support for MLO
rtw89
* complete BT-coexistence code for RTL8852BT
* RTL8922A WoWLAN net-detect support
* tag 'wireless-next-2024-09-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (105 commits)
wifi: brcmfmac: cfg80211: Convert comma to semicolon
wifi: rsi: Remove an unused field in struct rsi_debugfs
wifi: libertas: Cleanup unused declarations
wifi: wilc1000: Convert using devm_clk_get_optional_enabled() in wilc_bus_probe()
wifi: wilc1000: Convert using devm_clk_get_optional_enabled() in wilc_sdio_probe()
wifi: wilc1000: fix potential RCU dereference issue in wilc_parse_join_bss_param
wifi: mwifiex: Fix memcpy() field-spanning write warning in mwifiex_cmd_802_11_scan_ext()
wifi: mac80211: use two-phase skb reclamation in ieee80211_do_stop()
wifi: cfg80211: fix two more possible UBSAN-detected off-by-one errors
wifi: cfg80211: fix kernel-doc for per-link data
wifi: mt76: mt7925: replace chan config with extend txpower config for clc
wifi: mt76: mt7925: fix a potential array-index-out-of-bounds issue for clc
wifi: mt76: mt7615: check devm_kasprintf() returned value
wifi: mt76: mt7925: convert comma to semicolon
wifi: mt76: mt7925: fix a potential association failure upon resuming
wifi: mt76: Avoid multiple -Wflex-array-member-not-at-end warnings
wifi: mt76: mt7921: Check devm_kasprintf() returned value
wifi: mt76: mt7915: check devm_kasprintf() returned value
wifi: mt76: mt7915: avoid long MCU command timeouts during SER
wifi: mt76: mt7996: fix uninitialized TLV data
...
====================
Link: https://patch.msgid.link/20240911084147.A205DC4AF0F@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David S. Miller [Wed, 11 Sep 2024 10:06:12 +0000 (11:06 +0100)]
Merge branch 'lan743x-phylink'
Raju Lakkaraju says:
====================
Add support to PHYLINK for LAN743x/PCI11x1x chips
This is the follow-up patch series of
https://lkml.iu.edu/hypermail/linux/kernel/2310.2/02078.html
Divide the PHYLINK adaptation and SFP modifications into two separate patch
series.
The current patch series focuses on transitioning the LAN743x driver's PHY
support from phylib to phylink.
Tested on PCI11010 Rev-1 Evaluation board
Change List:
============
V5 -> V6:
- Remove the lan743x_find_max_speed( ) function. Not require
- Add EEE enable check before calling lan743x_mac_eee_enable( ) function
V4 -> V5:
- Remove the fixed_phy_unregister( ) function. Not require
- Remove the "phydev->eee_enabled" check to update the MAC EEE
enable/disable
- Call lan743x_mac_eee_enable() with true after update tx_lpi_timer.
- Add phy_support_eee() to initialize the EEE flags
V3 -> V4:
- Add fixed-link patch along with this series.
Note: Note: This code was developed by Mr.Russell King
Ref:
https://lore.kernel.org/netdev/LV8PR11MB8700C786F5F1C274C73036CC9F8E2@LV8PR11MB8700.namprd11.prod.outlook.com/T/#me943adf54f1ea082edf294aba448fa003a116815
- Change phylink fixed-link function header's string from "Returns" to
"Returns:"
- Remove the EEE private variable from LAN743x adapter strcture and fix the
EEE's set/get functions
- set the individual caps (i.e. _RGMII, _RGMII_ID, _RGMII_RXID and
__RGMII_TXID) replace with phy_interface_set_rgmii( ) function
- Change lan743x_set_eee( ) to lan743x_mac_eee_enable( )
V2 -> V3:
- Remove the unwanted parens in each of these if() sub-blocks
- Replace "to_net_dev(config->dev)" with "netdev".
- Add GMII_ID/RGMII_TXID/RGMII_RXID in supported_interfaces
- Fix the lan743x_phy_handle_exists( ) return type
V1 -> V2:
- Fix the Russell King's comments i.e. remove the speed, duplex update in
lan743x_phylink_mac_config( )
- pre-March 2020 legacy support has been removed
V0 -> V1:
- Integrate with Synopsys DesignWare XPCS drivers
- Based on external review comments,
- Changes made to SGMII interface support only 1G/100M/10M bps speed
- Changes made to 2500Base-X interface support only 2.5Gbps speed
- Add check for not is_sgmii_en with is_sfp_support_en support
- Change the "pci11x1x_strap_get_status" function return type from void to
int
- Add ethtool phylink wol, eee, pause get/set functions
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Raju Lakkaraju [Fri, 6 Sep 2024 10:35:11 +0000 (16:05 +0530)]
net: lan743x: Add support to ethtool phylink get and set settings
Add support to ethtool phylink functions:
- get/set settings like speed, duplex etc
- get/set the wake-on-lan (WOL)
- get/set the energy-efficient ethernet (EEE)
- get/set the pause
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Raju Lakkaraju [Fri, 6 Sep 2024 10:35:10 +0000 (16:05 +0530)]
net: lan743x: Migrate phylib to phylink
Migrate phy support from phylib to phylink.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Raju Lakkaraju [Fri, 6 Sep 2024 10:35:09 +0000 (16:05 +0530)]
net: lan743x: Create separate Link Speed Duplex state function
Create separate Link Speed Duplex (LSD) update state function from
lan743x_sgmii_config () to use as subroutine.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Raju Lakkaraju [Fri, 6 Sep 2024 10:35:08 +0000 (16:05 +0530)]
net: lan743x: Create separate PCS power reset function
Create separate PCS power reset function from lan743x_sgmii_config () to use
as subroutine.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Fri, 6 Sep 2024 10:35:07 +0000 (16:05 +0530)]
net: phylink: Add phylink_set_fixed_link() to configure fixed link state in phylink
The function allows for the configuration of a fixed link state for a given
phylink instance. This addition is particularly useful for network devices that
operate with a fixed link configuration, where the link parameters do not change
dynamically. By using `phylink_set_fixed_link()`, drivers can easily set up
the fixed link state during initialization or configuration changes.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <linux@armlinux.org.uk>
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 11 Sep 2024 03:05:09 +0000 (20:05 -0700)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
ice: support devlink subfunction
Michal Swiatkowski says:
Currently ice driver does not allow creating more than one networking
device per physical function. The only way to have more hardware backed
netdev is to use SR-IOV.
Following patchset adds support for devlink port API. For each new
pcisf type port, driver allocates new VSI, configures all resources
needed, including dynamically MSIX vectors, program rules and registers
new netdev.
This series supports only one Tx/Rx queue pair per subfunction.
Example commands:
devlink port add pci/0000:31:00.1 flavour pcisf pfnum 1 sfnum 1000
devlink port function set pci/0000:31:00.1/1 hw_addr 00:00:00:00:03:14
devlink port function set pci/0000:31:00.1/1 state active
devlink port function del pci/0000:31:00.1/1
Make the port representor and eswitch code generic to support
subfunction representor type.
VSI configuration is slightly different between VF and SF. It needs to
be reflected in the code.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: subfunction activation and base devlink ops
ice: basic support for VLAN in subfunctions
ice: support subfunction devlink Tx topology
ice: implement netdevice ops for SF representor
ice: check if SF is ready in ethtool ops
ice: don't set target VSI for subfunction
ice: create port representor for SF
ice: make representor code generic
ice: implement netdev for subfunction
ice: base subfunction aux driver
ice: allocate devlink for subfunction
ice: treat subfunction VSI the same as PF VSI
ice: add basic devlink subfunctions support
ice: export ice ndo_ops functions
ice: add new VSI type for subfunctions
====================
Link: https://patch.msgid.link/20240906223010.2194591-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 11 Sep 2024 03:01:15 +0000 (20:01 -0700)]
Merge tag 'mlx5-updates-2024-09-02' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2024-08-29
HW-Managed Flow Steering in mlx5 driver
Yevgeny Kliteynik says:
=======================
1. Overview
-----------
ConnectX devices support packet matching, modification, and redirection.
This functionality is referred as Flow Steering.
To configure a steering rule, the rule is written to the device-owned
memory. This memory is accessed and cached by the device when processing
a packet.
The first implementation of Flow Steering was done in FW, and it is
referred in the mlx5 driver as Device-Managed Flow Steering (DMFS).
Later we introduced SW-managed Flow Steering (SWS or SMFS), where the
driver is writing directly to the device's configuration memory (ICM)
through RC QP using RDMA operations (RDMA-read and RDAM-write), thus
achieving higher rates of rule insertion/deletion.
Now we introduce a new flow steering implementation: HW-Managed Flow
Steering (HWS or HMFS).
In this new approach, the driver is configuring steering rules directly
to the HW using the WQs with a special new type of WQE. This way we can
reach higher rule insertion/deletion rate with much lower CPU utilization
compared to SWS.
The key benefits of HWS as opposed to SWS:
+ HW manages the steering decision tree
- HW calculates CRC for each entry
- HW handles tree hash collisions
- HW & FW manage objects refcount
+ HW keeps cache coherency:
- HW provides tree access locking and synchronization
- HW provides notification on completion
+ Insertion rate isn’t affected by background traffic
- Dedicated HW components that handle insertion
2. Performance
--------------
Measuring Connection Tracking with simple IPv4 flows w/o NAT, we
are able to get ~5 times more flows offloaded per second using HWS.
3. Configuration
----------------
The enablement of HWS mode in eswitch manager is done using the same
devlink param that is already used for switching between FW-managed
steering and SW-managed steering modes:
# devlink dev param set pci/<PCI_ID> name flow_steering_mode cmod runtime value hmfs
4. Upstream Submission
----------------------
HWS support consists of 3 main components:
+ Steering:
- The lower layer that exposes HWS API to upper layers and implements
all the management of flow steering building blocks
+ FS-Core
- Implementation of fs_hws layer to enable fs_core to use HWS instead
of FW or SW steering
- Create HW steering action pools to utilize the ability of HWS to
share steering actions among different rules
- Add support for configuring HWS mode through devlink command,
similar to configuring SWS mode
+ Connection Tracking
- Implementation of CT support for HW steering
- Hooks up the CT ops for the new steering mode and uses the HWS API
to implement connection tracking.
Because of the large number of patches, we need to perform the submission
in several separate patch series. This series is the first submission that
lays the ground work for the next submissions, where an actual user of HWS
will be added.
5. Patches in this series
-------------------------
This patch series contains implementation of the first bullet from above.
=======================
* tag 'mlx5-updates-2024-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: HWS, added API and enabled HWS support
net/mlx5: HWS, added send engine and context handling
net/mlx5: HWS, added debug dump and internal headers
net/mlx5: HWS, added backward-compatible API handling
net/mlx5: HWS, added memory management handling
net/mlx5: HWS, added vport handling
net/mlx5: HWS, added modify header pattern and args handling
net/mlx5: HWS, added FW commands handling
net/mlx5: HWS, added matchers functionality
net/mlx5: HWS, added definers handling
net/mlx5: HWS, added rules handling
net/mlx5: HWS, added tables handling
net/mlx5: HWS, added actions handling
net/mlx5: Added missing definitions in preparation for HW Steering
net/mlx5: Added missing mlx5_ifc definition for HW Steering
====================
Link: https://patch.msgid.link/20240909181250.41596-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 11 Sep 2024 02:00:47 +0000 (19:00 -0700)]
Merge tag 'ipsec-next-2024-09-10' of git://git./linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2024-09-10
1) Remove an unneeded WARN_ON on packet offload.
From Patrisious Haddad.
2) Add a copy from skb_seq_state to buffer function.
This is needed for the upcomming IPTFS patchset.
From Christian Hopps.
3) Spelling fix in xfrm.h.
From Simon Horman.
4) Speed up xfrm policy insertions.
From Florian Westphal.
5) Add and revert a patch to support xfrm interfaces
for packet offload. This patch was just half cooked.
6) Extend usage of the new xfrm_policy_is_dead_or_sk helper.
From Florian Westphal.
7) Update comments on sdb and xfrm_policy.
From Florian Westphal.
8) Fix a null pointer dereference in the new policy insertion
code From Florian Westphal.
9) Fix an uninitialized variable in the new policy insertion
code. From Nathan Chancellor.
* tag 'ipsec-next-2024-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: policy: Restore dir assignments in xfrm_hash_rebuild()
xfrm: policy: fix null dereference
Revert "xfrm: add SA information to the offloaded packet"
xfrm: minor update to sdb and xfrm_policy comments
xfrm: policy: use recently added helper in more places
xfrm: add SA information to the offloaded packet
xfrm: policy: remove remaining use of inexact list
xfrm: switch migrate to xfrm_policy_lookup_bytype
xfrm: policy: don't iterate inexact policies twice at insert time
selftests: add xfrm policy insertion speed test script
xfrm: Correct spelling in xfrm.h
net: add copy from skb_seq_state to buffer function
xfrm: Remove documentation WARN_ON to limit return values for offloaded SA
====================
Link: https://patch.msgid.link/20240910065507.2436394-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 11 Sep 2024 01:42:48 +0000 (18:42 -0700)]
Merge branch 'bnxt_en-msix-improvements'
Michael Chan says:
====================
bnxt_en: MSIX improvements
This patchset makes some improvements related to MSIX. The first
patch adjusts the default MSIX vectors assigned for RoCE. On the
PF, the number of MSIX is increased to 64 from the current 9. The
second patch allocates additional MSIX vectors ahead of time when
changing ethtool channels if dynamic MSIX is supported. The 3rd
patch makes sure that the IRQ name is not truncated.
====================
Link: https://patch.msgid.link/20240909202737.93852-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Edwin Peer [Mon, 9 Sep 2024 20:27:37 +0000 (13:27 -0700)]
bnxt_en: resize bnxt_irq name field to fit format string
The name field of struct bnxt_irq is written using snprintf in
bnxt_setup_msix(). Make the field large enough to fit the maximal
formatted string to prevent truncation. Truncated IRQ names are
less meaningful to the user. For example, "enp4s0f0np0-TxRx-0"
gets truncated to "enp4s0f0np0-TxRx-" with the existing code.
Make sure we have space for the extra characters added to the IRQ
names:
- the characters introduced by the static format string: hyphens
- the maximal static substituted ring type string: "TxRx"
- the maximum length of an integer formatted as a string, even
though reasonable ring numbers would never be as long as this.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240909202737.93852-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Mon, 9 Sep 2024 20:27:36 +0000 (13:27 -0700)]
bnxt_en: Add MSIX check in bnxt_check_rings()
bnxt_check_rings() is called to ensure that we have the hardware ring
resources before committing to reinitialize with the new number of
rings. MSIX vectors are never checked at this point, because up
until recently we must first disable MSIX before we can allocate the
new set of MSIX vectors.
Now that we support dynamic MSIX allocation, check to make sure we
can dynamically allocate the new MSIX vectors as the last step in
bnxt_check_rings() if dynamic MSIX is supported.
For example, the IOMMU group may limit the number of MSIX vectors
for the device. With this patch, the ring change will fail more
gracefully when there is not enough MSIX vectors.
It is also better to move bnxt_check_rings() to be called as the last
step when changing ethtool rings.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240909202737.93852-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Mon, 9 Sep 2024 20:27:35 +0000 (13:27 -0700)]
bnxt_en: Increase the number of MSIX vectors for RoCE device
If RocE is supported on the device, set the number of RoCE MSIX vectors
to the number of online CPUs + 1 and capped at these maximums:
VF: 2
NPAR: 5
PF: 64
For the PF, the maximum is now increased from the previous value
of 9 to get better performance for kernel applications.
Remove the unnecessary check for BNXT_FLAG_ROCE_CAP.
bnxt_set_dflt_ulp_msix() will only be called if the flag is set.
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240909202737.93852-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rob Herring (Arm) [Mon, 9 Sep 2024 17:23:42 +0000 (12:23 -0500)]
net: amlogic,meson-dwmac: Fix "amlogic,tx-delay-ns" schema
The "amlogic,tx-delay-ns" property schema has unnecessary type reference
as it's a standard unit suffix, and the constraints are in freeform
text rather than schema.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Link: https://patch.msgid.link/20240909172342.487675-2-robh@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 11 Sep 2024 01:34:54 +0000 (18:34 -0700)]
Merge branch 'net-xilinx-axienet-partial-checksum-offload-improvements'
Sean Anderson says:
====================
net: xilinx: axienet: Partial checksum offload improvements
Partial checksum offload is not always used when it could be.
Enable it in more cases.
====================
Link: https://patch.msgid.link/20240909161016.1149119-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sean Anderson [Mon, 9 Sep 2024 16:10:16 +0000 (12:10 -0400)]
net: xilinx: axienet: Relax partial rx checksum checks
The partial rx checksum feature computes a checksum over the entire
packet, regardless of the L3 protocol. Remove the check for IPv4.
Additionally, testing with csum.py (from kselftests) shows no anomalies
with 64-byte packets, so we can remove that check as well.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240909161016.1149119-5-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sean Anderson [Mon, 9 Sep 2024 16:10:15 +0000 (12:10 -0400)]
net: xilinx: axienet: Set RXCSUM in features
When it is supported by hardware, we enable receive checksum offload
unconditionally. Update features to reflect this.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Link: https://patch.msgid.link/20240909161016.1149119-4-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sean Anderson [Mon, 9 Sep 2024 16:10:14 +0000 (12:10 -0400)]
net: xilinx: axienet: Enable NETIF_F_HW_CSUM for partial tx checksumming
Partial tx chechsumming is completely generic and does not depend on the
L3/L4 protocol. Signal this to the net subsystem by enabling the
more-generic offload feature (instead of restricting ourselves to
TCP/UDP over IPv4 checksumming only like is necessary with full
checksumming).
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240909161016.1149119-3-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sean Anderson [Mon, 9 Sep 2024 16:10:13 +0000 (12:10 -0400)]
net: xilinx: axienet: Remove unused checksum variables
These variables are set but never used. Remove them.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Link: https://patch.msgid.link/20240909161016.1149119-2-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Colin Ian King [Mon, 9 Sep 2024 13:46:12 +0000 (14:46 +0100)]
rtase: Fix spelling mistake: "tx_underun" -> "tx_underrun"
There is a spelling mistake in the struct field tx_underun, rename
it to tx_underrun.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240909134612.63912-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Colin Ian King [Mon, 9 Sep 2024 14:00:21 +0000 (15:00 +0100)]
r8169: Fix spelling mistake: "tx_underun" -> "tx_underrun"
There is a spelling mistake in the struct field tx_underun, rename
it to tx_underrun.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/20240909140021.64884-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dave Taht [Mon, 9 Sep 2024 09:16:28 +0000 (11:16 +0200)]
sch_cake: constify inverse square root cache
sch_cake uses a cache of the first 16 values of the inverse square root
calculation for the Cobalt AQM to save some cycles on the fast path.
This cache is populated when the qdisc is first loaded, but there's
really no reason why it can't just be pre-populated. So change it to be
pre-populated with constants, which also makes it possible to constify
it.
This gives a modest space saving for the module (not counting debug data):
.text: -224 bytes
.rodata: +80 bytes
.bss: -64 bytes
Total: -192 bytes
Signed-off-by: Dave Taht <dave.taht@gmail.com>
[ fixed up comment, rewrote commit message ]
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20240909091630.22177-1-toke@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pieter Van Trappen [Mon, 9 Sep 2024 13:42:59 +0000 (15:42 +0200)]
net: dsa: microchip: update tag_ksz masks for KSZ9477 family
Remove magic number 7 by introducing a GENMASK macro instead.
Remove magic number 0x80 by using the BIT macro instead.
Signed-off-by: Pieter Van Trappen <pieter.van.trappen@cern.ch>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20240909134301.75448-1-vtpieter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 10 Sep 2024 23:55:24 +0000 (16:55 -0700)]
Merge branch 'net-timestamp-introduce-a-flag-to-filter-out-rx-software-and-hardware-report'
Jason Xing says:
====================
net-timestamp: introduce a flag to filter out rx software and hardware report
When one socket is set SOF_TIMESTAMPING_RX_SOFTWARE which means the
whole system turns on the netstamp_needed_key button, other sockets
that only have SOF_TIMESTAMPING_SOFTWARE will be affected and then
print the rx timestamp information even without setting
SOF_TIMESTAMPING_RX_SOFTWARE generation flag.
How to solve it without breaking users?
We introduce a new flag named SOF_TIMESTAMPING_OPT_RX_FILTER. Using
it together with SOF_TIMESTAMPING_SOFTWARE can stop reporting the
rx software timestamp.
Similarly, we also filter out the hardware case where one process
enables the rx hardware generation flag, then another process only
passing SOF_TIMESTAMPING_RAW_HARDWARE gets the timestamp. So we can set
both SOF_TIMESTAMPING_RAW_HARDWARE and SOF_TIMESTAMPING_OPT_RX_FILTER
to stop reporting rx hardware timestamp after this patch applied.
v6: https://lore.kernel.org/
20240906095640.77533-1-kerneljasonxing@gmail.com
v5: https://lore.kernel.org/
20240905071738.3725-1-kerneljasonxing@gmail.com
v4: https://lore.kernel.org/
20240830153751.86895-1-kerneljasonxing@gmail.com
v3: https://lore.kernel.org/
20240828160145.68805-1-kerneljasonxing@gmail.com
v2: https://lore.kernel.org/
20240825152440.93054-1-kerneljasonxing@gmail.com
====================
Link: https://patch.msgid.link/20240909015612.3856-1-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jason Xing [Mon, 9 Sep 2024 01:56:12 +0000 (09:56 +0800)]
net-timestamp: add selftests for SOF_TIMESTAMPING_OPT_RX_FILTER
Test a few possible cases where we use SOF_TIMESTAMPING_OPT_RX_FILTER
with software or hardware report/generation flag.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240909015612.3856-3-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jason Xing [Mon, 9 Sep 2024 01:56:11 +0000 (09:56 +0800)]
net-timestamp: introduce SOF_TIMESTAMPING_OPT_RX_FILTER flag
introduce a new flag SOF_TIMESTAMPING_OPT_RX_FILTER in the receive
path. User can set it with SOF_TIMESTAMPING_SOFTWARE to filter
out rx software timestamp report, especially after a process turns on
netstamp_needed_key which can time stamp every incoming skb.
Previously, we found out if an application starts first which turns on
netstamp_needed_key, then another one only passing SOF_TIMESTAMPING_SOFTWARE
could also get rx timestamp. Now we handle this case by introducing this
new flag without breaking users.
Quoting Willem to explain why we need the flag:
"why a process would want to request software timestamp reporting, but
not receive software timestamp generation. The only use I see is when
the application does request
SOF_TIMESTAMPING_SOFTWARE | SOF_TIMESTAMPING_TX_SOFTWARE."
Similarly, this new flag could also be used for hardware case where we
can set it with SOF_TIMESTAMPING_RAW_HARDWARE, then we won't receive
hardware receive timestamp.
Another thing about errqueue in this patch I have a few words to say:
In this case, we need to handle the egress path carefully, or else
reporting the tx timestamp will fail. Egress path and ingress path will
finally call sock_recv_timestamp(). We have to distinguish them.
Errqueue is a good indicator to reflect the flow direction.
Suggested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240909015612.3856-2-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jason Xing [Sun, 8 Sep 2024 12:41:41 +0000 (20:41 +0800)]
net-timestamp: correct the use of SOF_TIMESTAMPING_RAW_HARDWARE
SOF_TIMESTAMPING_RAW_HARDWARE is a report flag which passes the
timestamps generated by either SOF_TIMESTAMPING_TX_HARDWARE or
SOF_TIMESTAMPING_RX_HARDWARE to the userspace all the time.
So let us revise the doc here.
Link: https://lore.kernel.org/all/66d8c21d3042a_163d93294cb@willemb.c.googlers.com.notmuch/
Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Link: https://patch.msgid.link/20240908124141.39628-1-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 10 Sep 2024 23:42:14 +0000 (16:42 -0700)]
Merge branch 'net-stmmac-fpe-via-ethtool-tc'
Furong Xu says:
====================
net: stmmac: FPE via ethtool + tc
Move the Frame Preemption(FPE) over to the new standard API which uses
ethtool-mm/tc-mqprio/tc-taprio.
====================
Link: https://patch.msgid.link/cover.1725631883.git.0x1207@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Furong Xu [Fri, 6 Sep 2024 14:30:12 +0000 (22:30 +0800)]
net: stmmac: silence FPE kernel logs
ethtool --show-mm can get real-time state of FPE.
fpe_irq_status logs should keep quiet.
tc-taprio can always query driver state, delete unbalanced logs.
Signed-off-by: Furong Xu <0x1207@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/39943d7967f291674a97ef0572878aca273087e9.1725631883.git.0x1207@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Furong Xu [Fri, 6 Sep 2024 14:30:11 +0000 (22:30 +0800)]
net: stmmac: support fp parameter of tc-taprio
tc-taprio can select whether traffic classes are express or preemptible.
0) tc qdisc add dev eth1 parent root handle 100 taprio \
num_tc 4 \
map 0 1 2 3 2 2 2 2 2 2 2 2 2 2 2 3 \
queues 1@0 1@1 1@2 1@3 \
base-time
1000000000 \
sched-entry S 03
10000000 \
sched-entry S 0e
10000000 \
flags 0x2 fp P E E E
1) After some traffic tests, MAC merge layer statistics are all good.
Local device:
[ {
"ifname": "eth1",
"pmac-enabled": true,
"tx-enabled": true,
"tx-active": true,
"tx-min-frag-size": 60,
"rx-min-frag-size": 60,
"verify-enabled": true,
"verify-time": 100,
"max-verify-time": 128,
"verify-status": "SUCCEEDED",
"statistics": {
"MACMergeFrameAssErrorCount": 0,
"MACMergeFrameSmdErrorCount": 0,
"MACMergeFrameAssOkCount": 0,
"MACMergeFragCountRx": 0,
"MACMergeFragCountTx": 17837,
"MACMergeHoldCount": 18639
}
} ]
Remote device:
[ {
"ifname": "end1",
"pmac-enabled": true,
"tx-enabled": true,
"tx-active": true,
"tx-min-frag-size": 60,
"rx-min-frag-size": 60,
"verify-enabled": true,
"verify-time": 100,
"max-verify-time": 128,
"verify-status": "SUCCEEDED",
"statistics": {
"MACMergeFrameAssErrorCount": 0,
"MACMergeFrameSmdErrorCount": 0,
"MACMergeFrameAssOkCount": 17189,
"MACMergeFragCountRx": 17837,
"MACMergeFragCountTx": 0,
"MACMergeHoldCount": 0
}
} ]
Tested on DWMAC CORE 5.10a
Signed-off-by: Furong Xu <0x1207@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/0d21ae356fb3cab77337527e87d46748a4852055.1725631883.git.0x1207@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Furong Xu [Fri, 6 Sep 2024 14:30:10 +0000 (22:30 +0800)]
net: stmmac: support fp parameter of tc-mqprio
tc-mqprio can select whether traffic classes are express or preemptible.
After some traffic tests, MAC merge layer statistics are all good.
Local device:
ethtool --include-statistics --json --show-mm eth1
[ {
"ifname": "eth1",
"pmac-enabled": true,
"tx-enabled": true,
"tx-active": true,
"tx-min-frag-size": 60,
"rx-min-frag-size": 60,
"verify-enabled": true,
"verify-time": 100,
"max-verify-time": 128,
"verify-status": "SUCCEEDED",
"statistics": {
"MACMergeFrameAssErrorCount": 0,
"MACMergeFrameSmdErrorCount": 0,
"MACMergeFrameAssOkCount": 0,
"MACMergeFragCountRx": 0,
"MACMergeFragCountTx": 35105,
"MACMergeHoldCount": 0
}
} ]
Remote device:
ethtool --include-statistics --json --show-mm end1
[ {
"ifname": "end1",
"pmac-enabled": true,
"tx-enabled": true,
"tx-active": true,
"tx-min-frag-size": 60,
"rx-min-frag-size": 60,
"verify-enabled": true,
"verify-time": 100,
"max-verify-time": 128,
"verify-status": "SUCCEEDED",
"statistics": {
"MACMergeFrameAssErrorCount": 0,
"MACMergeFrameSmdErrorCount": 0,
"MACMergeFrameAssOkCount": 35105,
"MACMergeFragCountRx": 35105,
"MACMergeFragCountTx": 0,
"MACMergeHoldCount": 0
}
} ]
Tested on DWMAC CORE 5.10a
Signed-off-by: Furong Xu <0x1207@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/592965ea93ed8240f0a1b8f6f8ebb8914f69419b.1725631883.git.0x1207@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Furong Xu [Fri, 6 Sep 2024 14:30:09 +0000 (22:30 +0800)]
net: stmmac: configure FPE via ethtool-mm
Implement ethtool --show-mm and --set-mm callbacks.
NIC up/down, link up/down, suspend/resume, kselftest-ethtool_mm,
all tested okay.
Signed-off-by: Furong Xu <0x1207@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/06ed409314fe0ee37b78b800922f2c0cce762532.1725631883.git.0x1207@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Furong Xu [Fri, 6 Sep 2024 14:30:08 +0000 (22:30 +0800)]
net: stmmac: refactor FPE verification process
Drop driver defined stmmac_fpe_state, and switch to common
ethtool_mm_verify_status for local TX verification status.
Local side and remote side verification processes are completely
independent. There is no reason at all to keep a local state and
a remote state.
Add a spinlock to avoid races among ISR, timer, link update
and register configuration.
This patch is based on Vladimir Oltean's proposal.
Vladimir Oltean says:
====================
In the INITIAL state, the timer sends MPACKET_VERIFY. Eventually the
stmmac_fpe_event_status() IRQ fires and advances the state to VERIFYING,
then rearms the timer after verify_time ms. If a subsequent IRQ comes in
and modifies the state to SUCCEEDED after getting MPACKET_RESPONSE, the
timer sees this. It must enable the EFPE bit now. Otherwise, it
decrements the verify_limit counter and tries again. Eventually it
moves the status to FAILED, from which the IRQ cannot move it anywhere
else, except for another stmmac_fpe_apply() call.
====================
Co-developed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Furong Xu <0x1207@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/151f86c8428eba967039718c6bf90a7d841e703b.1725631883.git.0x1207@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Furong Xu [Fri, 6 Sep 2024 14:30:07 +0000 (22:30 +0800)]
net: stmmac: drop stmmac_fpe_handshake
ethtool --set-mm can trigger FPE verification process by calling
stmmac_fpe_send_mpacket, stmmac_fpe_handshake should be gone.
Signed-off-by: Furong Xu <0x1207@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/42018b1a15eb3ced567fd6a73798c7cd4e08799a.1725631883.git.0x1207@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Furong Xu [Fri, 6 Sep 2024 14:30:06 +0000 (22:30 +0800)]
net: stmmac: move stmmac_fpe_cfg to stmmac_priv data
By moving the fpe_cfg field to the stmmac_priv data, stmmac_fpe_cfg
becomes platform-data eventually, instead of a run-time config.
Suggested-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Furong Xu <0x1207@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Link: https://patch.msgid.link/d9b3d7ecb308c5e39778a4c8ae9df288a2754379.1725631883.git.0x1207@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alexander Dahl [Fri, 6 Sep 2024 06:22:56 +0000 (08:22 +0200)]
net: mdiobus: Debug print fwnode handle instead of raw pointer
Was slightly misleading before, because printed is pointer to fwnode,
not to phy device, as placement in message suggested. Include header
for dev_dbg() declaration while at it.
Output before:
[ +0.001247] mdio_bus
f802c000.ethernet-
ffffffff: registered phy
2612f00a fwnode at address 3
Output after:
[ +0.001229] mdio_bus
f802c000.ethernet-
ffffffff: registered phy fwnode /ahb/apb/ethernet@
f802c000/ethernet-phy@3 at address 3
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Alexander Dahl <ada@thorsis.com>
Link: https://patch.msgid.link/20240906062256.11289-1-ada@thorsis.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
D. Wythe [Fri, 6 Sep 2024 02:35:35 +0000 (10:35 +0800)]
net/smc: add sysctl for smc_limit_hs
In commit
48b6190a0042 ("net/smc: Limit SMC visits when handshake workqueue congested"),
we introduce a mechanism to put constraint on SMC connections visit
according to the pressure of SMC handshake process.
At that time, we believed that controlling the feature through netlink
was sufficient. However, most people have realized now that netlink is
not convenient in container scenarios, and sysctl is a more suitable
approach.
In addition, since commit
462791bbfa35 ("net/smc: add sysctl interface for SMC")
had introcuded smc_sysctl_net_init(), it is reasonable for us to
initialize limit_smc_hs in it instead of initializing it in
smc_pnet_net_int().
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Wen Gu <guwen@linux.alibaba.com>
Reviewed-by: Jan Karcher <jaka@linux.ibm.com>
Link: https://patch.msgid.link/1725590135-5631-1-git-send-email-alibuda@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lee Trager [Thu, 5 Sep 2024 23:37:51 +0000 (16:37 -0700)]
eth: fbnic: Add devlink firmware version info
This adds support to show firmware version information for both stored and
running firmware versions. The version and commit is displayed separately
to aid monitoring tools which only care about the version.
Example output:
# devlink dev info
pci/0000:01:00.0:
driver fbnic
serial_number 88-25-08-ff-ff-01-50-92
versions:
running:
fw 24.07.15-017
fw.commit h999784ae9df0
fw.bootloader 24.07.10-000
fw.bootloader.commit hfef3ac835ce7
stored:
fw 24.07.24-002
fw.commit hc9d14a68b3f2
fw.bootloader 24.07.22-000
fw.bootloader.commit h922f8493eb96
fw.undi 01.00.03-000
Signed-off-by: Lee Trager <lee@trager.us>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240905233820.1713043-1-lee@trager.us
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Tue, 10 Sep 2024 09:04:18 +0000 (11:04 +0200)]
Merge branch 'net-lan966x-use-the-newly-introduced-fdma-library'
Daniel Machon says:
====================
net: lan966x: use the newly introduced FDMA library
This patch series is the second of a 2-part series [1], that adds a new
common FDMA library for Microchip switch chips Sparx5 and lan966x. These
chips share the same FDMA engine, and as such will benefit from a common
library with a common implementation. This also has the benefit of
removing a lot of open-coded bookkeeping and duplicate code for the two
drivers.
In this second series, the FDMA library will be taken into use by the
lan966x switch driver.
###################
# Example of use: #
###################
- Initialize the rx and tx fdma structs with values for: number of
DCB's, number of DB's, channel ID, DB size (data buffer size), and
total size of the requested memory. Also provide two callbacks:
nextptr_cb() and dataptr_cb() for getting the nextptr and dataptr.
- Allocate memory using fdma_alloc_phys() or fdma_alloc_coherent().
- Initialize the DCB's with fdma_dcb_init().
- Add new DCB's with fdma_dcb_add().
- Free memory with fdma_free_phys() or fdma_free_coherent().
#####################
# Patch breakdown: #
#####################
Patch #1: select FDMA library for lan966x.
Patch #2: includes the fdma_api.h header and removes old symbols.
Patch #3: replaces old rx and tx variables with equivalent ones from the
fdma struct. Only the variables that can be changed without
breaking traffic is changed in this patch.
Patch #4: uses the library for allocation of rx buffers. This requires
quite a bit of refactoring in this single patch.
Patch #5: uses the library for adding DCB's in the rx path.
Patch #6: uses the library for freeing rx buffers.
Patch #7: uses the library for allocation of tx buffers. This requires
quite a bit of refactoring in this single patch.
Patch #8: uses the library for adding DCB's in the tx path.
Patch #9: uses the library helpers in the tx path.
Patch #10: ditch last_in_use variable and use library instead.
Patch #11: uses library helpers throughout.
Patch #12: refactor lan966x_fdma_reload() function.
[1] https://lore.kernel.org/netdev/
20240902-fdma-sparx5-v1-0-
1e7d5e5a9f34@microchip.com/
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
====================
Link: https://patch.msgid.link/20240905-fdma-lan966x-v1-0-e083f8620165@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Machon [Thu, 5 Sep 2024 08:06:40 +0000 (10:06 +0200)]
net: lan966x: refactor buffer reload function
Now that we store everything in the fdma structs, refactor
lan966x_fdma_reload() to store and restore the entire struct.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Machon [Thu, 5 Sep 2024 08:06:39 +0000 (10:06 +0200)]
net: lan966x: use a few FDMA helpers throughout
The library provides helpers for a number of DCB and DB operations. Use
these throughout the code and remove the old ones.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Machon [Thu, 5 Sep 2024 08:06:38 +0000 (10:06 +0200)]
net: lan966x: ditch tx->last_in_use variable
This variable is used in the tx path to determine the last used DCB. The
library has the variable last_dcb for the exact same purpose. Ditch the
last_in_use variable throughout.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Machon [Thu, 5 Sep 2024 08:06:37 +0000 (10:06 +0200)]
net: lan966x: use library helper for freeing tx buffers
The library has the helper fdma_free_phys() for freeing physical FDMA
memory. Use it in the exit path.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Machon [Thu, 5 Sep 2024 08:06:36 +0000 (10:06 +0200)]
net: lan966x: use FDMA library for adding DCB's in the tx path
Use the fdma_dcb_add() function to add DCB's in the tx path. This gets
rid of the open-coding of nextptr and dataptr handling and leaves it to
the library.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Machon [Thu, 5 Sep 2024 08:06:35 +0000 (10:06 +0200)]
net: lan966x: use the FDMA library for allocation of tx buffers
Use the two functions: fdma_alloc_phys() and fdma_dcb_init() for rx
buffer allocation and use the new buffers throughout.
In order to replace the old buffers with the new ones, we have to do the
following refactoring:
- use fdma_alloc_phys() and fdma_dcb_init()
- replace the variables: tx->dma, tx->dcbs and tx->curr_entry
with the equivalents from the FDMA struct.
- add lan966x_fdma_tx_dataptr_cb callback for obtaining the dataptr.
- Initialize FDMA struct values.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Machon [Thu, 5 Sep 2024 08:06:34 +0000 (10:06 +0200)]
net: lan966x: use library helper for freeing rx buffers
The library has the helper fdma_free_phys() for freeing physical FDMA
memory. Use it in the exit path.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Machon [Thu, 5 Sep 2024 08:06:33 +0000 (10:06 +0200)]
net: lan966x: use FDMA library for adding DCB's in the rx path
Use the fdma_dcb_add() function to add DCB's in the rx path. This gets
rid of the open-coding of nextptr and dataptr handling and the functions
for adding DCB's.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Machon [Thu, 5 Sep 2024 08:06:32 +0000 (10:06 +0200)]
net: lan966x: use the FDMA library for allocation of rx buffers
Use the two functions: fdma_alloc_phys() and fdma_dcb_init() for rx
buffer allocation and use the new buffers throughout.
In order to replace the old buffers with the new ones, we have to do the
following refactoring:
- use fdma_alloc_phys() and fdma_dcb_init()
- replace the variables: rx->dma, rx->dcbs and rx->last_entry
with the equivalents from the FDMA struct.
- make use of fdma->db_size for rx buffer size.
- add lan966x_fdma_rx_dataptr_cb callback for obtaining the dataptr.
- Initialize FDMA struct values.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Machon [Thu, 5 Sep 2024 08:06:31 +0000 (10:06 +0200)]
net: lan966x: replace a few variables with new equivalent ones
Replace the old rx and tx variables: channel_id, FDMA_DCB_MAX,
FDMA_RX_DCB_MAX_DBS, FDMA_TX_DCB_MAX_DBS, dcb_index and db_index with
the equivalents from the FDMA rx and tx structs. These variables are not
entangled in any buffer allocation and can therefore be replaced in
advance.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Machon [Thu, 5 Sep 2024 08:06:30 +0000 (10:06 +0200)]
net: lan966x: use FDMA library symbols
Include and use the new FDMA header, which now provides the required
masks and bit offsets for operating on the DCB's and DB's.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Machon [Thu, 5 Sep 2024 08:06:29 +0000 (10:06 +0200)]
net: lan966x: select FDMA library
Select the newly introduced FDMA library.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Tue, 10 Sep 2024 02:20:42 +0000 (19:20 -0700)]
Merge branch 'ionic-convert-rx-queue-buffers-to-use-page_pool'
Brett Creeley says:
====================
ionic: convert Rx queue buffers to use page_pool
Our home-grown buffer management needs to go away and we need to play
nicely with the page_pool infrastructure. This patchset cleans up some
of our API use and converts the Rx traffic queues to use page_pool.
The first few patches are for tidying up things, then a small XDP
configuration refactor, adding page_pool support, and finally adding
support to hot swap an XDP program without having to reconfigure
anything.
The result is code that more closely follows current patterns, as well as
a either a performance boost or equivalent performance as seen with
iperf testing:
mss netio tx_pps rx_pps total_pps tx_bw rx_bw total_bw
---- ------- ---------- ---------- ----------- ------- ------- ----------
Before:
256 bidir 13,839,293 15,515,227 29,354,520 34 38 71
512 bidir 13,913,249 14,671,693 28,584,942 62 65 127
1024 bidir 13,006,189 13,695,413 26,701,602 109 115 224
1448 bidir 12,489,905 12,791,734 25,281,639 145 149 294
2048 bidir 9,195,622 9,247,649 18,443,271 148 149 297
4096 bidir 5,149,716 5,247,917 10,397,633 160 163 323
8192 bidir 3,029,993 3,008,882 6,038,875 179 179 358
9000 bidir 2,789,358 2,800,744 5,590,102 181 180 361
After:
256 bidir 21,540,037 21,344,644 42,884,681 52 52 104
512 bidir 23,170,014 19,207,260 42,377,274 103 85 188
1024 bidir 17,934,280 17,819,247 35,753,527 150 149 299
1448 bidir 15,242,515 14,907,030 30,149,545 167 174 341
2048 bidir 10,692,542 10,663,023 21,355,565 177 176 353
4096 bidir 6,024,977 6,083,580 12,108,557 187 180 367
8192 bidir 3,090,449 3,048,266 6,138,715 180 176 356
9000 bidir 2,859,146 2,864,226 5,723,372 178 180 358
v2: https://lore.kernel.org/
20240826184422.21895-1-brett.creeley@amd.com
v1: https://lore.kernel.org/
20240625165658.34598-1-shannon.nelson@amd.com
====================
Link: https://patch.msgid.link/20240906232623.39651-1-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Brett Creeley [Fri, 6 Sep 2024 23:26:23 +0000 (16:26 -0700)]
ionic: Allow XDP program to be hot swapped
Using examples of other driver(s), add the ability to hot-swap an XDP
program without having to reconfigure the queues. To prevent the
q->xdp_prog to be read/written more than once use READ_ONCE() and
WRITE_ONCE() on the q->xdp_prog.
The q->xdp_prog was being checked in multiple different for loops in the
hot path. The change to allow xdp_prog hot swapping created the
possibility for many READ_ONCE(q->xdp_prog) calls during a single napi
callback. Refactor the Rx napi handling to allow a previous
READ_ONCE(q->xdp_prog) (or NULL for hwstamp_rxq) to be passed into the
relevant functions.
Also, move other Rx related hotpath handling into the newly created
ionic_rx_cq_service() function to reduce the scope of the xdp_prog
local variable and put all Rx handling in one function similar to Tx.
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://patch.msgid.link/20240906232623.39651-8-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Fri, 6 Sep 2024 23:26:22 +0000 (16:26 -0700)]
ionic: convert Rx queue buffers to use page_pool
Our home-grown buffer management needs to go away and we need
to be playing nicely with the page_pool infrastructure. This
converts the Rx traffic queues to use page_pool.
Also, since ionic_rx_buf_size() was removed, redefine
IONIC_PAGE_SIZE to account for IONIC_MAX_BUF_LEN being the
largest allowed buffer to prevent overflowing u16 variables,
which could happen when PAGE_SIZE is defined as >= 64KB.
include/linux/minmax.h:93:37: warning: conversion from 'long unsigned int' to 'u16' {aka 'short unsigned int'} changes value from '65536' to '0' [-Woverflow]
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Link: https://patch.msgid.link/20240906232623.39651-7-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Brett Creeley [Fri, 6 Sep 2024 23:26:21 +0000 (16:26 -0700)]
ionic: Fully reconfigure queues when going to/from a NULL XDP program
Currently when going to/from a NULL XDP program the driver uses
ionic_stop_queues_reconfig() and then ionic_start_queues_reconfig() in
order to re-register the xdp_rxq_info and re-init the queues. This is
fine until page_pool(s) are used in an upcoming patch.
In preparation for adding page_pool support make sure to completely
rebuild the queues when going to/from a NULL XDP program. Without this
change the call to mem_allocator_disconnect() never happens when going
to a NULL XDP program, which eventually results in
xdp_rxq_info_reg_mem_model() failing with -ENOSPC due to the mem_id_pool
ida having no remaining space.
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://patch.msgid.link/20240906232623.39651-6-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Fri, 6 Sep 2024 23:26:20 +0000 (16:26 -0700)]
ionic: always use rxq_info
Instead of setting up and tearing down the rxq_info only when the XDP
program is loaded or unloaded, we will build the rxq_info whether or not
XDP is in use. This is the more common use pattern and better supports
future conversion to page_pool. Since the rxq_info wants the napi_id
we re-order things slightly to tie this into the queue init and deinit
functions where we do the add and delete of napi.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Link: https://patch.msgid.link/20240906232623.39651-5-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Fri, 6 Sep 2024 23:26:19 +0000 (16:26 -0700)]
ionic: use per-queue xdp_prog
We originally were using a per-interface xdp_prog variable to track
a loaded XDP program since we knew there would never be support for a
per-queue XDP program. With that, we only built the per queue rxq_info
struct when an XDP program was loaded and removed it on XDP program unload,
and used the pointer as an indicator in the Rx hotpath to know to how build
the buffers. However, that's really not the model generally used, and
makes a conversion to page_pool Rx buffer cacheing a little problematic.
This patch converts the driver to use the more common approach of using
a per-queue xdp_prog pointer to work out buffer allocations and need
for bpf_prog_run_xdp(). We jostle a couple of fields in the queue struct
in order to keep the new xdp_prog pointer in a warm cacheline.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Link: https://patch.msgid.link/20240906232623.39651-4-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Fri, 6 Sep 2024 23:26:18 +0000 (16:26 -0700)]
ionic: rename ionic_xdp_rx_put_bufs
We aren't "putting" buf, we're just unlinking them from our tracking in
order to let the XDP_TX and XDP_REDIRECT tx clean paths take care of the
pages when they are done with them. This rename clears up the intent.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Link: https://patch.msgid.link/20240906232623.39651-3-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shannon Nelson [Fri, 6 Sep 2024 23:26:17 +0000 (16:26 -0700)]
ionic: debug line for Tx completion errors
Here's a little debugging aid in case the device starts throwing
Tx completion errors.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Link: https://patch.msgid.link/20240906232623.39651-2-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 10 Sep 2024 00:44:52 +0000 (17:44 -0700)]
Merge branch 'rx-software-timestamp-for-all-round-3'
Gal Pressman says:
====================
RX software timestamp for all - round 3
Rounds 1 & 2 of drivers conversion were merged [1][2], this round will
complete the work.
[1] https://lore.kernel.org/netdev/
20240901112803.212753-1-gal@nvidia.com/
[2] https://lore.kernel.org/netdev/
20240904074922.256275-1-gal@nvidia.com/
====================
Link: https://patch.msgid.link/20240906144632.404651-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:32 +0000 (17:46 +0300)]
ptp: ptp_ines: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-17-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:31 +0000 (17:46 +0300)]
ixp4xx_eth: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://patch.msgid.link/20240906144632.404651-16-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:30 +0000 (17:46 +0300)]
net: stmmac: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-15-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:29 +0000 (17:46 +0300)]
sfc/siena: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://patch.msgid.link/20240906144632.404651-14-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:28 +0000 (17:46 +0300)]
sfc: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://patch.msgid.link/20240906144632.404651-13-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:27 +0000 (17:46 +0300)]
qede: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-12-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:26 +0000 (17:46 +0300)]
net: mscc: ocelot: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-11-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:25 +0000 (17:46 +0300)]
net/funeth: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-10-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:24 +0000 (17:46 +0300)]
enic: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-9-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:23 +0000 (17:46 +0300)]
net: thunderx: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-8-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:22 +0000 (17:46 +0300)]
liquidio: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-7-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:21 +0000 (17:46 +0300)]
net: macb: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Link: https://patch.msgid.link/20240906144632.404651-6-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:20 +0000 (17:46 +0300)]
amd-xgbe: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Link: https://patch.msgid.link/20240906144632.404651-5-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:19 +0000 (17:46 +0300)]
bonding: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240906144632.404651-4-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:18 +0000 (17:46 +0300)]
tg3: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20240906144632.404651-3-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Fri, 6 Sep 2024 14:46:17 +0000 (17:46 +0300)]
bnxt_en: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20240906144632.404651-2-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
MD Danish Anwar [Fri, 6 Sep 2024 09:36:49 +0000 (15:06 +0530)]
net: ti: icssg-prueth: Make pa_stats optional
pa_stats is optional in dt bindings, make it optional in driver as well.
Currently if pa_stats syscon regmap is not found driver returns -ENODEV.
Fix this by not returning an error in case pa_stats is not found and
continue generating ethtool stats without pa_stats.
Fixes:
550ee90ac61c ("net: ti: icssg-prueth: Add support for PA Stats")
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20240906093649.870883-1-danishanwar@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simon Horman [Fri, 6 Sep 2024 07:36:09 +0000 (08:36 +0100)]
net: ibm: emac: Use __iomem annotation for emac_[xg]aht_base
dev->emacp contains an __iomem pointer and values derived
from it are used as __iomem pointers. So use this annotation
in the return type for helpers that derive pointers from dev->emacp.
Flagged by Sparse as:
.../core.c:444:36: warning: incorrect type in argument 1 (different address spaces)
.../core.c:444:36: expected unsigned int volatile [noderef] [usertype] __iomem *addr
.../core.c:444:36: got unsigned int [usertype] *
.../core.c: note: in included file:
.../core.h:416:25: warning: cast removes address space '__iomem' of expression
Compile tested only.
No functional change intended.
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240906-emac-iomem-v1-1-207cc4f3fed0@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 10 Sep 2024 00:38:03 +0000 (17:38 -0700)]
Merge branch 'selftests-net-add-packetdrill'
Willem de Bruijn says:
====================
selftests/net: add packetdrill
Lay the groundwork to import into kselftests the over 150 packetdrill
TCP/IP conformance tests on github.com/google/packetdrill.
1/2: add kselftest infra for TEST_PROGS that need an interpreter
2/2: add the specific packetdrill tests
====================
Link: https://patch.msgid.link/20240905231653.2427327-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Willem de Bruijn [Thu, 5 Sep 2024 23:15:52 +0000 (19:15 -0400)]
selftests/net: integrate packetdrill with ksft
Lay the groundwork to import into kselftests the over 150 packetdrill
TCP/IP conformance tests on github.com/google/packetdrill.
Florian recently added support for packetdrill tests in nf_conntrack,
in commit
a8a388c2aae49 ("selftests: netfilter: add packetdrill based
conntrack tests").
This patch takes a slightly different approach. It relies on
ksft_runner.sh to run every *.pkt file in the directory.
Any future imports of packetdrill tests should require no additional
coding. Just add the *.pkt files.
Initially import only two features/directories from github. One with a
single script, and one with two. This was the only reason to pick
tcp/inq and tcp/md5.
The path replaces the directory hierarchy in github with a flat space
of files: $(subst /,_,$(wildcard tcp/**/*.pkt)). This is the most
straightforward option to integrate with kselftests. The Linked thread
reviewed two ways to maintain the hierarchy: TEST_PROGS_RECURSE and
PRESERVE_TEST_DIRS. But both introduce significant changes to
kselftest infra and with that risk to existing tests.
Implementation notes:
- restore alphabetical order when adding the new directory to
tools/testing/selftests/Makefile
- imported *.pkt files and support verbatim from the github project,
except for
- update `source ./defaults.sh` path (to adjust for flat dir)
- add SPDX headers
- remove one author statement
- Acknowledgment: drop an e (checkpatch)
Tested:
make -C tools/testing/selftests \
TARGETS=net/packetdrill \
run_tests
make -C tools/testing/selftests \
TARGETS=net/packetdrill \
install INSTALL_PATH=$KSFT_INSTALL_PATH
# in virtme-ng
./run_kselftest.sh -c net/packetdrill
./run_kselftest.sh -t net/packetdrill:tcp_inq_client.pkt
Link: https://lore.kernel.org/netdev/20240827193417.2792223-1-willemdebruijn.kernel@gmail.com/
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240905231653.2427327-3-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Willem de Bruijn [Thu, 5 Sep 2024 23:15:51 +0000 (19:15 -0400)]
selftests: support interpreted scripts with ksft_runner.sh
Support testcases that are themselves not executable, but need an
interpreter to run them.
If a test file is not executable, but an executable file
ksft_runner.sh exists in the TARGET dir, kselftest will run
./ksft_runner.sh ./$BASENAME_TEST
Packetdrill may add hundreds of packetdrill scripts for testing. These
scripts must be passed to the packetdrill process.
Have kselftest run each test directly, as it already solves common
runner requirements like parallel execution and isolation (netns).
A previous RFC added a wrapper in between, which would have to
reimplement such functionality.
Link: https://lore.kernel.org/netdev/66d4d97a4cac_3df182941a@willemb.c.googlers.com.notmuch/T/
Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240905231653.2427327-2-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>