Paolo Abeni [Tue, 18 Feb 2025 18:36:15 +0000 (19:36 +0100)]
mptcp: cleanup mem accounting
After the previous patch, updating sk_forward_memory is cheap and
we can drop a lot of complexity from the MPTCP memory accounting,
removing the custom fwd mem allocations for rmem.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250218-net-next-mptcp-rx-path-refactor-v1-4-4a47d90d7998@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Tue, 18 Feb 2025 18:36:14 +0000 (19:36 +0100)]
mptcp: move the whole rx path under msk socket lock protection
After commit
c2e6048fa1cf ("mptcp: fix race in release_cb") we can
move the whole MPTCP rx path under the socket lock leveraging the
release_cb.
We can drop a bunch of spin_lock pairs in the receive functions, use
a single receive queue and invoke __mptcp_move_skbs only when subflows
ask for it.
This will allow more cleanup in the next patch.
Some changes are worth specific mention:
The msk rcvbuf update now always happens under both the msk and the
subflow socket lock: we can drop a bunch of ONCE annotation and
consolidate the checks.
When the skbs move is delayed at msk release callback time, even the
msk rcvbuf update is delayed; additionally take care of such action in
__mptcp_move_skbs().
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250218-net-next-mptcp-rx-path-refactor-v1-3-4a47d90d7998@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Tue, 18 Feb 2025 18:36:13 +0000 (19:36 +0100)]
mptcp: drop __mptcp_fastopen_gen_msk_ackseq()
When we will move the whole RX path under the msk socket lock, updating
the already queued skb for passive fastopen socket at 3rd ack time will
be extremely painful and race prone
The map_seq for already enqueued skbs is used only to allow correct
coalescing with later data; preventing collapsing to the first skb of
a fastopen connect we can completely remove the
__mptcp_fastopen_gen_msk_ackseq() helper.
Before dropping this helper, a new item had to be added to the
mptcp_skb_cb structure. Because this item will be frequently tested in
the fast path -- almost on every packet -- and because there is free
space there, a single byte is used instead of a bitfield. This micro
optimisation slightly reduces the number of CPU operations to do the
associated check.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250218-net-next-mptcp-rx-path-refactor-v1-2-4a47d90d7998@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Tue, 18 Feb 2025 18:36:12 +0000 (19:36 +0100)]
mptcp: consolidate subflow cleanup
Consolidate all the cleanup actions requiring the worker in a single
helper and ensure the dummy data fin creation for fallback socket is
performed only when the tcp rx queue is empty.
There are no functional changes intended, but this will simplify the
next patch, when the tcp rx queue spooling could be delayed at release_cb
time.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250218-net-next-mptcp-rx-path-refactor-v1-1-4a47d90d7998@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dr. David Alan Gilbert [Wed, 19 Feb 2025 02:02:58 +0000 (02:02 +0000)]
nfc: hci: Remove unused nfc_llc_unregister
nfc_llc_unregister() has been unused since it was added in 2012's
commit
67cccfe17d1b ("NFC: Add an LLC Core layer to HCI")
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20250219020258.297995-1-linux@treblig.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Suchit [Tue, 18 Feb 2025 16:59:23 +0000 (22:29 +0530)]
selftests: net: Fix minor typos in MPTCP and psock tests
Fixes minor spelling errors:
- `simult_flows.sh`: "al testcases" -> "all testcases"
- `psock_tpacket.c`: "accross" -> "across"
Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250218165923.20740-1-suchitkarunakaran@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 20 Feb 2025 02:57:31 +0000 (18:57 -0800)]
Merge branch 'net-stmmac-further-cleanups'
Russell King says:
====================
net: stmmac: further cleanups
This small series does further cleanups to the stmmac driver:
1. Name priv->pause to indicate that it's a timeout and clarify the
units of the "pause" module parameter
2. Remove useless priv->flow_ctrl member and deprecate the useless
"flow_ctrl" module parameter
3. Fix long-standing signed-ness issue with "speed" passed around the
driver from the mac_link_up method.
====================
Link: https://patch.msgid.link/Z7Rf2daOaf778TOg@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Tue, 18 Feb 2025 10:24:39 +0000 (10:24 +0000)]
net: stmmac: "speed" passed to fix_mac_speed is an int
priv->plat->fix_mac_speed() is called from stmmac_mac_link_up(), which
is passed the speed as an "int". However, fix_mac_speed() implicitly
casts this to an unsigned int. Some platform glue code print this value
using %u, others with %d. Some implicitly cast it back to an int, and
others to u32.
Good practice is to use one type and only one type to represent a value
being passed around a driver.
Switch all of these over to consistently use "int" when dealing with a
speed passed from stmmac_mac_link_up(), even though the speed will
always be positive.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Chen-Yu Tsai <wens@csie.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Link: https://patch.msgid.link/E1tkKmN-004ObM-Ge@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Tue, 18 Feb 2025 10:24:34 +0000 (10:24 +0000)]
net: stmmac: remove useless priv->flow_ctrl
priv->flow_ctrl is only accessed by stmmac_main.c, and the only place
that it is read is in stmmac_mac_flow_ctrl(). This function is only
called from stmmac_mac_link_up() which always sets priv->flow_ctrl
immediately before calling this function.
Therefore, initialising this at probe time is ineffectual because it
will always be overwritten before it's read. As such, the "flow_ctrl"
module parameter has been useless for some time. Rather than remove
the module parameter, which would risk module load failure, change the
description to indicate that it is obsolete, and warn if it is set by
userspace.
Moreover, storing the value in the stmmac_priv has no benefit as it's
set and then immediately read stmmac_mac_flow_ctrl(). Instead, pass it
as a parameter to this function..
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Link: https://patch.msgid.link/E1tkKmI-004ObG-DL@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Tue, 18 Feb 2025 10:24:29 +0000 (10:24 +0000)]
net: stmmac: clarify priv->pause and pause module parameter
priv->pause corresponds with "pause_time" in the 802.3 specification,
and is also called "pause_time" in the various MAC backends. For
consistency, use the same name in the core stmmac code.
Clarify the units of the "pause" module parameter which sets up this
struct member to indicate that it's in units of the pause_quanta
defined by 802.3, which is 512 bit times.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Link: https://patch.msgid.link/E1tkKmD-004ObA-9K@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Erni Sri Satya Vennela [Tue, 18 Feb 2025 01:34:15 +0000 (17:34 -0800)]
net: mana: Add debug logs in MANA network driver
Add more logs to assist in debugging and monitoring
driver behaviour, making it easier to identify potential
issues during development and testing.
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1739842455-23899-1-git-send-email-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 20 Feb 2025 02:43:41 +0000 (18:43 -0800)]
Merge branch 'net-fib_rules-add-port-mask-support'
Ido Schimmel says:
====================
net: fib_rules: Add port mask support
In some deployments users would like to encode path information into
certain bits of the IPv6 flow label, the UDP source port and the DSCP
field and use this information to route packets accordingly.
Redirecting traffic to a routing table based on specific bits in the UDP
source port is not currently possible. Only exact match and range are
currently supported by FIB rules.
This patchset extends FIB rules to match on layer 4 ports with an
optional mask. The mask is not supported when matching on a range. A
future patchset will add support for matching on the DSCP field with an
optional mask.
Patches #1-#6 gradually extend FIB rules to match on layer 4 ports with
an optional mask.
Patches #7-#8 add test cases for FIB rule port matching.
iproute2 support can be found here [1].
[1] https://github.com/idosch/iproute2/tree/submit/fib_rule_mask_v1
====================
Link: https://patch.msgid.link/20250217134109.311176-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Mon, 17 Feb 2025 13:41:09 +0000 (15:41 +0200)]
selftests: fib_rule_tests: Add port mask match tests
Add tests for FIB rules that match on source and destination ports with
a mask. Test both good and bad flows.
# ./fib_rule_tests.sh
IPv6 FIB rule tests
[...]
TEST: rule6 check: sport and dport redirect to table [ OK ]
TEST: rule6 check: sport and dport no redirect to table [ OK ]
TEST: rule6 del by pref: sport and dport redirect to table [ OK ]
TEST: rule6 check: sport and dport range redirect to table [ OK ]
TEST: rule6 check: sport and dport range no redirect to table [ OK ]
TEST: rule6 del by pref: sport and dport range redirect to table [ OK ]
TEST: rule6 check: sport and dport masked redirect to table [ OK ]
TEST: rule6 check: sport and dport masked no redirect to table [ OK ]
TEST: rule6 del by pref: sport and dport masked redirect to table [ OK ]
[...]
Tests passed: 292
Tests failed: 0
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250217134109.311176-9-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Mon, 17 Feb 2025 13:41:08 +0000 (15:41 +0200)]
selftests: fib_rule_tests: Add port range match tests
Currently, only matching on specific ports is tested. Add port range
testing to make sure this use case does not regress.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250217134109.311176-8-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Mon, 17 Feb 2025 13:41:07 +0000 (15:41 +0200)]
netlink: specs: Add FIB rule port mask attributes
Add new port mask attributes to the spec. Example:
# ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/rt_rule.yaml \
--do newrule \
--json '{"family": 2, "sport-range": { "start": 12345, "end": 12345 }, "sport-mask": 65535, "action": 1, "table": 1}'
None
# ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/rt_rule.yaml \
--do newrule \
--json '{"family": 2, "dport-range": { "start": 54321, "end": 54321 }, "dport-mask": 65535, "action": 1, "table": 2}'
None
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/rt_rule.yaml \
--dump getrule --json '{"family": 2}' --output-json | jq '.[]'
[...]
{
"table": 2,
"suppress-prefixlen": "0xffffffff",
"protocol": 0,
"priority": 32764,
"dport-range": {
"start": 54321,
"end": 54321
},
"dport-mask": "0xffff",
"family": 2,
"dst-len": 0,
"src-len": 0,
"tos": 0,
"action": "to-tbl",
"flags": 0
}
{
"table": 1,
"suppress-prefixlen": "0xffffffff",
"protocol": 0,
"priority": 32765,
"sport-range": {
"start": 12345,
"end": 12345
},
"sport-mask": "0xffff",
"family": 2,
"dst-len": 0,
"src-len": 0,
"tos": 0,
"action": "to-tbl",
"flags": 0
}
[...]
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250217134109.311176-7-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Mon, 17 Feb 2025 13:41:06 +0000 (15:41 +0200)]
net: fib_rules: Enable port mask usage
Allow user space to configure FIB rules that match on the source and
destination ports with a mask, now that support has been added to the
FIB rule core and the IPv4 and IPv6 address families.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250217134109.311176-6-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Mon, 17 Feb 2025 13:41:05 +0000 (15:41 +0200)]
ipv6: fib_rules: Add port mask matching
Extend IPv6 FIB rules to match on source and destination ports using a
mask. Note that the mask is only set when not matching on a range.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250217134109.311176-5-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Mon, 17 Feb 2025 13:41:04 +0000 (15:41 +0200)]
ipv4: fib_rules: Add port mask matching
Extend IPv4 FIB rules to match on source and destination ports using a
mask. Note that the mask is only set when not matching on a range.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250217134109.311176-4-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Mon, 17 Feb 2025 13:41:03 +0000 (15:41 +0200)]
net: fib_rules: Add port mask support
Add support for configuring and deleting rules that match on source and
destination ports using a mask as well as support for dumping such rules
to user space.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250217134109.311176-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Mon, 17 Feb 2025 13:41:02 +0000 (15:41 +0200)]
net: fib_rules: Add port mask attributes
Add attributes that allow matching on source and destination ports with
a mask. Matching on the source port with a mask is needed in deployments
where users encode path information into certain bits of the UDP source
port.
Temporarily set the type of the attributes to 'NLA_REJECT' while support
is being added.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250217134109.311176-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Claus Stovgaard [Mon, 17 Feb 2025 08:05:02 +0000 (09:05 +0100)]
dt-bindings: net: dsa: b53: add BCM53101 support
BCM53101 is a ethernet switch, very similar to the BCM53115.
Signed-off-by: Claus Stovgaard <claus.stovgaard@prevas.dk>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250217080503.1390282-2-claus.stovgaard@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Torben Nielsen [Mon, 17 Feb 2025 08:05:01 +0000 (09:05 +0100)]
net: dsa: b53: mdio: add support for BCM53101
BCM53101 is a ethernet switch, very similar to the BCM53115.
Enable support for it, in the existing b53 dsa driver.
Signed-off-by: Torben Nielsen <torben.nielsen@prevas.dk>
Signed-off-by: Claus Stovgaard <claus.stovgaard@prevas.dk>
Link: https://patch.msgid.link/20250217080503.1390282-1-claus.stovgaard@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David S. Miller [Wed, 19 Feb 2025 11:52:44 +0000 (11:52 +0000)]
Merge branch 'am65-cpsw-cleanup'
Roger Quadros says:
====================
net: ethernet: ti: am65-cpsw: drop multiple functions and code cleanup
We have 2 tx completion functions to handle single-port vs multi-port
variants. Merge them into one function to make maintenance easier.
We also have 2 functions to handle TX completion for SKB vs XDP.
Get rid of them too.
Also do some minor cleanups.
====================
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roger Quadros [Mon, 17 Feb 2025 07:31:50 +0000 (09:31 +0200)]
net: ethernet: ti am65_cpsw: Drop separate TX completion functions
Drop separate TX completion functions for SKB and XDP. To do that
use the SW_DATA mechanism to store ndev and skb/xdpf for TX packets.
Use BUILD_BUG_ON_MSG() to fail build if SW_DATA size exceeds whats
available. i.e. AM65_CPSW_NAV_SW_DATA_SIZE.
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roger Quadros [Mon, 17 Feb 2025 07:31:49 +0000 (09:31 +0200)]
net: ethernet: ti: am65_cpsw: move am65_cpsw_put_page() out of am65_cpsw_run_xdp()
This allows us to re-use am65_cpsw_run_xdp() for zero copy
case. Add AM65_CPSW_XDP_TX case for successful XDP_TX so we don't
free the page while in flight.
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roger Quadros [Mon, 17 Feb 2025 07:31:48 +0000 (09:31 +0200)]
net: ethernet: ti: am65-cpsw: use return instead of goto in am65_cpsw_run_xdp()
In am65_cpsw_run_xdp() instead of goto followed by return, simply return.
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roger Quadros [Mon, 17 Feb 2025 07:31:47 +0000 (09:31 +0200)]
net: ethernet: ti: am65_cpsw: remove cpu argument am65_cpsw_run_xdp
am65_cpsw_run_xdp() can figure out the cpu id itself.
No need to pass it around 2 functions so drop it.
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roger Quadros [Mon, 17 Feb 2025 07:31:46 +0000 (09:31 +0200)]
net: ethernet: ti: am65-cpsw: remove am65_cpsw_nuss_tx_compl_packets_2g()
The only difference between am65_cpsw_nuss_tx_compl_packets_2g() and
am65_cpsw_nuss_tx_compl_packets() is the usage of spin_lock() and
netdev_tx_completed_queue() + am65_cpsw_nuss_tx_wake at every packet
in the latter.
Insted of having 2 separate functions for TX completion, merge them
into one. This will reduce code duplication and make maintenance easier.
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 19 Feb 2025 09:42:52 +0000 (09:42 +0000)]
Merge branch 'net-mana-big-tcp'
Shradha Gupta says:
====================
net: Enable Big TCP for MANA devices
Allow the max gso/gro aggregated pkt size to go up to GSO_MAX_SIZE for
MANA NIC. On Azure, this not possible without allowing the same for
netvsc NIC (as the NICs are bonded together).
Therefore, we use netif_set_tso_max_size() to set max aggregated pkt
size
to VF's tso_max_size for netvsc too, when the data path is switched over
to the VF
The first patch allows MANA to configure aggregated pkt size of up-to
GSO_MAX_SIZE
The second patch enables the same on the netvsc NIC, if the data path
for the bonded NIC is switched to the VF
---
Changes in v3
* Add ipv6_hopopt_jumbo_remove() while sending Big TCP packets
---
Changes in v2
* Instead of using 'tcp segment' throughout the patch used the words
'aggregated pkt size'
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Shradha Gupta [Mon, 17 Feb 2025 03:42:42 +0000 (19:42 -0800)]
hv_netvsc: Use VF's tso_max_size value when data path is VF
On Azure, increasing VF's gso/gro packet size to up-to GSO_MAX_SIZE
is not possible without allowing the same for netvsc NIC
(as the NICs are bonded together). For bonded NICs, the min of the max
aggregated pkt size of the members is propagated in the stack.
Therefore, we use netif_set_tso_max_size() to set max aggregated pkt size
to VF's packet size for netvsc too, when the data path is switched over
to the VF
Tested on azure env with Accelerated Networking enabled and disabled.
Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shradha Gupta [Mon, 17 Feb 2025 03:42:26 +0000 (19:42 -0800)]
net: mana: Allow tso_max_size to go up-to GSO_MAX_SIZE
Allow the max aggregated pkt size to go up-to GSO_MAX_SIZE for MANA NIC.
This patch only increases the max allowable gso/gro pkt size for MANA
devices and does not change the defaults.
Following are the perf benefits by increasing the pkt aggregate size from
legacy gso_max_size value(64K) to newer one(up-to 511K
IPv4 tests
for i in {1..10}; do netperf -t TCP_RR -H 10.0.0.5 -p50000 -- -r80000,80000
-O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
min p90 p99 Throughput gso_max_size
93 171 194 6594.25
97 154 180 7183.74
95 165 189 6927.86
96 165 188 6976.04
93 154 185 7338.05 64K
93 168 189 6938.03
94 169 189 6784.93
92 166 189 7117.56
94 179 191 6678.44
95 157 183 7277.81
min p90 p99 Throughput
93 134 146 8448.75
95 134 140 8396.54
94 137 148 8204.12
94 137 148 8244.41
94 128 139 8666.52 80K
94 141 153 8116.86
94 138 149 8163.92
92 135 142 8362.72
92 134 142 8497.57
93 136 148 8393.23
IPv6 Tests
for i in {1..10}; do netperf -t TCP_RR -H fd00:9013:cadd::4 -p50000 --
-r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
min p90 p99 Throughput gso_max_size
108 165 170 6673.2
101 169 189 6451.69
101 165 169 6737.65
102 167 175 6614.64
101 178 189 6247.13 64K
107 163 169 6678.63
106 176 187 6350.86
100 164 169 6617.36
102 163 170 6849.21
102 168 175 6605.7
min p90 p99 Throughput
108 155 166 7183
110 154 163 7268.87
109 152 159 7434.35
107 145 157 7569.15
107 149 164 7496.17 80K
110 154 159 7245.85
108 156 162 7266.24
109 145 158 7526.66
106 145 151 7785.75
111 148 157 7246.65
Tested on azure env with Accelerated Networking enabled and disabled.
Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 19 Feb 2025 02:27:23 +0000 (18:27 -0800)]
Merge branch 'net-deduplicate-cookie-logic'
Willem de Bruijn says:
====================
net: deduplicate cookie logic
Reuse standard sk, ip and ipv6 cookie init handlers where possible.
Avoid repeated open coding of the same logic.
Harmonize feature sets across protocols.
Make IPv4 and IPv6 logic more alike.
Simplify adding future new fields with a single init point.
====================
Link: https://patch.msgid.link/20250214222720.3205500-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Willem de Bruijn [Fri, 14 Feb 2025 22:27:04 +0000 (17:27 -0500)]
ipv6: initialize inet socket cookies with sockcm_init
Avoid open coding the same logic.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250214222720.3205500-8-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Willem de Bruijn [Fri, 14 Feb 2025 22:27:03 +0000 (17:27 -0500)]
ipv6: replace ipcm6_init calls with ipcm6_init_sk
This initializes tclass and dontfrag before cmsg parsing, removing the
need for explicit checks against -1 in each caller.
Leave hlimit set to -1, because its full initialization
(in ip6_sk_dst_hoplimit) requires more state (dst, flowi6, ..).
This also prepares for calling sockcm_init in a follow-on patch.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250214222720.3205500-7-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Willem de Bruijn [Fri, 14 Feb 2025 22:27:02 +0000 (17:27 -0500)]
icmp: reflect tos through ip cookie rather than updating inet_sk
Do not modify socket fields if it can be avoided.
The current code predates the introduction of ip cookies in commit
aa6615814533 ("ipv4: processing ancillary IP_TOS or IP_TTL"). Now that
cookies exist and support tos, update that field directly.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250214222720.3205500-6-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Willem de Bruijn [Fri, 14 Feb 2025 22:27:01 +0000 (17:27 -0500)]
ipv4: remove get_rttos
Initialize the ip cookie tos field when initializing the cookie, in
ipcm_init_sk.
The existing code inverts the standard pattern for initializing cookie
fields. Default is to initialize the field from the sk, then possibly
overwrite that when parsing cmsgs (the unlikely case).
This field inverts that, setting the field to an illegal value and
after cmsg parsing checking whether the value is still illegal and
thus should be overridden.
Be careful to always apply mask INET_DSCP_MASK, as before.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250214222720.3205500-5-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Willem de Bruijn [Fri, 14 Feb 2025 22:27:00 +0000 (17:27 -0500)]
ipv4: initialize inet socket cookies with sockcm_init
Avoid open coding the same logic.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250214222720.3205500-4-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Willem de Bruijn [Fri, 14 Feb 2025 22:26:59 +0000 (17:26 -0500)]
net: initialize mark in sockcm_init
Avoid open coding initialization of sockcm fields.
Avoid reading the sk_priority field twice.
This ensures all callers, existing and future, will correctly try a
cmsg passed mark before sk_mark.
This patch extends support for cmsg mark to:
packet_spkt and packet_tpacket and net/can/raw.c.
This patch extends support for cmsg priority to:
packet_spkt and packet_tpacket.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250214222720.3205500-3-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Willem de Bruijn [Fri, 14 Feb 2025 22:26:58 +0000 (17:26 -0500)]
tcp: only initialize sockcm tsflags field
TCP only reads the tsflags field. Don't bother initializing others.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250214222720.3205500-2-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yu-Chun Lin [Mon, 17 Feb 2025 15:58:33 +0000 (23:58 +0800)]
net: stmmac: Use str_enabled_disabled() helper
As kernel test robot reported, the following warning occurs:
cocci warnings: (new ones prefixed by >>)
>> drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c:582:6-8: opportunity for str_enabled_disabled(on)
Replace ternary (condition ? "enabled" : "disabled") with
str_enabled_disabled() from string_choices.h to improve readability,
maintain uniform string usage, and reduce binary size through linker
deduplication.
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
Link: https://patch.msgid.link/20250217155833.3105775-1-eleanor15x@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Breno Leitao [Mon, 17 Feb 2025 15:48:13 +0000 (07:48 -0800)]
net: Remove redundant variable declaration in __dev_change_flags()
The old_flags variable is declared twice in __dev_change_flags(),
causing a shadow variable warning. This patch fixes the issue by
removing the redundant declaration, reusing the existing old_flags
variable instead.
net/core/dev.c:9225:16: warning: declaration shadows a local variable [-Wshadow]
9225 | unsigned int old_flags = dev->flags;
| ^
net/core/dev.c:9185:15: note: previous declaration is here
9185 | unsigned int old_flags = dev->flags;
| ^
1 warning generated.
Remove the redundant inner declaration and reuse the existing old_flags
variable since its value is not needed outside the if block, and it is
safe to reuse the variable. This eliminates the warning while
maintaining the same functionality.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250217-old_flags-v2-1-4cda3b43a35f@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Chandra Mohan Sundar [Mon, 17 Feb 2025 14:15:16 +0000 (19:45 +0530)]
selftests: net: Fix few spelling mistakes
Fix few spelling mistakes in net selftests
Signed-off-by: Chandra Mohan Sundar <chandru.dav@gmail.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20250217141520.81033-1-chandru.dav@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Qingfang Deng [Mon, 17 Feb 2025 09:40:21 +0000 (17:40 +0800)]
net: ethernet: mediatek: add EEE support
Add EEE support to MediaTek SoC Ethernet. The register fields are
similar to the ones in MT7531, except that the LPI threshold is in
milliseconds.
Signed-off-by: Qingfang Deng <dqfext@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20250217094022.1065436-1-dqfext@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pei Xiao [Mon, 17 Feb 2025 01:29:30 +0000 (09:29 +0800)]
net: freescale: ucc_geth: make ugeth_mac_ops be static const
sparse warning:
sparse: symbol 'ugeth_mac_ops' was not declared. Should it be
static.
Add static to fix sparse warnings and add const. phylink_create() will
accept a const struct.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/
202502141128.9HfxcdIE-lkp@intel.com
Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 19 Feb 2025 02:07:10 +0000 (18:07 -0800)]
Merge branch 'net-phy-improve-and-simplify-eee-handling-in-phylib'
Heiner Kallweit says:
====================
net: phy: improve and simplify EEE handling in phylib
This series improves and simplifies phylib's EEE handling.
====================
Link: https://patch.msgid.link/3caa3151-13ac-44a8-9bb6-20f82563f698@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Sun, 16 Feb 2025 21:20:07 +0000 (22:20 +0100)]
net: phy: c45: remove local advertisement parameter from genphy_c45_eee_is_active
After the last user has gone, we can remove the local advertisement
parameter from genphy_c45_eee_is_active.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/bd121330-9e28-4bc8-8422-794bd54d561f@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Sun, 16 Feb 2025 21:19:23 +0000 (22:19 +0100)]
net: phy: c45: use cached EEE advertisement in genphy_c45_ethtool_get_eee
Now that disabled EEE modes are considered when populating
advertising_eee, we can use this bitmap here instead of reading
the PHY register.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/e57ed3d4-d0bc-4f91-83f6-8f48dfb6d7d7@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Sun, 16 Feb 2025 21:18:42 +0000 (22:18 +0100)]
net: phy: c45: Don't silently remove disabled EEE modes any longer when writing advertisement register
advertising_eee is adjusted now whenever an EEE mode gets disabled.
Therefore we can remove the silent removal of disabled EEE modes here.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/e95b9dad-24a7-4e3e-9af9-6f0770cf1520@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Sun, 16 Feb 2025 21:17:48 +0000 (22:17 +0100)]
net: phy: remove disabled EEE modes from advertising_eee in phy_probe
A PHY driver may populate eee_disabled_modes in its probe or get_features
callback, therefore filter the EEE advertisement read from the PHY.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/493f3e2e-9cfc-445d-adbe-58d9c117a489@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Sun, 16 Feb 2025 21:16:34 +0000 (22:16 +0100)]
net: phy: improve phy_disable_eee_mode
If a mode is to be disabled, remove it from advertising_eee.
Disabling EEE modes shall be done before calling phy_start(),
warn if that's not the case.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/92164896-38ff-4474-b98b-e83fc05b9509@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Sun, 16 Feb 2025 21:15:42 +0000 (22:15 +0100)]
net: phy: move definition of phy_is_started before phy_disable_eee_mode
In preparation of a follow-up patch, move phy_is_started() to before
phy_disable_eee_mode().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/04d1e7a5-f4c0-42ab-8fa4-88ad26b74813@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Sat, 15 Feb 2025 13:29:15 +0000 (14:29 +0100)]
net: phy: realtek: add defines for shadowed c45 standard registers
Realtek shadows standard c45 registers in VEND2 device register space.
Add defines for these VEND2 registers, based on the names of the
standard c45 registers.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/c90bdf76-f8b8-4d06-9656-7a52d5658ee6@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Siddh Raman Pant [Sat, 15 Feb 2025 09:40:51 +0000 (09:40 +0000)]
netlink: Unset cb_running when terminating dump on release
When we terminated the dump, the callback isn't running, so cb_running
should be set to false to be logically consistent.
cb_running signifies whether a dump is ongoing. It is set to true in
cb->start(), and is checked in netlink_dump() to be true initially.
After the dump, it is set to false in the same function.
This is just a cleanup, no path should access this field on a closed
socket.
Signed-off-by: Siddh Raman Pant <siddh.raman.pant@oracle.com>
Link: https://patch.msgid.link/aff028e3eb2b768b9895fa6349fa1981ae22f098.camel@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 19 Feb 2025 02:00:10 +0000 (18:00 -0800)]
Merge branch 'net-cadence-macb-modernize-statistics-reporting'
Sean Anderson says:
====================
net: cadence: macb: Modernize statistics reporting
Implement the modern interfaces for statistics reporting.
====================
Link: https://patch.msgid.link/20250214212703.2618652-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sean Anderson [Fri, 14 Feb 2025 21:27:03 +0000 (16:27 -0500)]
net: cadence: macb: Report standard stats
Report standard statistics using the dedicated callbacks instead of
get_ethtool_stats.
OCTTX is split over two registers. Accumulating these registers
separately in gem_stats just means we need to combine them again later.
Instead, combine these stats before saving them, like is done for
ethtool_stats.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Link: https://patch.msgid.link/20250214212703.2618652-3-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sean Anderson [Fri, 14 Feb 2025 21:27:02 +0000 (16:27 -0500)]
net: cadence: macb: Convert to get_stats64
Convert the existing get_stats implementation to get_stats64. Since we
now report 64-bit values, increase the counters to 64-bits as well.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Link: https://patch.msgid.link/20250214212703.2618652-2-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sean Anderson [Fri, 14 Feb 2025 21:12:52 +0000 (16:12 -0500)]
net: xilinx: axienet: Implement BQL
Implement byte queue limits to allow queueing disciplines to account for
packets enqueued in the ring buffers but not yet transmitted.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Link: https://patch.msgid.link/20250214211252.2615573-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Fri, 14 Feb 2025 20:31:14 +0000 (21:31 +0100)]
net: phy: realtek: add helper RTL822X_VND2_C22_REG
C22 register space is mapped to 0xa400 in MMD VEND2 register space.
Add a helper to access mapped C22 registers.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/6344277b-c5c7-449b-ac89-d5425306ca76@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 18 Feb 2025 23:32:22 +0000 (15:32 -0800)]
Merge branch 'eth-mlx4-use-the-page-pool-for-rx-buffers'
Jakub Kicinski says:
====================
eth: mlx4: use the page pool for Rx buffers
Convert mlx4 to page pool. I've been sitting on these patches for
over a year, and Jonathan Lemon had a similar series years before.
We never deployed it or sent upstream because it didn't really show
much perf win under normal load (admittedly I think the real testing
was done before Ilias's work on recycling).
During the v6.9 kernel rollout Meta's CDN team noticed that machines
with CX3 Pro (mlx4) are prone to overloads (double digit % of CPU time
spent mapping buffers in the IOMMU). The problem does not occur with
modern NICs, so I dusted off this series and reportedly it still works.
And it makes the problem go away, no overloads, perf back in line with
older kernels. Something must have changed in IOMMU code, I guess.
This series is very simple, and can very likely be optimized further.
Thing is, I don't have access to any CX3 Pro NICs. They only exist
in CDN locations which haven't had a HW refresh for a while. So I can
say this series survives a week under traffic w/ XDP enabled, but
my ability to iterate and improve is a bit limited.
v2: https://lore.kernel.org/
20250211192141.619024-1-kuba@kernel.org
v1: https://lore.kernel.org/
20250205031213.358973-1-kuba@kernel.org
====================
Link: https://patch.msgid.link/20250213010635.1354034-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 13 Feb 2025 01:06:35 +0000 (17:06 -0800)]
eth: mlx4: use the page pool for Rx buffers
Simple conversion to page pool. Preserve the current fragmentation
logic / page splitting. Each page starts with a single frag reference,
and then we bump that when attaching to skbs. This can likely be
optimized further.
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250213010635.1354034-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 13 Feb 2025 01:06:34 +0000 (17:06 -0800)]
eth: mlx4: remove the local XDP fast-recycling ring
It will be replaced with page pool's built-in recycling.
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250213010635.1354034-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 13 Feb 2025 01:06:33 +0000 (17:06 -0800)]
eth: mlx4: don't try to complete XDP frames in netpoll
mlx4 doesn't support ndo_xdp_xmit / XDP_REDIRECT and wasn't
using page pool until now, so it could run XDP completions
in netpoll (NAPI budget == 0) just fine. Page pool has calling
context requirements, make sure we don't try to call it from
what is potentially HW IRQ context.
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250213010635.1354034-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 13 Feb 2025 01:06:32 +0000 (17:06 -0800)]
eth: mlx4: create a page pool for Rx
Create a pool per rx queue. Subsequent patches will make use of it.
Move fcs_del to a hole to make space for the pointer.
Per common "wisdom" base the page pool size on the ring size.
Note that the page pool cache size is in full pages, so just
round up the effective buffer size to pages.
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250213010635.1354034-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Niklas Söderlund [Fri, 14 Feb 2025 17:46:50 +0000 (18:46 +0100)]
net: phy: marvell-88q2xxx: Init PHY private structure for mv88q211x
When adding LED support for mv88q222x devices the PHY private data
structure was added to the mv88q211x code path, the data structure is
however only allocated during mv88q222x probe. This results in a nullptr
deference for mv88q2110 devices.
Unable to handle kernel NULL pointer dereference at virtual address
0000000000000001
Mem abort info:
ESR = 0x0000000096000004
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x04: level 0 translation fault
Data abort info:
ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
CM = 0, WnR = 0, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[
0000000000000001] user address but active_mm is swapper
Internal error: Oops:
0000000096000004 [#1] PREEMPT SMP
CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted
6.14.0-rc1-arm64-renesas-00342-ga3783dbf2574 #7
Hardware name: Renesas White Hawk Single board based on r8a779g2 (DT)
pstate:
20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : mv88q2xxx_config_init+0x28/0x84
lr : mv88q2110_config_init+0x98/0xb0
sp :
ffff8000823eb9d0
x29:
ffff8000823eb9d0 x28:
ffff000440942000 x27:
ffff80008144e400
x26:
0000000000001002 x25:
0000000000000000 x24:
0000000000000000
x23:
0000000000000009 x22:
ffff8000810534f0 x21:
ffff800081053550
x20:
0000000000000000 x19:
ffff0004437d6800 x18:
0000000000000018
x17:
00000000000961c8 x16:
ffff0006bef75ec0 x15:
0000000000000001
x14:
0000000000000001 x13:
ffff000440218080 x12:
071c71c71c71c71c
x11:
ffff000440218080 x10:
0000000000001420 x9 :
ffff8000823eb770
x8 :
ffff8000823eb650 x7 :
ffff8000823eb750 x6 :
ffff8000823eb710
x5 :
0000000000000000 x4 :
0000000000000800 x3 :
0000000000000001
x2 :
0000000000000000 x1 :
00000000ffffffff x0 :
ffff0004437d6800
Call trace:
mv88q2xxx_config_init+0x28/0x84 (P)
mv88q2110_config_init+0x98/0xb0
phy_init_hw+0x64/0x9c
phy_attach_direct+0x118/0x320
phy_connect_direct+0x24/0x80
of_phy_connect+0x5c/0xa0
rtsn_open+0x5bc/0x78c
__dev_open+0xf8/0x1fc
__dev_change_flags+0x198/0x220
dev_change_flags+0x20/0x64
ip_auto_config+0x270/0xefc
do_one_initcall+0xe4/0x22c
kernel_init_freeable+0x2a8/0x308
kernel_init+0x20/0x130
ret_from_fork+0x10/0x20
Code:
b907e404 f9432814 3100083f 540000e3 (
39400680)
---[ end trace
0000000000000000 ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
SMP: stopping secondary CPUs
Kernel Offset: disabled
CPU features: 0x000,
00000070,
00801250,
8200700b
Memory Limit: none
---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
Fix this by using a generic probe function for both mv88q211x and
mv88q222x devices that allocates the PHY private data structure, while
only the mv88q222x probes for LED support.
Fixes:
a3783dbf2574 ("net: phy: marvell-88q2xxx: Add support for PHY LEDs on 88q2xxx")
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20250214174650.2056949-1-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Breno Leitao [Fri, 14 Feb 2025 17:07:11 +0000 (09:07 -0800)]
trace: tcp: Add tracepoint for tcp_cwnd_reduction()
Add a lightweight tracepoint to monitor TCP congestion window
adjustments via tcp_cwnd_reduction(). This tracepoint enables tracking
of:
- TCP window size fluctuations
- Active socket behavior
- Congestion window reduction events
Meta has been using BPF programs to monitor this function for years.
Adding a proper tracepoint provides a stable API for all users who need
to monitor TCP congestion window behavior.
Use DECLARE_TRACE instead of TRACE_EVENT to avoid creating trace event
infrastructure and exporting to tracefs, keeping the implementation
minimal. (Thanks Steven Rostedt)
Given that this patch creates a rawtracepoint, you could hook into it
using regular tooling, like bpftrace, using regular rawtracepoint
infrastructure, such as:
rawtracepoint:tcp_cwnd_reduction_tp {
....
}
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250214-cwnd_tracepoint-v2-1-ef8d15162d95@debian.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Tue, 18 Feb 2025 12:39:42 +0000 (13:39 +0100)]
Merge branch 'net-phy-marvell-88q2xxx-cleanup'
Dimitri Fedrau says:
====================
net: phy: marvell-88q2xxx: cleanup
- align defines
- order includes alphabetically
- enable temperature sensor in mv88q2xxx_config_init
Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com>
====================
Link: https://patch.msgid.link/20250214-marvell-88q2xxx-cleanup-v1-0-71d67c20f308@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Dimitri Fedrau [Fri, 14 Feb 2025 16:32:05 +0000 (17:32 +0100)]
net: phy: marvell-88q2xxx: enable temperature sensor in mv88q2xxx_config_init
Temperature sensor gets enabled for 88Q222X devices in
mv88q222x_config_init. Move enabling to mv88q2xxx_config_init because
all 88Q2XXX devices support the temperature sensor.
Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com>
Tested-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Dimitri Fedrau [Fri, 14 Feb 2025 16:32:04 +0000 (17:32 +0100)]
net: phy: marvell-88q2xxx: order includes alphabetically
Order includes alphabetically.
Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Dimitri Fedrau [Fri, 14 Feb 2025 16:32:03 +0000 (17:32 +0100)]
net: phy: marvell-88q2xxx: align defines
Align some defines.
Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Tue, 18 Feb 2025 12:06:50 +0000 (13:06 +0100)]
Merge branch 'vxlan-join-leave-mc-group-when-reconfigured'
Petr Machata says:
====================
vxlan: Join / leave MC group when reconfigured
When a vxlan netdevice is brought up, if its default remote is a multicast
address, the device joins the indicated group.
Therefore when the multicast remote address changes, the device should
leave the current group and subscribe to the new one. Similarly when the
interface used for endpoint communication is changed in a situation when
multicast remote is configured. This is currently not done.
Both vxlan_igmp_join() and vxlan_igmp_leave() can however fail. So it is
possible that with such fix, the netdevice will end up in an inconsistent
situation where the old group is not joined anymore, but joining the
new group fails. Should we join the new group first, and leave the old one
second, we might end up in the opposite situation, where both groups are
joined. Undoing any of this during rollback is going to be similarly
problematic.
One solution would be to just forbid the change when the netdevice is up.
However in vnifilter mode, changing the group address is allowed, and these
problems are simply ignored (see vxlan_vni_update_group()):
# ip link add name br up type bridge vlan_filtering 1
# ip link add vx1 up master br type vxlan external vnifilter local 192.0.2.1 dev lo dstport 4789
# bridge vni add dev vx1 vni 200 group 224.0.0.1
# tcpdump -i lo &
# bridge vni add dev vx1 vni 200 group 224.0.0.2
18:55:46.523438 IP 0.0.0.0 > 224.0.0.22: igmp v3 report, 1 group record(s)
18:55:46.943447 IP 0.0.0.0 > 224.0.0.22: igmp v3 report, 1 group record(s)
# bridge vni
dev vni group/remote
vx1 200 224.0.0.2
Having two different modes of operation for conceptually the same interface
is silly, so in this patchset, just do what the vnifilter code does and
deal with the errors by crossing fingers real hard.
====================
Link: https://patch.msgid.link/cover.1739548836.git.petrm@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Petr Machata [Fri, 14 Feb 2025 16:18:24 +0000 (17:18 +0100)]
selftests: test_vxlan_fdb_changelink: Add a test for MC remote change
Changes to MC remote need to be reflected in actual group memberships.
Add a test to verify that it is the case.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Petr Machata [Fri, 14 Feb 2025 16:18:23 +0000 (17:18 +0100)]
selftests: test_vxlan_fdb_changelink: Convert to lib.sh
Instead of inlining equivalents, use lib.sh-provided primitives.
Use defer to manage vx lifetime.
This will make it easier to extend the test in the next patch.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Petr Machata [Fri, 14 Feb 2025 16:18:22 +0000 (17:18 +0100)]
selftests: forwarding: lib: Move require_command to net, generalize
This helper could be useful to more than just forwarding tests.
Move it upstairs and port over to log_test_skip().
Split the function into two parts: the bit that actually checks and
reports skip, which is in a new function check_command(). And a bit
that exits the test script if the check fails. This allows users
consistent checking behavior while giving an option to bail out from
a single test without bailing out of the whole script.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Petr Machata [Fri, 14 Feb 2025 16:18:21 +0000 (17:18 +0100)]
vxlan: Join / leave MC group after remote changes
When a vxlan netdevice is brought up, if its default remote is a multicast
address, the device joins the indicated group.
Therefore when the multicast remote address changes, the device should
leave the current group and subscribe to the new one. Similarly when the
interface used for endpoint communication is changed in a situation when
multicast remote is configured. This is currently not done.
Both vxlan_igmp_join() and vxlan_igmp_leave() can however fail. So it is
possible that with such fix, the netdevice will end up in an inconsistent
situation where the old group is not joined anymore, but joining the new
group fails. Should we join the new group first, and leave the old one
second, we might end up in the opposite situation, where both groups are
joined. Undoing any of this during rollback is going to be similarly
problematic.
One solution would be to just forbid the change when the netdevice is up.
However in vnifilter mode, changing the group address is allowed, and these
problems are simply ignored (see vxlan_vni_update_group()):
# ip link add name br up type bridge vlan_filtering 1
# ip link add vx1 up master br type vxlan external vnifilter local 192.0.2.1 dev lo dstport 4789
# bridge vni add dev vx1 vni 200 group 224.0.0.1
# tcpdump -i lo &
# bridge vni add dev vx1 vni 200 group 224.0.0.2
18:55:46.523438 IP 0.0.0.0 > 224.0.0.22: igmp v3 report, 1 group record(s)
18:55:46.943447 IP 0.0.0.0 > 224.0.0.22: igmp v3 report, 1 group record(s)
# bridge vni
dev vni group/remote
vx1 200 224.0.0.2
Having two different modes of operation for conceptually the same interface
is silly, so in this patch, just do what the vnifilter code does and deal
with the errors by crossing fingers real hard.
The vnifilter code leaves old before joining new, and in case of join /
leave failures does not roll back the configuration changes that have
already been applied, but bails out of joining if it could not leave. Do
the same here: leave before join, apply changes unconditionally and do not
attempt to join if we couldn't leave.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Petr Machata [Fri, 14 Feb 2025 16:18:20 +0000 (17:18 +0100)]
vxlan: Drop 'changelink' parameter from vxlan_dev_configure()
vxlan_dev_configure() only has a single caller that passes false
for the changelink parameter. Drop the parameter and inline the
sole value.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jason Xing [Fri, 14 Feb 2025 06:42:50 +0000 (14:42 +0800)]
page_pool: avoid infinite loop to schedule delayed worker
We noticed the kworker in page_pool_release_retry() was waken
up repeatedly and infinitely in production because of the
buggy driver causing the inflight less than 0 and warning
us in page_pool_inflight()[1].
Since the inflight value goes negative, it means we should
not expect the whole page_pool to get back to work normally.
This patch mitigates the adverse effect by not rescheduling
the kworker when detecting the inflight negative in
page_pool_release_retry().
[1]
[Mon Feb 10 20:36:11 2025] ------------[ cut here ]------------
[Mon Feb 10 20:36:11 2025] Negative(-51446) inflight packet-pages
...
[Mon Feb 10 20:36:11 2025] Call Trace:
[Mon Feb 10 20:36:11 2025] page_pool_release_retry+0x23/0x70
[Mon Feb 10 20:36:11 2025] process_one_work+0x1b1/0x370
[Mon Feb 10 20:36:11 2025] worker_thread+0x37/0x3a0
[Mon Feb 10 20:36:11 2025] kthread+0x11a/0x140
[Mon Feb 10 20:36:11 2025] ? process_one_work+0x370/0x370
[Mon Feb 10 20:36:11 2025] ? __kthread_cancel_work+0x40/0x40
[Mon Feb 10 20:36:11 2025] ret_from_fork+0x35/0x40
[Mon Feb 10 20:36:11 2025] ---[ end trace
ebffe800f33e7e34 ]---
Note: before this patch, the above calltrace would flood the
dmesg due to repeated reschedule of release_dw kworker.
Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250214064250.85987-1-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Tue, 18 Feb 2025 10:36:29 +0000 (11:36 +0100)]
Merge branch 'add-af_xdp-support-for-cn10k'
Suman Ghosh says:
====================
Add af_xdp support for cn10k
This patchset includes changes to support AF_XDP for cn10k chipsets. Both
non-zero copy and zero copy will be supported after these changes. Also,
the RSS will be reconfigured once a particular receive queue is
added/removed to/from AF_XDP support.
Patch #1: octeontx2-pf: use xdp_return_frame() to free xdp buffers
Patch #2: octeontx2-pf: Add AF_XDP non-zero copy support
Patch #3: octeontx2-pf: AF_XDP zero copy receive support
Patch #4: octeontx2-pf: Reconfigure RSS table after enabling AF_XDP
zerocopy on rx queue
Patch #5: octeontx2-pf: Prepare for AF_XDP transmit
Patch #6: octeontx2-pf: AF_XDP zero copy transmit support
====================
Link: https://patch.msgid.link/20250213053141.2833254-1-sumang@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Suman Ghosh [Thu, 13 Feb 2025 05:31:41 +0000 (11:01 +0530)]
octeontx2-pf: AF_XDP zero copy transmit support
This patch implements below changes,
1. To avoid concurrency with normal traffic uses
XDP queues.
2. Since there are chances that XDP and AF_XDP can
fall under same queue uses separate flags to handle
dma buffers.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Suman Ghosh [Thu, 13 Feb 2025 05:31:40 +0000 (11:01 +0530)]
octeontx2-pf: Prepare for AF_XDP
Implement necessary APIs required for AF_XDP transmit.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Suman Ghosh [Thu, 13 Feb 2025 05:31:39 +0000 (11:01 +0530)]
octeontx2-pf: Reconfigure RSS table after enabling AF_XDP zerocopy on rx queue
RSS table needs to be reconfigured once a rx queue is enabled or
disabled for AF_XDP zerocopy support. After enabling UMEM on a rx queue,
that queue should not be part of RSS queue selection algorithm.
Similarly the queue should be considered again after UMEM is disabled.
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Suman Ghosh [Thu, 13 Feb 2025 05:31:38 +0000 (11:01 +0530)]
octeontx2-pf: AF_XDP zero copy receive support
This patch adds support to AF_XDP zero copy for CN10K.
This patch specifically adds receive side support. In this approach once
a xdp program with zero copy support on a specific rx queue is enabled,
then that receive quse is disabled/detached from the existing kernel
queue and re-assigned to the umem memory.
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Suman Ghosh [Thu, 13 Feb 2025 05:31:37 +0000 (11:01 +0530)]
octeontx2-pf: Add AF_XDP non-zero copy support
Set xdp rx ring memory type as MEM_TYPE_PAGE_POOL for
af-xdp to work. This is needed since xdp_return_frame
internally will use page pools.
Fixes:
06059a1a9a4a ("octeontx2-pf: Add XDP support to netdev PF")
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Suman Ghosh [Thu, 13 Feb 2025 05:31:36 +0000 (11:01 +0530)]
octeontx2-pf: use xdp_return_frame() to free xdp buffers
xdp_return_frames() will help to free the xdp frames and their
associated pages back to page pool.
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Tue, 18 Feb 2025 01:09:44 +0000 (17:09 -0800)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
ice, iavf: Add support for Rx timestamping
Mateusz Polchlopek says:
Initially, during VF creation it registers the PTP clock in
the system and negotiates with PF it's capabilities. In the
meantime the PF enables the Flexible Descriptor for VF.
Only this type of descriptor allows to receive Rx timestamps.
Enabling virtual clock would be possible, though it would probably
perform poorly due to the lack of direct time access.
Enable timestamping should be done using userspace tools, e.g.
hwstamp_ctl -i $VF -r 14
In order to report the timestamps to userspace, the VF extends
timestamp to 40b.
To support this feature the flexible descriptors and PTP part
in iavf driver have been introduced.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
iavf: add support for Rx timestamps to hotpath
iavf: handle set and get timestamps ops
iavf: Implement checking DD desc field
iavf: refactor iavf_clean_rx_irq to support legacy and flex descriptors
iavf: define Rx descriptors as qwords
libeth: move idpf_rx_csum_decoded and idpf_rx_extracted
iavf: periodically cache PHC time
iavf: add support for indirect access to PHC time
iavf: add initial framework for registering PTP clock
iavf: negotiate PTP capabilities
iavf: add support for negotiating flexible RXDID format
virtchnl: add enumeration for the rxdid format
ice: support Rx timestamp on flex descriptor
virtchnl: add support for enabling PTP on iAVF
====================
Link: https://patch.msgid.link/20250214192739.1175740-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sun, 16 Feb 2025 17:41:09 +0000 (09:41 -0800)]
eth: fbnic: support TCP segmentation offload
Add TSO support to the driver. Device can handle unencapsulated or
IPv6-in-IPv6 packets. Any other tunnel stacks are handled with
GSO partial.
Validate that the packet can be offloaded in ndo_features_check.
Main thing we need to check for is that the header geometry can
be expressed in the decriptor fields (offsets aren't too large).
Report number of TSO super-packets via the qstat API.
Link: https://patch.msgid.link/20250216174109.2808351-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 14 Feb 2025 22:46:01 +0000 (14:46 -0800)]
netdev: clarify GSO vs csum in qstats
Could be just me, but I had to pause and double check that the Tx csum
counter in qstat should include GSO'd packets. GSO pretty much implies
csum so one could possibly interpret the csum counter as pure csum offload.
But the counters are based on virtio:
[tx_needs_csum]
The number of packets which require checksum calculation by the device.
[rx_needs_csum]
The number of packets with VIRTIO_NET_HDR_F_NEEDS_CSUM.
and VIRTIO_NET_HDR_F_NEEDS_CSUM gets set on GSO packets virtio sends.
Clarify this in the spec to avoid any confusion.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250214224601.2271201-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 14 Feb 2025 22:43:40 +0000 (14:43 -0800)]
net: move stale comment about ntuple validation
Gal points out that the comment now belongs further down, since
the original if condition was split into two in
commit
de7f7582dff2 ("net: ethtool: prevent flow steering to RSS contexts which don't exist")
Link: https://lore.kernel.org/de4a2a8a-1eb9-4fa8-af87-7526e58218e9@nvidia.com
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20250214224340.2268691-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 18 Feb 2025 00:46:05 +0000 (16:46 -0800)]
Merge branch 'netdev-genl-add-an-xsk-attribute-to-queues'
Joe Damato says:
====================
netdev-genl: Add an xsk attribute to queues
This is an attempt to followup on something Jakub asked me about [1],
adding an xsk attribute to queues and more clearly documenting which
queues are linked to NAPIs...
After the RFC [2], Jakub suggested creating an empty nest for queues
which have a pool, so I've adjusted this version to work that way.
The nest can be extended in the future to express attributes about XSK
as needed. Queues which are not used for AF_XDP do not have the xsk
attribute present.
I've run the included test on:
- my mlx5 machine (via NETIF=)
- without setting NETIF
And the test seems to pass in both cases.
[1]: https://lore.kernel.org/netdev/
20250113143109.
60afa59a@kernel.org/
[2]: https://lore.kernel.org/netdev/
20250129172431.65773-1-jdamato@fastly.com/
====================
Link: https://patch.msgid.link/20250214211255.14194-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joe Damato [Fri, 14 Feb 2025 21:12:31 +0000 (21:12 +0000)]
selftests: drv-net: Test queue xsk attribute
Test that queues which are used for AF_XDP have the xsk nest attribute.
The attribute is currently empty, but its existence means the AF_XDP is
being used for the queue. Enable CONFIG_XDP_SOCKETS for
selftests/drivers/net tests, as well.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250214211255.14194-4-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joe Damato [Fri, 14 Feb 2025 21:12:30 +0000 (21:12 +0000)]
netdev-genl: Add an XSK attribute to queues
Expose a new per-queue nest attribute, xsk, which will be present for
queues that are being used for AF_XDP. If the queue is not being used for
AF_XDP, the nest will not be present.
In the future, this attribute can be extended to include more data about
XSK as it is needed.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250214211255.14194-3-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joe Damato [Fri, 14 Feb 2025 21:12:29 +0000 (21:12 +0000)]
netlink: Add nla_put_empty_nest helper
Creating empty nests is helpful when the exact attributes to be exposed
in the future are not known. Encapsulate the logic in a helper.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20250214211255.14194-2-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Anna Emese Nyiri [Fri, 14 Feb 2025 20:58:28 +0000 (21:58 +0100)]
selftests: net: add support for testing SO_RCVMARK and SO_RCVPRIORITY
Introduce tests to verify the correct functionality of the SO_RCVMARK and
SO_RCVPRIORITY socket options.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Suggested-by: Ferenc Fejes <fejes@inf.elte.hu>
Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250214205828.48503-1-annaemesenyiri@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stefano Jordhani [Fri, 14 Feb 2025 18:17:51 +0000 (18:17 +0000)]
net: use napi_id_valid helper
In commit
6597e8d35851 ("netdev-genl: Elide napi_id when not present"),
napi_id_valid function was added. Use the helper to refactor open-coded
checks in the source.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Stefano Jordhani <sjordhani@gmail.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk> # for iouring
Link: https://patch.msgid.link/20250214181801.931-1-sjordhani@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 18 Feb 2025 00:40:49 +0000 (16:40 -0800)]
Merge branch 'net-phy-dp83822-add-support-for-changing-the-transmit-amplitude-voltage'
Dimitri Fedrau via says:
====================
net: phy: dp83822: Add support for changing the transmit amplitude voltage
Add support for changing the transmit amplitude voltage in 100BASE-TX mode.
Add support for configuration via DT.
v4: https://lore.kernel.org/
20250211-dp83822-tx-swing-v4-0-
1e8ebd71ad54@liebherr.com
v3: https://lore.kernel.org/
20250204-dp83822-tx-swing-v3-0-
9798e96500d9@liebherr.com
v2: https://lore.kernel.org/
20250120-dp83822-tx-swing-v2-0-
07c99dc42627@liebherr.com
v1: https://lore.kernel.org/
20250113-dp83822-tx-swing-v1-0-
7ed5a9d80010@liebherr.com
====================
Link: https://patch.msgid.link/20250214-dp83822-tx-swing-v5-0-02ca72620599@liebherr.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dimitri Fedrau [Fri, 14 Feb 2025 14:14:11 +0000 (15:14 +0100)]
net: phy: dp83822: Add support for changing the transmit amplitude voltage
Add support for changing the transmit amplitude voltage in 100BASE-TX mode.
Modifying it can be necessary to compensate losses on the PCB and
connector, so the voltages measured on the RJ45 pins are conforming.
Signed-off-by: Dimitri Fedrau <dimitri.fedrau@liebherr.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250214-dp83822-tx-swing-v5-3-02ca72620599@liebherr.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dimitri Fedrau [Fri, 14 Feb 2025 14:14:10 +0000 (15:14 +0100)]
net: phy: Add helper for getting tx amplitude gain
Add helper which returns the tx amplitude gain defined in device tree.
Modifying it can be necessary to compensate losses on the PCB and
connector, so the voltages measured on the RJ45 pins are conforming.
Signed-off-by: Dimitri Fedrau <dimitri.fedrau@liebherr.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250214-dp83822-tx-swing-v5-2-02ca72620599@liebherr.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dimitri Fedrau [Fri, 14 Feb 2025 14:14:09 +0000 (15:14 +0100)]
dt-bindings: net: ethernet-phy: add property tx-amplitude-100base-tx-percent
Add property tx-amplitude-100base-tx-percent in the device tree bindings
for configuring the tx amplitude of 100BASE-TX PHYs. Modifying it can be
necessary to compensate losses on the PCB and connector, so the voltages
measured on the RJ45 pins are conforming.
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Dimitri Fedrau <dimitri.fedrau@liebherr.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250214-dp83822-tx-swing-v5-1-02ca72620599@liebherr.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pranav Tyagi [Thu, 13 Feb 2025 15:26:11 +0000 (20:56 +0530)]
selftests: net: fix grammar in reuseaddr_ports_exhausted.c log message
This patch fixes a grammatical error in a test log message
in reuseaddr_ports_exhausted.c for better clarity.
Signed-off-by: Pranav Tyagi <pranav.tyagi03@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250213152612.4434-1-pranav.tyagi03@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 18 Feb 2025 00:27:39 +0000 (16:27 -0800)]
Merge branch 'mlx5-add-sensor-name-in-temperature-message'
Tariq Toukan says:
====================
mlx5: Add sensor name in temperature message
This small series from Shahar adds the sensors names to the temperature
event messages, in addition to the existing bitmap indicators.
This improves human readability.
Series starts with simple refactoring and modifications. The top patch
adds the sensors names.
====================
Link: https://patch.msgid.link/20250213094641.226501-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Shahar Shitrit [Thu, 13 Feb 2025 09:46:41 +0000 (11:46 +0200)]
net/mlx5: Add sensor name to temperature event message
Previously, a temperature event message included a bitmap indicating
which sensors detect high temperatures.
To enhance clarity, we modify the message format to explicitly list
the names of the overheating sensors, alongside the sensors bitmap.
If HWMON is not configured, the event message remains unchanged.
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250213094641.226501-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>