Daniel Golle [Thu, 10 Oct 2024 12:55:29 +0000 (13:55 +0100)]
net: phy: intel-xway: add support for PHY LEDs
The intel-xway PHY driver predates the PHY LED framework and currently
initializes all LED pins to equal default values.
Add PHY LED functions to the drivers and don't set default values if
LEDs are defined in device tree.
According the datasheets 3 LEDs are supported on all Intel XWAY PHYs.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/81f4717ab9acf38f3239727a4540ae96fd01109b.1728558223.git.daniel@makrotopia.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Golle [Thu, 10 Oct 2024 12:55:17 +0000 (13:55 +0100)]
net: phy: mxl-gpy: correctly describe LED polarity
According the datasheet covering the LED (0x1b) register:
0B Active High LEDx pin driven high when activated
1B Active Low LEDx pin driven low when activated
Make use of the now available 'active-high' property and correctly
reflect the polarity setting which was previously inverted.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/180ccafa837f09908b852a8a874a3808c5ecd2d0.1728558223.git.daniel@makrotopia.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Golle [Thu, 10 Oct 2024 12:55:00 +0000 (13:55 +0100)]
net: phy: aquantia: correctly describe LED polarity override
Use newly defined 'active-high' property to set the
VEND1_GLOBAL_LED_DRIVE_VDD bit and let 'active-low' clear that bit. This
reflects the technical reality which was inverted in the previous
description in which the 'active-low' property was used to actually set
the VEND1_GLOBAL_LED_DRIVE_VDD bit, which means that VDD (ie. supply
voltage) of the LED is driven rather than GND.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/86a413b4387c42dcb54f587cc2433a06f16aae83.1728558223.git.daniel@makrotopia.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Golle [Thu, 10 Oct 2024 12:54:19 +0000 (13:54 +0100)]
net: phy: support 'active-high' property for PHY LEDs
In addition to 'active-low' and 'inactive-high-impedance' also
support 'active-high' property for PHY LED pin configuration.
As only either 'active-high' or 'active-low' can be set at the
same time, WARN and return an error in case both are set.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/91598487773d768f254d5faf06cf65b13e972f0e.1728558223.git.daniel@makrotopia.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Tue, 15 Oct 2024 08:44:54 +0000 (10:44 +0200)]
Merge branch 'make-phy-output-rmii-reference-clock'
Wei Fang says:
====================
make PHY output RMII reference clock
The TJA11xx PHYs have the capability to provide 50MHz reference clock
in RMII mode and output on REF_CLK pin. Therefore, add the new property
"nxp,rmii-refclk-output" to support this feature. This property is only
available for PHYs which use nxp-c45-tja11xx driver, such as TJA1103,
TJA1104, TJA1120 and TJA1121.
====================
Link: https://patch.msgid.link/20241010061944.266966-1-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Wei Fang [Thu, 10 Oct 2024 06:19:44 +0000 (14:19 +0800)]
net: phy: c45-tja11xx: add support for outputting RMII reference clock
For TJA11xx PHYs, they have the capability to output 50MHz reference
clock on REF_CLK pin in RMII mode, which is called "revRMII" mode in
the PHY data sheet.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Wei Fang [Thu, 10 Oct 2024 06:19:43 +0000 (14:19 +0800)]
dt-bindings: net: tja11xx: add "nxp,rmii-refclk-out" property
Per the RMII specification, the REF_CLK is sourced from MAC to PHY
or from an external source. But for TJA11xx PHYs, they support to
output a 50MHz RMII reference clock on REF_CLK pin. Previously the
"nxp,rmii-refclk-in" was added to indicate that in RMII mode, if
this property present, REF_CLK is input to the PHY, otherwise it
is output. This seems inappropriate now. Because according to the
RMII specification, the REF_CLK is originally input, so there is
no need to add an additional "nxp,rmii-refclk-in" property to
declare that REF_CLK is input.
Unfortunately, because the "nxp,rmii-refclk-in" property has been
added for a while, and we cannot confirm which DTS use the TJA1100
and TJA1101 PHYs, changing it to switch polarity will cause an ABI
break. But fortunately, this property is only valid for TJA1100 and
TJA1101. For TJA1103/TJA1104/TJA1120/TJA1121 PHYs, this property is
invalid because they use the nxp-c45-tja11xx driver, which is a
different driver from TJA1100/TJA1101. Therefore, for PHYs using
nxp-c45-tja11xx driver, add "nxp,rmii-refclk-out" property to
support outputting RMII reference clock on REF_CLK pin.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Fri, 11 Oct 2024 23:03:11 +0000 (16:03 -0700)]
selftests: net: move EXTRA_CLEAN of libynl.a into ynl.mk
Commit
1fd9e4f25782 ("selftests: make kselftest-clean remove libynl outputs")
added EXTRA_CLEAN of YNL generated files to ynl.mk. We already had
a EXTRA_CLEAN in the file including the snippet. Consolidate them.
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20241011230311.2529760-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 11 Oct 2024 23:03:10 +0000 (16:03 -0700)]
selftests: net: rebuild YNL if dependencies changed
Try to rebuild YNL if either user added a new family or the specs
of the families have changed. Stanislav's ncdevmem cause a false
positive build failure in NIPA because libynl.a isn't rebuilt
after ethtool is added to YNL_GENS.
Note that sha1sum is already used in other parts of the build system.
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20241011230311.2529760-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rosen Penev [Fri, 11 Oct 2024 20:02:25 +0000 (13:02 -0700)]
net: mtk_eth_soc: use ethtool_puts
Allows simplifying get_strings and avoids manual pointer manipulation.
Tested on Belkin RT1800.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Link: https://patch.msgid.link/20241011200225.7403-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rosen Penev [Fri, 11 Oct 2024 19:59:55 +0000 (12:59 -0700)]
net: mvneta: use ethtool_puts
Allows simplifying get_strings and avoids manual pointer manipulation.
Tested on Turris Omnia.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Link: https://patch.msgid.link/20241011195955.7065-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 15 Oct 2024 00:55:08 +0000 (17:55 -0700)]
Merge branch 'add-support-for-per-napi-config-via-netlink'
Joe Damato says:
====================
Add support for per-NAPI config via netlink
Greetings:
Welcome to v6. Minor changes from v5 [1], please see changelog below.
There were no explicit comments from reviewers on the call outs in my
v5, so I'm retaining them from my previous cover letter just in case :)
A few important call outs for reviewers:
1. This revision seems to work (see below for a full walk through). I
think this is the behavior we talked about, but please let me know if
a use case is missing.
2. Re a previous point made by Stanislav regarding "taking over a NAPI
ID" when the channel count changes: mlx5 seems to call napi_disable
followed by netif_napi_del for the old queues and then calls
napi_enable for the new ones. In this RFC, the NAPI ID generation is
deferred to napi_enable. This means we won't end up with two of the
same NAPI IDs added to the hash at the same time.
Can we assume all drivers will napi_disable the old queues before
napi_enable the new ones?
- If yes: we might not need to worry about a NAPI ID takeover
function.
- If no: I'll need to make a change so that the NAPI ID generation
is deferred only for drivers which have opted into the config
space via calls to netif_napi_add_config
3. I made the decision to remove the WARN_ON_ONCE that (I think?)
Jakub previously suggested in alloc_netdev_mqs (WARN_ON_ONCE(txqs
!= rxqs);) because this was triggering on every kernel boot with my
mlx5 NIC.
4. I left the "maxqs = max(txqs, rxqs);" in alloc_netdev_mqs despite
thinking this is a bit strange. I think it's strange that we might
be short some number of NAPI configs, but it seems like most people
are in favor of this approach, so I've left it.
I'd appreciate thoughts from reviewers on the above items, if at all
possible.
Now, on to the implementation.
Firstly, this implementation moves certain settings to napi_struct so that
they are "per-NAPI", while taking care to respect existing sysfs
parameters which are interface wide and affect all NAPIs:
- NAPI ID
- gro_flush_timeout
- defer_hard_irqs
Furthermore:
- NAPI ID generation and addition to the hash is now deferred to
napi_enable, instead of during netif_napi_add
- NAPIs are removed from the hash during napi_disable, instead of
netif_napi_del.
- An array of "struct napi_config" is allocated in net_device.
IMPORTANT: The above changes affect all network drivers.
Optionally, drivers may opt-in to using their config space by calling
netif_napi_add_config instead of netif_napi_add.
If a driver does this, the NAPI being added is linked with an allocated
"struct napi_config" and the per-NAPI settings (including NAPI ID) are
persisted even as hardware queues are destroyed and recreated.
To help illustrate how this would end up working, I've added patches for
3 drivers, of which I have access to only 1:
- mlx5 which is the basis of the examples below
- mlx4 which has TX only NAPIs, just to highlight that case. I have
only compile tested this patch; I don't have this hardware.
- bnxt which I have only compiled tested. I don't have this
hardware.
NOTE: I only tested this on mlx5; I have no access to the other hardware
for which I provided patches. Hopefully other folks can help test :)
Here's how it works when I test it on my mlx5 system:
$ ethtool -l eth4 | grep Combined | tail -1
Combined: 2
First, output the current NAPI settings:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 345,
'ifindex': 7,
'irq': 527},
{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 344,
'ifindex': 7,
'irq': 327}]
Now, set the global sysfs parameters:
$ sudo bash -c 'echo 20000 >/sys/class/net/eth4/gro_flush_timeout'
$ sudo bash -c 'echo 100 >/sys/class/net/eth4/napi_defer_hard_irqs'
Output current NAPI settings again:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 345,
'ifindex': 7,
'irq': 527},
{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 344,
'ifindex': 7,
'irq': 327}]
Now set NAPI ID 345, via its NAPI ID to specific values:
$ sudo ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/netdev.yaml \
--do napi-set \
--json='{"id": 345,
"defer-hard-irqs": 111,
"gro-flush-timeout": 11111}'
None
Now output current NAPI settings again to ensure only NAPI ID 345
changed:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 111,
'gro-flush-timeout': 11111,
'id': 345,
'ifindex': 7,
'irq': 527},
{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 344,
'ifindex': 7,
'irq': 327}]
Now, increase gro-flush-timeout only:
$ sudo ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/netdev.yaml \
--do napi-set --json='{"id": 345,
"gro-flush-timeout": 44444}'
None
Now output the current NAPI settings once more:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 111,
'gro-flush-timeout': 44444,
'id': 345,
'ifindex': 7,
'irq': 527},
{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 344,
'ifindex': 7,
'irq': 327}]
Now set NAPI ID 345 to have gro_flush_timeout of 0:
$ sudo ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/netdev.yaml \
--do napi-set --json='{"id": 345,
"gro-flush-timeout": 0}'
None
Check that NAPI ID 345 has a value of 0:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 111,
'gro-flush-timeout': 0,
'id': 345,
'ifindex': 7,
'irq': 527},
{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 344,
'ifindex': 7,
'irq': 327}]
Change the queue count, ensuring that NAPI ID 345 retains its settings:
$ sudo ethtool -L eth4 combined 4
Check that the new queues have the system wide settings but that NAPI ID
345 remains unchanged:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 347,
'ifindex': 7,
'irq': 529},
{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 346,
'ifindex': 7,
'irq': 528},
{'defer-hard-irqs': 111,
'gro-flush-timeout': 0,
'id': 345,
'ifindex': 7,
'irq': 527},
{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 344,
'ifindex': 7,
'irq': 327}]
Now reduce the queue count below where NAPI ID 345 is indexed:
$ sudo ethtool -L eth4 combined 1
Check the output:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 344,
'ifindex': 7,
'irq': 327}]
Re-increase the queue count to ensure NAPI ID 345 is re-assigned the same
values:
$ sudo ethtool -L eth4 combined 2
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 111,
'gro-flush-timeout': 0,
'id': 345,
'ifindex': 7,
'irq': 527},
{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 344,
'ifindex': 7,
'irq': 327}]
Create new queues to ensure the sysfs globals are used for the new NAPIs
but that NAPI ID 345 is unchanged:
$ sudo ethtool -L eth4 combined 8
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[...]
{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 346,
'ifindex': 7,
'irq': 528},
{'defer-hard-irqs': 111,
'gro-flush-timeout': 0,
'id': 345,
'ifindex': 7,
'irq': 527},
{'defer-hard-irqs': 100,
'gro-flush-timeout': 20000,
'id': 344,
'ifindex': 7,
'irq': 327}]
Last, but not least, let's try writing the sysfs parameters to ensure
all NAPIs are rewritten:
$ sudo bash -c 'echo 33333 >/sys/class/net/eth4/gro_flush_timeout'
$ sudo bash -c 'echo 222 >/sys/class/net/eth4/napi_defer_hard_irqs'
Check that worked:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 7}'
[...]
{'defer-hard-irqs': 222,
'gro-flush-timeout': 33333,
'id': 346,
'ifindex': 7,
'irq': 528},
{'defer-hard-irqs': 222,
'gro-flush-timeout': 33333,
'id': 345,
'ifindex': 7,
'irq': 527},
{'defer-hard-irqs': 222,
'gro-flush-timeout': 33333,
'id': 344,
'ifindex': 7,
'irq': 327}]
[1]: https://lore.kernel.org/
20241009005525.13651-1-jdamato@fastly.com
v5: https://lore.kernel.org/
20241009005525.13651-1-jdamato@fastly.com
rfcv4: https://lore.kernel.org/lkml/
20241001235302.57609-1-jdamato@fastly.com
rfcv3: https://lore.kernel.org/
20240912100738.16567-8-jdamato@fastly.com
rfcv2: https://lore.kernel.org/
20240908160702.56618-1-jdamato@fastly.com
====================
Link: https://patch.msgid.link/20241011184527.16393-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joe Damato [Fri, 11 Oct 2024 18:45:04 +0000 (18:45 +0000)]
mlx4: Add support for persistent NAPI config to RX CQs
Use netif_napi_add_config to assign persistent per-NAPI config when
initializing RX CQ NAPIs.
Presently, struct napi_config only has support for two fields used for
RX, so there is no need to support them with TX CQs, yet.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20241011184527.16393-10-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joe Damato [Fri, 11 Oct 2024 18:45:03 +0000 (18:45 +0000)]
mlx5: Add support for persistent NAPI config
Use netif_napi_add_config to assign persistent per-NAPI config when
initializing NAPIs.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20241011184527.16393-9-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joe Damato [Fri, 11 Oct 2024 18:45:02 +0000 (18:45 +0000)]
bnxt: Add support for persistent NAPI config
Use netif_napi_add_config to assign persistent per-NAPI config when
initializing NAPIs.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20241011184527.16393-8-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joe Damato [Fri, 11 Oct 2024 18:45:01 +0000 (18:45 +0000)]
netdev-genl: Support setting per-NAPI config values
Add support to set per-NAPI defer_hard_irqs and gro_flush_timeout.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241011184527.16393-7-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joe Damato [Fri, 11 Oct 2024 18:45:00 +0000 (18:45 +0000)]
net: napi: Add napi_config
Add a persistent NAPI config area for NAPI configuration to the core.
Drivers opt-in to setting the persistent config for a NAPI by passing an
index when calling netif_napi_add_config.
napi_config is allocated in alloc_netdev_mqs, freed in free_netdev
(after the NAPIs are deleted).
Drivers which call netif_napi_add_config will have persistent per-NAPI
settings: NAPI IDs, gro_flush_timeout, and defer_hard_irq settings.
Per-NAPI settings are saved in napi_disable and restored in napi_enable.
Co-developed-by: Martin Karsten <mkarsten@uwaterloo.ca>
Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241011184527.16393-6-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joe Damato [Fri, 11 Oct 2024 18:44:59 +0000 (18:44 +0000)]
netdev-genl: Dump gro_flush_timeout
Support dumping gro_flush_timeout for a NAPI ID.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241011184527.16393-5-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joe Damato [Fri, 11 Oct 2024 18:44:58 +0000 (18:44 +0000)]
net: napi: Make gro_flush_timeout per-NAPI
Allow per-NAPI gro_flush_timeout setting.
The existing sysfs parameter is respected; writes to sysfs will write to
all NAPI structs for the device and the net_device gro_flush_timeout
field. Reads from sysfs will read from the net_device field.
The ability to set gro_flush_timeout on specific NAPI instances will be
added in a later commit, via netdev-genl.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20241011184527.16393-4-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joe Damato [Fri, 11 Oct 2024 18:44:57 +0000 (18:44 +0000)]
netdev-genl: Dump napi_defer_hard_irqs
Support dumping defer_hard_irqs for a NAPI ID.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20241011184527.16393-3-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joe Damato [Fri, 11 Oct 2024 18:44:56 +0000 (18:44 +0000)]
net: napi: Make napi_defer_hard_irqs per-NAPI
Add defer_hard_irqs to napi_struct in preparation for per-NAPI
settings.
The existing sysfs parameter is respected; writes to sysfs will write to
all NAPI structs for the device and the net_device defer_hard_irq field.
Reads from sysfs show the net_device field.
The ability to set defer_hard_irqs on specific NAPI instances will be
added in a later commit, via netdev-genl.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20241011184527.16393-2-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Golle [Fri, 11 Oct 2024 03:40:39 +0000 (04:40 +0100)]
net: phylink: allow half-duplex modes with RATE_MATCH_PAUSE
PHYs performing rate-matching using MAC-side flow-control always
perform duplex-matching as well in case they are supporting
half-duplex modes at all.
No longer remove half-duplex modes from their capabilities.
Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/b157c0c289cfba024039a96e635d037f9d946745.1728617993.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 15 Oct 2024 00:39:38 +0000 (17:39 -0700)]
Merge branch 'tcp-add-skb-sk-to-more-control-packets'
Eric Dumazet says:
====================
tcp: add skb->sk to more control packets
Currently, TCP can set skb->sk for a variety of transmit packets.
However, packets sent on behalf of a TIME_WAIT sockets do not
have an attached socket.
Same issue for RST packets.
We want to change this, in order to increase eBPF program
capabilities.
This is slightly risky, because various layers could
be confused by TIME_WAIT sockets showing up in skb->sk.
v2: audited all sk_to_full_sk() users and addressed Martin feedback.
====================
Link: https://patch.msgid.link/20241010174817.1543642-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Thu, 10 Oct 2024 17:48:17 +0000 (17:48 +0000)]
ipv4: tcp: give socket pointer to control skbs
ip_send_unicast_reply() send orphaned 'control packets'.
These are RST packets and also ACK packets sent from TIME_WAIT.
Some eBPF programs would prefer to have a meaningful skb->sk
pointer as much as possible.
This means that TCP can now attach TIME_WAIT sockets to outgoing
skbs.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Brian Vazquez <brianvv@google.com>
Link: https://patch.msgid.link/20241010174817.1543642-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Thu, 10 Oct 2024 17:48:16 +0000 (17:48 +0000)]
ipv6: tcp: give socket pointer to control skbs
tcp_v6_send_response() send orphaned 'control packets'.
These are RST packets and also ACK packets sent from TIME_WAIT.
Some eBPF programs would prefer to have a meaningful skb->sk
pointer as much as possible.
This means that TCP can now attach TIME_WAIT sockets to outgoing
skbs.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Brian Vazquez <brianvv@google.com>
Link: https://patch.msgid.link/20241010174817.1543642-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Thu, 10 Oct 2024 17:48:15 +0000 (17:48 +0000)]
net: add skb_set_owner_edemux() helper
This can be used to attach a socket to an skb,
taking a reference on sk->sk_refcnt.
This helper might be a NOP if sk->sk_refcnt is zero.
Use it from tcp_make_synack().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Brian Vazquez <brianvv@google.com>
Link: https://patch.msgid.link/20241010174817.1543642-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Thu, 10 Oct 2024 17:48:14 +0000 (17:48 +0000)]
net_sched: sch_fq: prepare for TIME_WAIT sockets
TCP stack is not attaching skb to TIME_WAIT sockets yet,
but we would like to allow this in the future.
Add sk_listener_or_tw() helper to detect the three states
that FQ needs to take care.
Like NEW_SYN_RECV, TIME_WAIT are not full sockets and
do not contain sk->sk_pacing_status, sk->sk_pacing_rate.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Brian Vazquez <brianvv@google.com>
Link: https://patch.msgid.link/20241010174817.1543642-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Thu, 10 Oct 2024 17:48:13 +0000 (17:48 +0000)]
net: add TIME_WAIT logic to sk_to_full_sk()
TCP will soon attach TIME_WAIT sockets to some ACK and RST.
Make sure sk_to_full_sk() detects this and does not return
a non full socket.
v3: also changed sk_const_to_full_sk()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Brian Vazquez <brianvv@google.com>
Link: https://patch.msgid.link/20241010174817.1543642-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simon Horman [Wed, 9 Oct 2024 09:40:10 +0000 (10:40 +0100)]
tg3: Address byte-order miss-matches
Address byte-order miss-matches flagged by Sparse.
In tg3_load_firmware_cpu() and tg3_get_device_address()
this is done using appropriate types to store big endian values.
In the cases of tg3_test_nvram(), where buf is an array which
contains values of several different types, cast to __le32
before converting values to host byte order.
Reported by Sparse as:
.../tg3.c:3745:34: warning: cast to restricted __be32
.../tg3.c:13096:21: warning: cast to restricted __le32
.../tg3.c:13096:21: warning: cast from restricted __be32
.../tg3.c:13101:21: warning: cast to restricted __le32
.../tg3.c:13101:21: warning: cast from restricted __be32
.../tg3.c:17070:63: warning: incorrect type in argument 3 (different base types)
.../tg3.c:17070:63: expected restricted __be32 [usertype] *val
.../tg3.c:17070:63: got unsigned int *
dr.../tg3.c:17071:63: warning: incorrect type in argument 3 (different base types)
.../tg3.c:17071:63: expected restricted __be32 [usertype] *val
.../tg3.c:17071:63: got unsigned int *
Also, address white-space issues on lines modified for the above.
And, for consistency, lines adjacent to them.
Compile tested only.
No functional change intended.
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241009-tg3-sparse-v1-1-6af38a7bf4ff@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David S. Miller [Mon, 14 Oct 2024 12:20:41 +0000 (13:20 +0100)]
Merge branch 'net-ti-ethernet-warnings'
Simon Horman says:
====================
net: ethernet: ti: Address some warnings
This patchset addresses some warnings flagged by Sparse, and clang-18 in
TI Ethernet drivers.
Although these changes do not alter the functionality of the code, by
addressing them real problems introduced in future which are flagged by
tooling will stand out more readily.
Compile tested only.
---
Changes in v2:
- Dropped patch to directly address __percpu Sparse warnings and, instead
- Add patch to use tstats
- Added tags
- Thanks to all for the review of v1
- Link to v1: https://lore.kernel.org/r/
20240910-ti-warn-v1-0-
afd1e404abbe@kernel.org
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Thu, 10 Oct 2024 11:04:12 +0000 (12:04 +0100)]
net: ethernet: ti: cpsw_ale: Remove unused accessor functions
W=1 builds flag that some accessor functions for ALE fields are unused.
Address this by splitting up the macros used to define these
accessors to allow only those that are used to be declared.
The warnings are verbose, but for example, the mcast_state case is
flagged by clang-18 as:
.../cpsw_ale.c:220:1: warning: unused function 'cpsw_ale_get_mcast_state' [-Wunused-function]
220 | DEFINE_ALE_FIELD(mcast_state, 62, 2)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.../cpsw_ale.c:145:19: note: expanded from macro 'DEFINE_ALE_FIELD'
145 | static inline int cpsw_ale_get_##name(u32 *ale_entry) \
| ^~~~~~~~~~~~~~~~~~~
<scratch space>:196:1: note: expanded from here
196 | cpsw_ale_get_mcast_state
| ^~~~~~~~~~~~~~~~~~~~~~~~
Compile tested only.
No functional change intended.
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Thu, 10 Oct 2024 11:04:11 +0000 (12:04 +0100)]
net: ethernet: ti: am65-cpsw: Use tstats instead of open coded version
Make use of struct pcpu_sw_netstats and related helpers to handle
existing per-cpu stats for this driver - the exact same counters
are maintained.
A side effect of this change is to address __percpu warnings
flagged by Sparse:
.../am65-cpsw-nuss.c:2658:55: warning: incorrect type in initializer (different address spaces)
.../am65-cpsw-nuss.c:2658:55: expected struct am65_cpsw_ndev_stats [noderef] __percpu *stats
.../am65-cpsw-nuss.c:2658:55: got void *data
.../am65-cpsw-nuss.c:2781:15: warning: incorrect type in argument 3 (different address spaces)
.../am65-cpsw-nuss.c:2781:15: expected void *data
.../am65-cpsw-nuss.c:2781:15: got struct am65_cpsw_ndev_stats [noderef] __percpu *stats
Compile tested only.
No functional change intended.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/all/20240911170643.7ecb1bbb@kernel.org/
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Thu, 10 Oct 2024 11:04:10 +0000 (12:04 +0100)]
net: ethernet: ti: am65-cpsw: Use __be64 type for id_temp
The id_temp local variable in am65_cpsw_nuss_probe() is
used to hold a 64-bit big-endian value as it is assigned using
cpu_to_be64().
It is read using memcpy(), where it is written as an identifier into a
byte-array. So this can also be treated as big endian.
As it's type is currently host byte order (u64), sparse flags
an endian mismatch when compiling for little-endian systems:
.../am65-cpsw-nuss.c:3454:17: warning: incorrect type in assignment (different base types)
.../am65-cpsw-nuss.c:3454:17: expected unsigned long long [usertype] id_temp
.../am65-cpsw-nuss.c:3454:17: got restricted __be64 [usertype]
Address this by using __be64 as the type of id_temp.
No functional change intended.
Compile tested only.
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Thu, 10 Oct 2024 10:58:02 +0000 (12:58 +0200)]
r8169: enable SG/TSO on selected chip versions per default
Due to problem reports in the past SG and TSO/TSO6 are disabled per
default. It's not fully clear which chip versions are affected, so we
may impact also users of unaffected chip versions, unless they know
how to use ethtool for enabling SG/TSO/TSO6.
Vendor drivers r8168/r8125 enable SG/TSO/TSO6 for selected chip
versions per default, I'd interpret this as confirmation that these
chip versions are unaffected. So let's do the same here.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yu Liao [Thu, 10 Oct 2024 09:27:44 +0000 (17:27 +0800)]
net: hsr: convert to use new timer APIs
del_timer() and del_timer_sync() have been renamed to timer_delete()
and timer_delete_sync().
Inconsistent API usage makes the code a bit confusing, so replace with
the new APIs.
No functional changes intended.
Signed-off-by: Yu Liao <liaoyu15@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 13 Oct 2024 17:02:50 +0000 (18:02 +0100)]
Merge branch 'ethtool-write-firmware'
Danielle Ratson says:
====================
ethtool: Add support for writing firmware
In the CMIS specification for pluggable modules, LPL (Local Payload) and
EPL (Extended Payload) are two types of data payloads used for managing
various functions and features of the module.
EPL payloads are used for more complex and extensive management functions
that require a larger amount of data, so writing firmware blocks using EPL
is much more efficient.
Currently, only LPL payload is supported for writing firmware blocks to
the module.
Add support for writing firmware block using EPL payload, both to support
modules that support only EPL write mechanism, and to optimize the flashing
process of modules that support LPL and EPL.
Running the flashing command on the same sample module using EPL vs. LPL
showed an improvement of 84%.
Patchset overview:
Patch #1: preparations
Patch #2: Add EPL support
v5: Resending- no changes.
v4: Resending the right version after wrong v3.
No changes from v2.
v2:
* Fix the commit meassges to align the cover letter about the
right meaning of LPL and EPL.
Patch #2:
* Initialize the variable 'bytes_written' before the first
iteration.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Danielle Ratson [Wed, 9 Oct 2024 10:53:47 +0000 (13:53 +0300)]
net: ethtool: Add support for writing firmware blocks using EPL payload
In the CMIS specification for pluggable modules, LPL (Local Payload) and
EPL (Extended Payload) are two types of data payloads used for managing
various functions and features of the module.
EPL payloads are used for more complex and extensive management
functions that require a larger amount of data, so writing firmware
blocks using EPL is much more efficient.
Currently, only LPL payload is supported for writing firmware blocks to
the module.
Add support for writing firmware block using EPL payload, both to
support modules that supports only EPL write mechanism, and to optimize
the flashing process of modules that support LPL and EPL.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Danielle Ratson [Wed, 9 Oct 2024 10:53:46 +0000 (13:53 +0300)]
net: ethtool: Add new parameters and a function to support EPL
In the CMIS specification for pluggable modules, LPL (Local Payload) and
EPL (Extended Payload) are two types of data payloads used for managing
various functions and features of the module.
EPL payloads are used for more complex and extensive management
functions that require a larger amount of data, so writing firmware
blocks using EPL is much more efficient.
Currently, only LPL payload is supported for writing firmware blocks to
the module.
Add EPL related parameters to the function ethtool_cmis_cdb_compose_args()
and add a specific function for calculating the maximum allowable length
extension for EPL. Both will be used in the next patch to add support for
writing firmware blocks using EPL.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 13 Oct 2024 10:33:10 +0000 (11:33 +0100)]
Merge branch 'vxlan-skb-drop-reasons'
Menglong Dong says:
====================
net: vxlan: add skb drop reasons support
In this series, we add skb drop reasons support to VXLAN, and following
new skb drop reasons are introduced:
SKB_DROP_REASON_VXLAN_INVALID_HDR
SKB_DROP_REASON_VXLAN_VNI_NOT_FOUND
SKB_DROP_REASON_VXLAN_ENTRY_EXISTS
SKB_DROP_REASON_VXLAN_NO_REMOTE
SKB_DROP_REASON_MAC_INVALID_SOURCE
SKB_DROP_REASON_IP_TUNNEL_ECN
SKB_DROP_REASON_TUNNEL_TXINFO
SKB_DROP_REASON_LOCAL_MAC
We add some helper functions in this series, who will capture the drop
reasons from pskb_may_pull_reason and return them:
pskb_network_may_pull_reason()
pskb_inet_may_pull_reason()
And we also make the following functions return skb drop reasons:
skb_vlan_inet_prepare()
vxlan_remcsum()
vxlan_snoop()
vxlan_set_mac()
Changes since v6:
- fix some typos in the document for SKB_DROP_REASON_TUNNEL_TXINFO
Changes since v5:
- fix some typos in the document for SKB_DROP_REASON_TUNNEL_TXINFO
Changes since v4:
- make skb_vlan_inet_prepare() return drop reasons, instead of introduce
a wrapper for it in the 3rd patch.
- modify the document for SKB_DROP_REASON_LOCAL_MAC and
SKB_DROP_REASON_TUNNEL_TXINFO.
Changes since v3:
- rename SKB_DROP_REASON_VXLAN_INVALID_SMAC to
SKB_DROP_REASON_MAC_INVALID_SOURCE in the 6th patch
Changes since v2:
- move all the drop reasons of VXLAN to the "core", instead of introducing
the VXLAN drop reason subsystem
- add the 6th patch, which capture the drop reasons from vxlan_snoop()
- move the commits for vxlan_remcsum() and vxlan_set_mac() after
vxlan_rcv() to update the call of them accordingly
- fix some format problems
Changes since v1:
- document all the drop reasons that we introduce
- rename the drop reasons to make them more descriptive, as Ido advised
- remove the 2nd patch, which introduce the SKB_DR_RESET
- add the 4th patch, which adds skb_vlan_inet_prepare_reason() helper
- introduce the 6th patch, which make vxlan_set_mac return drop reasons
- introduce the 10th patch, which uses VXLAN_DROP_NO_REMOTE as the drop
reasons, as Ido advised
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Menglong Dong [Wed, 9 Oct 2024 02:28:30 +0000 (10:28 +0800)]
net: vxlan: use kfree_skb_reason() in encap_bypass_if_local()
Replace kfree_skb() with kfree_skb_reason() in encap_bypass_if_local, and
no new skb drop reason is added in this commit.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Menglong Dong [Wed, 9 Oct 2024 02:28:29 +0000 (10:28 +0800)]
net: vxlan: use kfree_skb_reason() in vxlan_encap_bypass()
Replace kfree_skb with kfree_skb_reason in vxlan_encap_bypass, and no new
skb drop reason is added in this commit.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Menglong Dong [Wed, 9 Oct 2024 02:28:28 +0000 (10:28 +0800)]
net: vxlan: use kfree_skb_reason() in vxlan_mdb_xmit()
Replace kfree_skb() with kfree_skb_reason() in vxlan_mdb_xmit. No drop
reasons are introduced in this commit.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Menglong Dong [Wed, 9 Oct 2024 02:28:27 +0000 (10:28 +0800)]
net: vxlan: add drop reasons support to vxlan_xmit_one()
Replace kfree_skb/dev_kfree_skb with kfree_skb_reason in vxlan_xmit_one.
No drop reasons are introduced in this commit.
The only concern of mine is replacing dev_kfree_skb with
kfree_skb_reason. The dev_kfree_skb is equal to consume_skb, and I'm not
sure if we can change it to kfree_skb here. In my option, the skb is
"dropped" here, isn't it?
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Menglong Dong [Wed, 9 Oct 2024 02:28:26 +0000 (10:28 +0800)]
net: vxlan: use kfree_skb_reason() in vxlan_xmit()
Replace kfree_skb() with kfree_skb_reason() in vxlan_xmit(). Following
new skb drop reasons are introduced for vxlan:
/* no remote found for xmit */
SKB_DROP_REASON_VXLAN_NO_REMOTE
/* packet without necessary metadata reached a device which is
* in "external" mode
*/
SKB_DROP_REASON_TUNNEL_TXINFO
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Menglong Dong [Wed, 9 Oct 2024 02:28:25 +0000 (10:28 +0800)]
net: vxlan: make vxlan_set_mac() return drop reasons
Change the return type of vxlan_set_mac() from bool to enum
skb_drop_reason. In this commit, the drop reason
"SKB_DROP_REASON_LOCAL_MAC" is introduced for the case that the source
mac of the packet is a local mac.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Menglong Dong [Wed, 9 Oct 2024 02:28:24 +0000 (10:28 +0800)]
net: vxlan: make vxlan_snoop() return drop reasons
Change the return type of vxlan_snoop() from bool to enum
skb_drop_reason. In this commit, two drop reasons are introduced:
SKB_DROP_REASON_MAC_INVALID_SOURCE
SKB_DROP_REASON_VXLAN_ENTRY_EXISTS
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Menglong Dong [Wed, 9 Oct 2024 02:28:23 +0000 (10:28 +0800)]
net: vxlan: make vxlan_remcsum() return drop reasons
Make vxlan_remcsum() support skb drop reasons by changing the return
value type of it from bool to enum skb_drop_reason.
The only drop reason in vxlan_remcsum() comes from pskb_may_pull_reason(),
so we just return it.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Menglong Dong [Wed, 9 Oct 2024 02:28:22 +0000 (10:28 +0800)]
net: vxlan: add skb drop reasons to vxlan_rcv()
Introduce skb drop reasons to the function vxlan_rcv(). Following new
drop reasons are added:
SKB_DROP_REASON_VXLAN_INVALID_HDR
SKB_DROP_REASON_VXLAN_VNI_NOT_FOUND
SKB_DROP_REASON_IP_TUNNEL_ECN
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Menglong Dong [Wed, 9 Oct 2024 02:28:21 +0000 (10:28 +0800)]
net: tunnel: make skb_vlan_inet_prepare() return drop reasons
Make skb_vlan_inet_prepare return the skb drop reasons, which is just
what pskb_may_pull_reason() returns. Meanwhile, adjust all the call of
it.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Menglong Dong [Wed, 9 Oct 2024 02:28:20 +0000 (10:28 +0800)]
net: tunnel: add pskb_inet_may_pull_reason() helper
Introduce the function pskb_inet_may_pull_reason() and make
pskb_inet_may_pull a simple inline call to it. The drop reasons of it just
come from pskb_may_pull_reason().
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Menglong Dong [Wed, 9 Oct 2024 02:28:19 +0000 (10:28 +0800)]
net: skb: add pskb_network_may_pull_reason() helper
Introduce the function pskb_network_may_pull_reason() and make
pskb_network_may_pull() a simple inline call to it. The drop reasons of
it just come from pskb_may_pull_reason.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Justin Chen [Thu, 10 Oct 2024 22:15:06 +0000 (15:15 -0700)]
net: bcmasp: enable SW timestamping
Add skb_tx_timestamp() call and enable support for SW
timestamping.
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20241010221506.802730-1-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Justin Chen [Thu, 10 Oct 2024 19:13:32 +0000 (12:13 -0700)]
net: broadcom: remove select MII from brcmstb Ethernet drivers
The MII driver isn't used by brcmstb Ethernet drivers. Remove it
from the BCMASP, GENET, and SYSTEMPORT drivers.
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20241010191332.1074642-1-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 11 Oct 2024 22:54:42 +0000 (15:54 -0700)]
Merge branch 'microchip_t1s-update-on-microchip-10base-t1s-phy-driver'
Parthiban Veerasooran says:
====================
microchip_t1s: Update on Microchip 10BASE-T1S PHY driver
This patch series contain the below updates:
- Restructured lan865x_write_cfg_params() and lan865x_read_cfg_params()
functions arguments to more generic.
- Updated new/improved initial settings of LAN865X Rev.B0 from latest
AN1760.
- Added support for LAN865X Rev.B1 from latest AN1760.
- Moved LAN867X reset handling to a new function for flexibility.
- Added support for LAN867X Rev.C1/C2 from latest AN1699.
- Disabled/enabled collision detection based on PLCA setting.
====================
Link: https://patch.msgid.link/20241010082205.221493-1-parthiban.veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Parthiban Veerasooran [Thu, 10 Oct 2024 08:22:05 +0000 (13:52 +0530)]
net: phy: microchip_t1s: configure collision detection based on PLCA mode
As per LAN8650/1 Rev.B0/B1 AN1760 (Revision F (DS60001760G - June 2024))
and LAN8670/1/2 Rev.C1/C2 AN1699 (Revision E (DS60001699F - June 2024)),
under normal operation, the device should be operated in PLCA mode.
Disabling collision detection is recommended to allow the device to
operate in noisy environments or when reflections and other inherent
transmission line distortion cause poor signal quality. Collision
detection must be re-enabled if the device is configured to operate in
CSMA/CD mode.
Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
Link: https://patch.msgid.link/20241010082205.221493-8-parthiban.veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Parthiban Veerasooran [Thu, 10 Oct 2024 08:22:04 +0000 (13:52 +0530)]
net: phy: microchip_t1s: add support for Microchip's LAN867X Rev.C2
Add support for LAN8670/1/2 Rev.C2 as per the latest configuration note
AN1699 released (Revision E (DS60001699F - June 2024)) for Rev.C1 is also
applicable for Rev.C2. Refer hardware revisions list in the latest AN1699
Revision E (DS60001699F - June 2024).
https://www.microchip.com/en-us/application-notes/an1699
Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
Link: https://patch.msgid.link/20241010082205.221493-7-parthiban.veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Parthiban Veerasooran [Thu, 10 Oct 2024 08:22:03 +0000 (13:52 +0530)]
net: phy: microchip_t1s: add support for Microchip's LAN867X Rev.C1
Add support for LAN8670/1/2 Rev.C1 as per the latest configuration note
AN1699 released (Revision E (DS60001699F - June 2024)).
https://www.microchip.com/en-us/application-notes/an1699
Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
Link: https://patch.msgid.link/20241010082205.221493-6-parthiban.veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Parthiban Veerasooran [Thu, 10 Oct 2024 08:22:02 +0000 (13:52 +0530)]
net: phy: microchip_t1s: move LAN867X reset handling to a new function
Move LAN867X reset handling code to a new function called
lan867x_check_reset_complete() which will be useful for the next patch
which also uses the same code to handle the reset functionality.
Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
Link: https://patch.msgid.link/20241010082205.221493-5-parthiban.veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Parthiban Veerasooran [Thu, 10 Oct 2024 08:22:01 +0000 (13:52 +0530)]
net: phy: microchip_t1s: add support for Microchip's LAN865X Rev.B1
Add support for LAN8650/1 Rev.B1. As per the latest configuration note
AN1760 released (Revision F (DS60001760G - June 2024)) for Rev.B0 is also
applicable for Rev.B1. Refer hardware revisions list in the latest AN1760
Revision F (DS60001760G - June 2024).
https://www.microchip.com/en-us/application-notes/an1760
Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
Link: https://patch.msgid.link/20241010082205.221493-4-parthiban.veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Parthiban Veerasooran [Thu, 10 Oct 2024 08:22:00 +0000 (13:52 +0530)]
net: phy: microchip_t1s: update new initial settings for LAN865X Rev.B0
Update the new/improved initial settings from the latest configuration
application note AN1760 released for LAN8650/1 Rev.B0 Revision F
(DS60001760G - June 2024).
https://www.microchip.com/en-us/application-notes/an1760
Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
Link: https://patch.msgid.link/20241010082205.221493-3-parthiban.veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Parthiban Veerasooran [Thu, 10 Oct 2024 08:21:59 +0000 (13:51 +0530)]
net: phy: microchip_t1s: restructure cfg read/write functions arguments
Restructure lan865x_write_cfg_params() and lan865x_read_cfg_params()
functions arguments to more generic which will be useful for the next
patch which updates the improved initial configuration for LAN8650/1
Rev.B0 published in the Configuration Note.
Signed-off-by: Parthiban Veerasooran <parthiban.veerasooran@microchip.com>
Link: https://patch.msgid.link/20241010082205.221493-2-parthiban.veerasooran@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 10 Oct 2024 21:18:57 +0000 (14:18 -0700)]
selftests: drv-net: add missing trailing backslash
Commit
b3ea416419c8 ("testing: net-drv: add basic shaper test")
removed the trailing backslash from the last entry. We have
a terminating comment here to avoid having to modify the last
line when adding at the end.
Reviewed-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20241010211857.2193076-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 11 Oct 2024 22:44:29 +0000 (15:44 -0700)]
Merge branch 'netdevsim-better-ipsec-output-format'
Hangbin Liu says:
====================
netdevsim: better ipsec output format
The first 2 patches improve the netdevsim ipsec debug output with better
format. The 3rd patch update the selftests.
v2: update rtnetlink selftest with new output format (Stanislav Fomichev, Jakub Kicinski)
====================
Link: https://patch.msgid.link/20241010040027.21440-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hangbin Liu [Thu, 10 Oct 2024 04:00:27 +0000 (04:00 +0000)]
selftests: rtnetlink: update netdevsim ipsec output format
After the netdevsim update to use human-readable IP address formats for
IPsec, we can now use the source and destination IPs directly in testing.
Here is the result:
# ./rtnetlink.sh -t kci_test_ipsec_offload
PASS: ipsec_offload
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20241010040027.21440-4-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hangbin Liu [Thu, 10 Oct 2024 04:00:26 +0000 (04:00 +0000)]
netdevsim: copy addresses for both in and out paths
The current code only copies the address for the in path, leaving the out
path address set to 0. This patch corrects the issue by copying the addresses
for both the in and out paths. Before this patch:
# cat /sys/kernel/debug/netdevsim/netdevsim0/ports/0/ipsec
SA count=2 tx=20
sa[0] tx ipaddr=0.0.0.0
sa[0] spi=0x00000100 proto=0x32 salt=0x0adecc3a crypt=1
sa[0] key=0x3167608a
ca4f1397 43565909 941fa627
sa[1] rx ipaddr=192.168.0.1
sa[1] spi=0x00000101 proto=0x32 salt=0x0adecc3a crypt=1
sa[1] key=0x3167608a
ca4f1397 43565909 941fa627
After this patch:
= cat /sys/kernel/debug/netdevsim/netdevsim0/ports/0/ipsec
SA count=2 tx=20
sa[0] tx ipaddr=192.168.0.2
sa[0] spi=0x00000100 proto=0x32 salt=0x0adecc3a crypt=1
sa[0] key=0x3167608a
ca4f1397 43565909 941fa627
sa[1] rx ipaddr=192.168.0.1
sa[1] spi=0x00000101 proto=0x32 salt=0x0adecc3a crypt=1
sa[1] key=0x3167608a
ca4f1397 43565909 941fa627
Fixes:
7699353da875 ("netdevsim: add ipsec offload testing")
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20241010040027.21440-3-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hangbin Liu [Thu, 10 Oct 2024 04:00:25 +0000 (04:00 +0000)]
netdevsim: print human readable IP address
Currently, IPSec addresses are printed in hexadecimal format, which is
not user-friendly. e.g.
# cat /sys/kernel/debug/netdevsim/netdevsim0/ports/0/ipsec
SA count=2 tx=20
sa[0] rx ipaddr=0x00000000
00000000 00000000 0100a8c0
sa[0] spi=0x00000101 proto=0x32 salt=0x0adecc3a crypt=1
sa[0] key=0x3167608a
ca4f1397 43565909 941fa627
sa[1] tx ipaddr=0x00000000
00000000 00000000 00000000
sa[1] spi=0x00000100 proto=0x32 salt=0x0adecc3a crypt=1
sa[1] key=0x3167608a
ca4f1397 43565909 941fa627
This patch updates the code to print the IPSec address in a human-readable
format for easier debug. e.g.
# cat /sys/kernel/debug/netdevsim/netdevsim0/ports/0/ipsec
SA count=4 tx=40
sa[0] tx ipaddr=0.0.0.0
sa[0] spi=0x00000100 proto=0x32 salt=0x0adecc3a crypt=1
sa[0] key=0x3167608a
ca4f1397 43565909 941fa627
sa[1] rx ipaddr=192.168.0.1
sa[1] spi=0x00000101 proto=0x32 salt=0x0adecc3a crypt=1
sa[1] key=0x3167608a
ca4f1397 43565909 941fa627
sa[2] tx ipaddr=::
sa[2] spi=0x00000100 proto=0x32 salt=0x0adecc3a crypt=1
sa[2] key=0x3167608a
ca4f1397 43565909 941fa627
sa[3] rx ipaddr=2000::1
sa[3] spi=0x00000101 proto=0x32 salt=0x0adecc3a crypt=1
sa[3] key=0x3167608a
ca4f1397 43565909 941fa627
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20241010040027.21440-2-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Aryan Srivastava [Wed, 9 Oct 2024 21:23:19 +0000 (10:23 +1300)]
net: dsa: mv88e6xxx: Fix uninitialised err value
The err value in mv88e6xxx_region_atu_snapshot is now potentially
uninitialised on return. Initialise err as 0.
Fixes:
ada5c3229b32 ("net: dsa: mv88e6xxx: Add FID map cache")
Signed-off-by: Aryan Srivastava <aryan.srivastava@alliedtelesis.co.nz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241009212319.1045176-1-aryan.srivastava@alliedtelesis.co.nz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 11 Oct 2024 22:41:36 +0000 (15:41 -0700)]
Merge branch 'net-xilinx-emaclite-adopt-clock-support'
Radhey Shyam Pandey says:
====================
net: xilinx: emaclite: Adopt clock support
This patchset adds emaclite clock support. AXI Ethernet Lite IP can also
be used on SoC platforms like Zynq UltraScale+ MPSoC which combines
powerful processing system (PS) and user-programmable logic (PL) into
the same device. On these platforms it is mandatory to explicitly enable
IP clocks for proper functionality.
====================
Link: https://patch.msgid.link/1728491303-1456171-1-git-send-email-radhey.shyam.pandey@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Abin Joseph [Wed, 9 Oct 2024 16:28:23 +0000 (21:58 +0530)]
net: emaclite: Adopt clock support
Adapt to use the clock framework. Add s_axi_aclk clock from the processor
bus clock domain and make clk optional to keep DTB backward compatibility.
Signed-off-by: Abin Joseph <abin.joseph@amd.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1728491303-1456171-4-git-send-email-radhey.shyam.pandey@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Abin Joseph [Wed, 9 Oct 2024 16:28:22 +0000 (21:58 +0530)]
net: emaclite: Replace alloc_etherdev() with devm_alloc_etherdev()
Use device managed ethernet device allocation to simplify the error
handling logic. No functional change.
Signed-off-by: Abin Joseph <abin.joseph@amd.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1728491303-1456171-3-git-send-email-radhey.shyam.pandey@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Abin Joseph [Wed, 9 Oct 2024 16:28:21 +0000 (21:58 +0530)]
dt-bindings: net: emaclite: Add clock support
Add s_axi_aclk AXI4 clock support. Traditionally this IP was used on
microblaze platforms which had fixed clocks enabled all the time. But
since its a PL IP, it can also be used on SoC platforms like Zynq
UltraScale+ MPSoC which combines processing system (PS) and user
programmable logic (PL) into the same device. On these platforms instead
of fixed enabled clocks it is mandatory to explicitly enable IP clocks
for proper functionality.
So make clock a required property and also define max supported clock
constraints.
Signed-off-by: Abin Joseph <abin.joseph@amd.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://patch.msgid.link/1728491303-1456171-2-git-send-email-radhey.shyam.pandey@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 11 Oct 2024 22:35:07 +0000 (15:35 -0700)]
Merge branch 'net-remove-rtnl-from-fib_seq_sum'
Eric Dumazet says:
====================
net: remove RTNL from fib_seq_sum()
This series is inspired by a syzbot report showing
rtnl contention and one thread blocked in:
7 locks held by syz-executor/10835:
#0:
ffff888033390420 (sb_writers#8){.+.+}-{0:0}, at: file_start_write include/linux/fs.h:2931 [inline]
#0:
ffff888033390420 (sb_writers#8){.+.+}-{0:0}, at: vfs_write+0x224/0xc90 fs/read_write.c:679
#1:
ffff88806df6bc88 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x1ea/0x500 fs/kernfs/file.c:325
#2:
ffff888026fcf3c8 (kn->active#50){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x20e/0x500 fs/kernfs/file.c:326
#3:
ffffffff8f56f848 (nsim_bus_dev_list_lock){+.+.}-{3:3}, at: new_device_store+0x1b4/0x890 drivers/net/netdevsim/bus.c:166
#4:
ffff88805e0140e8 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:1014 [inline]
#4:
ffff88805e0140e8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x8e/0x520 drivers/base/dd.c:1005
#5:
ffff88805c5fb250 (&devlink->lock_key#55){+.+.}-{3:3}, at: nsim_drv_probe+0xcb/0xb80 drivers/net/netdevsim/dev.c:1534
#6:
ffffffff8fcd1748 (rtnl_mutex){+.+.}-{3:3}, at: fib_seq_sum+0x31/0x290 net/core/fib_notifier.c:46
====================
Link: https://patch.msgid.link/20241009184405.3752829-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 9 Oct 2024 18:44:05 +0000 (18:44 +0000)]
net: do not acquire rtnl in fib_seq_sum()
After we made sure no fib_seq_read() handlers needs RTNL anymore,
we can remove RTNL from fib_seq_sum().
Note that after RTNL was dropped, fib_seq_sum() result was possibly
outdated anyway.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20241009184405.3752829-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 9 Oct 2024 18:44:04 +0000 (18:44 +0000)]
ipmr: use READ_ONCE() to read net->ipv[46].ipmr_seq
mr_call_vif_notifiers() and mr_call_mfc_notifiers() already
uses WRITE_ONCE() on the write side.
Using RTNL to protect the reads seems a big hammer.
Constify 'struct net' argument of ip6mr_rules_seq_read()
and ipmr_rules_seq_read().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20241009184405.3752829-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 9 Oct 2024 18:44:03 +0000 (18:44 +0000)]
ipv6: use READ_ONCE()/WRITE_ONCE() on fib6_table->fib_seq
Using RTNL to protect ops->fib_rules_seq reads seems a big hammer.
Writes are protected by RTNL.
We can use READ_ONCE() when reading it.
Constify 'struct net' argument of fib6_tables_seq_read() and
fib6_rules_seq_read().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20241009184405.3752829-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 9 Oct 2024 18:44:02 +0000 (18:44 +0000)]
ipv4: use READ_ONCE()/WRITE_ONCE() on net->ipv4.fib_seq
Using RTNL to protect ops->fib_rules_seq reads seems a big hammer.
Writes are protected by RTNL.
We can use READ_ONCE() when reading it.
Constify 'struct net' argument of fib4_rules_seq_read()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20241009184405.3752829-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 9 Oct 2024 18:44:01 +0000 (18:44 +0000)]
fib: rules: use READ_ONCE()/WRITE_ONCE() on ops->fib_rules_seq
Using RTNL to protect ops->fib_rules_seq reads seems a big hammer.
Writes are protected by RTNL.
We can use READ_ONCE() on readers.
Constify 'struct net' argument of fib_rules_seq_read()
and lookup_rules_ops().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20241009184405.3752829-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Thu, 10 Oct 2024 03:41:00 +0000 (03:41 +0000)]
tcp: move sysctl_tcp_l3mdev_accept to netns_ipv4_read_rx
sysctl_tcp_l3mdev_accept is read from TCP receive fast path from
tcp_v6_early_demux(),
__inet6_lookup_established,
inet_request_bound_dev_if().
Move it to netns_ipv4_read_rx.
Remove the '#ifdef CONFIG_NET_L3_MASTER_DEV' that was guarding
its definition.
Note this adds a hole of three bytes that could be filled later.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Cc: Wei Wang <weiwan@google.com>
Cc: Coco Li <lixiaoyan@google.com>
Link: https://patch.msgid.link/20241010034100.320832-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Aryan Srivastava [Thu, 10 Oct 2024 00:49:34 +0000 (13:49 +1300)]
net: phy: aquantia: poll status register
The system interface connection status register is not immediately
correct upon line side link up. This results in the status being read as
OFF and then transitioning to the correct host side link mode with a
short delay. This causes the phylink framework passing the OFF status
down to all MAC config drivers, resulting in the host side link being
misconfigured, which in turn can lead to link flapping or complete
packet loss in some cases.
Mitigate this by periodically polling the register until it not showing
the OFF state. This will be done every 1ms for 10ms, using the same
poll/timeout as the processor intensive operation reads.
If the phy is still expressing the OFF state after the timeout, then set
the link to false and pass the NA interface mode onto the phylink
framework.
Signed-off-by: Aryan Srivastava <aryan.srivastava@alliedtelesis.co.nz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20241010004935.1774601-1-aryan.srivastava@alliedtelesis.co.nz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 8 Oct 2024 15:48:24 +0000 (08:48 -0700)]
eth: remove the DLink/Sundance (ST201) driver
Konstantin reports the maintainer's address bounces.
There is no other maintainer and the driver is quite old.
There is a good chance nobody is using this driver any more.
Let's try to remove it completely, we can revert it back in
if someone complains.
Link: https://lore.kernel.org/20240925-bizarre-earwig-from-pluto-1484aa@lemu/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Denis Kirjanov <dkirjanov@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 11 Oct 2024 01:40:34 +0000 (18:40 -0700)]
Merge branch 'tg3-link-irqs-napis-and-queues'
Joe Damato says:
====================
tg3: Link IRQs, NAPIs, and queues
This follows from a previous RFC (wherein I botched the subject lines of
all the messages) [1].
I've taken Michael Chan's suggestion on modifying patch 2 and I've
updated the commit messages of both patches to test and show the output
for the default 1 TX 4 RX queues and the 4 TX and 4 RX queues cases.
Reviewers: please check the commit messages carefully to ensure the
output is correct (or on your own systems to verify, if you like). I am
not a tg3 expert and it's possible that I got something wrong.
[1]: https://lore.kernel.org/all/
20241005145717.302575-3-jdamato@fastly.com/T/
====================
Link: https://patch.msgid.link/20241009175509.31753-1-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joe Damato [Wed, 9 Oct 2024 17:55:09 +0000 (17:55 +0000)]
tg3: Link queues to NAPIs
Link queues to NAPIs using the netdev-genl API so this information is
queryable.
First, test with the default setting on my tg3 NIC at boot with 1 TX
queue:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump queue-get --json='{"ifindex": 2}'
[{'id': 0, 'ifindex': 2, 'napi-id': 8194, 'type': 'rx'},
{'id': 1, 'ifindex': 2, 'napi-id': 8195, 'type': 'rx'},
{'id': 2, 'ifindex': 2, 'napi-id': 8196, 'type': 'rx'},
{'id': 3, 'ifindex': 2, 'napi-id': 8197, 'type': 'rx'},
{'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'tx'}]
Now, adjust the number of TX queues to be 4 via ethtool:
$ sudo ethtool -L eth0 tx 4
$ sudo ethtool -l eth0 | tail -5
Current hardware settings:
RX: 4
TX: 4
Other: n/a
Combined: n/a
Despite "Combined: n/a" in the ethtool output, /proc/interrupts shows
the tg3 has renamed the IRQs to be combined:
343: [...] eth0-0
344: [...] eth0-txrx-1
345: [...] eth0-txrx-2
346: [...] eth0-txrx-3
347: [...] eth0-txrx-4
Now query this via netlink to ensure the queues are linked properly to
their NAPIs:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump queue-get --json='{"ifindex": 2}'
[{'id': 0, 'ifindex': 2, 'napi-id': 8960, 'type': 'rx'},
{'id': 1, 'ifindex': 2, 'napi-id': 8961, 'type': 'rx'},
{'id': 2, 'ifindex': 2, 'napi-id': 8962, 'type': 'rx'},
{'id': 3, 'ifindex': 2, 'napi-id': 8963, 'type': 'rx'},
{'id': 0, 'ifindex': 2, 'napi-id': 8960, 'type': 'tx'},
{'id': 1, 'ifindex': 2, 'napi-id': 8961, 'type': 'tx'},
{'id': 2, 'ifindex': 2, 'napi-id': 8962, 'type': 'tx'},
{'id': 3, 'ifindex': 2, 'napi-id': 8963, 'type': 'tx'}]
As you can see above, id 0 for both TX and RX share a NAPI, NAPI ID
8960, and so on for each queue index up to 3.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241009175509.31753-3-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joe Damato [Wed, 9 Oct 2024 17:55:08 +0000 (17:55 +0000)]
tg3: Link IRQs to NAPI instances
Link IRQs to NAPI instances with netif_napi_set_irq. This information
can be queried with the netdev-genl API.
Begin by testing my tg3 device in its default state: 1 TX queue and 4 RX
queues.
Compare the output of /proc/interrupts for my tg3 device with the output of
netdev-genl after applying this patch:
$ cat /proc/interrupts | grep eth0
343: [...] eth0-tx-0
344: [...] eth0-rx-1
345: [...] eth0-rx-2
346: [...] eth0-rx-3
347: [...] eth0-rx-4
As you can see above, tg3 has named the IRQs such that there is a
dedicated tx IRQ and 4 dedicated rx IRQs, for a total of 5 IRQs.
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 2}'
[{'id': 8197, 'ifindex': 2, 'irq': 347},
{'id': 8196, 'ifindex': 2, 'irq': 346},
{'id': 8195, 'ifindex': 2, 'irq': 345},
{'id': 8194, 'ifindex': 2, 'irq': 344},
{'id': 8193, 'ifindex': 2, 'irq': 343}]
Netlink displays the same IRQs as above, noting that each is mapped to a
unique NAPI instance.
Now, reconfigure the NIC to have 4 TX queues and 4 RX queues:
$ sudo ethtool -L eth0 rx 4 tx 4
$ sudo ethtool -l eth0 | tail -5
Current hardware settings:
RX: 4
TX: 4
Other: n/a
Combined: n/a
Examine /proc/interrupts once again, noting that tg3 will now rename the
IRQs to suggest that they are combined tx and rx without allocating
additional IRQs, so the total IRQ count in /proc/interrupts is
unchanged:
343: [...] eth0-0
344: [...] eth0-txrx-1
345: [...] eth0-txrx-2
346: [...] eth0-txrx-3
347: [...] eth0-txrx-4
Check the output from netlink again:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 2}'
[{'id': 8973, 'ifindex': 2, 'irq': 347},
{'id': 8972, 'ifindex': 2, 'irq': 346},
{'id': 8971, 'ifindex': 2, 'irq': 345},
{'id': 8970, 'ifindex': 2, 'irq': 344},
{'id': 8969, 'ifindex': 2, 'irq': 343}]
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241009175509.31753-2-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Wed, 9 Oct 2024 05:48:05 +0000 (07:48 +0200)]
r8169: remove original workaround for RTL8125 broken rx issue
Now that we have
b9c7ac4fe22c ("r8169: disable ALDPS per default for
RTL8125"), the first attempt to fix the issue shouldn't be needed
any longer. So let's effectively revert
621735f59064 ("r8169: fix
rare issue with broken rx after link-down on RTL8125") and see
whether anybody complains.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/382d8c88-cbce-400f-ad62-fda0181c7e38@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Heiner Kallweit [Wed, 9 Oct 2024 05:44:23 +0000 (07:44 +0200)]
r8169: don't apply UDP padding quirk on RTL8126A
Vendor drivers r8125/r8126 indicate that this quirk isn't needed
any longer for RTL8126A. Mimic this in r8169.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/d1317187-aa81-4a69-b831-678436e4de62@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 3 Oct 2024 17:05:55 +0000 (10:05 -0700)]
Merge git://git./linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.12-rc3).
No conflicts and no adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 10 Oct 2024 19:36:35 +0000 (12:36 -0700)]
Merge tag 'net-6.12-rc3' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth and netfilter.
Current release - regressions:
- dsa: sja1105: fix reception from VLAN-unaware bridges
- Revert "net: stmmac: set PP_FLAG_DMA_SYNC_DEV only if XDP is
enabled"
- eth: fec: don't save PTP state if PTP is unsupported
Current release - new code bugs:
- smc: fix lack of icsk_syn_mss with IPPROTO_SMC, prevent null-deref
- eth: airoha: update Tx CPU DMA ring idx at the end of xmit loop
- phy: aquantia: AQR115c fix up PMA capabilities
Previous releases - regressions:
- tcp: 3 fixes for retrans_stamp and undo logic
Previous releases - always broken:
- net: do not delay dst_entries_add() in dst_release()
- netfilter: restrict xtables extensions to families that are safe,
syzbot found a way to combine ebtables with extensions that are
never used by userspace tools
- sctp: ensure sk_state is set to CLOSED if hashing fails in
sctp_listen_start
- mptcp: handle consistently DSS corruption, and prevent corruption
due to large pmtu xmit"
* tag 'net-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
MAINTAINERS: Add headers and mailing list to UDP section
MAINTAINERS: consistently exclude wireless files from NETWORKING [GENERAL]
slip: make slhc_remember() more robust against malicious packets
net/smc: fix lacks of icsk_syn_mss with IPPROTO_SMC
ppp: fix ppp_async_encode() illegal access
docs: netdev: document guidance on cleanup patches
phonet: Handle error of rtnl_register_module().
mpls: Handle error of rtnl_register_module().
mctp: Handle error of rtnl_register_module().
bridge: Handle error of rtnl_register_module().
vxlan: Handle error of rtnl_register_module().
rtnetlink: Add bulk registration helpers for rtnetlink message handlers.
net: do not delay dst_entries_add() in dst_release()
mptcp: pm: do not remove closing subflows
mptcp: fallback when MPTCP opts are dropped after 1st data
tcp: fix mptcp DSS corruption due to large pmtu xmit
mptcp: handle consistently DSS corruption
net: netconsole: fix wrong warning
net: dsa: refuse cross-chip mirroring operations
net: fec: don't save PTP state if PTP is unsupported
...
Linus Torvalds [Thu, 10 Oct 2024 19:25:32 +0000 (12:25 -0700)]
Merge tag 'trace-ringbuffer-v6.12-rc2' of git://git./linux/kernel/git/trace/linux-trace
Pull tracing fix from Steven Rostedt:
"Ring-buffer fix: do not have boot-mapped buffers use CPU hotplug
callbacks
When a ring buffer is mapped to memory assigned at boot, it also
splits it up evenly between the possible CPUs. But the allocation code
still attached a CPU notifier callback to this ring buffer. When a CPU
is added, the callback will happen and another per-cpu buffer is
created for the ring buffer.
But for boot mapped buffers, there is no room to add another one (as
they were all created already). The result of calling the CPU hotplug
notifier on a boot mapped ring buffer is unpredictable and could lead
to a system crash.
If the ring buffer is boot mapped simply do not attach the CPU
notifier to it"
* tag 'trace-ringbuffer-v6.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ring-buffer: Do not have boot mapped buffers hook to CPU hotplug
Linus Torvalds [Thu, 10 Oct 2024 17:02:59 +0000 (10:02 -0700)]
Merge tag 'for-6.12-rc2-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- update fstrim loop and add more cancellation points, fix reported
delayed or blocked suspend if there's a huge chunk queued
- fix error handling in recent qgroup xarray conversion
- in zoned mode, fix warning printing device path without RCU
protection
- again fix invalid extent xarray state (
6252690f7e1b), lost due to
refactoring
* tag 'for-6.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix clear_dirty and writeback ordering in submit_one_sector()
btrfs: zoned: fix missing RCU locking in error message when loading zone info
btrfs: fix missing error handling when adding delayed ref with qgroups enabled
btrfs: add cancellation points to trim loops
btrfs: split remaining space to discard in chunks
Linus Torvalds [Thu, 10 Oct 2024 16:52:49 +0000 (09:52 -0700)]
Merge tag 'nfsd-6.12-1' of git://git./linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Fix NFSD bring-up / shutdown
- Fix a UAF when releasing a stateid
* tag 'nfsd-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: fix possible badness in FREE_STATEID
nfsd: nfsd_destroy_serv() must call svc_destroy() even if nfsd_startup_net() failed
NFSD: Mark filecache "down" if init fails
Linus Torvalds [Thu, 10 Oct 2024 16:45:45 +0000 (09:45 -0700)]
Merge tag 'xfs-6.12-fixes-3' of git://git./fs/xfs/xfs-linux
Pull xfs fixes from Carlos Maiolino:
- A few small typo fixes
- fstests xfs/538 DEBUG-only fix
- Performance fix on blockgc on COW'ed files, by skipping trims on
cowblock inodes currently opened for write
- Prevent cowblocks to be freed under dirty pagecache during unshare
- Update MAINTAINERS file to quote the new maintainer
* tag 'xfs-6.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix a typo
xfs: don't free cowblocks from under dirty pagecache on unshare
xfs: skip background cowblock trims on inodes open for write
xfs: support lowmode allocations in xfs_bmap_exact_minlen_extent_alloc
xfs: call xfs_bmap_exact_minlen_extent_alloc from xfs_bmap_btalloc
xfs: don't ifdef around the exact minlen allocations
xfs: fold xfs_bmap_alloc_userdata into xfs_bmapi_allocate
xfs: distinguish extra split from real ENOSPC from xfs_attr_node_try_addname
xfs: distinguish extra split from real ENOSPC from xfs_attr3_leaf_split
xfs: return bool from xfs_attr3_leaf_add
xfs: merge xfs_attr_leaf_try_add into xfs_attr_leaf_addname
xfs: Use try_cmpxchg() in xlog_cil_insert_pcp_aggregate()
xfs: scrub: convert comma to semicolon
xfs: Remove empty declartion in header file
MAINTAINERS: add Carlos Maiolino as XFS release manager
Jakub Kicinski [Thu, 10 Oct 2024 16:35:50 +0000 (09:35 -0700)]
Merge branch 'maintainers-networking-file-coverage-updates'
Simon Horman says:
====================
MAINTAINERS: Networking file coverage updates
The aim of this proposal is to make the handling of some files,
related to Networking and Wireless, more consistently. It does so by:
1. Adding some more headers to the UDP section, making it consistent
with the TCP section.
2. Excluding some files relating to Wireless from NETWORKING [GENERAL],
making their handling consistent with other files related to
Wireless.
The aim of this is to make things more consistent. And for MAINTAINERS
to better reflect the situation on the ground. I am more than happy to
be told that the current state of affairs is fine. Or for other ideas to
be discussed.
v1: https://lore.kernel.org/
20241004-maint-net-hdrs-v1-0-
41fd555aacc5@kernel.org
====================
Link: https://patch.msgid.link/20241009-maint-net-hdrs-v2-0-f2c86e7309c8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simon Horman [Wed, 9 Oct 2024 08:47:23 +0000 (09:47 +0100)]
MAINTAINERS: Add headers and mailing list to UDP section
Add netdev mailing list and some more udp.h headers to the UDP section.
This is now more consistent with the TCP section.
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241009-maint-net-hdrs-v2-2-f2c86e7309c8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simon Horman [Wed, 9 Oct 2024 08:47:22 +0000 (09:47 +0100)]
MAINTAINERS: consistently exclude wireless files from NETWORKING [GENERAL]
We already exclude wireless drivers from the netdev@ traffic, to
delegate it to linux-wireless@, and avoid overwhelming netdev@.
Many of the following wireless-related sections MAINTAINERS
are already not included in the NETWORKING [GENERAL] section.
For consistency, exclude those that are.
* 802.11 (including CFG80211/NL80211)
* MAC80211
* RFKILL
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241009-maint-net-hdrs-v2-1-f2c86e7309c8@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 9 Oct 2024 09:11:32 +0000 (09:11 +0000)]
slip: make slhc_remember() more robust against malicious packets
syzbot found that slhc_remember() was missing checks against
malicious packets [1].
slhc_remember() only checked the size of the packet was at least 20,
which is not good enough.
We need to make sure the packet includes the IPv4 and TCP header
that are supposed to be carried.
Add iph and th pointers to make the code more readable.
[1]
BUG: KMSAN: uninit-value in slhc_remember+0x2e8/0x7b0 drivers/net/slip/slhc.c:666
slhc_remember+0x2e8/0x7b0 drivers/net/slip/slhc.c:666
ppp_receive_nonmp_frame+0xe45/0x35e0 drivers/net/ppp/ppp_generic.c:2455
ppp_receive_frame drivers/net/ppp/ppp_generic.c:2372 [inline]
ppp_do_recv+0x65f/0x40d0 drivers/net/ppp/ppp_generic.c:2212
ppp_input+0x7dc/0xe60 drivers/net/ppp/ppp_generic.c:2327
pppoe_rcv_core+0x1d3/0x720 drivers/net/ppp/pppoe.c:379
sk_backlog_rcv+0x13b/0x420 include/net/sock.h:1113
__release_sock+0x1da/0x330 net/core/sock.c:3072
release_sock+0x6b/0x250 net/core/sock.c:3626
pppoe_sendmsg+0x2b8/0xb90 drivers/net/ppp/pppoe.c:903
sock_sendmsg_nosec net/socket.c:729 [inline]
__sock_sendmsg+0x30f/0x380 net/socket.c:744
____sys_sendmsg+0x903/0xb60 net/socket.c:2602
___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656
__sys_sendmmsg+0x3c1/0x960 net/socket.c:2742
__do_sys_sendmmsg net/socket.c:2771 [inline]
__se_sys_sendmmsg net/socket.c:2768 [inline]
__x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768
x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Uninit was created at:
slab_post_alloc_hook mm/slub.c:4091 [inline]
slab_alloc_node mm/slub.c:4134 [inline]
kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4186
kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:587
__alloc_skb+0x363/0x7b0 net/core/skbuff.c:678
alloc_skb include/linux/skbuff.h:1322 [inline]
sock_wmalloc+0xfe/0x1a0 net/core/sock.c:2732
pppoe_sendmsg+0x3a7/0xb90 drivers/net/ppp/pppoe.c:867
sock_sendmsg_nosec net/socket.c:729 [inline]
__sock_sendmsg+0x30f/0x380 net/socket.c:744
____sys_sendmsg+0x903/0xb60 net/socket.c:2602
___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656
__sys_sendmmsg+0x3c1/0x960 net/socket.c:2742
__do_sys_sendmmsg net/socket.c:2771 [inline]
__se_sys_sendmmsg net/socket.c:2768 [inline]
__x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768
x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
CPU: 0 UID: 0 PID: 5460 Comm: syz.2.33 Not tainted
6.12.0-rc2-syzkaller-00006-g87d6aab2389e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Fixes:
b5451d783ade ("slip: Move the SLIP drivers")
Reported-by: syzbot+2ada1bc857496353be5a@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/
670646db.
050a0220.3f80e.0027.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241009091132.2136321-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simon Horman [Wed, 9 Oct 2024 10:05:21 +0000 (11:05 +0100)]
net/smc: Address spelling errors
Address spelling errors flagged by codespell.
This patch is intended to cover all files under drivers/smc
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Link: https://patch.msgid.link/20241009-smc-starspell-v1-1-b8b395bbaf82@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
D. Wythe [Wed, 9 Oct 2024 06:55:16 +0000 (14:55 +0800)]
net/smc: fix lacks of icsk_syn_mss with IPPROTO_SMC
Eric report a panic on IPPROTO_SMC, and give the facts
that when INET_PROTOSW_ICSK was set, icsk->icsk_sync_mss must be set too.
Bug: Unable to handle kernel NULL pointer dereference at virtual address
0000000000000000
Mem abort info:
ESR = 0x0000000086000005
EC = 0x21: IABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x05: level 1 translation fault
user pgtable: 4k pages, 48-bit VAs, pgdp=
00000001195d1000
[
0000000000000000] pgd=
0800000109c46003, p4d=
0800000109c46003,
pud=
0000000000000000
Internal error: Oops:
0000000086000005 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 UID: 0 PID: 8037 Comm: syz.3.265 Not tainted
6.11.0-rc7-syzkaller-g5f5673607153 #0
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 08/06/2024
pstate:
80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : 0x0
lr : cipso_v4_sock_setattr+0x2a8/0x3c0 net/ipv4/cipso_ipv4.c:1910
sp :
ffff80009b887a90
x29:
ffff80009b887aa0 x28:
ffff80008db94050 x27:
0000000000000000
x26:
1fffe0001aa6f5b3 x25:
dfff800000000000 x24:
ffff0000db75da00
x23:
0000000000000000 x22:
ffff0000d8b78518 x21:
0000000000000000
x20:
ffff0000d537ad80 x19:
ffff0000d8b78000 x18:
1fffe000366d79ee
x17:
ffff8000800614a8 x16:
ffff800080569b84 x15:
0000000000000001
x14:
000000008b336894 x13:
00000000cd96feaa x12:
0000000000000003
x11:
0000000000040000 x10:
00000000000020a3 x9 :
1fffe0001b16f0f1
x8 :
0000000000000000 x7 :
0000000000000000 x6 :
000000000000003f
x5 :
0000000000000040 x4 :
0000000000000001 x3 :
0000000000000000
x2 :
0000000000000002 x1 :
0000000000000000 x0 :
ffff0000d8b78000
Call trace:
0x0
netlbl_sock_setattr+0x2e4/0x338 net/netlabel/netlabel_kapi.c:1000
smack_netlbl_add+0xa4/0x154 security/smack/smack_lsm.c:2593
smack_socket_post_create+0xa8/0x14c security/smack/smack_lsm.c:2973
security_socket_post_create+0x94/0xd4 security/security.c:4425
__sock_create+0x4c8/0x884 net/socket.c:1587
sock_create net/socket.c:1622 [inline]
__sys_socket_create net/socket.c:1659 [inline]
__sys_socket+0x134/0x340 net/socket.c:1706
__do_sys_socket net/socket.c:1720 [inline]
__se_sys_socket net/socket.c:1718 [inline]
__arm64_sys_socket+0x7c/0x94 net/socket.c:1718
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151
el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
Code: ???????? ???????? ???????? ???????? (????????)
---[ end trace
0000000000000000 ]---
This patch add a toy implementation that performs a simple return to
prevent such panic. This is because MSS can be set in sock_create_kern
or smc_setsockopt, similar to how it's done in AF_SMC. However, for
AF_SMC, there is currently no way to synchronize MSS within
__sys_connect_file. This toy implementation lays the groundwork for us
to support such feature for IPPROTO_SMC in the future.
Fixes:
d25a92ccae6b ("net/smc: Introduce IPPROTO_SMC")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Link: https://patch.msgid.link/1728456916-67035-1-git-send-email-alibuda@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 9 Oct 2024 18:58:02 +0000 (18:58 +0000)]
ppp: fix ppp_async_encode() illegal access
syzbot reported an issue in ppp_async_encode() [1]
In this case, pppoe_sendmsg() is called with a zero size.
Then ppp_async_encode() is called with an empty skb.
BUG: KMSAN: uninit-value in ppp_async_encode drivers/net/ppp/ppp_async.c:545 [inline]
BUG: KMSAN: uninit-value in ppp_async_push+0xb4f/0x2660 drivers/net/ppp/ppp_async.c:675
ppp_async_encode drivers/net/ppp/ppp_async.c:545 [inline]
ppp_async_push+0xb4f/0x2660 drivers/net/ppp/ppp_async.c:675
ppp_async_send+0x130/0x1b0 drivers/net/ppp/ppp_async.c:634
ppp_channel_bridge_input drivers/net/ppp/ppp_generic.c:2280 [inline]
ppp_input+0x1f1/0xe60 drivers/net/ppp/ppp_generic.c:2304
pppoe_rcv_core+0x1d3/0x720 drivers/net/ppp/pppoe.c:379
sk_backlog_rcv+0x13b/0x420 include/net/sock.h:1113
__release_sock+0x1da/0x330 net/core/sock.c:3072
release_sock+0x6b/0x250 net/core/sock.c:3626
pppoe_sendmsg+0x2b8/0xb90 drivers/net/ppp/pppoe.c:903
sock_sendmsg_nosec net/socket.c:729 [inline]
__sock_sendmsg+0x30f/0x380 net/socket.c:744
____sys_sendmsg+0x903/0xb60 net/socket.c:2602
___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656
__sys_sendmmsg+0x3c1/0x960 net/socket.c:2742
__do_sys_sendmmsg net/socket.c:2771 [inline]
__se_sys_sendmmsg net/socket.c:2768 [inline]
__x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768
x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Uninit was created at:
slab_post_alloc_hook mm/slub.c:4092 [inline]
slab_alloc_node mm/slub.c:4135 [inline]
kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4187
kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:587
__alloc_skb+0x363/0x7b0 net/core/skbuff.c:678
alloc_skb include/linux/skbuff.h:1322 [inline]
sock_wmalloc+0xfe/0x1a0 net/core/sock.c:2732
pppoe_sendmsg+0x3a7/0xb90 drivers/net/ppp/pppoe.c:867
sock_sendmsg_nosec net/socket.c:729 [inline]
__sock_sendmsg+0x30f/0x380 net/socket.c:744
____sys_sendmsg+0x903/0xb60 net/socket.c:2602
___sys_sendmsg+0x28d/0x3c0 net/socket.c:2656
__sys_sendmmsg+0x3c1/0x960 net/socket.c:2742
__do_sys_sendmmsg net/socket.c:2771 [inline]
__se_sys_sendmmsg net/socket.c:2768 [inline]
__x64_sys_sendmmsg+0xbc/0x120 net/socket.c:2768
x64_sys_call+0xb6e/0x3ba0 arch/x86/include/generated/asm/syscalls_64.h:308
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
CPU: 1 UID: 0 PID: 5411 Comm: syz.1.14 Not tainted
6.12.0-rc1-syzkaller-00165-g360c1f1f24c6 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+1d121645899e7692f92a@syzkaller.appspotmail.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241009185802.3763282-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simon Horman [Wed, 9 Oct 2024 09:12:19 +0000 (10:12 +0100)]
docs: netdev: document guidance on cleanup patches
The purpose of this section is to document what is the current practice
regarding clean-up patches which address checkpatch warnings and similar
problems. I feel there is a value in having this documented so others
can easily refer to it.
Clearly this topic is subjective. And to some extent the current
practice discourages a wider range of patches than is described here.
But I feel it is best to start somewhere, with the most well established
part of the current practice.
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241009-doc-mc-clean-v2-1-e637b665fa81@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 10 Oct 2024 15:31:06 +0000 (08:31 -0700)]
Merge branch 'net-introduce-tx-h-w-shaping-api'
Paolo Abeni says:
====================
net: introduce TX H/W shaping API
We have a plurality of shaping-related drivers API, but none flexible
enough to meet existing demand from vendors[1].
This series introduces new device APIs to configure in a flexible way
TX H/W shaping. The new functionalities are exposed via a newly
defined generic netlink interface and include introspection
capabilities. Some self-tests are included, on top of a dummy
netdevsim implementation. Finally a basic implementation for the iavf
driver is provided.
Some usage examples:
* Configure shaping on a given queue:
./tools/net/ynl/cli.py --spec Documentation/netlink/specs/shaper.yaml \
--do set --json '{"ifindex": '$IFINDEX',
"shaper": {"handle":
{"scope": "queue", "id":'$QUEUEID'},
"bw-max":
2000000}}'
* Container B/W sharing
The orchestration infrastructure wants to group the
container-related queues under a RR scheduling and limit the aggregate
bandwidth:
./tools/net/ynl/cli.py --spec Documentation/netlink/specs/shaper.yaml \
--do group --json '{"ifindex": '$IFINDEX',
"leaves": [
{"handle": {"scope": "queue", "id":'$QID1'},
"weight": '$W1'},
{"handle": {"scope": "queue", "id":'$QID2'},
"weight": '$W2'}],
{"handle": {"scope": "queue", "id":'$QID3'},
"weight": '$W3'}],
"handle": {"scope":"node"},
"bw-max":
10000000}'
{'ifindex': $IFINDEX, 'handle': {'scope': 'node', 'id': 0}}
Q1 \
\
Q2 -- node 0 ------- netdev
/ (bw-max: 10M)
Q3 /
* Delegation
A containers wants to limit the aggregate B/W bandwidth of 2 of the 3
queues it owns - the starting configuration is the one from the
previous point:
SPEC=Documentation/netlink/specs/net_shaper.yaml
./tools/net/ynl/cli.py --spec $SPEC \
--do group --json '{"ifindex": '$IFINDEX',
"leaves": [
{"handle": {"scope": "queue", "id":'$QID1'},
"weight": '$W1'},
{"handle": {"scope": "queue", "id":'$QID2'},
"weight": '$W2'}],
"handle": {"scope": "node"},
"bw-max":
5000000 }'
{'ifindex': $IFINDEX, 'handle': {'scope': 'node', 'id': 1}}
Q1 -- node 1 --------\
/ (bw-max: 5M) \
Q2 / node 0 ------- netdev
/(bw-max: 10M)
Q3 ------------------/
In a group operation, when prior to the op itself, the leaves have
different parents, the user must specify the parent handle for the
group. I.e., starting from the previous config:
./tools/net/ynl/cli.py --spec $SPEC \
--do group --json '{"ifindex": '$IFINDEX',
"leaves": [
{"handle": {"scope": "queue", "id":'$QID1'},
"weight": '$W1'},
{"handle": {"scope": "queue", "id":'$QID3'},
"weight": '$W3'}],
"handle": {"scope": "node"},
"bw-max":
3000000 }'
Netlink error: Invalid argument
nl_len = 96 (80) nl_flags = 0x300 nl_type = 2
error: -22
extack: {'msg': 'All the leaves shapers must have the same old parent'}
./tools/net/ynl/cli.py --spec $SPEC \
--do group --json '{"ifindex": '$IFINDEX',
"leaves": [
{"handle": {"scope": "queue", "id":'$QID1'},
"weight": '$W1'},
{"handle": {"scope": "queue", "id":'$QID3'},
"weight": '$W3'}],
"handle": {"scope": "node"},
"parent": {"scope": "node", "id": 1},
"bw-max":
3000000 }
{'ifindex': $IFINDEX, 'handle': {'scope': 'node', 'id': 2}}
Q1 -- node 2 ---
/(bw-max:3M)\
Q3 / \
---- node 1 \
/ (bw-max: 5M)\
Q2 node 0 ------- netdev
(bw-max: 10M)
* Cleanup:
Still starting from config 1To delete a single queue shaper
./tools/net/ynl/cli.py --spec $SPEC --do delete --json \
'{"ifindex": '$IFINDEX',
"handle": {"scope": "queue", "id":'$QID3'}}'
Q1 -- node 2 ---
(bw-max:3M)\
\
---- node 1 \
/ (bw-max: 5M)\
Q2 node 0 ------- netdev
(bw-max: 10M)
Deleting a node shaper relinks all its leaves to the node's parent:
./tools/net/ynl/cli.py --spec $SPEC --do delete --json \
'{"ifindex": '$IFINDEX',
"handle": {"scope": "node", "id":2}}'
Q1 ---\
\
node 1----- \
/ (bw-max: 5M)\
Q2----/ node 0 ------- netdev
(bw-max: 10M)
Deleting the last shaper under a node shaper deletes the node, too:
./tools/net/ynl/cli.py --spec $SPEC --do delete --json \
'{"ifindex": '$IFINDEX',
"handle": {"scope": "queue", "id":'$QID1'}}'
./tools/net/ynl/cli.py --spec $SPEC --do delete --json \
'{"ifindex": '$IFINDEX',
"handle": {"scope": "queue", "id":'$QID2'}}'
./tools/net/ynl/cli.py --spec $SPEC --do get --json \
'{"ifindex": '$IFINDEX',
"handle": {"scope": "node", "id": 1}}'
Netlink error: No such file or directory
nl_len = 44 (28) nl_flags = 0x300 nl_type = 2
error: -2
extack: {'bad-attr': '.handle'}
Such delete recurses on parents that are left over with no leaves:
./tools/net/ynl/cli.py --spec $SPEC --do get --json \
'{"ifindex": '$IFINDEX',
"handle": {"scope": "node", "id": 0}}'
Netlink error: No such file or directory
nl_len = 44 (28) nl_flags = 0x300 nl_type = 2
error: -2
extack: {'bad-attr': '.handle'}
v8: https://lore.kernel.org/cover.
1727704215.git.pabeni@redhat.com
v7: https://lore.kernel.org/cover.
1725919039.git.pabeni@redhat.com
v6: https://lore.kernel.org/cover.
1725457317.git.pabeni@redhat.com
v5: https://lore.kernel.org/cover.
1724944116.git.pabeni@redhat.com
v4: https://lore.kernel.org/cover.
1724165948.git.pabeni@redhat.com
v3: https://lore.kernel.org/cover.
1722357745.git.pabeni@redhat.com
RFC v2: https://lore.kernel.org/cover.
1721851988.git.pabeni@redhat.com
RFC v1: https://lore.kernel.org/cover.
1719518113.git.pabeni@redhat.com
====================
Link: https://patch.msgid.link/cover.1728460186.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>