linux-2.6-block.git
5 months agoMAINTAINERS: net: Add Oleksij to pse-pd maintainers
Kory Maincent (Dent Project) [Sun, 14 Apr 2024 14:21:50 +0000 (16:21 +0200)]
MAINTAINERS: net: Add Oleksij to pse-pd maintainers

Oleksij was the first to add support for pse-pd net subsystem.
Add himself to the maintainers seems logical.

Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20240414-feature_poe-v8-1-e4bf1e860da5@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: drv-net: add config for netdevsim
Jakub Kicinski [Tue, 16 Apr 2024 00:45:52 +0000 (17:45 -0700)]
selftests: drv-net: add config for netdevsim

Real driver testing will obviously require enabling more
options, but will require more manual setup in the first
place. For CIs running purely software tests we need
to enable netdevsim.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240416004556.1618804-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: drv-net: add stdout to the command failed exception
Jakub Kicinski [Tue, 16 Apr 2024 00:45:51 +0000 (17:45 -0700)]
selftests: drv-net: add stdout to the command failed exception

ping prints all the info to stdout. To make debug easier capture
stdout in the Exception raised when command unexpectedly fails.

Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240416004556.1618804-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoip6_vti: fix memleak on netns dismantle
Florian Westphal [Mon, 15 Apr 2024 12:23:44 +0000 (14:23 +0200)]
ip6_vti: fix memleak on netns dismantle

kmemleak reports net_device resources are no longer released, restore
needs_free_netdev toggle.  Sample backtrace:

unreferenced object 0xffff88810874f000 (size 4096): [..]
    [<00000000a2b8af8b>] __kmalloc_node+0x209/0x290
    [<0000000040b0a1a9>] alloc_netdev_mqs+0x58/0x470
    [<00000000b4be1e78>] vti6_init_net+0x94/0x230
    [<000000008830c1ea>] ops_init+0x32/0xc0
    [<000000006a26fa8f>] setup_net+0x134/0x2e0
[..]

Cc: Breno Leitao <leitao@debian.org>
Fixes: a9b2d55a8f1e ("ip6_vti: Do not use custom stat allocator")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240415122346.26503-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agodt-bindings: net: nxp,dwmac-imx: allow nvmem cells property
Peng Fan [Mon, 15 Apr 2024 10:36:21 +0000 (18:36 +0800)]
dt-bindings: net: nxp,dwmac-imx: allow nvmem cells property

Allow nvmem-cells and nvmem-cell-names to get mac_address from onchip
fuse.

Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20240415103621.1644735-1-peng.fan@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: ipa: Remove unnecessary print function dev_err()
Jiapeng Chong [Mon, 15 Apr 2024 03:14:56 +0000 (11:14 +0800)]
net: ipa: Remove unnecessary print function dev_err()

The print function dev_err() is redundant because
platform_get_irq_byname() already prints an error.

./drivers/net/ipa/ipa_interrupt.c:300:2-9: line 300 is redundant because platform_get_irq() already prints an error.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=8756
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240415031456.10805-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet/handshake: remove redundant assignment to variable ret
Colin Ian King [Mon, 15 Apr 2024 10:07:13 +0000 (11:07 +0100)]
net/handshake: remove redundant assignment to variable ret

The variable is being assigned an value and then is being re-assigned
a new value in the next statement. The assignment is redundant and can
be removed.

Cleans up clang scan build warning:
net/handshake/tlshd.c:216:2: warning: Value stored to 'ret' is never
read [deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/20240415100713.483399-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: dsa: microchip: drop unneeded MODULE_ALIAS
Krzysztof Kozlowski [Sun, 14 Apr 2024 15:49:29 +0000 (17:49 +0200)]
net: dsa: microchip: drop unneeded MODULE_ALIAS

The ID table already has respective entry and MODULE_DEVICE_TABLE and
creates proper alias for SPI driver.  Having another MODULE_ALIAS causes
the alias to be duplicated.

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20240414154929.127045-1-krzk@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoaf_unix: Try not to hold unix_gc_lock during accept().
Kuniyuki Iwashima [Sat, 13 Apr 2024 02:19:28 +0000 (19:19 -0700)]
af_unix: Try not to hold unix_gc_lock during accept().

Commit dcf70df2048d ("af_unix: Fix up unix_edge.successor for embryo
socket.") added spin_lock(&unix_gc_lock) in accept() path, and it
caused regression in a stress test as reported by kernel test robot.

If the embryo socket is not part of the inflight graph, we need not
hold the lock.

To decide that in O(1) time and avoid the regression in the normal
use case,

  1. add a new stat unix_sk(sk)->scm_stat.nr_unix_fds

  2. count the number of inflight AF_UNIX sockets in the receive
     queue under unix_state_lock()

  3. move unix_update_edges() call under unix_state_lock()

  4. avoid locking if nr_unix_fds is 0 in unix_update_edges()

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202404101427.92a08551-oliver.sang@intel.com
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240413021928.20946-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next...
Paolo Abeni [Tue, 16 Apr 2024 11:11:29 +0000 (13:11 +0200)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/next-queue

Tony Nguyen says:

====================

This series contains updates to ice driver only.

Lukasz removes unnecessary argument from ice_fdir_comp_rules().

Jakub adds support for ethtool 'ether' flow-type rules.

Jake moves setting of VF MSI-X value to initialization function and adds
tracking of VF relative MSI-X index.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: store VF relative MSI-X index in q_vector->vf_reg_idx
  ice: set vf->num_msix in ice_initialize_vf_entry()
  ice: Implement 'flow-type ether' rules
  ice: Remove unnecessary argument from ice_fdir_comp_rules()
====================

Link: https://lore.kernel.org/r/20240412210534.916756-1-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agoMerge branch 'selftests-assortment-of-fixes'
Paolo Abeni [Tue, 16 Apr 2024 10:14:44 +0000 (12:14 +0200)]
Merge branch 'selftests-assortment-of-fixes'

Petr Machata says:

====================
selftests: Assortment of fixes

This is a loose follow-up to the Kernel CI patchset posted recently. It
contains various fixes that were supposed to be part of said patchset, but
didn't fit due to its size. The latter 4 patches were written independently
of the CI effort, but again didn't fit in their intended patchsets.

- Patch #1 unifies code of two very similar looking functions, busywait()
  and slowwait().

- Patch #2 adds sanity checks around the setting of NETIFS, which carries
  list of interfaces to run on.

- Patch #3 changes bail_on_lldpad() to SKIP instead of FAILing.

- Patches #4 to #7 fix issues in selftests.

- Patches #8 to #10 add topology diagrams to several selftests.
  This should have been part of the mlxsw leg of NH group stats patches,
  but again, it did not fit in due to size.
====================

Link: https://lore.kernel.org/r/cover.1712940759.git.petrm@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agoselftests: forwarding: router_nh: Add a diagram
Petr Machata [Fri, 12 Apr 2024 17:03:13 +0000 (19:03 +0200)]
selftests: forwarding: router_nh: Add a diagram

This test lacks a topology diagram, making the setup not obvious.
Add one.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agoselftests: forwarding: router_mpath_nh_res: Add a diagram
Petr Machata [Fri, 12 Apr 2024 17:03:12 +0000 (19:03 +0200)]
selftests: forwarding: router_mpath_nh_res: Add a diagram

This test lacks a topology diagram, making the setup not obvious.
Add one.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agoselftests: forwarding: router_mpath_nh: Add a diagram
Petr Machata [Fri, 12 Apr 2024 17:03:11 +0000 (19:03 +0200)]
selftests: forwarding: router_mpath_nh: Add a diagram

This test lacks a topology diagram, making the setup not obvious.
Add one.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agoselftests: mlxsw: ethtool_lanes: Wait for lanes parameter dump explicitly
Danielle Ratson [Fri, 12 Apr 2024 17:03:10 +0000 (19:03 +0200)]
selftests: mlxsw: ethtool_lanes: Wait for lanes parameter dump explicitly

The ethtool dump includes the lanes parameter only when the port is up.
Therefore, the ethtool_lanes.sh test waits for ports to come before testing
the lanes parameter.

In some cases, the test considers the port as up, but the lanes parameter
is not yet dumped although assumed to be, resulting in ethtool_lanes.sh
test failure.

To avoid that, ensure that the lanes parameter is indeed dumped by waiting
for it explicitly, before preforming the test cases.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agoselftests: drivers: hw: Include tc_common.sh in hw_stats_l3
Petr Machata [Fri, 12 Apr 2024 17:03:09 +0000 (19:03 +0200)]
selftests: drivers: hw: Include tc_common.sh in hw_stats_l3

The tests use the constant TC_HIT_TIMEOUT when waiting on the counter
values. However it does not include tc_common.sh where the counter is
specified. The test has been robust in our testing, which means the counter
is bumped quickly enough that the updated value is available already on the
first iteration. Nevertheless it's not correct. Include tc_common.sh as
appropriate.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agoselftests: drivers: hw: ethtool.sh: Adjust output
Petr Machata [Fri, 12 Apr 2024 17:03:08 +0000 (19:03 +0200)]
selftests: drivers: hw: ethtool.sh: Adjust output

Some log_test calls are done in a loop, and lead to the same log output.
This might prove tricky to deduplicate for automated tools. Instead, roll
the unique information from log_info to log_test, and drop the log_info.
This also leads to more compact and clearer output.

This change prompts rewording the messages so that they are not excessively
long.

Some check_err messages do not indicate what the issue actually is, so
reword them to say it's a "ping with", like is the case in some other
instances in this test.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agoselftests: drivers: hw: Fix ethtool_rmon
Petr Machata [Fri, 12 Apr 2024 17:03:07 +0000 (19:03 +0200)]
selftests: drivers: hw: Fix ethtool_rmon

When rx-pktsNtoM reports a range that involves very low-valued range, such
as 0-64, the calculated length of the packet will be -4, because FCS is
subtracted from the value. mausezahn then confuses the value for an option
and bails out. As a result, the test dumps many mausezahn error messages.

Instead, cap the value at 0. mausezahn will use an appropriate minimum
packet length.

Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
Cc: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agoselftests: forwarding: bail_on_lldpad() should SKIP
Petr Machata [Fri, 12 Apr 2024 17:03:06 +0000 (19:03 +0200)]
selftests: forwarding: bail_on_lldpad() should SKIP

$ksft_skip is used to mark selftests that have tooling issues. The fact
that LLDPad is running, but shouldn't, is one such issue. Therefore have
bail_on_lldpad() bail with $ksft_skip.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Benjamin Poirier <bpoirier@nvidia.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agoselftests: forwarding: lib.sh: Validate NETIFS
Petr Machata [Fri, 12 Apr 2024 17:03:05 +0000 (19:03 +0200)]
selftests: forwarding: lib.sh: Validate NETIFS

The variable should contain at least NUM_NETIFS interfaces, stored
as keys named "p$i", for i in `seq $NUM_NETIFS`.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Benjamin Poirier <bpoirier@nvidia.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agoselftests: net: Unify code of busywait() and slowwait()
Petr Machata [Fri, 12 Apr 2024 17:03:04 +0000 (19:03 +0200)]
selftests: net: Unify code of busywait() and slowwait()

Bodies of busywait() and slowwait() functions are almost identical. Extract
the common code into a helper, loopy_wait, and convert busywait() and
slowwait() into trivial wrappers.

Moreover, the fact that slowwait() uses seconds for units is really not
intuitive, and the comment does not help much. Instead make the unit part
of the name of the argument to further clarify what units are expected.

Cc: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Benjamin Poirier <bpoirier@nvidia.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agonet: dsa: mt7530: provide own phylink MAC operations
Russell King (Oracle) [Fri, 12 Apr 2024 15:15:34 +0000 (16:15 +0100)]
net: dsa: mt7530: provide own phylink MAC operations

Convert mt753x to provide its own phylink MAC operations, thus avoiding
the shim layer in DSA's port.c

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Link: https://lore.kernel.org/r/E1rvIco-006bQu-Fq@rmk-PC.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agonet: dsa: lantiq_gswip: provide own phylink MAC operations
Russell King (Oracle) [Fri, 12 Apr 2024 15:15:29 +0000 (16:15 +0100)]
net: dsa: lantiq_gswip: provide own phylink MAC operations

Convert lantiq_gswip to provide its own phylink MAC operations, thus
avoiding the shim layer in DSA's port.c. For lantiq_gswip, it means
we end up with a common instance of phylink MAC operations that are
shared between the different variants, rather than having duplicated
initialisers in dsa_switch_ops.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/E1rvIcj-006bQo-B3@rmk-PC.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agonet: dsa: qca8k: provide own phylink MAC operations
Russell King (Oracle) [Fri, 12 Apr 2024 15:15:24 +0000 (16:15 +0100)]
net: dsa: qca8k: provide own phylink MAC operations

Convert qca8k to provide its own phylink MAC operations, thus
avoiding the shim layer in DSA's port.c.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/E1rvIce-006bQi-58@rmk-PC.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agonet: dsa: ar9331: provide own phylink MAC operations
Russell King (Oracle) [Fri, 12 Apr 2024 15:15:19 +0000 (16:15 +0100)]
net: dsa: ar9331: provide own phylink MAC operations

Convert ar9331 to provide its own phylink MAC operations, thus
avoiding the shim layer in DSA's port.c.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/E1rvIcZ-006bQc-0W@rmk-PC.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agonet: dsa: sja1105: provide own phylink MAC operations
Russell King (Oracle) [Fri, 12 Apr 2024 15:15:13 +0000 (16:15 +0100)]
net: dsa: sja1105: provide own phylink MAC operations

Convert sja1105 to provide its own phylink MAC operations, thus
avoiding the shim layer in DSA's port.c

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/E1rvIcT-006bQW-S3@rmk-PC.armlinux.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 months agoMerge branch 'selftests-net-exercise-page-pool-reporting-via-netlink'
Jakub Kicinski [Mon, 15 Apr 2024 18:21:15 +0000 (11:21 -0700)]
Merge branch 'selftests-net-exercise-page-pool-reporting-via-netlink'

Jakub Kicinski says:

====================
selftests: net: exercise page pool reporting via netlink

Add a basic test for page pool netlink reporting.

v1: https://lore.kernel.org/all/20240411012815.174400-1-kuba@kernel.org/
====================

Link: https://lore.kernel.org/r/20240412141436.828666-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: net: exercise page pool reporting via netlink
Jakub Kicinski [Fri, 12 Apr 2024 14:14:36 +0000 (07:14 -0700)]
selftests: net: exercise page pool reporting via netlink

Add a Python test for the basic ops.

  # ./net/nl_netdev.py
  KTAP version 1
  1..3
  ok 1 nl_netdev.empty_check
  ok 2 nl_netdev.lo_check
  ok 3 nl_netdev.page_pool_check
  # Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0

Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/20240412141436.828666-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: net: support use of NetdevSimDev under "with" in python
Jakub Kicinski [Fri, 12 Apr 2024 14:14:35 +0000 (07:14 -0700)]
selftests: net: support use of NetdevSimDev under "with" in python

Using "with" on an entire driver test env is supported already,
but it's also useful to use "with" on an individual nsim.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/20240412141436.828666-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: net: print full exception on failure
Jakub Kicinski [Fri, 12 Apr 2024 14:14:34 +0000 (07:14 -0700)]
selftests: net: print full exception on failure

Instead of a summary line print the full exception.
This makes debugging Python tests much easier.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/20240412141436.828666-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: net: print report check location in python tests
Jakub Kicinski [Fri, 12 Apr 2024 14:14:33 +0000 (07:14 -0700)]
selftests: net: print report check location in python tests

Developing Python tests is a bit annoying because when test fails
we only print the fail message and no info about which exact check
led to it. Print the location (the first line of this example is new):

  # At /root/ksft-net-drv/./net/nl_netdev.py line 38:
  # Check failed 0 != 10
  not ok 3 nl_netdev.page_pool_check

Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/20240412141436.828666-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agotools: ynl: don't return None for dumps
Jakub Kicinski [Fri, 12 Apr 2024 14:14:32 +0000 (07:14 -0700)]
tools: ynl: don't return None for dumps

YNL currently reports None for empty dump:

 $ cli.py ...netdev.yaml --dump page-pool-get
 None

This doesn't matter for the CLI but when writing YNL based tests
having to deal with either list or None is annoying. Limit the
None conversion to non-dump ops:

 $ cli.py ...netdev.yaml --dump page-pool-get
 []

Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240412141436.828666-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: netdevsim: add some fake page pool use
Jakub Kicinski [Fri, 12 Apr 2024 14:14:31 +0000 (07:14 -0700)]
net: netdevsim: add some fake page pool use

Add very basic page pool use so that we can exercise
the netlink uAPI in a selftest.

Page pool gets created on open, destroyed on close.
But we control allocating of a single page thru debugfs.
This page may survive past the page pool itself so that
we can test orphaned page pools.

Link: https://lore.kernel.org/r/20240412141436.828666-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge branch 'net-dqs-optimize-if-stall-threshold-is-not-set'
Jakub Kicinski [Mon, 15 Apr 2024 18:19:58 +0000 (11:19 -0700)]
Merge branch 'net-dqs-optimize-if-stall-threshold-is-not-set'

Breno Leitao says:

====================
net: dqs: optimize if stall threshold is not set

Here are four patches aimed at enhancing the Dynamic Queue Limit (DQL)
subsystem within the networking stack.

The first two commits involve code refactoring, while the third patch
introduces the actual change. The fourth patch just improves the cache
locality.

Typically, when DQL is enabled, stall information is always populated
through dql_queue_stall(). However, this information is only necessary
if a stall threshold is set, which is stored in struct dql->stall_thrs.

Although dql_queue_stall() is relatively inexpensive, it is not entirely
free due to memory barriers and similar overheads.

To optimize performance, refrain from calling dql_queue_stall() when no
stall threshold is set, thus avoiding the processing of unnecessary
information.

v1: https://lore.kernel.org/all/20240404145939.3601097-1-leitao@debian.org/
====================

Link: https://lore.kernel.org/r/20240411192241.2498631-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: dqs: make struct dql more cache efficient
Breno Leitao [Thu, 11 Apr 2024 19:22:32 +0000 (12:22 -0700)]
net: dqs: make struct dql more cache efficient

With the previous change, struct dqs->stall_thrs will be in the hot path
(at queue side), even if DQS is disabled.

The other fields accessed in this function (last_obj_cnt and num_queued)
are in the first cache line, let's move this field  (stall_thrs) to the
very first cache line, since there is a hole there.

This does not change the structure size, since it moves an short (2
bytes) to 4-bytes whole in the first cache line.

This is the new structure format now:

struct dql {
unsigned int    num_queued;
unsigned int    last_obj_cnt;
...
short unsigned int    stall_thrs;
/* XXX 2 bytes hole, try to pack */
...
/* --- cacheline 1 boundary (64 bytes) --- */
...
  /* Longest stall detected, reported to user */
short unsigned int         stall_max;
/* XXX 2 bytes hole, try to pack */
};

Also, read the stall_thrs (now in the very first cache line) earlier,
together with dql->num_queued (also in the first cache line).

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240411192241.2498631-5-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: dql: Optimize stall information population
Breno Leitao [Thu, 11 Apr 2024 19:22:31 +0000 (12:22 -0700)]
net: dql: Optimize stall information population

When Dynamic Queue Limit (DQL) is set, it always populate stall
information through dql_queue_stall().  However, this information is
only necessary if a stall threshold is set, stored in struct
dql->stall_thrs.

dql_queue_stall() is cheap, but not free, since it does have memory
barriers and so forth.

Do not call dql_queue_stall() if there is no stall threshold set, and
save some CPU cycles.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240411192241.2498631-4-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: dql: Separate queue function responsibilities
Breno Leitao [Thu, 11 Apr 2024 19:22:30 +0000 (12:22 -0700)]
net: dql: Separate queue function responsibilities

The dql_queued() function currently handles both queuing object counts
and populating bitmaps for reporting stalls.

This commit splits the bitmap population into a separate function,
allowing for conditional invocation in scenarios where the feature is
disabled.

This refactor maintains functionality while improving code
organization.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240411192241.2498631-3-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: dql: Avoid calling BUG() when WARN() is enough
Breno Leitao [Thu, 11 Apr 2024 19:22:29 +0000 (12:22 -0700)]
net: dql: Avoid calling BUG() when WARN() is enough

If the dql_queued() function receives an invalid argument, WARN about it
and continue, instead of crashing the kernel.

This was raised by checkpatch, when I am refactoring this code (see
following patch/commit)

WARNING: Do not crash the kernel unless it is absolutely unavoidable--use WARN_ON_ONCE() plus recovery code (if feasible) instead of BUG() or variants

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240411192241.2498631-2-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge branch 'cpsw-xdp'
David S. Miller [Mon, 15 Apr 2024 12:18:18 +0000 (13:18 +0100)]
Merge branch 'cpsw-xdp'

Julien Panis says:

====================
Add minimal XDP support to TI AM65 CPSW Ethernet driver

This patch adds XDP support to TI AM65 CPSW Ethernet driver.

The following features are implemented: NETDEV_XDP_ACT_BASIC,
NETDEV_XDP_ACT_REDIRECT, and NETDEV_XDP_ACT_NDO_XMIT.

Zero-copy and non-linear XDP buffer supports are NOT implemented.

Besides, the page pool memory model is used to get better performance.
====================

Signed-off-by: Julien Panis <jpanis@baylibre.com>
5 months agonet: ethernet: ti: am65-cpsw: Add minimal XDP support
Julien Panis [Fri, 12 Apr 2024 15:38:34 +0000 (17:38 +0200)]
net: ethernet: ti: am65-cpsw: Add minimal XDP support

This patch adds XDP (eXpress Data Path) support to TI AM65 CPSW
Ethernet driver. The following features are implemented:
- NETDEV_XDP_ACT_BASIC (XDP_PASS, XDP_TX, XDP_DROP, XDP_ABORTED)
- NETDEV_XDP_ACT_REDIRECT (XDP_REDIRECT)
- NETDEV_XDP_ACT_NDO_XMIT (ndo_xdp_xmit callback)

The page pool memory model is used to get better performance.
Below are benchmark results obtained for the receiver with iperf3 default
parameters:
- Without page pool: 495 Mbits/sec
- With page pool: 605 Mbits/sec (actually 610 Mbits/sec, with a 5 Mbits/sec
loss due to extra processing in the hot path to handle XDP).

Signed-off-by: Julien Panis <jpanis@baylibre.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agonet: ethernet: ti: Add desc_infos member to struct k3_cppi_desc_pool
Julien Panis [Fri, 12 Apr 2024 15:38:33 +0000 (17:38 +0200)]
net: ethernet: ti: Add desc_infos member to struct k3_cppi_desc_pool

This patch introduces a member and the related accessors which can be
used to store descriptor specific additional information. This member
can store, for instance, an ID to differentiate a skb TX buffer type
from a xdpf TX buffer type.

Signed-off-by: Julien Panis <jpanis@baylibre.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agonet: ethernet: ti: Add accessors for struct k3_cppi_desc_pool members
Julien Panis [Fri, 12 Apr 2024 15:38:32 +0000 (17:38 +0200)]
net: ethernet: ti: Add accessors for struct k3_cppi_desc_pool members

This patch adds accessors for desc_size and cpumem members. They may be
used, for instance, to compute a descriptor index.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Julien Panis <jpanis@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agoudp: Avoid call to compute_score on multiple sites
Gabriel Krisman Bertazi [Fri, 12 Apr 2024 21:20:04 +0000 (17:20 -0400)]
udp: Avoid call to compute_score on multiple sites

We've observed a 7-12% performance regression in iperf3 UDP ipv4 and
ipv6 tests with multiple sockets on Zen3 cpus, which we traced back to
commit f0ea27e7bfe1 ("udp: re-score reuseport groups when connected
sockets are present").  The failing tests were those that would spawn
UDP sockets per-cpu on systems that have a high number of cpus.

Unsurprisingly, it is not caused by the extra re-scoring of the reused
socket, but due to the compiler no longer inlining compute_score, once
it has the extra call site in udp4_lib_lookup2.  This is augmented by
the "Safe RET" mitigation for SRSO, needed in our Zen3 cpus.

We could just explicitly inline it, but compute_score() is quite a large
function, around 300b.  Inlining in two sites would almost double
udp4_lib_lookup2, which is a silly thing to do just to workaround a
mitigation.  Instead, this patch shuffles the code a bit to avoid the
multiple calls to compute_score.  Since it is a static function used in
one spot, the compiler can safely fold it in, as it did before, without
increasing the text size.

With this patch applied I ran my original iperf3 testcases.  The failing
cases all looked like this (ipv4):
iperf3 -c 127.0.0.1 --udp -4 -f K -b $R -l 8920 -t 30 -i 5 -P 64 -O 2

where $R is either 1G/10G/0 (max, unlimited).  I ran 3 times each.
baseline is v6.9-rc3. harmean == harmonic mean; CV == coefficient of
variation.

ipv4:
                 1G                10G                  MAX
    HARMEAN  (CV)      HARMEAN  (CV)    HARMEAN     (CV)
baseline 1743852.66(0.0208) 1725933.02(0.0167) 1705203.78(0.0386)
patched  1968727.61(0.0035) 1962283.22(0.0195) 1923853.50(0.0256)

ipv6:
                 1G                10G                  MAX
    HARMEAN  (CV)      HARMEAN  (CV)    HARMEAN     (CV)
baseline 1729020.03(0.0028) 1691704.49(0.0243) 1692251.34(0.0083)
patched  1900422.19(0.0067) 1900968.01(0.0067) 1568532.72(0.1519)

This restores the performance we had before the change above with this
benchmark.  We obviously don't expect any real impact when mitigations
are disabled, but just to be sure it also doesn't regresses:

mitigations=off ipv4:
                 1G                10G                  MAX
    HARMEAN  (CV)      HARMEAN  (CV)    HARMEAN     (CV)
baseline 3230279.97(0.0066) 3229320.91(0.0060) 2605693.19(0.0697)
patched  3242802.36(0.0073) 3239310.71(0.0035) 2502427.19(0.0882)

Cc: Lorenz Bauer <lmb@isovalent.com>
Fixes: f0ea27e7bfe1 ("udp: re-score reuseport groups when connected sockets are present")
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agonet: ip6_gre: Remove generic .ndo_get_stats64
Breno Leitao [Fri, 12 Apr 2024 15:19:26 +0000 (08:19 -0700)]
net: ip6_gre: Remove generic .ndo_get_stats64

Commit 3e2f544dd8a33 ("net: get stats64 if device if driver is
configured") moved the callback to dev_get_tstats64() to net core, so,
unless the driver is doing some custom stats collection, it does not
need to set .ndo_get_stats64.

Since this driver is now relying in NETDEV_PCPU_STAT_TSTATS, then, it
doesn't need to set the dev_get_tstats64() generic .ndo_get_stats64
function pointer.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agonet: ipv6_gre: Do not use custom stat allocator
Breno Leitao [Fri, 12 Apr 2024 15:19:25 +0000 (08:19 -0700)]
net: ipv6_gre: Do not use custom stat allocator

With commit 34d21de99cea9 ("net: Move {l,t,d}stats allocation to core and
convert veth & vrf"), stats allocation could be done on net core
instead of in this driver.

With this new approach, the driver doesn't have to bother with error
handling (allocation failure checking, making sure free happens in the
right spot, etc). This is core responsibility now.

Remove the allocation in the ip6_gre and leverage the network
core allocation instead.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agonet: dsa: convert dsa_user_phylink_fixed_state() to use dsa_phylink_to_port()
Russell King (Oracle) [Fri, 12 Apr 2024 15:15:03 +0000 (16:15 +0100)]
net: dsa: convert dsa_user_phylink_fixed_state() to use dsa_phylink_to_port()

Convert dsa_user_phylink_fixed_state() to use the newly introduced
dsa_phylink_to_port() helper.

Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agonet: constify net_class
Heiner Kallweit [Fri, 12 Apr 2024 10:17:57 +0000 (12:17 +0200)]
net: constify net_class

AFAICS all users of net_class take a const struct class * argument.
Therefore fully constify net_class.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agogve: Correctly report software timestamping capabilities
John Fraker [Fri, 12 Apr 2024 05:32:29 +0000 (22:32 -0700)]
gve: Correctly report software timestamping capabilities

gve has supported software timestamp generation since its inception,
but has not advertised that support via ethtool. This patch correctly
advertises that support.

Signed-off-by: John Fraker <jfraker@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agonet: save some cycles when doing skb_attempt_defer_free()
Jason Xing [Fri, 12 Apr 2024 03:07:18 +0000 (11:07 +0800)]
net: save some cycles when doing skb_attempt_defer_free()

Normally, we don't face these two exceptions very often meanwhile
we have some chance to meet the condition where the current cpu id
is the same as skb->alloc_cpu.

One simple test that can help us see the frequency of this statement
'cpu == raw_smp_processor_id()':
1. running iperf -s and iperf -c [ip] -P [MAX CPU]
2. using BPF to capture skb_attempt_defer_free()

I can see around 4% chance that happens to satisfy the statement.
So moving this statement at the beginning can save some cycles in
most cases.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agoMerge branch 'flower-control-flags'
David S. Miller [Mon, 15 Apr 2024 09:33:15 +0000 (10:33 +0100)]
Merge branch 'flower-control-flags'

Asbjørn Sloth Tønnesen says:

====================
flower: validate control flags

I have reviewed the flower control flags code.
In all, but one (sfc), the flags field wasn't
checked properly for unsupported flags.

In this series I have only included a single example
user for each helper function. Once the helpers are in,
I will submit patches for all other drivers implementing
flower.

After which there will be:
- 6 drivers using flow_rule_is_supp_control_flags()
- 8 drivers using flow_rule_has_control_flags()
- 11 drivers using flow_rule_match_has_control_flags()

---
Changelog:

v3:
- Added Reviewed-by from Louis Peens (first two patches)
- Properly fixed kernel-doc format

v2: https://lore.kernel.org/netdev/20240410093235.5334-1-ast@fiberby.net/
- Squashed the 3 helper functions to one commmit (requested by Baowen Zheng)
- Renamed helper functions to avoid double negatives (suggested by Louis Peens)
- Reverse booleans in some functions and callsites to align with new names
- Fix autodoc format

v1: https://lore.kernel.org/netdev/20240408130927.78594-1-ast@fiberby.net/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agonet: dsa: microchip: ksz9477: flower: validate control flags
Asbjørn Sloth Tønnesen [Thu, 11 Apr 2024 10:52:57 +0000 (10:52 +0000)]
net: dsa: microchip: ksz9477: flower: validate control flags

Add check for unsupported control flags.

Only compile-tested, no access to HW.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agonet: prestera: flower: validate control flags
Asbjørn Sloth Tønnesen [Thu, 11 Apr 2024 10:52:56 +0000 (10:52 +0000)]
net: prestera: flower: validate control flags

Add check for unsupported control flags.

Only compile-tested, no access to HW.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agonfp: flower: fix check for unsupported control flags
Asbjørn Sloth Tønnesen [Thu, 11 Apr 2024 10:52:55 +0000 (10:52 +0000)]
nfp: flower: fix check for unsupported control flags

Use flow_rule_is_supp_control_flags()

Check the mask, not the key, for unsupported control flags.

Only compile-tested, no access to HW

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agoflow_offload: add control flag checking helpers
Asbjørn Sloth Tønnesen [Thu, 11 Apr 2024 10:52:54 +0000 (10:52 +0000)]
flow_offload: add control flag checking helpers

These helpers aim to help drivers, with checking
for the presence of unsupported control flags.

For drivers supporting at least one control flag:
  flow_rule_is_supp_control_flags()

For drivers using flow_rule_match_control(), but not using flags:
  flow_rule_has_control_flags()

For drivers not using flow_rule_match_control():
  flow_rule_match_has_control_flags()

While primarily aimed at FLOW_DISSECTOR_KEY_CONTROL
and flow_rule_match_control(), then the first two
can also be used with FLOW_DISSECTOR_KEY_ENC_CONTROL
and flow_rule_match_enc_control().

These helpers mirrors the existing check done in sfc:
  drivers/net/ethernet/sfc/tc.c +276

Only compile-tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agonet: dev_addr_lists: move locking out of init/exit in kunit
Jakub Kicinski [Thu, 11 Apr 2024 18:32:22 +0000 (11:32 -0700)]
net: dev_addr_lists: move locking out of init/exit in kunit

We lock and unlock rtnl in init/exit for convenience,
but it started causing problems if the exit is handled
by a different thread. To avoid having to futz with
disabling locking assertions move the locking into
the test cases. We don't use ASSERTs so it should
be safe.

   ============= dev-addr-list-test (6 subtests) ==============
   [PASSED] dev_addr_test_basic
   [PASSED] dev_addr_test_sync_one
   [PASSED] dev_addr_test_add_del
   [PASSED] dev_addr_test_del_main
   [PASSED] dev_addr_test_add_set
   [PASSED] dev_addr_test_add_excl
   =============== [PASSED] dev-addr-list-test ================

Link: https://lore.kernel.org/all/20240403131936.787234-7-linux@roeck-us.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agodrop_monitor: replace spin_lock by raw_spin_lock
Wander Lairson Costa [Thu, 11 Apr 2024 14:13:46 +0000 (11:13 -0300)]
drop_monitor: replace spin_lock by raw_spin_lock

trace_drop_common() is called with preemption disabled, and it acquires
a spin_lock. This is problematic for RT kernels because spin_locks are
sleeping locks in this configuration, which causes the following splat:

BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 449, name: rcuc/47
preempt_count: 1, expected: 0
RCU nest depth: 2, expected: 2
5 locks held by rcuc/47/449:
 #0: ff1100086ec30a60 ((softirq_ctrl.lock)){+.+.}-{2:2}, at: __local_bh_disable_ip+0x105/0x210
 #1: ffffffffb394a280 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0xbf/0x130
 #2: ffffffffb394a280 (rcu_read_lock){....}-{1:2}, at: __local_bh_disable_ip+0x11c/0x210
 #3: ffffffffb394a160 (rcu_callback){....}-{0:0}, at: rcu_do_batch+0x360/0xc70
 #4: ff1100086ee07520 (&data->lock){+.+.}-{2:2}, at: trace_drop_common.constprop.0+0xb5/0x290
irq event stamp: 139909
hardirqs last  enabled at (139908): [<ffffffffb1df2b33>] _raw_spin_unlock_irqrestore+0x63/0x80
hardirqs last disabled at (139909): [<ffffffffb19bd03d>] trace_drop_common.constprop.0+0x26d/0x290
softirqs last  enabled at (139892): [<ffffffffb07a1083>] __local_bh_enable_ip+0x103/0x170
softirqs last disabled at (139898): [<ffffffffb0909b33>] rcu_cpu_kthread+0x93/0x1f0
Preemption disabled at:
[<ffffffffb1de786b>] rt_mutex_slowunlock+0xab/0x2e0
CPU: 47 PID: 449 Comm: rcuc/47 Not tainted 6.9.0-rc2-rt1+ #7
Hardware name: Dell Inc. PowerEdge R650/0Y2G81, BIOS 1.6.5 04/15/2022
Call Trace:
 <TASK>
 dump_stack_lvl+0x8c/0xd0
 dump_stack+0x14/0x20
 __might_resched+0x21e/0x2f0
 rt_spin_lock+0x5e/0x130
 ? trace_drop_common.constprop.0+0xb5/0x290
 ? skb_queue_purge_reason.part.0+0x1bf/0x230
 trace_drop_common.constprop.0+0xb5/0x290
 ? preempt_count_sub+0x1c/0xd0
 ? _raw_spin_unlock_irqrestore+0x4a/0x80
 ? __pfx_trace_drop_common.constprop.0+0x10/0x10
 ? rt_mutex_slowunlock+0x26a/0x2e0
 ? skb_queue_purge_reason.part.0+0x1bf/0x230
 ? __pfx_rt_mutex_slowunlock+0x10/0x10
 ? skb_queue_purge_reason.part.0+0x1bf/0x230
 trace_kfree_skb_hit+0x15/0x20
 trace_kfree_skb+0xe9/0x150
 kfree_skb_reason+0x7b/0x110
 skb_queue_purge_reason.part.0+0x1bf/0x230
 ? __pfx_skb_queue_purge_reason.part.0+0x10/0x10
 ? mark_lock.part.0+0x8a/0x520
...

trace_drop_common() also disables interrupts, but this is a minor issue
because we could easily replace it with a local_lock.

Replace the spin_lock with raw_spin_lock to avoid sleeping in atomic
context.

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Reported-by: Hu Chunyu <chuhu@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agofib: rules: no longer hold RTNL in fib_nl_dumprule()
Eric Dumazet [Thu, 11 Apr 2024 13:33:40 +0000 (13:33 +0000)]
fib: rules: no longer hold RTNL in fib_nl_dumprule()

- fib rules are already RCU protected, RTNL is not needed
  to get them.

- Fix return value at the end of a dump,
  so that NLMSG_DONE can be appended to current skb,
  saving one recvmsg() system call.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240411133340.1332796-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agotipc: remove redundant assignment to ret, simplify code
Colin Ian King [Thu, 11 Apr 2024 09:17:04 +0000 (10:17 +0100)]
tipc: remove redundant assignment to ret, simplify code

Variable err is being assigned a zero value and it is never read
afterwards in either the break path or continue path, the assignment
is redundant and can be removed. With it removed, the if statement
can also be simplified.

Cleans up clang scan warning:
net/tipc/socket.c:3570:5: warning: Value stored to 'err' is never
read [deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20240411091704.306752-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agotcp: small optimization when TCP_TW_SYN is processed
Eric Dumazet [Thu, 11 Apr 2024 08:25:29 +0000 (08:25 +0000)]
tcp: small optimization when TCP_TW_SYN is processed

When TCP_TW_SYN is processed, we perform a lookup to find
a listener and jump back in tcp_v6_rcv() and tcp_v4_rcv()

Paolo suggested that we do not have to check if the
found socket is a TIME_WAIT or NEW_SYN_RECV one.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/netdev/68085c8a84538cacaac991415e4ccc72f45e76c2.camel@redhat.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Link: https://lore.kernel.org/r/20240411082530.907113-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge branch 'support-some-features-for-the-hns3-ethernet-driver'
Jakub Kicinski [Sat, 13 Apr 2024 01:58:58 +0000 (18:58 -0700)]
Merge branch 'support-some-features-for-the-hns3-ethernet-driver'

Jijie Shao says:

====================
Support some features for the HNS3 ethernet driver

Currently, the hns3 driver does not have the trace
of the command queue. As a result, it is difficult to
locate the communication between the driver and firmware.
Therefore, the trace function of the command queue is
added in this patch set to facilitate the locating of
communication problems between the driver and firmware.

If a RAS occurs, the driver will automatically reset to attempt
to recover the RAS. Therefore, to locate the cause of the RAS,
it is necessary to save the values of some RAS-related registers
before the reset. So we added a patch in this patch set to
print these information.
====================

Link: https://lore.kernel.org/r/20240410125354.2177067-1-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: hns3: add support to query scc version by devlink info
Hao Chen [Wed, 10 Apr 2024 12:53:54 +0000 (20:53 +0800)]
net: hns3: add support to query scc version by devlink info

Add support to query scc version by devlink info for device V3.

Signed-off-by: Hao Chen <chenhao418@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://lore.kernel.org/r/20240410125354.2177067-5-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: hns3: dump more reg info based on ras mod
Peiyang Wang [Wed, 10 Apr 2024 12:53:53 +0000 (20:53 +0800)]
net: hns3: dump more reg info based on ras mod

When the driver received an interrupte for hardware error,
it will try to restore by resetting. But the hardware registers
will also be reset at this case, which make it hard to analysis
why the hardware error occurs.

This patch dumps these registers before resetting to help
analyze the hardware error occurs.

Signed-off-by: Peiyang Wang <wangpeiyang1@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240410125354.2177067-4-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: hns3: move constants from hclge_debugfs.h to hclge_debugfs.c
Jijie Shao [Wed, 10 Apr 2024 12:53:52 +0000 (20:53 +0800)]
net: hns3: move constants from hclge_debugfs.h to hclge_debugfs.c

some constants are defined in hclge_debugfs.h,
but only used in hclge_debugfs.c.
so move them from hclge_debugfs.h to hclge_debugfs.c.

Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240410125354.2177067-3-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: hns3: add command queue trace for hns3
Hao Lan [Wed, 10 Apr 2024 12:53:51 +0000 (20:53 +0800)]
net: hns3: add command queue trace for hns3

Add support to dump command queue trace for hns3.

Signed-off-by: Hao Lan <lanhao@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://lore.kernel.org/r/20240410125354.2177067-2-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agonet: nfc: remove inappropriate attrs check
Lin Ma [Wed, 10 Apr 2024 03:48:46 +0000 (11:48 +0800)]
net: nfc: remove inappropriate attrs check

Revert "NFC: fix attrs checks in netlink interface"
This reverts commit 18917d51472fe3b126a3a8f756c6b18085eb8130.

Our checks found weird attrs present check in function
nfc_genl_dep_link_down() and nfc_genl_llc_get_params(), which are
introduced by commit 18917d51472f ("NFC: fix attrs checks in netlink
interface").

According to its message, it should add checks for functions
nfc_genl_deactivate_target() and nfc_genl_fw_download(). However, it
didn't do that. In fact, the expected checks are added by
(1) commit 385097a36757 ("nfc: Ensure presence of required attributes in
the deactivate_target handler") and
(2) commit 280e3ebdafb8 ("nfc: Ensure presence of NFC_ATTR_FIRMWARE_NAME
attribute in nfc_genl_fw_download()"). Perhaps something went wrong.

Anyway, the attr NFC_ATTR_TARGET_INDEX is never accessed in callback
nfc_genl_dep_link_down() and same for NFC_ATTR_FIRMWARE_NAME and
nfc_genl_llc_get_params(). Thus, remove those checks.

Signed-off-by: Lin Ma <linma@zju.edu.cn>
Link: https://lore.kernel.org/r/20240410034846.167421-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge branch 'ptp-convert-to-platform-remove-callback-returning-void'
Jakub Kicinski [Sat, 13 Apr 2024 01:51:41 +0000 (18:51 -0700)]
Merge branch 'ptp-convert-to-platform-remove-callback-returning-void'

Uwe Kleine-König says:

====================
ptp: Convert to platform remove callback returning void

this series converts all platform drivers below drivers/ptp/ to not use
struct platform_device::remove() any more. See commit 5c5a7680e67b
("platform: Provide a remove callback that returns no value") for an
extended explanation and the eventual goal.

All conversations are trivial, because the driver's .remove() callbacks
returned zero unconditionally.
====================

Link: https://lore.kernel.org/r/cover.1712734365.git.u.kleine-koenig@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoptp: ptp_qoriq: Convert to platform remove callback returning void
Uwe Kleine-König [Wed, 10 Apr 2024 07:34:54 +0000 (09:34 +0200)]
ptp: ptp_qoriq: Convert to platform remove callback returning void

The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/477c6995046eee729447d4f88bf042c7577fe100.1712734365.git.u.kleine-koenig@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoptp: ptp_ines: Convert to platform remove callback returning void
Uwe Kleine-König [Wed, 10 Apr 2024 07:34:53 +0000 (09:34 +0200)]
ptp: ptp_ines: Convert to platform remove callback returning void

The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/2cc6c137dd43444abb5bdb53693713f7c2c08b71.1712734365.git.u.kleine-koenig@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoptp: ptp_idt82p33: Convert to platform remove callback returning void
Uwe Kleine-König [Wed, 10 Apr 2024 07:34:52 +0000 (09:34 +0200)]
ptp: ptp_idt82p33: Convert to platform remove callback returning void

The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/5807d0b11214b35f48908fd35cbb7b31b7655ba6.1712734365.git.u.kleine-koenig@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoptp: ptp_dte: Convert to platform remove callback returning void
Uwe Kleine-König [Wed, 10 Apr 2024 07:34:51 +0000 (09:34 +0200)]
ptp: ptp_dte: Convert to platform remove callback returning void

The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/e8a0de7e8e6d642242350360a938132c7ba0488e.1712734365.git.u.kleine-koenig@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoptp: ptp_clockmatrix: Convert to platform remove callback returning void
Uwe Kleine-König [Wed, 10 Apr 2024 07:34:50 +0000 (09:34 +0200)]
ptp: ptp_clockmatrix: Convert to platform remove callback returning void

The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/0f0f5680c1a2a3ef19975935a2c6828a98bc4d25.1712734365.git.u.kleine-koenig@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoMerge branch 'selftests-move-netfilter-tests-to-net'
Jakub Kicinski [Sat, 13 Apr 2024 00:52:04 +0000 (17:52 -0700)]
Merge branch 'selftests-move-netfilter-tests-to-net'

Florian Westphal says:

====================
selftests: move netfilter tests to net

First patch in this series moves selftests/netfilter/
to selftests/net/netfilter/.

Passing this via net-next rather than nf-next for this reason.

Main motivation is that a lot of these scripts only work on my old
development VM, I hope that placing this in net/ will get these
tests to get run in more regular intervals (and tests get more robust).

Changes are:

- make use of existing 'setup_ns' and 'busywait' helpers
- fix shellcheck warnings
- add more SKIP checks to avoid failures
- get rid of netcat in favor of socat, too many test
  failures due to 'wrong' netcat flavor
- do not assume rp_filter sysctl is off

I have more patches that fix up the remaining test scripts,
but the series was too large to send them at once (34 patches).

After all scripts are fixed up, tests pass on both my Debian
and Fedora test machines.

MAINTAINERS is updated to reflect that future updates should be handled
via netfilter-devel@.
====================

Link: https://lore.kernel.org/r/20240411233624.8129-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: netfilter: nft_nat.sh: move to lib.sh infra
Florian Westphal [Thu, 11 Apr 2024 23:36:20 +0000 (01:36 +0200)]
selftests: netfilter: nft_nat.sh: move to lib.sh infra

Use busywait helper to wait until socat listener is up to avoid "sleep" calls.
This reduces script execution time slighty (12s to 7s).

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240411233624.8129-16-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: netfilter: nft_flowtable.sh: move test to lib.sh infra
Florian Westphal [Thu, 11 Apr 2024 23:36:19 +0000 (01:36 +0200)]
selftests: netfilter: nft_flowtable.sh: move test to lib.sh infra

Use socat, the different nc implementations have too much variance wrt.
supported options.

Avoid sleeping until listener is up, use busywait helper for this,
this also greatly reduces test duration.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240411233624.8129-15-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: netfilter: nft_fib.sh: move to lib.sh infra
Florian Westphal [Thu, 11 Apr 2024 23:36:18 +0000 (01:36 +0200)]
selftests: netfilter: nft_fib.sh: move to lib.sh infra

Also lower ping interval, wait times (helpers get called several times)
and set nodad for ipv6 addresses: 20s down to 4s.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240411233624.8129-14-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: netfilter: nft_conntrack_helper.sh: test to lib.sh infra
Florian Westphal [Thu, 11 Apr 2024 23:36:17 +0000 (01:36 +0200)]
selftests: netfilter: nft_conntrack_helper.sh: test to lib.sh infra

prefer socat over nc, nc has too many incompatible versions around.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240411233624.8129-13-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: netfilter: nf_nat_edemux.sh: move to lib.sh infra
Florian Westphal [Thu, 11 Apr 2024 23:36:16 +0000 (01:36 +0200)]
selftests: netfilter: nf_nat_edemux.sh: move to lib.sh infra

While at it, use checktool helper.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240411233624.8129-12-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: netfilter: ipvs.sh: move to lib.sh infra
Florian Westphal [Thu, 11 Apr 2024 23:36:15 +0000 (01:36 +0200)]
selftests: netfilter: ipvs.sh: move to lib.sh infra

The setup_ns helper makes the netns names random, so replace nsX with $nsX
everywhere.

Replace nc with socat, otherwise script fails on my system due to
incompatible nc versions ("nc: cannot use -p and -l").

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240411233624.8129-11-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: netfilter: place checktool helper in lib.sh
Florian Westphal [Thu, 11 Apr 2024 23:36:14 +0000 (01:36 +0200)]
selftests: netfilter: place checktool helper in lib.sh

... so it doesn't have to be repeated everywhere.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240411233624.8129-10-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: netfilter: conntrack_ipip_mtu.sh" move to lib.sh infra
Florian Westphal [Thu, 11 Apr 2024 23:36:13 +0000 (01:36 +0200)]
selftests: netfilter: conntrack_ipip_mtu.sh" move to lib.sh infra

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240411233624.8129-9-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: netfilter: conntrack_vrf.sh: move to lib.sh infra
Florian Westphal [Thu, 11 Apr 2024 23:36:12 +0000 (01:36 +0200)]
selftests: netfilter: conntrack_vrf.sh: move to lib.sh infra

swap test for "ip" with "conntrack", former is already accounted for
via setup_ns helper.  Also switch to bash.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240411233624.8129-8-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: netfilter: conntrack_sctp_collision.sh: move to lib.sh infra
Florian Westphal [Thu, 11 Apr 2024 23:36:11 +0000 (01:36 +0200)]
selftests: netfilter: conntrack_sctp_collision.sh: move to lib.sh infra

While at it, address warnings generated by shellcheck and fix following
minor issues:

 - some distros place netem in 'extra' modules package, so add a skip check for netem-attach
   failure.
 - tc prints a warning for the 100mbit class:
   "Warning: sch_htb: quantum of class 10001 is big. Consider r2q change."
   Silence this by increasing the divisor.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240411233624.8129-7-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: netfilter: conntrack_tcp_unreplied.sh: move to lib.sh infra
Florian Westphal [Thu, 11 Apr 2024 23:36:10 +0000 (01:36 +0200)]
selftests: netfilter: conntrack_tcp_unreplied.sh: move to lib.sh infra

Replace nc with socat. Too many different implementations of nc
are around with incompatible options ("nc: cannot use -p and -l").

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240411233624.8129-6-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: netfilter: conntrack_icmp_related.sh: move to lib.sh infra
Florian Westphal [Thu, 11 Apr 2024 23:36:09 +0000 (01:36 +0200)]
selftests: netfilter: conntrack_icmp_related.sh: move to lib.sh infra

Only relevant change is that netns names have random suffix names,
i.e. its safe to run this in parallel with other tests.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240411233624.8129-5-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: netfilter: br_netfilter.sh: move to lib.sh infra
Florian Westphal [Thu, 11 Apr 2024 23:36:08 +0000 (01:36 +0200)]
selftests: netfilter: br_netfilter.sh: move to lib.sh infra

Also, fix two issues reported by Pablo Neira:
1. Must modprobe br_netfilter in case its not loaded,
   else sysctl cannot be set.
2. ping for netns4 fails if rp_filter is enabled in bridge netns,
   so set all and default to 0.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240411233624.8129-4-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: netfilter: bridge_brouter.sh: move to lib.sh infra
Florian Westphal [Thu, 11 Apr 2024 23:36:07 +0000 (01:36 +0200)]
selftests: netfilter: bridge_brouter.sh: move to lib.sh infra

Doing so gets us dynamically generated netns names.

Also:
* do not assume rp_filter is disabled, if its on script failed
* reduce timeout (-W) for "expected to fail" ping commands
* don't print PASS line for basic sanity ping
* shellcheck cleanups

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240411233624.8129-3-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoselftests: netfilter: move to net subdir
Florian Westphal [Thu, 11 Apr 2024 23:36:06 +0000 (01:36 +0200)]
selftests: netfilter: move to net subdir

.. so this can start re-using existing lib.sh infra in next patches.

Several of these scripts will not work, e.g. because they assume
rp_filter is disabled, or reliance on a particular version/flavor
of "netcat" tool.

Add config settings for them.

nft_trans_stress.sh script is removed, it also exists in the nftables
userspace selftests.  I do not see a reason to keep two versions in
different repositories/projects.

The settings file is removed for now:

It was used to increase the timeout to avoid slow scripts from getting
zapped by the 45s timeout, but some of the slow scripts can be sped up.
Re-add it later for scripts that cannot be sped up easily.

Update MAINTAINERS to reflect that future updates to netfilter
scripts should go through netfilter-devel@.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240411233624.8129-2-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 months agoice: store VF relative MSI-X index in q_vector->vf_reg_idx
Jacob Keller [Fri, 22 Mar 2024 21:44:45 +0000 (14:44 -0700)]
ice: store VF relative MSI-X index in q_vector->vf_reg_idx

The ice physical function driver needs to configure the association of
queues and interrupts on behalf of its virtual functions. This is done over
virtchnl by the VF sending messages during its initialization phase. These
messages contain a vector_id which the VF wants to associate with a given
queue. This ID is relative to the VF space, where 0 indicates the control
IRQ for non-queue interrupts.

When programming the mapping, the PF driver currently passes this vector_id
directly to the low level functions for programming. This works for SR-IOV,
because the hardware uses the VF-based indexing for interrupts.

This won't work for Scalable IOV, which uses PF-based indexing for
programming its VSIs. To handle this, the driver needs to be able to look
up the proper index to use for programming. For typical IRQs, this would be
the q_vector->reg_idx field.

The q_vector->reg_idx can't be set to a VF relative value, because it is
used when the PF needs to control the interrupt, such as when triggering a
software interrupt on stopping the Tx queue. Thus, introduce a new
q_vector->vf_reg_idx which can store the VF relative index for registers
which expect this.

Use this in ice_cfg_interrupt to look up the VF index from the q_vector.
This allows removing the vector ID parameter of ice_cfg_interrupt. Also
notice that this function returns an int, but then is cast to the virtchnl
error enumeration, virtchnl_status_code. Update the return type to indicate
it does not return an integer error code. We can't use normal error codes
here because the return values are passed across the virtchnl interface.

This will allow the future Scalable IOV VFs to correctly look up the index
needed for programming the VF queues without breaking SR-IOV.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 months agoice: set vf->num_msix in ice_initialize_vf_entry()
Jacob Keller [Fri, 22 Mar 2024 21:44:44 +0000 (14:44 -0700)]
ice: set vf->num_msix in ice_initialize_vf_entry()

Commit fe1c5ca2fe76 ("ice: implement num_msix field per VF") updated the
driver to allow for per-VF MSI-X configuration. The initial defaults were
set in ice_create_vf_entries(). This logic is better placed in
ice_initialize_vf_entry(). Indeed, that function already sets
vf->num_vf_qs, as well as initializes the allow list via calling
ice_vc_set_default_allowlist().

Move this logic into ice_initialize_vf_entry(). This makes the code clear,
and ensures that these VF fields will be initialized properly for both
SR-IOV VFs and the upcoming Scalable IOV VFs.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 months agoice: Implement 'flow-type ether' rules
Jakub Buchocki [Wed, 3 Apr 2024 10:24:02 +0000 (12:24 +0200)]
ice: Implement 'flow-type ether' rules

Add support for 'flow-type ether' Flow Director rules via ethtool.

Create packet segment info for filter configuration based on ethtool
command parameters. Reuse infrastructure already created for
ipv4 and ipv6 flows to convert packet segment into
extraction sequence, which is later used to program the filter
inside Flow Director block of the Rx pipeline.

Rules not containing masks are processed by the Flow Director,
and support the following set of input parameters in all combinations:
src, dst, proto, vlan-etype, vlan, action.

It is possible to specify address mask in ethtool parameters but only
00:00:00:00:00 and FF:FF:FF:FF:FF are valid.
The same applies to proto, vlan-etype and vlan masks:
only 0x0000 and 0xffff masks are valid.

Testing:
  (DUT) iperf3 -s
  (DUT) ethtool -U ens785f0np0 flow-type ether dst <ens785f0np0 mac> \
        action 10
  (DUT) watch 'ethtool -S ens785f0np0 | grep rx_queue'
  (LP)  iperf3 -c ${DUT_IP}

  Counters increase only for:
    'rx_queue_10_packets'
    'rx_queue_10_bytes'

Signed-off-by: Jakub Buchocki <jakubx.buchocki@intel.com>
Co-developed-by: Mateusz Pacuszka <mateuszx.pacuszka@intel.com>
Signed-off-by: Mateusz Pacuszka <mateuszx.pacuszka@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Lukasz Plachno <lukasz.plachno@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 months agoice: Remove unnecessary argument from ice_fdir_comp_rules()
Lukasz Plachno [Wed, 3 Apr 2024 10:24:01 +0000 (12:24 +0200)]
ice: Remove unnecessary argument from ice_fdir_comp_rules()

Passing v6 argument is unnecessary as flow_type is still
analyzed inside the function.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Lukasz Plachno <lukasz.plachno@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
5 months agoMerge branch 'nfp-minor-improvements'
David S. Miller [Fri, 12 Apr 2024 10:40:09 +0000 (11:40 +0100)]
Merge branch 'nfp-minor-improvements'

Louis Peens says:

====================
nfp: series of minor driver improvements

This short series bundles now only includes a small update to add a
board part number to devlink. Previously some dim patches also formed
part of this series, these were dropped in v5.

Patch1: Add new define for devlink string "board.part_number"
Patch2: Make use of this field in the nfp driver

Changes since V4:
- Dropped the dim patches, as there is a more significant rework in
  progress to make it more flexible, as mentioned in the V4 review:
  https://lore.kernel.org/all/1712547870-112976-2-git-send-email-hengqi@linux.alibaba.com/
- Updated the devlink description of 'board.part_number'

Changes since V3:
- Fixed: Documentation/networking/devlink/devlink-info.rst:150:
    WARNING: Title underline too short.

Changes since V2:
- After some discussion on the previous series it was agreed that only
  the "board.part_number" field makes sense in the common code. The
  "board.model" field which was moved to devlink common code in V1 is
  now kept in the driver. The field is specific to the nfp driver,
  exposing the codename of the board.
- In summary, add "board.part_number" to devlink, and populate it
  in the the nfp driver.

Changes since V1:
- Move nfp local defines to devlink common code as it is quite generic.
- Add new 'dim' profile instead of using driver local overrides, as this
  allows use of the 'dim' helpers.
- This expanded 2 patches to 4, as the common code changes are split
  into seperate patches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agonfp: update devlink device info output
Fei Qin [Wed, 10 Apr 2024 11:26:36 +0000 (13:26 +0200)]
nfp: update devlink device info output

Newer NIC will introduce a new part number, now add it
into devlink device info.

This patch also updates the information of "board.id" in
nfp.rst to match the devlink-info.rst.

Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agodevlink: add a new info version tag
Fei Qin [Wed, 10 Apr 2024 11:26:35 +0000 (13:26 +0200)]
devlink: add a new info version tag

Add definition and documentation for the new generic
info "board.part_number".

The new one is for part number specific use, and board.id
is modified to match the documentation in devlink-info.

Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agotcp: increase the default TCP scaling ratio
Hechao Li [Tue, 9 Apr 2024 16:43:55 +0000 (09:43 -0700)]
tcp: increase the default TCP scaling ratio

After commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale"),
we noticed an application-level timeout due to reduced throughput.

Before the commit, for a client that sets SO_RCVBUF to 65k, it takes
around 22 seconds to transfer 10M data. After the commit, it takes 40
seconds. Because our application has a 30-second timeout, this
regression broke the application.

The reason that it takes longer to transfer data is that
tp->scaling_ratio is initialized to a value that results in ~0.25 of
rcvbuf. In our case, SO_RCVBUF is set to 65536 by the application, which
translates to 2 * 65536 = 131,072 bytes in rcvbuf and hence a ~28k
initial receive window.

Later, even though the scaling_ratio is updated to a more accurate
skb->len/skb->truesize, which is ~0.66 in our environment, the window
stays at ~0.25 * rcvbuf. This is because tp->window_clamp does not
change together with the tp->scaling_ratio update when autotuning is
disabled due to SO_RCVBUF. As a result, the window size is capped at the
initial window_clamp, which is also ~0.25 * rcvbuf, and never grows
bigger.

Most modern applications let the kernel do autotuning, and benefit from
the increased scaling_ratio. But there are applications such as kafka
that has a default setting of SO_RCVBUF=64k.

This patch increases the initial scaling_ratio from ~25% to 50% in order
to make it backward compatible with the original default
sysctl_tcp_adv_win_scale for applications setting SO_RCVBUF.

Fixes: dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
Signed-off-by: Hechao Li <hli@netflix.com>
Reviewed-by: Tycho Andersen <tycho@tycho.pizza>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/netdev/20240402215405.432863-1-hli@netflix.com/
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agoMerge branch 'rtl8226b-serdes-switching'
David S. Miller [Fri, 12 Apr 2024 09:18:19 +0000 (10:18 +0100)]
Merge branch 'rtl8226b-serdes-switching'

Eric Woudstra says:

====================
rtl8226b/8221b add C45 instances and SerDes switching

Based on the comments in [PATCH net-next]
"Realtek RTL822x PHY rework to c45 and SerDes interface switching"

Adds SerDes switching interface between 2500base-x and sgmii for
rtl8221b and rtl8226b.

Add get_rate_matching() for rtl8226b and rtl8221b, reading the serdes
mode from phy.

Driver instances are added for rtl8226b and rtl8221b for Clause 45
access only. The existing code is not touched, they use newly added
functions. They also use the same rtl822xb_config_init() and
rtl822xb_get_rate_matching() as these functions also can be used for
direct Clause 45 access. Also Adds definition of MMC 31 registers,
which cannot be used through C45-over-C22, only when phydev->is_c45
is set.

Change rtlgen_get_speed() so the register value is passed as argument.
Using Clause 45 access, this value is retrieved differently.
Rename it to rtlgen_decode_speed() and add a call to it in
rtl822x_c45_read_status().

Add rtl822x_c45_get_features() to set supported port for rtl8221b.

Then 1 quirk is added for sfp modules known to have a rtl8221b
behind RollBall, Clause 45 only, protocol.

Changed in PATCH v4:
* Changed switch to if statement in rtl822xb_get_rate_matching()
* Removed setting ETHTOOL_LINK_MODE_MII_BIT in rtl822x_c45_get_features()

Changed in PATCH v3:
* Only apply to rtl8221b and rtl8226b phy's
* Set phydev->rate_matching in .config_init()
* Removed OEM SFP fixup for now, as there are modules with the same
  vendor name/PN, but with different PHY's. We found rtl8221b, but
  also the ty8821, which is not yet supported.

Changed in PATCH v2:
* Set author to Marek for the commit of the new C45 instances
* Separate commit for setting supported ports
* Renamed rtlgen_get_speed to rtlgen_decode_speed
* Always fill in possible interfaces
* Renamed sfp_fixup_oem_2_5g to sfp_fixup_oem_2_5gbaset
* Only update phydev->interface when link is up

Alexander Couzens (1):
  net: phy: realtek: configure SerDes mode for rtl822xb PHYs

Eric Woudstra (3):
  net: phy: realtek: add get_rate_matching() for rtl822xb PHYs
  net: phy: realtek: Change rtlgen_get_speed() to rtlgen_decode_speed()
  net: phy: realtek: add rtl822x_c45_get_features() to set supported
    port
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agonet: sfp: add quirk for another multigig RollBall transceiver
Marek Behún [Tue, 9 Apr 2024 07:30:16 +0000 (09:30 +0200)]
net: sfp: add quirk for another multigig RollBall transceiver

Add quirk for another RollBall copper transceiver: Turris RTSFP-2.5G,
containing 2.5g capable RTL8221B PHY.

Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agonet: phy: realtek: add rtl822x_c45_get_features() to set supported port
Eric Woudstra [Tue, 9 Apr 2024 07:30:15 +0000 (09:30 +0200)]
net: phy: realtek: add rtl822x_c45_get_features() to set supported port

Sets ETHTOOL_LINK_MODE_TP_BIT in phydev->supported.

Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agonet: phy: realtek: Change rtlgen_get_speed() to rtlgen_decode_speed()
Eric Woudstra [Tue, 9 Apr 2024 07:30:14 +0000 (09:30 +0200)]
net: phy: realtek: Change rtlgen_get_speed() to rtlgen_decode_speed()

The value of the register to determine the speed, is retrieved
differently when using Clause 45 only. To use the rtlgen_get_speed()
function in this case, pass the value of the register as argument to
rtlgen_get_speed(). The function would then always return 0, so change it
to void. A better name for this function now is rtlgen_decode_speed().

Replace a call to genphy_read_status() followed by rtlgen_get_speed()
with a call to rtlgen_read_status() in rtl822x_read_status().

Add reading speed to rtl822x_c45_read_status().

Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 months agonet: phy: realtek: Add driver instances for rtl8221b via Clause 45
Marek Behún [Tue, 9 Apr 2024 07:30:13 +0000 (09:30 +0200)]
net: phy: realtek: Add driver instances for rtl8221b via Clause 45

Collected from several commits in [PATCH net-next]
"Realtek RTL822x PHY rework to c45 and SerDes interface switching"

The instances are used by Clause 45 only accessible PHY's on several sfp
modules, which are using RollBall protocol.

Signed-off-by: Marek Behún <kabel@kernel.org>
[ Added matching functions to differentiate C45 instances ]
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>