git.kernel.dk Git - linux-block.git/log

Merge branch 'qca8k-cleanup-and-port-isolation'

Matthias Schiffer says:

====================
net: dsa: qca8k: cleanup and port isolation

A small cleanup patch, and basically the same changes that were just
accepted for mt7530 to implement port isolation.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: qca8k: add support for bridge port isolation

Remove a pair of ports from the port matrix when both ports have the
isolated flag set.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: qca8k: factor out bridge join/leave logic

Most of the logic in qca8k_port_bridge_join() and qca8k_port_bridge_leave()
is the same. Refactor to reduce duplication and prepare for reusing the
code for implementing bridge port isolation.

dsa_port_offloads_bridge_dev() is used instead of
dsa_port_offloads_bridge(), passing the bridge in as a struct netdevice *,
as we won't have a struct dsa_bridge in qca8k_port_bridge_flags().

The error handling is changed slightly in the bridge leave case,
returning early and emitting an error message when a regmap access fails.
This shouldn't matter in practice, as there isn't much we can do if
communication with the switch breaks down in the middle of reconfiguration.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: qca8k: do not write port mask twice in bridge join/leave

qca8k_port_bridge_join() set QCA8K_PORT_LOOKUP_CTRL() for i == port twice,
once in the loop handling all other port's masks, and finally at the end
with the accumulated port_mask.

The first time it would incorrectly set the port's own bit in the mask,
only to correct the mistake a moment later. qca8k_port_bridge_leave() had
the same issue, but here the regmap_clear_bits() was a no-op rather than
setting an unintended value.

Remove the duplicate assignment by skipping the whole loop iteration for
i == port. The unintended bit setting doesn't seem to have any negative
effects (even when not reverted right away), so the change is submitted
as a simple cleanup rather than a fix.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-mscc-miim-switch-reset'

Herve Codina says:

====================
Handle switch reset in mscc-miim

These two patches were previously sent as part of a bigger series:
  https://lore.kernel.org/lkml/20240527161450.326615-1-herve.codina@bootlin.com/

v1 and v2 iterations were handled during the v1 and v2 reviews of this
bigger series. As theses two patches are now ready to be applied, they
were extracted from the bigger series and sent alone in this current
series.

This current v3 series takes into account feedback received during the
bigger series v2 review.

Changes v2 -> v3
  - patch 1
    Drop one useless sentence.
    Add 'Reviewed-by: Andrew Lunn <andrew@lunn.ch>'
    Add 'Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>'

  - patch 2
   Add 'Reviewed-by: Andrew Lunn <andrew@lunn.ch>'

Changes v1 -> v2 (as part of the bigger series iterations)
  - Patch 1
    Improve the reset property description

  - Patch 2
    Fix a wrong reverse x-mass tree declaration
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: mdio: mscc-miim: Handle the switch reset

The mscc-miim device can be impacted by the switch reset, at least when
this device is part of the LAN966x PCI device.

Handle this newly added (optional) resets property.

Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: mscc-miim: Add resets property

Add the (optional) resets property.
The mscc-miim device is impacted by the switch reset especially when the
mscc-miim device is used as part of the LAN966x PCI device.

Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'l2tp-sk_user_data'

James Chapman says:

====================
l2tp: don't use the tunnel socket's sk_user_data in datapath

This series refactors l2tp to not use the tunnel socket's sk_user_data
in the datapath. The main reasons for doing this are

* to allow for simplifying internal socket cleanup code (to be done
   in a later series)
* to support multiple L2TPv3 UDP tunnels using the same 5-tuple
   address

When handling received UDP frames, l2tp's current approach is to look
up a session in a per-tunnel list. l2tp uses the tunnel socket's
sk_user_data to derive the tunnel context from the receiving socket.

But this results in the socket and tunnel lifetimes being very tightly
coupled and the tunnel/socket cleanup paths being complicated. The
latter has historically been a source of l2tp bugs and makes the code
more difficult to maintain. Also, if sockets are aliased, we can't
trust that the socket's sk_user_data references the right tunnel
anyway. Hence the desire to not use sk_user_data in the datapath.

The new approach is to lookup sessions in a per-net session list
without first deriving the tunnel:

* For L2TPv2, the l2tp header has separate tunnel ID and session ID
   fields which can be trivially combined to make a unique 32-bit key
   for per-net session lookup.

* For L2TPv3, there is no tunnel ID in the packet header, only a
   session ID, which should be unique over all tunnels so can be used
   as a key for per-net session lookup. However, this only works when
   the L2TPv3 session ID really is unique over all tunnels. At least
   one L2TPv3 application is known to use the same session ID in
   different L2TPv3 UDP tunnels, relying on UDP address/ports to
   distinguish them. This worked previously because sessions in UDP
   tunnels were kept in a per-tunnel list. To retain support for this,
   L2TPv3 session ID collisions are managed using a separate per-net
   session hlist, keyed by ID and sk. When looking up a session by ID,
   if there's more than one match, sk is used to find the right one.

L2TPv3 sessions in IP-encap tunnels are already looked up by session
ID in a per-net list. This work has UDP sessions also use the per-net
session list, while allowing for session ID collisions. The existing
per-tunnel hlist becomes a plain list since it is used only in
management and cleanup paths to walk a list of sessions in a given
tunnel.

For better performance, the per-net session lists use IDR. Separate
IDRs are used for L2TPv2 and L2TPv3 sessions to avoid potential key
collisions.

These changes pass l2tp regression tests and improve data forwarding
performance by about 10% in some of my test setups.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: replace hlist with simple list for per-tunnel session list

The per-tunnel session list is no longer used by the
datapath. However, we still need a list of sessions in the tunnel for
l2tp_session_get_nth, which is used by management code. (An
alternative might be to walk each session IDR list, matching only
sessions of a given tunnel.)

Replace the per-tunnel hlist with a per-tunnel list. In functions
which walk a list of sessions of a tunnel, walk this list instead.

Signed-off-by: James Chapman <jchapman@katalix.com>
Reviewed-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: drop the now unused l2tp_tunnel_get_session

All users of l2tp_tunnel_get_session are now gone so it can be
removed.

Signed-off-by: James Chapman <jchapman@katalix.com>
Reviewed-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: use IDR for all session lookups

Add generic session getter which uses IDR. Replace all users of
l2tp_tunnel_get_session which uses the per-tunnel session list to use
the generic getter.

Signed-off-by: James Chapman <jchapman@katalix.com>
Reviewed-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: don't use sk_user_data in l2tp_udp_encap_err_recv

If UDP sockets are aliased, sk might be the wrong socket. There's no
benefit to using sk_user_data to do some checks on the associated
tunnel context. Just report the error anyway, like udp core does.

Signed-off-by: James Chapman <jchapman@katalix.com>
Reviewed-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: refactor udp recv to lookup to not use sk_user_data

Modify UDP decap to not use the tunnel pointer which comes from the
sock's sk_user_data when parsing the L2TP header. By looking up the
destination session using only the packet contents we avoid potential
UDP 5-tuple aliasing issues which arise from depending on the socket
that received the packet.

Drop the useless error messages on short packet or on failing to find
a session since the tunnel pointer might point to a different tunnel
if multiple sockets use the same 5-tuple.

Short packets (those not big enough to contain an L2TP header) are no
longer counted in the tunnel's invalid counter because we can't derive
the tunnel until we parse the l2tp header to lookup the session.

l2tp_udp_encap_recv was a small wrapper around l2tp_udp_recv_core which
used sk_user_data to derive a tunnel pointer in an RCU-safe way. But
we no longer need the tunnel pointer, so remove that code and combine
the two functions.

Signed-off-by: James Chapman <jchapman@katalix.com>
Reviewed-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: store l2tpv2 sessions in per-net IDR

L2TPv2 sessions are currently kept in a per-tunnel hashlist, keyed by
16-bit session_id. When handling received L2TPv2 packets, we need to
first derive the tunnel using the 16-bit tunnel_id or sock, then
lookup the session in a per-tunnel hlist using the 16-bit session_id.

We want to avoid using sk_user_data in the datapath and double lookups
on every packet. So instead, use a per-net IDR to hold L2TPv2
sessions, keyed by a 32-bit value derived from the 16-bit tunnel_id
and session_id. This will allow the L2TPv2 UDP receive datapath to
lookup a session with a single lookup without deriving the tunnel
first.

L2TPv2 sessions are held in their own IDR to avoid potential
key collisions with L2TPv3 sessions.

Signed-off-by: James Chapman <jchapman@katalix.com>
Reviewed-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: store l2tpv3 sessions in per-net IDR

L2TPv3 sessions are currently held in one of two fixed-size hash
lists: either a per-net hashlist (IP-encap), or a per-tunnel hashlist
(UDP-encap), keyed by the L2TPv3 32-bit session_id.

In order to lookup L2TPv3 sessions in UDP-encap tunnels efficiently
without finding the tunnel first via sk_user_data, UDP sessions are
now kept in a per-net session list, keyed by session ID. Convert the
existing per-net hashlist to use an IDR for better performance when
there are many sessions and have L2TPv3 UDP sessions use the same IDR.

Although the L2TPv3 RFC states that the session ID alone identifies
the session, our implementation has allowed the same session ID to be
used in different L2TP UDP tunnels. To retain support for this, a new
per-net session hashtable is used, keyed by the sock and session
ID. If on creating a new session, a session already exists with that
ID in the IDR, the colliding sessions are added to the new hashtable
and the existing IDR entry is flagged. When looking up sessions, the
approach is to first check the IDR and if no unflagged match is found,
check the new hashtable. The sock is made available to session getters
where session ID collisions are to be considered. In this way, the new
hashtable is used only for session ID collisions so can be kept small.

For managing session removal, we need a list of colliding sessions
matching a given ID in order to update or remove the IDR entry of the
ID. This is necessary to detect session ID collisions when future
sessions are created. The list head is allocated on first collision
of a given ID and refcounted.

Signed-off-by: James Chapman <jchapman@katalix.com>
Reviewed-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: remove unused list_head member in l2tp_tunnel

Remove an unused variable in struct l2tp_tunnel which was left behind
by commit c4d48a58f32c5 ("l2tp: convert l2tp_tunnel_list to idr").

Signed-off-by: James Chapman <jchapman@katalix.com>
Reviewed-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-pf: Add ucast filter count configurability via devlink.

The existing method of reserving unicast filter count leads to wasted
MCAM entries if the functionality is not used or fewer entries are used.
Furthermore, the amount of MCAM entries differs amongst Octeon SoCs.
We implemented a means to adjust the UC filter count via devlink,
allowing for better use of MCAM entries across Netdev apps.

commands:

To get the current unicast filter count
# devlink dev param show pci/0002:02:00.0 name unicast_filter_count

To change/set the unicast filter count
# devlink dev param set pci/0002:02:00.0 name unicast_filter_count
value 5 cmode runtime

Signed-off-by: Sai Krishna <saikrishnag@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

docs: net: document guidance of implementing the SR-IOV NDOs

New drivers were prevented from adding ndo_set_vf_* callbacks
over the last few years. This was expected to result in broader
switchdev adoption, but seems to have had little effect.

Based on recent netdev meeting there is broad support for allowing
adding those ops.

There is a problem with the current API supporting a limited number
of VFs (100+, which is less than some modern HW supports).
We can try to solve it by adding similar functionality on devlink
ports, but that'd be another API variation to maintain.
So a netlink attribute reshuffling is a more likely outcome.

Document the guidance, make it clear that the API is frozen.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: ksz_common: Allow only up to two HSR HW offloaded ports for KSZ9477

The KSZ9477 allows HSR in-HW offloading for any of two selected ports.
This patch adds check if one tries to use more than two ports with
HSR offloading enabled.

The problem is with RedBox configuration (HSR-SAN) - when configuring:
ip link add name hsr0 type hsr slave1 lan1 slave2 lan2 interlink lan3 \
supervision 45 version 1

The lan1 (port0) and lan2 (port1) are correctly configured as ports, which
can use HSR offloading on ksz9477.

However, when we do already have two bits set in hsr_ports, we need to
return (-ENOTSUPP), so the interlink port (lan3) would be used with
SW based HSR RedBox support.

Otherwise, I do see some strange network behavior, as some HSR frames are
visible on non-HSR network and vice versa.

This causes the switch connected to interlink port (lan3) to drop frames
and no communication is possible.

Moreover, conceptually - the interlink (i.e. HSR-SAN port - lan3/port2)
shall be only supported in software as it is also possible to use ksz9477
with only SW based HSR (i.e. port0/1 -> hsr0 with offloading, port2 ->
HSR-SAN/interlink, port4/5 -> hsr1 with SW based HSR).

Fixes: 5055cccfc2d1 ("net: hsr: Provide RedBox support (HSR-SAN)")
Signed-off-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: Fix FEC_ECR_EN1588 being cleared on link-down

FEC_ECR_EN1588 bit gets cleared after MAC reset in `fec_stop()`, which
makes all 1588 functionality shut down, and all the extended registers
disappear, on link-down, making the adapter fall back to compatibility
"dumb mode". However, some functionality needs to be retained (e.g. PPS)
even without link.

Fixes: 6605b730c061 ("FEC: Add time stamping code and a PTP hardware clock")
Cc: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/netdev/5fa9fadc-a89d-467a-aae9-c65469ff5fe1@lunn.ch/
Signed-off-by: Csókás, Bence <csokas.bence@prolan.hu>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bnxt_en-netdev_queue_mgmt_ops'

David Wei says:

====================
bnxt_en: implement netdev_queue_mgmt_ops

Implement netdev_queue_mgmt_ops for bnxt added in [1]. This will be used
in the io_uring ZC Rx patchset to configure queues with a custom page
pool w/ a special memory provider for zero copy support.

The first two patches prep the driver, while the final patch adds the
implementation.

Any arbitrary Rx queue can be reset without affecting other queues. V2
and prior of this patchset was thought to only support resetting queues
not in the main RSS context. Upon further testing I realised moving
queues out and calling bnxt_hwrm_vnic_update() wasn't necessary.

I didn't include the netdev core API using this netdev_queue_mgmt_ops
because Mina is adding it in his devmem TCP series [2]. But I'm happy to
include it if folks want to include a user with this series.

I tested this series on BCM957504-N1100FY4 with FW 229.1.123.0. I
manually injected failures at all the places that can return an errno
and confirmed that the device/queue is never left in a broken state.

[1]: https://lore.kernel.org/netdev/20240501232549.1327174-2-shailend@google.com/
[2]: https://lore.kernel.org/netdev/20240607005127.3078656-2-almasrymina@google.com/

v3:
- tested w/o bnxt_hwrm_vnic_update() and it works on any queue
- removed unneeded code

v2:
- fix broken build
- remove unused var in bnxt_init_one_rx_ring()
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: implement netdev_queue_mgmt_ops

Implement netdev_queue_mgmt_ops for bnxt added in [1].

Two bnxt_rx_ring_info structs are allocated to hold the new/old queue
memory. Queue memory is copied from/to the main bp->rx_ring[idx]
bnxt_rx_ring_info.

Queue memory is pre-allocated in bnxt_queue_mem_alloc() into a clone,
and then copied into bp->rx_ring[idx] in bnxt_queue_mem_start().

Similarly, when bp->rx_ring[idx] is stopped its queue memory is copied
into a clone, and then freed later in bnxt_queue_mem_free().

I tested this patchset with netdev_rx_queue_restart(), including
inducing errors in all places that returns an error code. In all cases,
the queue is left in a good working state.

Rx queues are created/destroyed using bnxt_hwrm_rx_ring_alloc() and
bnxt_hwrm_rx_ring_free(), which issue HWRM_RING_ALLOC and HWRM_RING_FREE
commands respectively to the firmware. By the time a HWRM_RING_FREE
response is received, there won't be any more completions from that
queue.

Thanks to Somnath for helping me with this patch. With their permission
I've added them as Acked-by.

[1]: https://lore.kernel.org/netdev/20240501232549.1327174-2-shailend@google.com/

Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: split rx ring helpers out from ring helpers

To prepare for queue API implementation, split rx ring functions out
from ring helpers. These new helpers will be called from queue API
implementation.

Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-cleanup-arc-emac'

Johan Jonker says:

====================
cleanup arc emac

The Rockchip emac binding for rk3036/rk3066/rk3188 has been converted to YAML
with the ethernet-phy node in a mdio node. This requires some driver fixes
by someone that can do hardware testing.

In order to make a future fix easier make the driver 'Rockchip only'
by removing the obsolete part of the arc emac driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: remove arc_emac.txt

The last real user nSIM_700 of the "snps,arc-emac" compatible string in
a driver was removed in 2019. The use of this string in the combined DT of
rk3066a/rk3188 as place holder has also been replaced, so
remove arc_emac.txt

Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: arc: remove emac_arc driver

The last real user nSIM_700 of the "snps,arc-emac" compatible string in
a driver was removed in 2019. The use of this string in the combined DT of
rk3066a/rk3188 as place holder has also been replaced, so
remove emac_arc.c to clean up some code.

Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ARM: dts: rockchip: rk3xxx: fix emac node

In the combined DT of rk3066a/rk3188 the emac node uses as place holder
the compatible string "snps,arc-emac". The last real user nSIM_700
of the "snps,arc-emac" compatible string in a driver was removed in 2019.
Rockchip emac nodes don't make use of this common fall back string.
In order to removed unused driver code replace this string with
"rockchip,rk3066-emac".
As we are there remove the blank lines and sort.

Signed-off-by: Johan Jonker <jbx6244@gmail.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dt-bindings-net-convert-fsl-fman-related-file-to-yaml-format'

Frank Li says:

====================
dt-bindings: net: Convert fsl,fman related file to yaml format

Passed dt_binding_check
Run dt_binding_check: fsl,fman-mdio.yaml
  SCHEMA  Documentation/devicetree/bindings/processed-schema.json
  CHKDT   Documentation/devicetree/bindings
  LINT    Documentation/devicetree/bindings
  DTEX    Documentation/devicetree/bindings/net/fsl,fman-mdio.example.dts
  DTC_CHK Documentation/devicetree/bindings/net/fsl,fman-mdio.example.dtb
Run dt_binding_check: fsl,fman-muram.yaml
  SCHEMA  Documentation/devicetree/bindings/processed-schema.json
  CHKDT   Documentation/devicetree/bindings
  LINT    Documentation/devicetree/bindings
  DTEX    Documentation/devicetree/bindings/net/fsl,fman-muram.example.dts
  DTC_CHK Documentation/devicetree/bindings/net/fsl,fman-muram.example.dtb
Run dt_binding_check: fsl,fman-port.yaml
  SCHEMA  Documentation/devicetree/bindings/processed-schema.json
  CHKDT   Documentation/devicetree/bindings
  LINT    Documentation/devicetree/bindings
  DTEX    Documentation/devicetree/bindings/net/fsl,fman-port.example.dts
  DTC_CHK Documentation/devicetree/bindings/net/fsl,fman-port.example.dtb
Run dt_binding_check: fsl,fman.yaml
  SCHEMA  Documentation/devicetree/bindings/processed-schema.json
  CHKDT   Documentation/devicetree/bindings
  LINT    Documentation/devicetree/bindings
  DTEX    Documentation/devicetree/bindings/net/fsl,fman.example.dts
  DTC_CHK Documentation/devicetree/bindings/net/fsl,fman.example.dtb
Run dt_binding_check: ptp-qoriq.yaml
  SCHEMA  Documentation/devicetree/bindings/processed-schema.json
  CHKDT   Documentation/devicetree/bindings
  LINT    Documentation/devicetree/bindings
  DTEX    Documentation/devicetree/bindings/ptp/ptp-qoriq.example.dts
  DTC_CHK Documentation/devicetree/bindings/ptp/ptp-qoriq.example.dtb

v1: https://lore.kernel.org/r/20240614-ls_fman-v1-0-cb33c96dc799@nxp.com
====================

Link: https://patch.msgid.link/20240618-ls_fman-v2-0-f00a82623d8e@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: Convert fsl-fman to yaml

Convert fsl-fman from txt to yaml format and split it fsl,fman.yam,
fsl,fman-port.yaml, fsl-muram.yaml, fsl-mdio.yaml.

Addition changes:
fsl,fman.yaml:
  - Fixed interrupts in example.
  - Fixed ethernet@e8000 miss } in example.
  - ptp-timer add label in example.
  - Ref to new fsl,fman*.yaml.
  - Reorder property in example.
  - Keep only one example.
  - Add const for #address-cells and #size-cells.
  - Use defined interrupt type.
  - ptp example use node name phc.

fsl,fman-port:
  - Keep only one example.

fsl,fman-mdio:
  - Add little-endian property.
  - Add ref to mdio.yaml.
  - Remove suppress-preamble.
  - Add #address-cells and #size-cells in example.
  - Remove clock-frequency, which already describe in mmio.yaml.

fsl,muram.yaml:
  - Add reg property.
  - Remove range property.
  - Use reg instead of range in example.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20240618-ls_fman-v2-2-f00a82623d8e@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: ptp: Convert ptp-qoirq to yaml format

Convert ptp-qoirq from txt to yaml format.

Additional change:
- Fixed example interrupts proptery. Need only 1 irq by check MPC8313 spec.
- Move Reference clock context under clk,sel.
- Interrupts is not required property.
- Use low case for hex value.
- Check reference manual of MPC8313, p1010 and so on, which dts use more
than 1 irqs. Only 1 irq for each ptp device. Check driver code
(drivers/ptp/ptp_qoriq.c) and only 1 irq used. So original description is
wrong.
- Remove comments for compatible string.

Signed-off-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20240618-ls_fman-v2-1-f00a82623d8e@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

wifi: mt76: un-embedd netdev from mt76_dev

Embedding net_device into structures prohibits the usage of flexible
arrays in the net_device structure. For more details, see the discussion
at [1].

Un-embed the net_devices from struct mt76_dev by converting them
into pointers, and allocating them dynamically. Use the leverage
alloc_netdev_dummy() to allocate the net_device object at
mt76_dma_init().

The free of the device occurs at mt76_dma_cleanup().

Link: https://lore.kernel.org/all/20240229225910.79e224cf@kernel.org/
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Kalle Valo <kvalo@kernel.org>
Link: https://patch.msgid.link/20240619105311.3144908-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: unexport stmmac_pltfr_init/exit()

These functions are only used within the compilation unit they're defined
in so there's no reason to export them.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240619140119.26777-1-brgl@bgdev.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nfc: Drop explicit initialization of struct i2c_device_id::driver_data to 0

These drivers don't use the driver_data member of struct i2c_device_id,
so don't explicitly initialize this member.

This prepares putting driver_data in an anonymous union which requires
either no initialization or named designators. But it's also a nice
cleanup on its own.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20240619195631.2545407-2-u.kleine-koenig@baylibre.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge git://git./linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/ethernet/broadcom/bnxt/bnxt.c
1e7962114c10 ("bnxt_en: Restore PTP tx_avail count in case of skb_pad() error")
165f87691a89 ("bnxt_en: add timestamping statistics support")

No adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'net-6.10-rc5' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless, bpf and netfilter.

  Happy summer solstice! The line count is a bit inflated by a selftest
  and update to a driver's FW interface header, in reality this is
  slightly below average for us. We are expecting one driver fix from
  Intel, but there are no big known issues.

  Current release - regressions:

   - ipv6: bring NLM_DONE out to a separate recv() again

  Current release - new code bugs:

   - wifi: cfg80211: wext: set ssids=NULL for passive scans via old wext API

  Previous releases - regressions:

   - wifi: mac80211: fix monitor channel setting with chanctx emulation
     (probably most awaited of the fixes in this PR, tracked by Thorsten)

   - usb: ax88179_178a: bring back reset on init, if PHY is disconnected

   - bpf: fix UML x86_64 compile failure with BPF

   - bpf: avoid splat in pskb_pull_reason(), sanity check added can be hit
     with malicious BPF

   - eth: mvpp2: use slab_build_skb() for packets in slab, driver was
     missed during API refactoring

   - wifi: iwlwifi: add missing unlock of mvm mutex

  Previous releases - always broken:

   - ipv6: add a number of missing null-checks for in6_dev_get(), in case
     IPv6 disabling races with the datapath

   - bpf: fix reg_set_min_max corruption of fake_reg

   - sched: act_ct: add netns as part of the key of tcf_ct_flow_table"

* tag 'net-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (63 commits)
  net: usb: rtl8150 fix unintiatilzed variables in rtl8150_get_link_ksettings
  selftests: virtio_net: add forgotten config options
  bnxt_en: Restore PTP tx_avail count in case of skb_pad() error
  bnxt_en: Set TSO max segs on devices with limits
  bnxt_en: Update firmware interface to 1.10.3.44
  net: stmmac: Assign configured channel value to EXTTS event
  net: do not leave a dangling sk pointer, when socket creation fails
  net/tcp_ao: Don't leak ao_info on error-path
  ice: Fix VSI list rule with ICE_SW_LKUP_LAST type
  ipv6: bring NLM_DONE out to a separate recv() again
  selftests: add selftest for the SRv6 End.DX6 behavior with netfilter
  selftests: add selftest for the SRv6 End.DX4 behavior with netfilter
  netfilter: move the sysctl nf_hooks_lwtunnel into the netfilter core
  seg6: fix parameter passing when calling NF_HOOK() in End.DX4 and End.DX6 behaviors
  netfilter: ipset: Fix suspicious rcu_dereference_protected()
  selftests: openvswitch: Set value to nla flags.
  octeontx2-pf: Fix linking objects into multiple modules
  octeontx2-pf: Add error handling to VLAN unoffload handling
  virtio_net: fixing XDP for fully checksummed packets handling
  virtio_net: checksum offloading handling fix
  ...

Merge tag 'sound-6.10-rc5' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Lots of small HD-audio quirks and fixes (mostly Realtek codec and
  Cirrus stuff).

  Also a small MIDI 2.0 fix and a fix for missing module description
  are included"

* tag 'sound-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda: cs35l56: Select SERIAL_MULTI_INSTANTIATE
  ALSA: hda/realtek: Add more codec ID to no shutup pins list
  sound/oss/dmasound: add missing MODULE_DESCRIPTION() macro
  ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7 14ARP8
  ALSA: hda/realtek: Enable headset mic on IdeaPad 330-17IKB 81DM
  ALSA: hda: tas2781: Component should be unbound before deconstruction
  ALSA: hda: cs35l41: Component should be unbound before deconstruction
  ALSA: hda: cs35l56: Component should be unbound before deconstruction
  ALSA/hda: intel-dsp-config: Document AVS as dsp_driver option
  ALSA: hda/realtek: Support Lenovo Thinkbook 13x Gen 4
  ALSA: hda/realtek: Support Lenovo Thinkbook 16P Gen 5
  ALSA: hda: cs35l41: Support Lenovo Thinkbook 13x Gen 4
  ALSA: hda: cs35l41: Support Lenovo Thinkbook 16P Gen 5
  ALSA: hda/realtek: Remove Framework Laptop 16 from quirks
  ALSA: hda/realtek: Limit mic boost on N14AP7
  ALSA: hda/realtek: fix mute/micmute LEDs don't work for ProBook 445/465 G11.
  ALSA: seq: ump: Fix missing System Reset message handling
  ALSA: hda: cs35l41: Possible null pointer dereference in cs35l41_hda_unbind()
  ALSA: hda: cs35l56: Fix lifecycle of codec pointer

Merge tag 'mfd-fixes-6.10' of git://git./linux/kernel/git/lee/mfd

Pull mfd fix from Lee Jones:

- Fix AXP717 PMIC probe and by extension its consumers

* tag 'mfd-fixes-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
mfd: axp20x: AXP717: Fix missing IRQ status registers range

net: usb: rtl8150 fix unintiatilzed variables in rtl8150_get_link_ksettings

This functions retrieves values by passing a pointer. As the function
that retrieves them can fail before touching the pointers, the variables
must be initialized.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+5186630949e3c55f0799@syzkaller.appspotmail.com
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Link: https://lore.kernel.org/r/20240619132816.11526-1-oneukum@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: virtio_net: add forgotten config options

One may use tools/testing/selftests/drivers/net/virtio_net/config
for example for vng build command like this one:
$ vng -v -b -f tools/testing/selftests/drivers/net/virtio_net/config

In that case, the needed kernel config options are not turned on.
Add the missed kernel config options.

Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20240617072614.75fe79e7@kernel.org/
Reported-by: Matthieu Baerts <matttbe@kernel.org>
Closes: https://lore.kernel.org/netdev/1a63f209-b1d4-4809-bc30-295a5cafa296@kernel.org/
Fixes: ccfaed04db5e ("selftests: virtio_net: add initial tests")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20240619061748.1869404-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'bnxt_en-bug-fixes-for-net'

Michael Chan says:

====================
bnxt_en: Bug fixes for net

The first firmware interface update is needed by the second patch to
limit the number of TSO segments on the 5760X chips. The third patch
fixes the TX error path for PTP packets.
====================

Link: https://lore.kernel.org/r/20240618215313.29631-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt_en: Restore PTP tx_avail count in case of skb_pad() error

The current code only restores PTP tx_avail count when we get DMA
mapping errors. Fix it so that the PTP tx_avail count will be
restored for both DMA mapping errors and skb_pad() errors.
Otherwise PTP TX timestamp will not be available after a PTP
packet hits the skb_pad() error.

Fixes: 83bb623c968e ("bnxt_en: Transmit and retrieve packet timestamps")
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240618215313.29631-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt_en: Set TSO max segs on devices with limits

Firmware will now advertise a non-zero TSO max segments if the
device has a limit. 0 means no limit. The latest 5760X chip
(early revs) has a limit of 2047 that cannot be exceeded. If
exceeded, the chip will send out just a small number of segments.

Call netif_set_tso_max_segs() if the device has a limit.

Fixes: 2012a6abc876 ("bnxt_en: Add 5760X (P7) PCI IDs")
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20240618215313.29631-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt_en: Update firmware interface to 1.10.3.44

The relevant change is the max_tso_segs value returned by firmware
in the HWRM_FUNC_QCAPS response. This value will be used in the next
patch to cap the TSO segments.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20240618215313.29631-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

igb: Add MII write support

To facilitate running PHY parametric tests, add support for the SIOCSMIIREG
ioctl. This allows a userspace application to write to the PHY registers
to enable the test modes.

Signed-off-by: Jackie Jone <jackie.jone@alliedtelesis.co.nz>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20240618213330.982046-1-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'add-flow-director-for-txgbe'

Jiawen Wu says:

====================
add flow director for txgbe

Add flow director support for Wangxun 10Gb NICs.

v2 -> v3: https://lore.kernel.org/all/20240605020852.24144-1-jiawenwu@trustnetic.com/
- Wrap the code at 80 chars where possible. (Jakub Kicinski)
- Add function description address on kernel-doc. (Jakub Kicinski)
- Correct return code. (Simon Horman)
- Remove redundant size check. (Hariprasad Kelam)

v1 -> v2: https://lore.kernel.org/all/20240529093821.27108-1-jiawenwu@trustnetic.com/
- Fix build warnings reported by kernel test robot.
====================

Link: https://lore.kernel.org/r/20240618101609.3580-1-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: txgbe: add FDIR info to ethtool ops

Add flow director filter match and miss statistics to ethtool -S.
And change the number of queues when using flow director for ehtool -l.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: txgbe: support Flow Director perfect filters

Support the addition and deletion of Flow Director filters.

Supported fields: src-ip, dst-ip, src-port, dst-port
Supported flow-types: tcp4, udp4, sctp4, ipv4

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: txgbe: add FDIR ATR support

Add flow director ATR filter. ATR mode is enabled by default to filter
TCP packets.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: stmmac: Assign configured channel value to EXTTS event

Assign the configured channel value to the EXTTS event in the timestamp
interrupt handler. Without assigning the correct channel, applications
like ts2phc will refuse to accept the event, resulting in errors such
as:
...
ts2phc[656.834]: config item end1.ts2phc.pin_index is 0
ts2phc[656.834]: config item end1.ts2phc.channel is 3
ts2phc[656.834]: config item end1.ts2phc.extts_polarity is 2
ts2phc[656.834]: config item end1.ts2phc.extts_correction is 0
...
ts2phc[656.862]: extts on unexpected channel
ts2phc[658.141]: extts on unexpected channel
ts2phc[659.140]: extts on unexpected channel

Fixes: f4da56529da60 ("net: stmmac: Add support for external trigger timestamping")
Cc: stable@vger.kernel.org
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Link: https://lore.kernel.org/r/20240618073821.619751-1-o.rempel@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'nf-24-06-19' of git://git./linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

Patch #1 fixes the suspicious RCU usage warning that resulted from the
recent fix for the race between namespace cleanup and gc in
ipset left out checking the pernet exit phase when calling
rcu_dereference_protected(), from Jozsef Kadlecsik.

Patch #2 fixes incorrect input and output netdevice in SRv6 prerouting
hooks, from Jianguo Wu.

Patch #3 moves nf_hooks_lwtunnel sysctl toggle to the netfilter core.
The connection tracking system is loaded on-demand, this
ensures availability of this knob regardless.

Patch #4-#5 adds selftests for SRv6 netfilter hooks also from Jianguo Wu.

netfilter pull request 24-06-19

* tag 'nf-24-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  selftests: add selftest for the SRv6 End.DX6 behavior with netfilter
  selftests: add selftest for the SRv6 End.DX4 behavior with netfilter
  netfilter: move the sysctl nf_hooks_lwtunnel into the netfilter core
  seg6: fix parameter passing when calling NF_HOOK() in End.DX4 and End.DX6 behaviors
  netfilter: ipset: Fix suspicious rcu_dereference_protected()
====================

Link: https://lore.kernel.org/r/20240619170537.2846-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: do not leave a dangling sk pointer, when socket creation fails

It is possible to trigger a use-after-free by:
  * attaching an fentry probe to __sock_release() and the probe calling the
    bpf_get_socket_cookie() helper
  * running traceroute -I 1.1.1.1 on a freshly booted VM

A KASAN enabled kernel will log something like below (decoded and stripped):
==================================================================
BUG: KASAN: slab-use-after-free in __sock_gen_cookie (./arch/x86/include/asm/atomic64_64.h:15 ./include/linux/atomic/atomic-arch-fallback.h:2583 ./include/linux/atomic/atomic-instrumented.h:1611 net/core/sock_diag.c:29)
Read of size 8 at addr ffff888007110dd8 by task traceroute/299

CPU: 2 PID: 299 Comm: traceroute Tainted: G            E      6.10.0-rc2+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:117 (discriminator 1))
print_report (mm/kasan/report.c:378 mm/kasan/report.c:488)
? __sock_gen_cookie (./arch/x86/include/asm/atomic64_64.h:15 ./include/linux/atomic/atomic-arch-fallback.h:2583 ./include/linux/atomic/atomic-instrumented.h:1611 net/core/sock_diag.c:29)
kasan_report (mm/kasan/report.c:603)
? __sock_gen_cookie (./arch/x86/include/asm/atomic64_64.h:15 ./include/linux/atomic/atomic-arch-fallback.h:2583 ./include/linux/atomic/atomic-instrumented.h:1611 net/core/sock_diag.c:29)
kasan_check_range (mm/kasan/generic.c:183 mm/kasan/generic.c:189)
__sock_gen_cookie (./arch/x86/include/asm/atomic64_64.h:15 ./include/linux/atomic/atomic-arch-fallback.h:2583 ./include/linux/atomic/atomic-instrumented.h:1611 net/core/sock_diag.c:29)
bpf_get_socket_ptr_cookie (./arch/x86/include/asm/preempt.h:94 ./include/linux/sock_diag.h:42 net/core/filter.c:5094 net/core/filter.c:5092)
bpf_prog_875642cf11f1d139___sock_release+0x6e/0x8e
bpf_trampoline_6442506592+0x47/0xaf
__sock_release (net/socket.c:652)
__sock_create (net/socket.c:1601)
...
Allocated by task 299 on cpu 2 at 78.328492s:
kasan_save_stack (mm/kasan/common.c:48)
kasan_save_track (mm/kasan/common.c:68)
__kasan_slab_alloc (mm/kasan/common.c:312 mm/kasan/common.c:338)
kmem_cache_alloc_noprof (mm/slub.c:3941 mm/slub.c:4000 mm/slub.c:4007)
sk_prot_alloc (net/core/sock.c:2075)
sk_alloc (net/core/sock.c:2134)
inet_create (net/ipv4/af_inet.c:327 net/ipv4/af_inet.c:252)
__sock_create (net/socket.c:1572)
__sys_socket (net/socket.c:1660 net/socket.c:1644 net/socket.c:1706)
__x64_sys_socket (net/socket.c:1718)
do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

Freed by task 299 on cpu 2 at 78.328502s:
kasan_save_stack (mm/kasan/common.c:48)
kasan_save_track (mm/kasan/common.c:68)
kasan_save_free_info (mm/kasan/generic.c:582)
poison_slab_object (mm/kasan/common.c:242)
__kasan_slab_free (mm/kasan/common.c:256)
kmem_cache_free (mm/slub.c:4437 mm/slub.c:4511)
__sk_destruct (net/core/sock.c:2117 net/core/sock.c:2208)
inet_create (net/ipv4/af_inet.c:397 net/ipv4/af_inet.c:252)
__sock_create (net/socket.c:1572)
__sys_socket (net/socket.c:1660 net/socket.c:1644 net/socket.c:1706)
__x64_sys_socket (net/socket.c:1718)
do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

Fix this by clearing the struct socket reference in sk_common_release() to cover
all protocol families create functions, which may already attached the
reference to the sk object with sock_init_data().

Fixes: c5dbb89fc2ac ("bpf: Expose bpf_get_socket_cookie to tracing programs")
Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/netdev/20240613194047.36478-1-kuniyu@amazon.com/T/
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: D. Wythe <alibuda@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240617210205.67311-1-ignat@cloudflare.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ALSA: hda: cs35l56: Select SERIAL_MULTI_INSTANTIATE

The ACPI IDs used in the CS35L56 HDA drivers are all handled by the
serial multi-instantiate driver which starts multiple Linux device
instances from a single ACPI Device() node.

As serial multi-instantiate is not an optional part of the system add it
as a dependency in Kconfig so that it is not overlooked.

Signed-off-by: Simon Trimmer <simont@opensource.cirrus.com>
Link: https://lore.kernel.org/20240619161602.117452-1-simont@opensource.cirrus.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

Merge branch 'ionic-rework-fix-for-doorbell-miss'

Shannon Nelson says:

====================
ionic: rework fix for doorbell miss

A latency test in a scaled out setting (many VMs with many queues)
has uncovered an issue with our missed doorbell fix from
commit b69585bfcece ("ionic: missed doorbell workaround")

As a refresher, the Elba ASIC has an issue where once in a blue
moon it might miss/drop a queue doorbell notification from
the driver.  This can result in Tx timeouts and potential Rx
buffer misses.

The basic problem with the original solution is that
we're delaying things with a timer for every single queue,
periodically using mod_timer() to reset to reset the alarm, and
mod_timer() becomes a more and more expensive thing as there
are more and more VFs and queues each with their own timer.
A ping-pong latency test tends to exacerbate the effect such
that every napi is doing a mod_timer() in every cycle.

An alternative has been worked out to replace this using
periodic workqueue items outside the napi cycle to request a
napi_schedule driven by a single delayed-workqueue per device
rather than a timer for every queue.  Also, now that newer
firmware is actually reporting its ASIC type, we can restrict
this to the appropriate chip.

The testing scenario used 128 VFs in UP state, 16 queues per
VF, and latency tests were done using TCP_RR with adaptive
interrupt coalescing enabled, running on 1 VF.  We would see
99th percentile latencies of up to 900us range, with some max
fliers as much as 4ms.

With these fixes the 99th percentile latencies are typically well
under 50us with the occasional max under 500us.

v1:
https://lore.kernel.org/netdev/20240610230706.34883-1-shannon.nelson@amd.com/
====================

Link: https://lore.kernel.org/r/20240619003257.6138-1-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ionic: Only run the doorbell workaround for certain asic_type

If the doorbell workaround isn't required for a certain
asic_type then there is no need to run the associated
code. Since newer FW versions are finally reporting their
asic_type we can use a flag to determine whether or not to
do the workaround.

Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240619003257.6138-9-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ionic: Use an u16 for rx_copybreak

We only support (u16)-1 size for rx_copybreak, so we can reduce the
field size and move a couple other fields around to save a little
space in the ionic_lif struct.

Before:
/* size: 17440, cachelines: 273, members: 56 */
/* sum members: 17403, holes: 9, sum holes: 37 */
/* last cacheline: 32 bytes */

After:
/* size: 17424, cachelines: 273, members: 56 */
/* sum members: 17401, holes: 7, sum holes: 23 */
/* last cacheline: 16 bytes */

Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240619003257.6138-8-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ionic: check for queue deadline in doorbell_napi_work

Check the deadline against the last time run and only
schedule a new napi if we haven't been run recently.

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240619003257.6138-7-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ionic: add per-queue napi_schedule for doorbell check

Add a work item for each queue that will be run on the queue's
preferred cpu and will schedule another napi. This napi is
run in case the device missed a doorbell and didn't process
a packet. This is a problem for the Elba asic that happens
very rarely.

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240619003257.6138-6-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ionic: add work item for missed-doorbell check

Add the first queued work for checking on the missed doorbell.
This is a delayed work item that reschedules itself every cycle
starting at probe.

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240619003257.6138-5-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ionic: add private workqueue per-device

Instead of using the system's default workqueue, add a private
workqueue for the device to use for its little jobs. This is
to better support the new work items we will be adding in the
next patches for PF and VF specific jobs, without inundating
the system workqueue in a couple of customer cases where our
devices get scaled out to 100-200 VFs.

Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240619003257.6138-4-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ionic: Keep interrupt affinity up to date

Currently the driver either sets the initial interrupt affinity for its
adminq and tx/rx queues on probe or resets it on various
down/up/reconfigure flows. If any user and/or user process
(i.e. irqbalance) changes IRQ affinity for any of the driver's interrupts
that will be reset to driver defaults whenever any down/up/reconfigure
operation happens. This is incorrect and is fixed by making 2 changes:

1. Allocate an array of cpumasks that's only allocated on probe and
destroyed on remove.
2. Update the cpumask(s) for interrupts that are in use by registering
for affinity notifiers.

Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240619003257.6138-3-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ionic: remove missed doorbell per-queue timer

Remove the timer-per-queue mechanics from the missed doorbell
check in preparation for the new missed doorbell fix.

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240619003257.6138-2-shannon.nelson@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mlxsw-use-page-pool-for-rx-buffers-allocation'

Petr Machata says:

====================
mlxsw: Use page pool for Rx buffers allocation

Amit Cohen writes:

After using NAPI to process events from hardware, the next step is to
use page pool for Rx buffers allocation, which is also enhances
performance.

To simplify this change, first use page pool to allocate one continuous
buffer for each packet, later memory consumption can be improved by using
fragmented buffers.

This set significantly enhances mlxsw driver performance, CPU can handle
about 370% of the packets per second it previously handled.

The next planned improvement is using XDP to optimize telemetry.

Patch set overview:
Patches #1-#2 are small preparations for page pool usage
Patch #3 initializes page pool, but do not use it
Patch #4 converts the driver to use page pool for buffers allocations
Patch #5 is an optimization for buffer access
Patch #6 cleans up an unused structure
Patch #7 uses napi_consume_skb() as part of Tx completion
====================

Link: https://lore.kernel.org/r/cover.1718709196.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: pci: Use napi_consume_skb() to free SKB as part of Tx completion

Currently, as part of Tx completion, the driver calls dev_kfree_skb_any()
to free the SKB. For this flow, the correct function is napi_consume_skb().
This function and dev_consume_skb_any() were added to be used for consumed
SKBs, which were not dropped, so the skb:kfree_skb tracepoint is not
triggered, and we can get better diagnostics about dropped packets.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/a9f9f3dc884c0d1be4bd4c9d72030c88c7ac004f.1718709196.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: pci: Do not store SKB for RDQ elements

The previous patch used page pool to allocate buffers for RDQ. With this
change, 'elem_info->u.rdq.skb' is not used anymore, as we do not allocate
SKB before getting the packet, we hold page pointer and build the SKB
around it once packet is received.

Remove the union and store SKB pointer for SDQ only.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/23a531008936dc9a1a298643fb1e4f9a7b8e6eb3.1718709196.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: pci: Optimize data buffer access

Before accessing data buffer, call net_prefetch() to load it into the
cache. This change improves driver performance, CPU can handle about
7.1% more packets per second.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/1fa07c510890866a6f201163ab7e78890ba28b3b.1718709196.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: pci: Use page pool for Rx buffers allocation

As part of driver init, all Rx queues are filled with buffers for
hardware usage. Later, when a packet is received, a new buffer should be
allocated to be used by hardware instead of the received buffer.
Packet's processing time includes allocation time, which can be improved
using page pool.

Using page pool, DMA mapping is done only for first allocation of buffers.
As subsequent buffers allocation avoid DMA mapping, it results in
performance improvement. The purpose of page pool is to allocate pages fast
from cache without locking. This lockless guarantee naturally comes from
running under a NAPI.

Use page pool to allocate the data buffer only, so hardware will use it to
fill the packet. At completion time, attach the data buffer (now filled
with packet payload) to new SKB which is allocated around the received
buffer. SKB building at completion time prevents cache miss for each
packet, as now the SKB is allocated right before packets will be handled by
networking stack.

Page pool for each Rx queue enhances Rx side performance by reclaiming
buffers back to each queue specific pool. This change significantly
improves driver performance, CPU can handle about 345% of the packets per
second it previously handled.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/1cf788a8f43c70aae6d526018ef77becb27ad6d3.1718709196.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: pci: Initialize page pool per CQ

Next patch will use page pool to allocate buffers for RDQ. Initialize
page pool for each CQ, which is mapped 1:1 to RDQ. Page pool for each Rx
queue enhances Rx side performance by reclaiming buffers back to each queue
specific pool.

When only one NAPI instance is the consumer of pages from page pool, it is
recommended to pass it as part of 'page_pool_params', then page pool APIs
will be done without special locks. mlxsw driver holds NAPI instance per
CQ, so add page pool per CQ and use the existing NAPI instance.

For now, pages are not allocated from the pool, next patch will use it.

Some notes regarding 'page_pool_params':
* Use PP_FLAG_DMA_MAP to allow page pool handles DMA mapping, for now
  do not use sync flag, as only the device writes to this memory and we
  read it only when it finishes writing there. This will probably be
  changed when we will support XDP.
* Define 'order' according to maximum MTU and take into account software
  overhead. Some round up are done, which means that we allocate more pages
  than we really need. This can be improved later by using fragmented
  buffers.
* Use pool_size = MLXSW_PCI_WQE_COUNT. This will be the size of 'ptr_ring',
  and should be the maximum amount of packets that page pool will allocate
  memory for. In our case, this is the queue size, defined as
  MLXSW_PCI_WQE_COUNT.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/02e5856ae7c572d4293ce6bb92c286ee6cfec800.1718709196.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: pci: Store CQ pointer as part of RDQ structure

Next patches will add support for page pool in mlxsw driver. Page pool will
be used to allocate buffers for RDQ and will use NAPI instance of the
appropriate CQ (RDQ is mapped 1:1 to CQ).

To allow pool initialization as part of CQ init, when NAPI is initialized,
page_pool structure will be as part of CQ structure. Later, the allocations
for RDQ will be done from the pool in the appropriate CQ. To allow access
to the appropriate pool, set CQ pointer as part of RDQ initialization.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/d60918ca1e142a554af1df9c1152cdac83854a3b.1718709196.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mlxsw: pci: Split NAPI setup/teardown into two steps

mlxsw_pci_cq_napi_setup() includes both NAPI initialization and
enablement, similar to teardown function. Next patches will add support
for page pool in mlxsw driver, then we use NAPI instance for page pool.

Page pool initialization should be done before NAPI enablement, same for
page pool destruction which should be done after NAPI disablement.

As preparation, split NAPI setup/teardown into two steps, then page pool
setup will be done between the phases.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/8dbf37e859f07247498fca17109b8858ff2b0498.1718709196.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: hsr: cosmetic: Remove extra white space

This change just removes extra (i.e. not needed) white space in
prp_drop_frame() function.

No functional changes.

Signed-off-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20240618125817.1111070-1-lukma@denx.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/tcp_ao: Don't leak ao_info on error-path

It seems I introduced it together with TCP_AO_CMDF_AO_REQUIRED, on
version 5 [1] of TCP-AO patches. Quite frustrative that having all these
selftests that I've written, running kmemtest & kcov was always in todo.

[1]: https://lore.kernel.org/netdev/20230215183335.800122-5-dima@arista.com/

Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20240617072451.1403e1d2@kernel.org/
Fixes: 0aadc73995d0 ("net/tcp: Prevent TCP-MD5 with TCP-AO being set")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240619-tcp-ao-required-leak-v1-1-6408f3c94247@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

virtio_net: add support for Byte Queue Limits

Add support for Byte Queue Limits (BQL).

Tested on qemu emulated virtio_net device with 1, 2 and 4 queues.
Tested with fq_codel and pfifo_fast. Super netperf with 50 threads is
running in background. Netperf TCP_RR results:

NOBQL FQC 1q:  159.56  159.33  158.50  154.31    agv: 157.925
NOBQL FQC 2q:  184.64  184.96  174.73  174.15    agv: 179.62
NOBQL FQC 4q:  994.46  441.96  416.50  499.56    agv: 588.12
NOBQL PFF 1q:  148.68  148.92  145.95  149.48    agv: 148.2575
NOBQL PFF 2q:  171.86  171.20  170.42  169.42    agv: 170.725
NOBQL PFF 4q: 1505.23 1137.23 2488.70 3507.99    agv: 2159.7875
  BQL FQC 1q: 1332.80 1297.97 1351.41 1147.57    agv: 1282.4375
  BQL FQC 2q:  768.30  817.72  864.43  974.40    agv: 856.2125
  BQL FQC 4q:  945.66  942.68  878.51  822.82    agv: 897.4175
  BQL PFF 1q:  149.69  151.49  149.40  147.47    agv: 149.5125
  BQL PFF 2q: 2059.32  798.74 1844.12  381.80    agv: 1270.995
  BQL PFF 4q: 1871.98 4420.02 4916.59 13268.16   agv: 6119.1875

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20240618144456.1688998-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: smc9194: add missing MODULE_DESCRIPTION() macro

With ARCH=m68k, make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/smsc/smc9194.o

Add the missing invocation of the MODULE_DESCRIPTION() macro.

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240618-md-m68k-drivers-net-ethernet-smsc-v1-1-ad3d7200421e@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: mac89x0: add missing MODULE_DESCRIPTION() macro

With ARCH=m68k, make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/cirrus/mac89x0.o

Add the missing invocation of the MODULE_DESCRIPTION() macro.

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240618-md-m68k-drivers-net-ethernet-cirrus-v1-1-07f5bd0b64cb@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: amd: add missing MODULE_DESCRIPTION() macros

With ARCH=m68k, make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/amd/a2065.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/amd/ariadne.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/amd/atarilance.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/amd/hplance.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/amd/7990.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/amd/mvme147.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/amd/sun3lance.o

Add the missing invocation of the MODULE_DESCRIPTION() macro to all
files which have a MODULE_LICENSE().

This includes drivers/net/ethernet/amd/lance.c which, although it did
not produce a warning with the m68k allmodconfig configuration, may
cause this warning with other configurations.

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240618-md-m68k-drivers-net-ethernet-amd-v1-1-50ee7a9ad50e@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: arcnet: com20020-isa: add missing MODULE_DESCRIPTION() macro

With ARCH=m68k, make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/arcnet/com20020-isa.o

Add the missing invocation of the MODULE_DESCRIPTION() macro.

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240618-md-m68k-drivers-net-arcnet-v1-1-90e42bc58102@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ice: Fix VSI list rule with ICE_SW_LKUP_LAST type

Adding/updating VSI list rule, as well as allocating/freeing VSI list
resource are called several times with type ICE_SW_LKUP_LAST, which fails
because ice_update_vsi_list_rule() and ice_aq_alloc_free_vsi_list()
consider it invalid. Allow calling these functions with ICE_SW_LKUP_LAST.

This fixes at least one issue in switchdev mode, where the same rule with
different action cannot be added, e.g.:

  tc filter add dev $PF1 ingress protocol arp prio 0 flower skip_sw \
    dst_mac ff:ff:ff:ff:ff:ff action mirred egress redirect dev $VF1_PR
  tc filter add dev $PF1 ingress protocol arp prio 0 flower skip_sw \
    dst_mac ff:ff:ff:ff:ff:ff action mirred egress redirect dev $VF2_PR

Fixes: 0f94570d0cae ("ice: allow adding advanced rules")
Suggested-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20240618210206.981885-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv6: bring NLM_DONE out to a separate recv() again

Commit under Fixes optimized the number of recv() calls
needed during RTM_GETROUTE dumps, but we got multiple
reports of applications hanging on recv() calls.
Applications expect that a route dump will be terminated
with a recv() reading an individual NLM_DONE message.

Coalescing NLM_DONE is perfectly legal in netlink,
but even tho reporters fixed the code in respective
projects, chances are it will take time for those
applications to get updated. So revert to old behavior
(for now)?

This is an IPv6 version of commit 460b0d33cf10 ("inet: bring
NLM_DONE out to a separate recv() again").

Reported-by: Maciej Żenczykowski <zenczykowski@gmail.com>
Link: https://lore.kernel.org/all/CANP3RGc1RG71oPEBXNx_WZFP9AyphJefdO4paczN92n__ds4ow@mail.gmail.com
Reported-by: Stefano Brivio <sbrivio@redhat.com>
Link: https://lore.kernel.org/all/20240315124808.033ff58d@elisabeth
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Link: https://lore.kernel.org/all/02b50aae-f0e9-47a4-8365-a977a85975d3@ovn.org
Fixes: 5fc68320c1fb ("ipv6: remove RTNL protection from inet6_dump_fib()")
Tested-by: Ilya Maximets <i.maximets@ovn.org>
Link: https://lore.kernel.org/r/20240618193914.561782-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'probes-fixes-v6.10-rc4' of git://git./linux/kernel/git/trace/linux-trace

Pull probes fix from Masami Hiramatsu:

- Restrict gen-API tests for synthetic and kprobe events to only be
   built as modules, as they generate dynamic events that cannot be
   removed, causing ftracetest and startup selftests to fail

* tag 'probes-fixes-v6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Build event generation tests only as modules

Merge tag 'mips-fixes_6.10_1' of git://git./linux/kernel/git/mips/linux

Pull MIPS fixes from Thomas Bogendoerfer:

- fix for BCM6538 boards

- fix RB532 PCI workaround

* tag 'mips-fixes_6.10_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  Revert "MIPS: pci: lantiq: restore reset gpio polarity"
  mips: bmips: BCM6358: make sure CBR is correctly set
  MIPS: pci: lantiq: restore reset gpio polarity
  MIPS: Routerboard 532: Fix vendor retry check code

selftests: add selftest for the SRv6 End.DX6 behavior with netfilter

this selftest is designed for evaluating the SRv6 End.DX6 behavior
used with netfilter(rpfilter), in this example, for implementing
IPv6 L3 VPN use cases.

Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

selftests: add selftest for the SRv6 End.DX4 behavior with netfilter

this selftest is designed for evaluating the SRv6 End.DX4 behavior
used with netfilter(rpfilter), in this example, for implementing
IPv4 L3 VPN use cases.

Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: move the sysctl nf_hooks_lwtunnel into the netfilter core

Currently, the sysctl net.netfilter.nf_hooks_lwtunnel depends on the
nf_conntrack module, but the nf_conntrack module is not always loaded.
Therefore, accessing net.netfilter.nf_hooks_lwtunnel may have an error.

Move sysctl nf_hooks_lwtunnel into the netfilter core.

Fixes: 7a3f5b0de364 ("netfilter: add netfilter hooks to SRv6 data plane")
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

seg6: fix parameter passing when calling NF_HOOK() in End.DX4 and End.DX6 behaviors

input_action_end_dx4() and input_action_end_dx6() are called NF_HOOK() for
PREROUTING hook, in PREROUTING hook, we should passing a valid indev,
and a NULL outdev to NF_HOOK(), otherwise may trigger a NULL pointer
dereference, as below:

    [74830.647293] BUG: kernel NULL pointer dereference, address: 0000000000000090
    [74830.655633] #PF: supervisor read access in kernel mode
    [74830.657888] #PF: error_code(0x0000) - not-present page
    [74830.659500] PGD 0 P4D 0
    [74830.660450] Oops: 0000 [#1] PREEMPT SMP PTI
    ...
    [74830.664953] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
    [74830.666569] RIP: 0010:rpfilter_mt+0x44/0x15e [ipt_rpfilter]
    ...
    [74830.689725] Call Trace:
    [74830.690402]  <IRQ>
    [74830.690953]  ? show_trace_log_lvl+0x1c4/0x2df
    [74830.692020]  ? show_trace_log_lvl+0x1c4/0x2df
    [74830.693095]  ? ipt_do_table+0x286/0x710 [ip_tables]
    [74830.694275]  ? __die_body.cold+0x8/0xd
    [74830.695205]  ? page_fault_oops+0xac/0x140
    [74830.696244]  ? exc_page_fault+0x62/0x150
    [74830.697225]  ? asm_exc_page_fault+0x22/0x30
    [74830.698344]  ? rpfilter_mt+0x44/0x15e [ipt_rpfilter]
    [74830.699540]  ipt_do_table+0x286/0x710 [ip_tables]
    [74830.700758]  ? ip6_route_input+0x19d/0x240
    [74830.701752]  nf_hook_slow+0x3f/0xb0
    [74830.702678]  input_action_end_dx4+0x19b/0x1e0
    [74830.703735]  ? input_action_end_t+0xe0/0xe0
    [74830.704734]  seg6_local_input_core+0x2d/0x60
    [74830.705782]  lwtunnel_input+0x5b/0xb0
    [74830.706690]  __netif_receive_skb_one_core+0x63/0xa0
    [74830.707825]  process_backlog+0x99/0x140
    [74830.709538]  __napi_poll+0x2c/0x160
    [74830.710673]  net_rx_action+0x296/0x350
    [74830.711860]  __do_softirq+0xcb/0x2ac
    [74830.713049]  do_softirq+0x63/0x90

input_action_end_dx4() passing a NULL indev to NF_HOOK(), and finally
trigger a NULL dereference in rpfilter_mt()->rpfilter_is_loopback():

    static bool
    rpfilter_is_loopback(const struct sk_buff *skb,
                  const struct net_device *in)
    {
            // in is NULL
            return skb->pkt_type == PACKET_LOOPBACK ||
           in->flags & IFF_LOOPBACK;
    }

Fixes: 7a3f5b0de364 ("netfilter: add netfilter hooks to SRv6 data plane")
Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: ipset: Fix suspicious rcu_dereference_protected()

When destroying all sets, we are either in pernet exit phase or
are executing a "destroy all sets command" from userspace. The latter
was taken into account in ip_set_dereference() (nfnetlink mutex is held),
but the former was not. The patch adds the required check to
rcu_dereference_protected() in ip_set_dereference().

Fixes: 4e7aaa6b82d6 ("netfilter: ipset: Fix race between namespace cleanup and gc in the list:set type")
Reported-by: syzbot+b62c37cdd58103293a5a@syzkaller.appspotmail.com
Reported-by: syzbot+cfbe1da5fdfc39efc293@syzkaller.appspotmail.com
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202406141556.e0b6f17e-lkp@intel.com
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

selftests: openvswitch: Set value to nla flags.

Netlink flags, although they don't have payload at the netlink level,
are represented as having "True" as value in pyroute2.

Without it, trying to add a flow with a flag-type action (e.g: pop_vlan)
fails with the following traceback:

Traceback (most recent call last):
  File "[...]/ovs-dpctl.py", line 2498, in <module>
    sys.exit(main(sys.argv))
             ^^^^^^^^^^^^^^
  File "[...]/ovs-dpctl.py", line 2487, in main
    ovsflow.add_flow(rep["dpifindex"], flow)
  File "[...]/ovs-dpctl.py", line 2136, in add_flow
    reply = self.nlm_request(
            ^^^^^^^^^^^^^^^^^
  File "[...]/pyroute2/netlink/nlsocket.py", line 822, in nlm_request
    return tuple(self._genlm_request(*argv, **kwarg))
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "[...]/pyroute2/netlink/generic/__init__.py", line 126, in
nlm_request
    return tuple(super().nlm_request(*argv, **kwarg))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "[...]/pyroute2/netlink/nlsocket.py", line 1124, in nlm_request
    self.put(msg, msg_type, msg_flags, msg_seq=msg_seq)
  File "[...]/pyroute2/netlink/nlsocket.py", line 389, in put
    self.sendto_gate(msg, addr)
  File "[...]/pyroute2/netlink/nlsocket.py", line 1056, in sendto_gate
    msg.encode()
  File "[...]/pyroute2/netlink/__init__.py", line 1245, in encode
    offset = self.encode_nlas(offset)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "[...]/pyroute2/netlink/__init__.py", line 1560, in encode_nlas
    nla_instance.setvalue(cell[1])
  File "[...]/pyroute2/netlink/__init__.py", line 1265, in setvalue
    nlv.setvalue(nla_tuple[1])
                 ~~~~~~~~~^^^
IndexError: list index out of range

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mt7530: add support for bridge port isolation

Remove a pair of ports from the port matrix when both ports have the
isolated flag set.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mt7530: factor out bridge join/leave logic

As preparation for implementing bridge port isolation, move the logic to
add and remove bits in the port matrix into a new helper
mt7530_update_port_member(), which is called from
mt7530_port_bridge_join() and mt7530_port_bridge_leave().

Another part of the preparation is using dsa_port_offloads_bridge_dev()
instead of dsa_port_offloads_bridge() to check for bridge membership, as
we don't have a struct dsa_bridge in mt7530_port_bridge_flags().

The port matrix setting is slightly streamlined, now always first setting
the mt7530_port's pm field and then writing the port matrix from that
field into the hardware register, instead of duplicating the bit
manipulation for both the struct field and the register.

mt7530_port_bridge_join() was previously using |= to update the port
matrix with the port bitmap, which was unnecessary, as pm would only
have the CPU port set before joining a bridge; a simple assignment can
be used for both joining and leaving (and will also work when individual
bits are added/removed in port_bitmap with regard to the previous port
matrix, which is what happens with port isolation).

No functional change intended.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-pf: Fix linking objects into multiple modules

This patch fixes the below build warning messages that are
caused due to linking same files to multiple modules by
exporting the required symbols.

"scripts/Makefile.build:244: drivers/net/ethernet/marvell/octeontx2/nic/Makefile:
otx2_devlink.o is added to multiple modules: rvu_nicpf rvu_nicvf

scripts/Makefile.build:244: drivers/net/ethernet/marvell/octeontx2/nic/Makefile:
otx2_dcbnl.o is added to multiple modules: rvu_nicpf rvu_nicvf"

Fixes: 8e67558177f8 ("octeontx2-pf: PFC config support with DCBx").
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-drop-rx-socket-tracepoint'

Yan Zhai says:

====================
net: pass receive socket to drop tracepoint

We set up our production packet drop monitoring around the kfree_skb
tracepoint. While this tracepoint is extremely valuable for diagnosing
critical problems, it also has some limitation with drops on the local
receive path: this tracepoint can only inspect the dropped skb itself,
but such skb might not carry enough information to:

1. determine in which netns/container this skb gets dropped
2. determine by which socket/service this skb oughts to be received

The 1st issue is because skb->dev is the only member field with valid
netns reference. But skb->dev can get cleared or reused. For example,
tcp_v4_rcv will clear skb->dev and in later processing it might be reused
for OFO tree.

The 2nd issue is because there is no reference on an skb that reliably
points to a receiving socket. skb->sk usually points to the local
sending socket, and it only points to a receive socket briefly after
early demux stage, yet the socket can get stolen later. For certain drop
reason like TCP OFO_MERGE, Zerowindow, UDP at PROTO_MEM error, etc, it
is hard to infer which receiving socket is impacted. This cannot be
overcome by simply looking at the packet header, because of
complications like sk lookup programs. In the past, single purpose
tracepoints like trace_udp_fail_queue_rcv_skb, trace_sock_rcvqueue_full,
etc are added as needed to provide more visibility. This could be
handled in a more generic way.

In this change set we propose a new 'sk_skb_reason_drop' call as a drop-in
replacement for kfree_skb_reason at various local input path. It accepts
an extra receiving socket argument. Both issues above can be resolved
via this new argument.

V4->V5: rename rx_skaddr to rx_sk to be more clear visually, suggested
by Jesper Dangaard Brouer.

V3->V4: adjusted the TP_STRUCT field order to align better, suggested by
Steven Rostedt.

V2->V3: fixed drop_monitor function signatures; fixed a few uninitialized sks;
Added a few missing report tags from test bots (also noticed by Dan
Carpenter and Simon Horman).

V1->V2: instead of using skb->cb, directly add the needed argument to
trace_kfree_skb tracepoint. Also renamed functions as Eric Dumazet
suggested.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

af_packet: use sk_skb_reason_drop to free rx packets

Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving
socket to the tracepoint.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202406011859.Aacus8GV-lkp@intel.com/
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

udp: use sk_skb_reason_drop to free rx packets

Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving
socket to the tracepoint.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202406011751.NpVN0sSk-lkp@intel.com/
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: use sk_skb_reason_drop to free rx packets

Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving
socket to the tracepoint.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202406011539.jhwBd7DX-lkp@intel.com/
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: raw: use sk_skb_reason_drop to free rx packets

Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving
socket to the tracepoint.

Signed-off-by: Yan Zhai <yan@cloudflare.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ping: use sk_skb_reason_drop to free rx packets

Replace kfree_skb_reason with sk_skb_reason_drop and pass the receiving
socket to the tracepoint.

Signed-off-by: Yan Zhai <yan@cloudflare.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: introduce sk_skb_reason_drop function

Long used destructors kfree_skb and kfree_skb_reason do not pass
receiving socket to packet drop tracepoints trace_kfree_skb.
This makes it hard to track packet drops of a certain netns (container)
or a socket (user application).

The naming of these destructors are also not consistent with most sk/skb
operating functions, i.e. functions named "sk_xxx" or "skb_xxx".
Introduce a new functions sk_skb_reason_drop as drop-in replacement for
kfree_skb_reason on local receiving path. Callers can now pass receiving
sockets to the tracepoints.

kfree_skb and kfree_skb_reason are still usable but they are now just
inline helpers that call sk_skb_reason_drop.

Note it is not feasible to do the same to consume_skb. Packets not
dropped can flow through multiple receive handlers, and have multiple
receiving sockets. Leave it untouched for now.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add rx_sk to trace_kfree_skb

skb does not include enough information to find out receiving
sockets/services and netns/containers on packet drops. In theory
skb->dev tells about netns, but it can get cleared/reused, e.g. by TCP
stack for OOO packet lookup. Similarly, skb->sk often identifies a local
sender, and tells nothing about a receiver.

Allow passing an extra receiving socket to the tracepoint to improve
the visibility on receiving drops.

Signed-off-by: Yan Zhai <yan@cloudflare.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-pf: Add error handling to VLAN unoffload handling

otx2_sq_append_skb makes used of __vlan_hwaccel_push_inside()
to unoffload VLANs - push them from skb meta data into skb data.
However, it omitts a check for __vlan_hwaccel_push_inside()
returning NULL.

Found by inspection based on [1] and [2].
Compile tested only.

[1] Re: [PATCH net-next v1] net: stmmac: Enable TSO on VLANs
https://lore.kernel.org/all/ZmrN2W8Fye450TKs@shell.armlinux.org.uk/
[2] Re: [PATCH net-next v2] net: stmmac: Enable TSO on VLANs
https://lore.kernel.org/all/CANn89i+11L5=tKsa7V7Aeyxaj6nYGRwy35PAbCRYJ73G+b25sg@mail.gmail.com/

Fixes: fd9d7859db6c ("octeontx2-pf: Implement ingress/egress VLAN offload")
Signed-off-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'am65x-ptp'

Diogo Ivo says

====================
Enable PTP timestamping/PPS for AM65x SR1.0 devices

This patch series enables support for PTP in AM65x SR1.0 devices.

This feature relies heavily on the Industrial Ethernet Peripheral
(IEP) hardware module, which implements a hardware counter through
which time is kept. This hardware block is the basis for exposing
a PTP hardware clock to userspace and for issuing timestamps for
incoming/outgoing packets, allowing for time synchronization.

The IEP also has compare registers that fire an interrupt when the
counter reaches the value stored in a compare register. This feature
allows us to support PPS events in the kernel.

The changes are separated into five patches:
- PATCH 01/05: Register SR1.0 devices with the IEP infrastructure to
expose a PHC clock to userspace, allowing time to be
adjusted using standard PTP tools. The code for issuing/
collecting packet timestamps is already present in the
current state of the driver, so only this needs to be
done.
- PATCH 02/05: Remove unnecessary spinlock synchronization.
- PATCH 03/05: Document IEP interrupt in DT binding.
- PATCH 04/05: Add support for IEP compare event/interrupt handling
to enable PPS events.
- PATCH 05/05: Add the interrupts to the IOT2050 device tree.

Currently every compare event generates two interrupts, the first
corresponding to the actual event and the second being a spurious
but otherwise harmless interrupt. The root cause of this has been
identified and has been solved in the platform's SDK. A forward port
of the SDK's patches also fixes the problem in upstream but is not
included here since it's upstreaming is out of the scope of this
series. If someone from TI would be willing to chime in and help
get the interrupt changes upstream that would be great!

Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com>
---
Changes in v4:
- Remove unused 'flags' variables in patch 02/05
- Add patch 03/05 describing IEP interrupt in DT binding
- Link to v3: https://lore.kernel.org/r/20240607-iep-v3-0-4824224105bc@siemens.com

Changes in v3:
- Collect Reviewed-by tags
- Add patch 02/04 removing spinlocks from IEP driver
- Use mutex-based synchronization when accessing HW registers
- Link to v2: https://lore.kernel.org/r/20240604-iep-v2-0-ea8e1c0a5686@siemens.com

Changes in v2:
- Collect Reviewed-by tags
- PATCH 01/03: Limit line length to 80 characters
- PATCH 02/03: Proceed with limited functionality if getting IRQ fails,
limit line length to 80 characters
- Link to v1: https://lore.kernel.org/r/20240529-iep-v1-0-7273c07592d3@siemens.com
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

arm64: dts: ti: iot2050: Add IEP interrupts for SR1.0 devices

Add the interrupts needed for PTP Hardware Clock support via IEP
in SR1.0 devices.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com>
Signed-off-by: David S. Miller <davem@davemloft.net>