git.kernel.dk Git - linux-2.6-block.git/log

net: dsa: microchip: lan9371/2: add 100BaseTX PHY support

On the LAN9371 and LAN9372, the 4th internal PHY is a 100BaseTX PHY
instead of a 100BaseT1 PHY. The 100BaseTX PHYs have a different base
register offset.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'device-memory-tcp'

Prep patches for Device Memory TCP

Pick up a couple of prep patches for Device Memory TCP which
stand on their own.

Link: https://patch.msgid.link/20240628003253.1694510-1-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: net: package libynl for use in selftests

Support building the C YNL userspace library into one big static file.
We can then link selftests against it for easy to use C netlink
interface.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20240628003253.1694510-14-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

page_pool: convert to use netmem

Abstract the memory type from the page_pool so we can later add support
for new memory types. Convert the page_pool to use the new netmem type
abstraction, rather than use struct page directly.

As of this patch the netmem type is a no-op abstraction: it's always a
struct page underneath. All the page pool internals are converted to
use struct netmem instead of struct page, and the page pool now exports
2 APIs:

1. The existing struct page API.
2. The new struct netmem API.

Keeping the existing API is transitional; we do not want to refactor all
the current drivers using the page pool at once.

The netmem abstraction is currently a no-op. The page_pool uses
page_to_netmem() to convert allocated pages to netmem, and uses
netmem_to_page() to convert the netmem back to pages to pass to mm APIs,

Follow up patches to this series add non-paged netmem support to the
page_pool. This change is factored out on its own to limit the code
churn to this 1 patch, for ease of code review.

Signed-off-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/20240628003253.1694510-6-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'fixes-for-stm32-dwmac-driver-fails-to-probe'

Christophe Roullier says:

====================
Fixes for stm32-dwmac driver fails to probe

Mark Brown found issue during stm32-dwmac probe:

For the past few days networking has been broken on the Avenger 96, a
stm32mp157a based platform.  The stm32-dwmac driver fails to probe:

<6>[    1.894271] stm32-dwmac 5800a000.ethernet: IRQ eth_wake_irq not found
<6>[    1.899694] stm32-dwmac 5800a000.ethernet: IRQ eth_lpi not found
<6>[    1.905849] stm32-dwmac 5800a000.ethernet: IRQ sfty not found
<3>[    1.912304] stm32-dwmac 5800a000.ethernet: Unable to parse OF data
<3>[    1.918393] stm32-dwmac 5800a000.ethernet: probe with driver stm32-dwmac failed with error -75

which looks a bit odd given the commit contents but I didn't look at the
driver code at all.

Full boot log here:

   https://lava.sirena.org.uk/scheduler/job/467150

A working equivalent is here:

   https://lava.sirena.org.uk/scheduler/job/466518

I delivered 2 fixes to solve issue.
====================

Link: https://patch.msgid.link/20240701064838.425521-1-christophe.roullier@foss.st.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: stmmac: dwmac-stm32: update err status in case different of stm32mp13

The mask parameter of syscfg property is mandatory for MP13 but
optional for all other cases.
The function should not return error code because for non-MP13
the missing syscfg phandle in DT is not considered an error.
So reset err to 0 in that case to support existing DTs without
syscfg phandle.

Fixes: 50bbc0393114 ("net: stmmac: dwmac-stm32: add management of stm32mp13 for stm32")
Signed-off-by: Christophe Roullier <christophe.roullier@foss.st.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Tested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: stmmac: dwmac-stm32: Add test to verify if ETHCK is used before checking clk rate

When we want to use clock from RCC to clock Ethernet PHY (with ETHCK)
we need to check if value of clock rate is authorized.
If ETHCK is unused, the ETHCK frequency is 0Hz and validation fails.
It makes no sense to validate unused ETHCK, so skip the validation.

Fixes: 582ac134963e ("net: stmmac: dwmac-stm32: Separate out external clock rate validation")
Signed-off-by: Christophe Roullier <christophe.roullier@foss.st.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Tested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

dt-bindings: net: dwmac: Validate PBL for all IP-cores

Indeed the maximum DMA burst length can be programmed not only for DW
xGMACs, Allwinner EMACs and Spear SoC GMAC, but in accordance with
[1, 2, 3] for Generic DW *MAC IP-cores. Moreover the STMMAC driver parses
the property and then apply the configuration for all supported DW MAC
devices. All of that makes the property being available for all IP-cores
the bindings supports. Let's make sure the PBL-related properties are
validated for all of them by the common DW *MAC DT schema.

[1] DesignWare Cores Ethernet MAC Universal Databook, Revision 3.73a,
    October 2013, p.378.

[2] DesignWare Cores Ethernet Quality-of-Service Databook, Revision 5.10a,
    December 2017, p.1223.

[3] DesignWare Cores XGMAC - 10G Ethernet MAC Databook, Revision 2.11a,
    September 2015, p.469-473.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20240628154515.8783-1-fancer.lancer@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-bpf_net_context-cleanups'

Sebastian Andrzej Siewior says:

====================
net: bpf_net_context cleanups.

a small series with bpf_net_context cleanups/ improvements.
Jakub asked for #1 and #2 and while looking around I made #3.
====================

Link: https://patch.msgid.link/20240628103020.1766241-1-bigeasy@linutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: Move flush list retrieval to where it is used.

The bpf_net_ctx_get_.*_flush_list() are used at the top of the function.
This means the variable is always assigned even if unused. By moving the
function to where it is used, it is possible to delay the initialisation
until it is unavoidable.
Not sure how much this gains in reality but by looking at bq_enqueue()
(in devmap.c) gcc pushes one register less to the stack. \o/.

Move flush list retrieval to where it is used.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: Optimize xdp_do_flush() with bpf_net_context infos.

Every NIC driver utilizing XDP should invoke xdp_do_flush() after
processing all packages. With the introduction of the bpf_net_context
logic the flush lists (for dev, CPU-map and xsk) are lazy initialized
only if used. However xdp_do_flush() tries to flush all three of them so
all three lists are always initialized and the likely empty lists are
"iterated".
Without the usage of XDP but with CONFIG_DEBUG_NET the lists are also
initialized due to xdp_do_check_flushed().

Jakub suggest to utilize the hints in bpf_net_context and avoid invoking
the flush function. This will also avoiding initializing the lists which
are otherwise unused.

Introduce bpf_net_ctx_get_all_used_flush_lists() to return the
individual list if not-empty. Use the logic in xdp_do_flush() and
xdp_do_check_flushed(). Remove the not needed .*_check_flush().

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: Remove task_struct::bpf_net_context init on fork.

There is no clone() invocation within a bpf_net_ctx_…() block. Therefore
the task_struct::bpf_net_context has always to be NULL and an explicit
initialisation is not required.

Remove the NULL assignment in the clone() path.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'page_pool-bnxt_en-unlink-old-page-pool-in-queue-api-using-helper'

David Wei says:

====================
page_pool: bnxt_en: unlink old page pool in queue api using helper

56ef27e3 unexported page_pool_unlink_napi() and renamed it to
page_pool_disable_direct_recycling(). This is because there was no
in-tree user of page_pool_unlink_napi().

Since then Rx queue API and an implementation in bnxt got merged. In the
bnxt implementation, it broadly follows the following steps: allocate
new queue memory + page pool, stop old rx queue, swap, then destroy old
queue memory + page pool.

The existing NAPI instance is re-used so when the old page pool that is
no longer used but still linked to this shared NAPI instance is
destroyed, it will trigger warnings.

In my initial patches I unlinked a page pool from a NAPI instance
directly. Instead, export page_pool_disable_direct_recycling() and call
that instead to avoid having a driver touch a core struct.
====================

Link: https://patch.msgid.link/20240627030200.3647145-1-dw@davidwei.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

bnxt_en: unlink page pool when stopping Rx queue

Have bnxt call page_pool_disable_direct_recycling() to unlink the old
page pool when resetting a queue prior to destroying it, instead of
touching a netdev core struct directly.

Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

page_pool: export page_pool_disable_direct_recycling()

56ef27e3 unexported page_pool_unlink_napi() and renamed it to
page_pool_disable_direct_recycling(). This is because there was no
in-tree user of page_pool_unlink_napi().

Since then Rx queue API and an implementation in bnxt got merged. In the
bnxt implementation, it broadly follows the following steps: allocate
new queue memory + page pool, stop old rx queue, swap, then destroy old
queue memory + page pool.

The existing NAPI instance is re-used so when the old page pool that is
no longer used but still linked to this shared NAPI instance is
destroyed, it will trigger warnings.

In my initial patches I unlinked a page pool from a NAPI instance
directly. Instead, export page_pool_disable_direct_recycling() and call
that instead to avoid having a driver touch a core struct.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'zerocopy-tx-cleanups'

Pavel Begunkov says:

====================
zerocopy tx cleanups

Assorted zerocopy send path cleanups, the main part of which is
moving some net stack specific accounting out of io_uring back
to net/ in Patch 4.
====================

Link: https://patch.msgid.link/cover.1719190216.git.asml.silence@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: limit scope of a skb_zerocopy_iter_stream var

skb_zerocopy_iter_stream() only uses @orig_uarg in the !link_skb path,
and we can move the local variable in the appropriate block.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

io_uring/net: move charging socket out of zc io_uring

Currently, io_uring's io_sg_from_iter() duplicates the part of
__zerocopy_sg_from_iter() charging pages to the socket. It'd be too easy
to miss while changing it in net/, the chunk is not the most
straightforward for outside users and full of internal implementation
details. io_uring is not a good place to keep it, deduplicate it by
moving out of the callback into __zerocopy_sg_from_iter().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: batch zerocopy_fill_skb_from_iter accounting

Instead of accounting every page range against the socket separately, do
it in batch based on the change in skb->truesize. It's also moved into
__zerocopy_sg_from_iter(), so that zerocopy_fill_skb_from_iter() is
simpler and responsible for setting frags but not the accounting.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: split __zerocopy_sg_from_iter()

Split a function out of __zerocopy_sg_from_iter() that only cares about
the traditional path with refcounted pages and doesn't need to know
about ->sg_from_iter. A preparation patch, we'll improve on the function
later.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: always try to set ubuf in skb_zerocopy_iter_stream

skb_zcopy_set() does nothing if there is already a ubuf_info associated
with an skb, and since ->link_skb should have set it several lines above
the check here essentially does nothing and can be removed. It's also
safer this way, because even if the callback is faulty we'll
have it set.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: phy: fix potential use of NULL pointer in phy_suspend()

phy_suspend() checks the WoL status, and then dereferences
phydrv->flags if (and only if) we decided that WoL has been enabled
on either the PHY or the netdev.

We then check whether phydrv was NULL, but we've potentially already
dereferenced the pointer.

If phydrv is NULL, then phy_ethtool_get_wol() will return an error
and leave wol.wolopts set to zero. However, if netdev->wol_enabled
is true, then we would dereference a NULL pointer.

Checking the PHY drivers, the only place that phydev->wol_enabled is
checked by them is in their suspend/resume callbacks and nowhere else
(which is correct, because phylib only updates this in phy_suspend()).

So, move the NULL pointer check earlier to avoid a NULL pointer
dereference. Leave the check for phydrv->suspend in place as a driver
may populate the .resume method but not the .suspend method.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/E1sN8tn-00GDCZ-Jj@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'linux-can-next-for-6.11-20240629' of git://git./linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2024-06-29

Geert Uytterhoeven contributes 3 patches with small improvements and
cleanups for the rcar_canfd driver.

A patch by Christophe JAILLET constifies the struct m_can_ops in the
m_can driver to reduce the code size.

The last 9 patches are by me an work around erratum DS80000789E 6 of
mcp2518fd.

* tag 'linux-can-next-for-6.11-20240629' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
  can: mcp251xfd: tef: update workaround for erratum DS80000789E 6 of mcp2518fd
  can: mcp251xfd: tef: prepare to workaround broken TEF FIFO tail index erratum
  can: mcp251xfd: rx: add workaround for erratum DS80000789E 6 of mcp2518fd
  can: mcp251xfd: rx: prepare to workaround broken RX FIFO head index erratum
  can: mcp251xfd: mcp251xfd_handle_rxif_ring_uinc(): factor out in separate function
  can: mcp251xfd: clarify the meaning of timestamp
  can: mcp251xfd: move mcp251xfd_timestamp_start()/stop() into mcp251xfd_chip_start/stop()
  can: mcp251xfd: update errata references
  can: mcp251xfd: properly indent labels
  can: gs_usb: add VID/PID for Xylanta SAINT3 product family
  can: m_can: Constify struct m_can_ops
  can: rcar_canfd: Remove superfluous parentheses in address calculations
  can: rcar_canfd: Improve printing of global operational state
  can: rcar_canfd: Simplify clock handling
====================

Link: https://patch.msgid.link/20240629114017.1080160-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethtool: Fix the panic caused by dev being null when dumping coalesce

syzbot reported a general protection fault caused by a null pointer
dereference in coalesce_fill_reply(). The issue occurs when req_base->dev
is null, leading to an invalid memory access.

This panic occurs if dumping coalesce when no device name is specified.

Fixes: f750dfe825b9 ("ethtool: provide customized dim profile management")
Reported-by: syzbot+e77327e34cdc8c36b7d3@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=e77327e34cdc8c36b7d3
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/next-queue into main

Tony nguyen says:

====================
Intel Wired LAN Driver Updates 2024-06-28 (MAINTAINERS, ice)

This series contains updates to MAINTAINERS file and ice driver.

Jesse replaces himself with Przemek in the maintainers file.

Karthik Sundaravel adds support for VF get/set MAC address via devlink.

Eric checks for errors from ice_vsi_rebuild() during queue
reconfiguration.

Paul adjusts FW API version check for E830 devices.

Piotr adds differentiation of unload type when shutting down AdminQ.

Przemek changes ice_adapter initialization to occur once per physical
card.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bnxt_en-ptp' into main

Michael Chan says:

====================
bnxt_en: PTP updates for net-next

The first 5 patches implement the PTP feature on the new BCM5760X
chips.  The main new hardware feature is the new TX timestamp
completion which enables the driver to retrieve the TX timestamp
in NAPI without deferring to the PTP worker.

The last 5 patches increase the number of TX PTP packets in-flight
from 1 to 4 on the older BCM5750X chips.  On these older chips, we
need to call firmware in the PTP worker to retrieve the timestamp.
We use an arry to keep track of the in-flight TX PTP packets.

v2: Patch #2: Fix the unwind of txr->is_ts_pkt when bnxt_start_xmit() aborts.
    Patch #4: Set the SKBTX_IN_PROGRESS flag for timestamp packets.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Remove atomic operations on ptp->tx_avail

Now that we require the spinlock to protect ptp->txts_prod, change
ptp->tx_avail to non-atomic and protect it under the same spinlock.
Add a new helper function bnxt_ptp_get_txts_prod() to decrement
ptp->tx_avail under spinlock and return the producer.

Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Increase the max total outstanding PTP TX packets to 4

Start accepting up to 4 TX TS requests on BCM5750X (P5) chips.
These PTP TX packets will be queued in the ptp->txts_req[] array
waiting for the TX timestamp to complete. The entries in the
array will be managed by a producer and consumer index. The
producer index is updated under spinlock since multiple TX rings
can try to send PTP packets at the same time.

Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Let bnxt_stamp_tx_skb() return error code

Change the function bnxt_stamp_tx_skb() to return 0 for suceess
or -EAGAIN if the timestamp is still pending in firmware. The
calling PTP aux worker will reschedule based on the return code.

Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Remove an impossible condition check for PTP TX pending SKB

In the current 5750X PTP code paths, there is always at most one TX
SKB requested for timestamp and we won't accept another one until we
have retrieved the timestamp or it has timed out. Remove the
unnecessary check in bnxt_get_tx_ts_p5() for a pending SKB and change
the function to void.

Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Refactor all PTP TX timestamp fields into a struct

On the older 5750X (P5) chips, we currently support only 1 TX PTP
packet in-flight waiting for the timestamp. Refactor the
datastructures to prepare to support up to 4 TX PTP packets.

Combine all fields required for PTP TX timestamp query into one
structure. An array of this structure will be added in follow-on
patches to support multiple outstanding TX timestamps.

Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Add BCM5760X specific PHC registers mapping

BCM5760X firmware will advertise direct 64-bit PHC registers access
for the driver from BAR0.

Make the necessary changes in handling HWRM_PORT_MAC_PTP_QCFG's
response and PHC register mapping for 5760X chips.

Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Add TX timestamp completion logic

The new BCM5760X chips will return the timestamp of TX packets in a
new completion. Add logic in __bnxt_poll_work() to handle this
completion type to retrieve the timestamp. This feature eliminates
the limit on the number of in-flight PTP TX packets.

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Allow some TX packets to be unprocessed in NAPI

The driver's current logic will always free all the TX SKBs up to
txr->tx_hw_cons within NAPI.  In the next patches, we'll be adding
logic to handle TX timestamp completion and we may need to hold
some remaining TX SKBs if we don't have the timestamp completions
yet.

Modify __bnxt_poll_work_done() to clear each event bit separately to
allow bnapi->tx_int() to decide whether to clear BNXT_TX_CMP_EVENT or
not.  bnapi->tx_int() will not clear BNXT_TX_CMP_EVENT if some TX
SKBs are held waiting for TX timestamps.  Note that legacy chips will
never hold any SKBs this way.  The SKB is always deferred to the PTP
worker slow path to retrieve the timestamp from firmware.  On the new
P7 chips, the timestamp is returned by the hardware directly and we
can retrieve it directly from NAPI.

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Add is_ts_pkt field to struct bnxt_sw_tx_bd

Remove the unused is_gso field and add the is_ts_pkt field to struct
bnxt_sw_tx_bd.  This field will mark the TX BD that has requested
HW TX timestamp.  The field needs to be cleared if the timestamp packet
is later aborted.  This field will be useful when processing the
new TX timestamp completion from the hardware in the next patches.

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Add new TX timestamp completion definitions

The new BCM5760X chips will generate this new TX timestamp completion
when a TX packet's timestamp has been taken right before transmission.
The driver logic to retrieve the timestamp will be added in the next
few patches.

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-af: Sync NIX and NPA contexts from NDC to LLC/DRAM

Octeontx2 hardware uses Near Data Cache(NDC) block to cache
contexts in it so that access to LLC/DRAM can be avoided.
It is recommended in HRM to sync the NDC contents before
releasing/resetting LF resources. Hence implement NDC_SYNC
mailbox and sync contexts during driver teardown.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: tn40xx: add initial ethtool_ops support

Call phylink_ethtool_ksettings_get() for get_link_ksettings method and
ethtool_op_get_link() for get_link method.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'nf-next-24-06-28' of git://git./linux/kernel/git/netfilter/nf-next into main

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for net-next:

Patch #1 to #11 to shrink memory consumption for transaction objects:

  struct nft_trans_chain { /* size: 120 (-32), cachelines: 2, members: 10 */
  struct nft_trans_elem { /* size: 72 (-40), cachelines: 2, members: 4 */
  struct nft_trans_flowtable { /* size: 80 (-48), cachelines: 2, members: 5 */
  struct nft_trans_obj { /* size: 72 (-40), cachelines: 2, members: 4 */
  struct nft_trans_rule { /* size: 80 (-32), cachelines: 2, members: 6 */
  struct nft_trans_set { /* size: 96 (-24), cachelines: 2, members: 8 */
  struct nft_trans_table { /* size: 56 (-40), cachelines: 1, members: 2 */

  struct nft_trans_elem can now be allocated from kmalloc-96 instead of
  kmalloc-128 slab.

  Series from Florian Westphal. For the record, I have mangled patch #1
  to add nft_trans_container_*() and use if for every transaction object.
   I have also added BUILD_BUG_ON to ensure struct nft_trans always comes
  at the beginning of the container transaction object. And few minor
  cleanups, any new bugs are of my own.

Patch #12 simplify check for SCTP GSO in IPVS, from Ismael Luceno.

Patch #13 nf_conncount key length remains in the u32 bound, from Yunjian Wang.

Patch #14 removes unnecessary check for CTA_TIMEOUT_L3PROTO when setting
          default conntrack timeouts via nfnetlink_cttimeout API, from
          Lin Ma.

Patch #15 updates NFT_SECMARK_CTX_MAXLEN to 4096, SELinux could use
          larger secctx names than the existing 256 bytes length.

Patch #16 adds a selftest to exercise nfnetlink_queue listeners leaving
          nfnetlink_queue, from Florian Westphal.

Patch #17 increases hitcount from 255 to 65535 in xt_recent, from Phil Sutter.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tcp_metrics-netlink-specs' into main

Jakub Kicinski says:

====================
tcp_metrics: add netlink protocol spec in YAML

Add a netlink protocol spec for the tcp_metrics generic netlink family.
First patch adjusts the uAPI header guards to make it easier to build
tools/ with non-system headers.

v1: https://lore.kernel.org/all/20240626201133.2572487-1-kuba@kernel.org
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp_metrics: add netlink protocol spec in YAML

Add a protocol spec for tcp_metrics, so that it's accessible via YNL.
Useful at the very least for testing fixes.

In this episode of "10,000 ways to complicate netlink" the metric
nest has defines which are off by 1. iproute2 does:

        struct rtattr *m[TCP_METRIC_MAX + 1 + 1];

        parse_rtattr_nested(m, TCP_METRIC_MAX + 1, a);

        for (i = 0; i < TCP_METRIC_MAX + 1; i++) {
                // ...
                attr = m[i + 1];

This is too weird to support in YNL, add a new set of defines
with _correct_ values to the official kernel header.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp_metrics: add UAPI to the header guard

tcp_metrics' header lacks the customary _UAPI in the header guard.
This makes YNL build rules work less seamlessly.
We can easily fix that on YNL side, but this could also be
problematic if we ever needed to create a kernel-only tcp_metrics.h.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: realtek: Add support for PHY LEDs on RTL8211F

Realtek RTL8211F Ethernet PHY supports 3 LED pins which are used to
indicate link status and activity. Add minimal LED controller driver
supporting the most common uses with the 'netdev' trigger.

Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ethtool-track-custom-rss-contexts-in-the-core'

Edward Cree says:

====================
ethtool: track custom RSS contexts in the core

Make the core responsible for tracking the set of custom RSS contexts,
their IDs, indirection tables, hash keys, and hash functions; this
lets us get rid of duplicative code in drivers, and will allow us to
support netlink dumps later.

This series only moves the sfc EF10 & EF100 driver over to the new API;
other drivers (mvpp2, octeontx2, mlx5, sfc/siena, bnxt_en) can be converted
afterwards and the legacy API removed.
====================

Link: https://patch.msgid.link/cover.1719502239.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: remove get_rxfh_context dead code

The core now always satisfies 'ethtool -x context nonzero' from its own
tracking, so our lookup code for that case is never called. Remove it.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/b426fcc416dedc8f203e52eebef6891eccebe4c1.1719502240.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethtool: use the tracking array for get_rxfh on custom RSS contexts

On 'ethtool -x' with rss_context != 0, instead of calling the driver to
read the RSS settings for the context, just get the settings from the
rss_ctx xarray, and return them to the user with no driver involvement.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/2d0190fa29638f307ea720f882ebd41f6f867694.1719502240.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

sfc: use new rxfh_context API

The core is now responsible for allocating IDs and a memory region for
us to store our state (struct efx_rss_context_priv), so we no longer
need efx_alloc_rss_context_entry() and friends.
Since the contexts are now maintained by the core, use the core's lock
(net_dev->ethtool->rss_lock), rather than our own mutex (efx->rss_lock),
to serialise access against changes; and remove the now-unused
efx->rss_lock from struct efx_nic.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/150274740ea8cc137fef5502541ce573d32fb319.1719502240.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethtool: add a mutex protecting RSS contexts

While this is not needed to serialise the ethtool entry points (which
are all under RTNL), drivers may have cause to asynchronously access
dev->ethtool->rss_ctx; taking dev->ethtool->rss_lock allows them to
do this safely without needing to take the RTNL.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/7f9c15eb7525bf87af62c275dde3a8570ee8bf0a.1719502240.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethtool: add an extack parameter to new rxfh_context APIs

Currently passed as NULL, but will allow drivers to report back errors
when ethnl support for these ops is added.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/6e0012347d175fdd1280363d7bfa76a2f2777e17.1719502240.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethtool: let the core choose RSS context IDs

Add a new API to create/modify/remove RSS contexts, that passes in the
newly-chosen context ID (not as a pointer) rather than leaving the
driver to choose it on create. Also pass in the ctx, allowing drivers
to easily use its private data area to store their hardware-specific
state.
Keep the existing .set_rxfh API for now as a fallback, but deprecate it
for custom contexts (rss_context != 0).

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/45f1fe61df2163c091ec394c9f52000c8b16cc3b.1719502240.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethtool: record custom RSS contexts in the XArray

Since drivers are still choosing the context IDs, we have to force the
XArray to use the ID they've chosen rather than picking one ourselves,
and handle the case where they give us an ID that's already in use.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/801f5faa4cec87c65b2c6e27fb220c944bce593a.1719502240.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethtool: attach an XArray of custom RSS contexts to a netdevice

Each context stores the RXFH settings (indir, key, and hfunc) as well
as optionally some driver private data.
Delete any still-existing contexts at netdev unregister time.

Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/cbd1c402cec38f2e03124f2ab65b4ae4e08bd90d.1719502240.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: move ethtool-related netdev state into its own struct

net_dev->ethtool is a pointer to new struct ethtool_netdev_state, which
currently contains only the wol_enabled field.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/293a562278371de7534ed1eb17531838ca090633.1719502239.git.ecree.xilinx@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'selftests-drv-net-add-ability-to-schedule-cleanup-with-defer'

Jakub Kicinski says:

====================
selftests: drv-net: add ability to schedule cleanup with defer()

Introduce a defer / cleanup mechanism for driver selftests.
More detailed info in the second patch.
====================

Link: https://patch.msgid.link/20240627185502.3069139-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: rss_ctx: convert to defer()

Use just added defer().

Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20240627185502.3069139-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: add ability to schedule cleanup with defer()

This implements what I was describing in [1]. When writing a test
author can schedule cleanup / undo actions right after the creation
completes, eg:

  cmd("touch /tmp/file")
  defer(cmd, "rm /tmp/file")

defer() takes the function name as first argument, and the rest are
arguments for that function. defer()red functions are called in
inverse order after test exits. It's also possible to capture them
and execute earlier (in which case they get automatically de-queued).

  undo = defer(cmd, "rm /tmp/file")
  # ... some unsafe code ...
  undo.exec()

As a nice safety all exceptions from defer()ed calls are captured,
printed, and ignored (they do make the test fail, however).
This addresses the common problem of exceptions in cleanup paths
often being unhandled, leading to potential leaks.

There is a global action queue, flushed by ksft_run(). We could support
function level defers too, I guess, but there's no immediate need..

Link: https://lore.kernel.org/all/877cedb2ki.fsf@nvidia.com/
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20240627185502.3069139-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: ksft: avoid continue when handling results

Exception handlers print the result and use continue
to skip the non-exception result printing. This makes
inserting common post-test code hard. Refactor to
avoid the continues and have only one ktap_result() call.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20240627185502.3069139-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

enic: add ethtool get_channel support

Add .get_channel to enic_ethtool_ops to enable basic ethtool -l
support to get the current channel configuration.

Note that the driver does not support dynamically changing queue
configuration, so .set_channel is intentionally unused. Instead, users
should use Cisco's hardware management tools (UCSM/IMC) to modify
virtual interface card configuration out of band.

Signed-off-by: Jon Kohler <jon@nutanix.com>
Link: https://patch.msgid.link/20240627202013.2398217-1-jon@nutanix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'lift-udp_segment-restriction-for-egress-via-device-w-o-csum-offload'

Jakub Sitnicki says:

====================
Lift UDP_SEGMENT restriction for egress via device w/o csum offload

This is a follow-up to an earlier question [1] if we can make UDP GSO work
with any egress device, even those with no checksum offload capability.
That's the default setup for TUN/TAP.

Because there is a change in behavior - sendmsg() does no longer return
EIO error - I'm submitting through net-next tree, rather than net,
as per Willem's advice.

[1] https://lore.kernel.org/netdev/87jzqsld6q.fsf@cloudflare.com/

v1: https://lore.kernel.org/r/20240622-linux-udpgso-v1-0-d2344157ab2a@cloudflare.com
====================

Link: https://patch.msgid.link/20240626-linux-udpgso-v2-0-422dfcbd6b48@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: Add test coverage for UDP GSO software fallback

Extend the existing test to exercise UDP GSO egress through devices with
various offload capabilities, including lack of checksum offload, which is
the default case for TUN/TAP devices.

Test against a dummy device because it is simpler to set up then TUN/TAP.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240626-linux-udpgso-v2-2-422dfcbd6b48@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

udp: Allow GSO transmit from devices with no checksum offload

Today sending a UDP GSO packet from a TUN device results in an EIO error:

  import fcntl, os, struct
  from socket import *

  TUNSETIFF = 0x400454CA
  IFF_TUN = 0x0001
  IFF_NO_PI = 0x1000
  UDP_SEGMENT = 103

  tun_fd = os.open("/dev/net/tun", os.O_RDWR)
  ifr = struct.pack("16sH", b"tun0", IFF_TUN | IFF_NO_PI)
  fcntl.ioctl(tun_fd, TUNSETIFF, ifr)

  os.system("ip addr add 192.0.2.1/24 dev tun0")
  os.system("ip link set dev tun0 up")

  s = socket(AF_INET, SOCK_DGRAM)
  s.setsockopt(SOL_UDP, UDP_SEGMENT, 1200)
  s.sendto(b"x" * 3000, ("192.0.2.2", 9)) # EIO

This is due to a check in the udp stack if the egress device offers
checksum offload. While TUN/TAP devices, by default, don't advertise this
capability because it requires support from the TUN/TAP reader.

However, the GSO stack has a software fallback for checksum calculation,
which we can use. This way we don't force UDP_SEGMENT users to handle the
EIO error and implement a segmentation fallback.

Lift the restriction so that UDP_SEGMENT can be used with any egress
device. We also need to adjust the UDP GSO code to match the GSO stack
expectation about ip_summed field, as set in commit 8d63bee643f1 ("net:
avoid skb_warn_bad_offload false positives on UFO"). Otherwise we will hit
the bad offload check.

Users should, however, expect a potential performance impact when
batch-sending packets with UDP_SEGMENT without checksum offload on the
egress device. In such case the packet payload is read twice: first during
the sendmsg syscall when copying data from user memory, and then in the GSO
stack for checksum computation. This double memory read can be less
efficient than a regular sendmsg where the checksum is calculated during
the initial data copy from user memory.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240626-linux-udpgso-v2-1-422dfcbd6b48@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge patch series "can: mcp251xfd: workaround for erratum DS80000789E 6 of mcp2518fd"

Marc Kleine-Budde <mkl@pengutronix.de> says:

This patch series tries to work around erratum DS80000789E 6 of the
mcp2518fd, found by Stefan Althöfer, the other variants of the chip
family (mcp2517fd and mcp251863) are probably also affected.

Erratum DS80000789E 6 says "reading of the FIFOCI bits in the FIFOSTA
register for an RX FIFO may be corrupted". However observation shows
that this problem is not limited to RX FIFOs but also effects the TEF
FIFO.

In the bad case, the driver reads a too large head index. In the
original code, the driver always trusted the read value.

For the RX FIDO this caused old, already processed CAN frames or new,
incompletely written CAN frames to be (re-)processed.

To work around this issue, keep a per FIFO timestamp of the last valid
received CAN frame and compare against the timestamp of every received
CAN frame.

Further tests showed that this workaround can recognize old CAN
frames, but a small time window remains in which partially written CAN
frames are not recognized but then processed. These CAN frames have
the correct data and time stamps, but the DLC has not yet been
updated.

For the TEF FIFO the original driver already detects the error, update
the error handling with the knowledge that it is causes by this erratum.

Link: https://lore.kernel.org/all/20240628-mcp251xfd-workaround-erratum-6-v4-0-53586f168524@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: tef: update workaround for erratum DS80000789E 6 of mcp2518fd

This patch updates the workaround for a problem similar to erratum
DS80000789E 6 of the mcp2518fd, the other variants of the chip
family (mcp2517fd and mcp251863) are probably also affected.

Erratum DS80000789E 6 says "reading of the FIFOCI bits in the FIFOSTA
register for an RX FIFO may be corrupted". However observation shows
that this problem is not limited to RX FIFOs but also effects the TEF
FIFO.

In the bad case, the driver reads a too large head index. As the FIFO
is implemented as a ring buffer, this results in re-handling old CAN
transmit complete events.

Every transmit complete event contains with a sequence number that
equals to the sequence number of the corresponding TX request. This
way old TX complete events can be detected.

If the original driver detects a non matching sequence number, it
prints an info message and tries again later. As wrong sequence
numbers can be explained by the erratum DS80000789E 6, demote the info
message to debug level, streamline the code and update the comments.

Keep the behavior: If an old CAN TX complete event is detected, abort
the iteration and mark the number of valid CAN TX complete events as
processed in the chip by incrementing the FIFO's tail index.

Cc: Stefan Althöfer <Stefan.Althoefer@janztec.com>
Cc: Thomas Kopp <thomas.kopp@microchip.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: tef: prepare to workaround broken TEF FIFO tail index erratum

This is a preparatory patch to work around a problem similar to
erratum DS80000789E 6 of the mcp2518fd, the other variants of the chip
family (mcp2517fd and mcp251863) are probably also affected.

Erratum DS80000789E 6 says "reading of the FIFOCI bits in the FIFOSTA
register for an RX FIFO may be corrupted". However observation shows
that this problem is not limited to RX FIFOs but also effects the TEF
FIFO.

When handling the TEF interrupt, the driver reads the FIFO header
index from the TEF FIFO STA register of the chip.

In the bad case, the driver reads a too large head index. In the
original code, the driver always trusted the read value, which caused
old CAN transmit complete events that were already processed to be
re-processed.

Instead of reading and trusting the head index, read the head index
and calculate the number of CAN frames that were supposedly received -
replace mcp251xfd_tef_ring_update() with mcp251xfd_get_tef_len().

The mcp251xfd_handle_tefif() function reads the CAN transmit complete
events from the chip, iterates over them and pushes them into the
network stack. The original driver already contains code to detect old
CAN transmit complete events, that will be updated in the next patch.

Cc: Stefan Althöfer <Stefan.Althoefer@janztec.com>
Cc: Thomas Kopp <thomas.kopp@microchip.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: rx: add workaround for erratum DS80000789E 6 of mcp2518fd

This patch tries to works around erratum DS80000789E 6 of the
mcp2518fd, the other variants of the chip family (mcp2517fd and
mcp251863) are probably also affected.

In the bad case, the driver reads a too large head index. In the
original code, the driver always trusted the read value, which caused
old, already processed CAN frames or new, incompletely written CAN
frames to be (re-)processed.

To work around this issue, keep a per FIFO timestamp [1] of the last
valid received CAN frame and compare against the timestamp of every
received CAN frame. If an old CAN frame is detected, abort the
iteration and mark the number of valid CAN frames as processed in the
chip by incrementing the FIFO's tail index.

Further tests showed that this workaround can recognize old CAN
frames, but a small time window remains in which partially written CAN
frames [2] are not recognized but then processed. These CAN frames
have the correct data and time stamps, but the DLC has not yet been
updated.

[1] As the raw timestamp overflows every 107 seconds (at the usual
clock rate of 40 MHz) convert it to nanoseconds with the
timecounter framework and use this to detect stale CAN frames.

Link: https://lore.kernel.org/all/BL3PR11MB64844C1C95CA3BDADAE4D8CCFBC99@BL3PR11MB6484.namprd11.prod.outlook.com
Reported-by: Stefan Althöfer <Stefan.Althoefer@janztec.com>
Closes: https://lore.kernel.org/all/FR0P281MB1966273C216630B120ABB6E197E89@FR0P281MB1966.DEUP281.PROD.OUTLOOK.COM
Tested-by: Stefan Althöfer <Stefan.Althoefer@janztec.com>
Tested-by: Thomas Kopp <thomas.kopp@microchip.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: rx: prepare to workaround broken RX FIFO head index erratum

This is a preparatory patch to work around erratum DS80000789E 6 of
the mcp2518fd, the other variants of the chip family (mcp2517fd and
mcp251863) are probably also affected.

When handling the RX interrupt, the driver iterates over all pending
FIFOs (which are implemented as ring buffers in hardware) and reads
the FIFO header index from the RX FIFO STA register of the chip.

In the bad case, the driver reads a too large head index. In the
original code, the driver always trusted the read value, which caused
old CAN frames that were already processed, or new, incompletely
written CAN frames to be (re-)processed.

Instead of reading and trusting the head index, read the head index
and calculate the number of CAN frames that were supposedly received -
replace mcp251xfd_rx_ring_update() with mcp251xfd_get_rx_len().

The mcp251xfd_handle_rxif_ring() function reads the received CAN
frames from the chip, iterates over them and pushes them into the
network stack. Prepare that the iteration can be stopped if an old CAN
frame is detected. The actual code to detect old or incomplete frames
and abort will be added in the next patch.

Link: https://lore.kernel.org/all/BL3PR11MB64844C1C95CA3BDADAE4D8CCFBC99@BL3PR11MB6484.namprd11.prod.outlook.com
Reported-by: Stefan Althöfer <Stefan.Althoefer@janztec.com>
Closes: https://lore.kernel.org/all/FR0P281MB1966273C216630B120ABB6E197E89@FR0P281MB1966.DEUP281.PROD.OUTLOOK.COM
Tested-by: Stefan Althöfer <Stefan.Althoefer@janztec.com>
Tested-by: Thomas Kopp <thomas.kopp@microchip.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: mcp251xfd_handle_rxif_ring_uinc(): factor out in separate function

This is a preparation patch.

Sending the UINC messages followed by incrementing the tail pointer
will be called in more than one place in upcoming patches, so factor
this out into a separate function.

Also make mcp251xfd_handle_rxif_ring_uinc() safe to be called with a
"len" of 0.

Tested-by: Stefan Althöfer <Stefan.Althoefer@janztec.com>
Tested-by: Thomas Kopp <thomas.kopp@microchip.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: clarify the meaning of timestamp

The mcp251xfd chip is configured to provide a timestamp with each
received and transmitted CAN frame. The timestamp is derived from the
internal free-running timer, which can also be read from the TBC
register via SPI. The timer is 32 bits wide and is clocked by the
external oscillator (typically 20 or 40 MHz).

To avoid confusion, we call this timestamp "timestamp_raw" or "ts_raw"
for short.

Using the timecounter framework, the "ts_raw" is converted to 64 bit
nanoseconds since the epoch. This is what we call "timestamp".

This is a preparation for the next patches which use the "timestamp"
to work around a bug where so far only the "ts_raw" is used.

Tested-by: Stefan Althöfer <Stefan.Althoefer@janztec.com>
Tested-by: Thomas Kopp <thomas.kopp@microchip.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: move mcp251xfd_timestamp_start()/stop() into mcp251xfd_chip_start/stop()

The mcp251xfd wakes up from Low Power or Sleep Mode when SPI activity
is detected. To avoid this, make sure that the timestamp worker is
stopped before shutting down the chip.

Split the starting of the timestamp worker out of
mcp251xfd_timestamp_init() into the separate function
mcp251xfd_timestamp_start().

Call mcp251xfd_timestamp_init() before mcp251xfd_chip_start(), move
mcp251xfd_timestamp_start() to mcp251xfd_chip_start(). In this way,
mcp251xfd_timestamp_stop() can be called unconditionally by
mcp251xfd_chip_stop().

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: update errata references

Since the errata references have been added to the driver, new errata
sheets have been published. Update the references for the mcp2517fd
and mcp2518fd. For completeness add references for the mcp251863.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcp251xfd: properly indent labels

To fix the coding style, remove the whitespace in front of labels.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

ice: do not init struct ice_adapter more times than needed

Allocate and initialize struct ice_adapter object only once per physical
card instead of once per port. This is not a big deal by now, but we want
to extend this struct more and more in the near future. Our plans include
PTP stuff and a devlink instance representing whole-device/physical card.

Transactions requiring to be sleep-able (like those doing user (here ice)
memory allocation) must be performed with an additional (on top of xarray)
mutex. Adding it here removes need to xa_lock() manually.

Since this commit is a reimplementation of ice_adapter_get(), a rather new
scoped_guard() wrapper for locking is used to simplify the logic.

It's worth to mention that xa_insert() use gives us both slot reservation
and checks if it is already filled, what simplifies code a tiny bit.

Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Distinguish driver reset and removal for AQ shutdown

Admin queue command for shutdown AQ contains a flag to indicate driver
unload. However, the flag is always set in the driver, even for resets. It
can cause the firmware to consider driver as unloaded once the PF reset is
triggered on all ports of device, which could lead to unexpected results.

Add an additional function parameter to functions that shutdown AQ,
indicating whether the driver is actually unloading.

Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Allow different FW API versions based on MAC type

Allow the driver to be compatible with different FW API versions based
on the device's MAC type. Currently, E810 is only compatible with one
FW API version. Now the driver can be compatible with different FW API
versions for both E810 and E830. For example, E810 FW API version is
1.5.0 and E830 is 1.7.0.

Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Check all ice_vsi_rebuild() errors in function

Check the return value from ice_vsi_rebuild() and prevent the usage of
incorrectly configured VSI.

Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Eric Joyner <eric.joyner@intel.com>
Signed-off-by: Karen Ostrowska <karen.ostrowska@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Add get/set hw address for VFs using devlink commands

Changing the MAC address of the VFs is currently unsupported via devlink.
Add the function handlers to set and get the HW address for the VFs.

Signed-off-by: Karthik Sundaravel <ksundara@redhat.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

MAINTAINERS: update Intel Ethernet maintainers

Since Jesse has moved to a new role, replace him with a new maintainer
to work with Tony on representing Intel networking drivers in the
kernel.

Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

netfilter: xt_recent: Lift restrictions on max hitcount value

Support tracking of up to 65535 packets per table entry instead of just
255 to better facilitate longer term tracking or higher throughput
scenarios.

Note how this aligns sizes of struct recent_entry's 'nstamps' and
'index' fields when 'nstamps' was larger before. This is unnecessary as
the value of 'nstamps' grows along with that of 'index' after being
initialized to 1 (see recent_entry_update()). Its value will thus never
exceed that of 'index' and therefore does not need to provide space for
larger values.

Requested-by: Fabio <pedretti.fabio@gmail.com>
Link: https://bugzilla.netfilter.org/show_bug.cgi?id=1745
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

selftests: netfilter: nft_queue.sh: add test for disappearing listener

If userspace program exits while the queue its subscribed to has packets
those need to be discarded.

commit dc21c6cc3d69 ("netfilter: nfnetlink_queue: acquire rcu_read_lock()
in instance_destroy_rcu()") fixed a (harmless) rcu splat that could be
triggered in this case.

Add a test case to cover this.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

can: gs_usb: add VID/PID for Xylanta SAINT3 product family

Add support for the Xylanta SAINT3 product family.

Cc: Andy Jackson <andy@xylanta.com>
Cc: Ken Aitchison <ken@xylanta.com>
Tested-by: Andy Jackson <andy@xylanta.com>
Link: https://lore.kernel.org/all/20240625140353.769373-1-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge branch 'net-selftests-mirroring-cleanup' into main

Petr Machata says:

====================
selftest: Clean-up and stabilize mirroring tests

The mirroring selftests work by sending ICMP traffic between two hosts.
Along the way, this traffic is mirrored to a gretap netdevice, and counter
taps are then installed strategically along the path of the mirrored
traffic to verify the mirroring took place.

The problem with this is that besides mirroring the primary traffic, any
other service traffic is mirrored as well. At the same time, because the
tests need to work in HW-offloaded scenarios, the ability of the device to
do arbitrary packet inspection should not be taken for granted. Most tests
therefore simply use matchall, one uses flower to match on IP address.
As a result, the selftests are noisy.

mirror_test() accommodated this noisiness by giving the counters an
allowance of several packets. But that only works up to a point, and on
busy systems won't be always enough.

In this patch set, clean up and stabilize the mirroring selftests. The
original intention was to port the tests over to UDP, but the logic of
ICMP ends up being so entangled in the mirroring selftests that the
changes feel overly invasive. Instead, ICMP is kept, but where possible,
we match on ICMP message type, thus filtering out hits by other ICMP
messages.

Where this is not practical (where the counter tap is put on a device
that carries encapsulated packets), switch the counter condition to _at
least_ X observed packets. This is less robust, but barely so --
probably the only scenario that this would not catch is something like
erroneous packet duplication, which would hopefully get caught by the
numerous other tests in this extensive suite.

- Patches #1 to #3 clean up parameters at various helpers.

- Patches #4 to #6 stabilize the mirroring selftests as described above.

- Mirroring tests currently allow testing SW datapath even on HW
  netdevices by trapping traffic to the SW datapath. This complicates
  the tests a bit without a good reason: to test SW datapath, just run
  the selftests on the veth topology. Thus in patch #7, drop support for
  this dual SW/HW testing.

- At this point, some cleanups were either made possible by the previous
  patches, or were always possible. In patches #8 to #11, realize these
  cleanups.

- In patch #12, fix mlxsw mirror_gre selftest to respect setting TESTS.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: mlxsw: mirror_gre: Obey TESTS

This test is unusual in that overriding TESTS does not change the tests to
be run. Split the individual tests into several functions and invoke them
through tests_run() as appropriate.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: libs: Drop unused functions

Nothing calls these.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: libs: Drop slow_path_trap_install()/_uninstall()

These functions are not used anymore.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: mirror_gre_lag_lacp: Drop unnecessary code

The selftest does not use functions from mirror_gre_lib, ditch the import.

It does not use arping either, so drop the require_command as well.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: mlxsw: mirror_gre: Simplify

After the previous patch, the function test_span_failable() is always
called with should_fail=1. Drop the argument and streamline the code.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: mirror: Drop dual SW/HW testing

The mirroring tests are currently run in a skip_hw and optionally a skip_sw
mode. The former tests the SW datapath, the latter the HW datapath, if
available. In order to be able to test SW datapath on HW loopbacks, traps
are installed on ingress to get traffic from the HW datapath to the SW one.
This adds an unnecessary complexity when it would be much simpler to just
use a veth-based topology to test the SW datapath. Thus drop all the code
that supports this dual testing.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: mirror: mirror_test(): Allow exact count of packets

The mirroring selftests work by sending ICMP traffic between two hosts.
Along the way, this traffic is mirrored to a gretap netdevice, and counter
taps are then installed strategically along the path of the mirrored
traffic to verify the mirroring took place.

The problem with this is that besides mirroring the primary traffic, any
other service traffic is mirrored as well. At the same time, because the
tests need to work in HW-offloaded scenarios, the ability of the device to
do arbitrary packet inspection should not be taken for granted. Most tests
therefore simply use matchall, one uses flower to match on IP address.

As a result, the selftests are noisy, because besides the primary ICMP
traffic, any amount of other service traffic is mirrored as well.

mirror_test() accommodated this noisiness by giving the counters an
allowance of several packets. But in the previous patch, where possible,
counter taps were changed to match only on an exact ICMP message. At least
in those cases, we can demand an exact number of packets to match.

Where the tap is installed on a connective netdevice, the exact matching is
not practical (though with u32, anything is possible). In those places,
there should still be some leeway -- and probably bigger than before,
because experience shows that these tests are very noisy.

To that end, change mirror_test() so that it can be either called with an
exact number to expect, or with an expression. Where leeway is needed,
adjust callers to pass a ">= 10" instead of mere 10.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: mirror: do_test_span_dir_ips(): Install accurate taps

The mirroring selftests work by sending ICMP traffic between two hosts.
Along the way, this traffic is mirrored to a gretap netdevice, and counter
taps are then installed strategically along the path of the mirrored
traffic to verify the mirroring took place.

The problem with this is that besides mirroring the primary traffic, any
other service traffic is mirrored as well. At the same time, because the
tests need to work in HW-offloaded scenarios, the ability of the device to
do arbitrary packet inspection should not be taken for granted. Most tests
therefore simply use matchall, one uses flower to match on IP address.

As a result, the selftests are noisy, because besides the primary ICMP
traffic, any amount of other service traffic is mirrored as well.

However, often the counter tap is installed at the remote end of the gretap
tunnel. Since this is a SW-datapath scenario anyway, we can make the filter
arbitrarily accurate.

Thus in this patch, add parameters forward_type and backward_type to
several mirroring test helpers, as some other helpers already have. Then
change do_test_span_dir_ips() to instead of installing one generic tap and
using it for test in both directions, install the tap for each direction
separately, matching on the ICMP type given by these parameters.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: mirror_gre_lag_lacp: Check counters at tunnel

The test works by sending packets through a tunnel, whence they are
forwarded to a LAG. One of the LAG children is removed from the LAG prior
to the exercise, and the test then counts how many packets pass through the
other one. The issue with this is that it counts all packets, not just the
encapsulated ones.

So instead add a second gretap endpoint to receive the sent packets, and
check reception counters there.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: lib: tc_rule_stats_get(): Move default to argument definition

The argument $dir has a fallback value of "ingress". Move the fallback from
the usage site to the argument definition block to make the fact clearer.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: mirror: Drop direction argument from several functions

The argument is not used by these functions except to propagate it for
ultimately no purpose.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: libs: Expand "$@" where possible

In some functions, argument-forwarding through "$@" without listing the
individual arguments explicitly is fundamental to the operation of a
function. E.g. xfail_on_veth() should be able to run various tests in the
fail-to-xfail regime, and usage of "$@" is appropriate as an abstraction
mechanism. For functions such as simple_if_init(), $@ is a handy way to
pass an array.

In other functions, it's merely a mechanism to save some typing, which
however ends up obscuring the real arguments and makes life hard for those
that end up reading the code.

This patch adds some of the implicit function arguments and correspondingly
expands $@'s. In several cases this will come in handy as following patches
adjust the parameter lists.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-flash-modees-firmware' into main

Danielle Ratson says:

====================
Add ability to flash modules' firmware

CMIS compliant modules such as QSFP-DD might be running a firmware that
can be updated in a vendor-neutral way by exchanging messages between
the host and the module as described in section 7.2.2 of revision
4.0 of the CMIS standard.

According to the CMIS standard, the firmware update process is done
using a CDB commands sequence.

CDB (Command Data Block Message Communication) reads and writes are
performed on memory map pages 9Fh-AFh according to the CMIS standard,
section 8.12 of revision 4.0.

Add a pair of new ethtool messages that allow:

* User space to trigger firmware update of transceiver modules

* The kernel to notify user space about the progress of the process

The user interface is designed to be asynchronous in order to avoid RTNL
being held for too long and to allow several modules to be updated
simultaneously. The interface is designed with CMIS compliant modules in
mind, but kept generic enough to accommodate future use cases, if these
arise.

The kernel interface that will implement the firmware update using CDB
command will include 2 layers that will be added under ethtool:

* The upper layer that will be triggered from the module layer, is
cmis_ fw_update.
* The lower one is cmis_cdb.

In the future there might be more operations to implement using CDB
commands. Therefore, the idea is to keep the cmis_cdb interface clean and
the cmis_fw_update specific to the cdb commands handling it.

The communication between the kernel and the driver will be done using
two ethtool operations that enable reading and writing the transceiver
module EEPROM.
The operation ethtool_ops::get_module_eeprom_by_page, that is already
implemented, will be used for reading from the EEPROM the CDB reply,
e.g. reading module setting, state, etc.
The operation ethtool_ops::set_module_eeprom_by_page, that is added in
the current patchset, will be used for writing to the EEPROM the CDB
command such as start firmware image, run firmware image, etc.

Therefore in order for a driver to implement module flashing, that
driver needs to implement the two functions mentioned above.

Patchset overview:
Patch #1-#2: Implement the EEPROM writing in mlxsw.
Patch #3: Define the interface between the kernel and user space.
Patch #4: Add ability to notify the flashing firmware progress.
Patch #5: Veto operations during flashing.
Patch #6: Add extended compliance codes.
Patch #7: Add the cdb layer.
Patch #8: Add the fw_update layer.
Patch #9: Add ability to flash transceiver modules' firmware.

v8:
Patch #7:
* In the ethtool_cmis_wait_for_cond() evaluate the condition once more
  to decide if the error code should be -ETIMEDOUT or something else.
* s/netdev_err/netdev_err_once.

v7:
Patch #4:
* Return -ENOMEM instead of PTR_ERR(attr) on
  ethnl_module_fw_flash_ntf_put_err().
Patch #9:
* Fix Warning for not unlocking the spin_lock in the error flow
             on module_flash_fw_work_list_add().
* Avoid the fall-through on ethnl_sock_priv_destroy().

v6:
* Squash some of the last patch to patch #5 and patch #9.
Patch #3:
* Add paragraph in .rst file.
Patch #4:
* Reserve '1' more place on SKB for NUL terminator in
  the error message string.
* Add more prints on error flow, re-write the printing
  function and add ethnl_module_fw_flash_ntf_put_err().
* Change the communication method so notification will be
  sent in unicast instead of multicast.
* Add new 'struct ethnl_module_fw_flash_ntf_params' that holds
  the relevant info for unicast communication and use it to
  send notification to the specific socket.
* s/nla_put_u64_64bit/nla_put_uint/
Patch #7:
* In ethtool_cmis_cdb_init(), Use 'const' for the 'params'
  parameter.
Patch #8:
* Add a list field to struct ethtool_module_fw_flash for
  module_fw_flash_work_list that will be presented in the next
  patch.
* Move ethtool_cmis_fw_update() cleaning to a new function that
  will be represented in the next patch.
* Move some of the fields in struct ethtool_module_fw_flash to
  a separate struct, so ethtool_cmis_fw_update() will get only
  the relevant parameters for it.
* Edit the relevant functions to get the relevant params for
  them.
* s/CMIS_MODULE_READY_MAX_DURATION_USEC/CMIS_MODULE_READY_MAX_DURATION_MSEC
Patch #9:
* Add a paragraph in the commit message.
* Rename labels in module_flash_fw_schedule().
* Add info to genl_sk_priv_*() and implement the relevant
  callbacks, in order to handle properly a scenario of closing
  the socket from user space before the work item was ended.
* Add a list the holds all the ethtool_module_fw_flash struct
  that corresponds to the in progress work items.
* Add a new enum for the socket types.
* Use both above to identify a flashing socket, add it to the
  list and when closing socket affect only the flashing type.
* Create a new function that will get the work item instead of
  ethtool_cmis_fw_update().
* Edit the relevant functions to get the relevant params for
  them.
* The new function will call the old ethtool_cmis_fw_update(),
  and do the cleaning, so the existence of the list should be
  completely isolated in module.c.
===================

Signed-off-by: David S. Miller <davem@davemloft.net>

ethtool: Add ability to flash transceiver modules' firmware

Add the ability to flash the modules' firmware by implementing the
interface between the user space and the kernel.

Example from a succeeding implementation:

# ethtool --flash-module-firmware swp40 file test.bin

Transceiver module firmware flashing started for device swp40
Transceiver module firmware flashing in progress for device swp40
Progress: 99%
Transceiver module firmware flashing completed for device swp40

In addition, add infrastructure that allows modules to set socket-specific
private data. This ensures that when a socket is closed from user space
during the flashing process, the right socket halts sending notifications
to user space until the work item is completed.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethtool: cmis_fw_update: add a layer for supporting firmware update using CDB

According to the CMIS standard, the firmware update process is done using
a CDB commands sequence.

Implement a work that will be triggered from the module layer in the
next patch the will initiate and execute all the CDB commands in order, to
eventually complete the firmware update process.

This flashing process includes, writing the firmware image, running the new
firmware image and committing it after testing, so that it will run upon
reset.

This work will also notify user space about the progress of the firmware
update process.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethtool: cmis_cdb: Add a layer for supporting CDB commands

CDB (Command Data Block Message Communication) reads and writes are
performed on memory map pages 9Fh-AFh according to the CMIS standard,
section 8.20 of revision 5.2.
Page 9Fh is used to specify the CDB command to be executed and also
provides an area for a local payload (LPL).

According to the CMIS standard, the firmware update process is done using
a CDB commands sequence that will be implemented in the next patch.

The kernel interface that will implement the firmware update using CDB
command will include 2 layers that will be added under ethtool:

* The upper layer that will be triggered from the module layer, is
  cmis_fw_update.
* The lower one is cmis_cdb.

In the future there might be more operations to implement using CDB
commands. Therefore, the idea is to keep the CDB interface clean and the
cmis_fw_update specific to the CDB commands handling it.

These two layers will communicate using the API the consists of three
functions:

- struct ethtool_cmis_cdb *
  ethtool_cmis_cdb_init(struct net_device *dev,
struct ethtool_module_fw_flash_params *params);
- void ethtool_cmis_cdb_fini(struct ethtool_cmis_cdb *cdb);
- int ethtool_cmis_cdb_execute_cmd(struct net_device *dev,
   struct ethtool_cmis_cdb_cmd_args *args);

Add the CDB layer to support initializing, finishing and executing CDB
commands:

* The initialization process will include creating of an ethtool_cmis_cdb
  instance, querying the module CDB support, entering and validating the
  password from user space (CMD 0x0000) and querying the module features
  (CMD 0x0040).

* The finishing API will simply free the ethtool_cmis_cdb instance.

* The executing process will write the CDB command to EEPROM using
  set_module_eeprom_by_page() that was presented earlier, and will
  process the reply from EEPROM.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sfp: Add more extended compliance codes

SFF-8024 is used to define various constants re-used in several SFF
SFP-related specifications.

Add SFF-8024 extended compliance code definitions for CMIS compliant
modules and use them in the next patch to determine the firmware flashing
work.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethtool: Veto some operations during firmware flashing process

Some operations cannot be performed during the firmware flashing
process.

For example:

- Port must be down during the whole flashing process to avoid packet loss
while committing reset for example.

- Writing to EEPROM interrupts the flashing process, so operations like
ethtool dump, module reset, get and set power mode should be vetoed.

- Split port firmware flashing should be vetoed.

In order to veto those scenarios, add a flag in 'struct net_device' that
indicates when a firmware flash is taking place on the module and use it
to prevent interruptions during the process.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethtool: Add flashing transceiver modules' firmware notifications ability

Add progress notifications ability to user space while flashing modules'
firmware by implementing the interface between the user space and the
kernel.

Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>