git.kernel.dk Git - linux-block.git/log

net/mlx5e: RX, Fix releasing page_pool pages twice for striding RQ

mlx5e_free_rx_descs is responsible for calling the dealloc_wqe op which
returns pages to the page_pool. This can happen during flush or close.
For XSK, the regular RQ is flushed (when replaced by the XSK RQ) and
also closed later. This is normally not a problem as the wqe list is
empty on a second call to mlx5e_free_rx_descs. However, for striding RQ,
the previously released wqes from the list will appear as missing
and will be released a second time by mlx5e_free_rx_missing_descs.

This patch sets the no release bits on the striding RQ wqes in the
dealloc_wqe op to prevent releasing the pages a second time.

Please note that the bits are set only in the control path during
close and not in the data path.

Fixes: 4c2a13236807 ("net/mlx5e: RX, Defer page release in striding rq for better recycling")
Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5e: Add vnic devlink health reporter to representors

Create a new devlink health reporter for representor interface, which
reports the values of representor vnic diagnostic counters when diagnosed.

This patch will allow admins to monitor VF diagnostic counters through
the representor-interface vnic reporter.

Example of usage:
$ devlink health diagnose pci/0000:08:00.0/65537 reporter vnic
  vNIC env counters:
    total_error_queues: 0 send_queue_priority_update_flow: 0
    comp_eq_overrun: 0 async_eq_overrun: 0 cq_overrun: 0
    invalid_command: 0 quota_exceeded_command: 0
    nic_receive_steering_discard: 0

Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: Add vnic devlink health reporter to PFs/VFs

Create a vnic devlink health reporter for PFs/VFs interfaces.
The reporter's diagnose callback displays the values of vNIC/vport
transport debug counters of PFs/VFs, as follows:

$ devlink health diagnose pci/0000:08:00.0 reporter vnic
vNIC env counters:
    total_error_queues: 0 send_queue_priority_update_flow: 0
    comp_eq_overrun: 0 async_eq_overrun: 0 cq_overrun: 0
    invalid_command: 0 quota_exceeded_command: 0
    nic_receive_steering_discard: 0

Moreover, add documentation on the reporter functionality and the
counters description.

While at it, expose the vNIC counters diagnose function to be used by
the downstream patch, which will reveal the counters for representor
interfaces.

Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

Revert "net/mlx5: Expose vnic diagnostic counters for eswitch managed vports"

This reverts commit 606e6a72e29dff9e3341c4cc9b554420e4793f401 which exposes
the vnic diagnostic counters via debugfs. Instead, The upcoming series will
expose the same counters through devlink health reporter.

Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

Revert "net/mlx5: Expose steering dropped packets counter"

This reverts commit 4fe1b3a5f8fe2fdcedcaba9561e5b0ae5cb1d15b, which
exposes the steering dropped packets counter via debugfs. The upcoming
series will expose the counter via devlink health reporter instead
of debugfs.

Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Add memory statistics for domain object

Add counters for number of buddies that are currently in use per domain
per buddy type (STE, MODIFY-HEADER, MODIFY-PATTERN).

Signed-off-by: Erez Shitrit <erezsh@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Add more info in domain dbg dump

Add additinal items to domain info dump: Linux version and device name.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Calculate sync threshold of each pool according to its type

When certain ICM chunk is no longer needed, it needs to be freed.
Fully freeing ICM memory involves issuing FW SYNC_STEERING command.
This is very time consuming, and it is impractical to do it for every
freed chunk.
Instead, we manage these 'freed' chunks in hot list (list of chunks
that are not required by SW any more, but HW might still access them).
When size of the hot list reaches certain threshold, we purge it and
issue SYNC_STEERING FW command.
There is one threshold for all the different ICM types, which is not
optimal, as different ICM types require different approach: STEs pool
is very large, and it is very 'dynamic' in its nature, so letting hot
list to become too large will result in a significant perf hiccup when
purging the hot list. Modify action is much smaller and less dynamic,
so we can let the hot list to grow to almost the size of the whole pool.

This patch fixes this problem: instead of having same hot memory
threshold for all the pools, sync operation will be triggered in
accordance with the ICM type.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net/mlx5: DR, Fix dumping of legacy modify_hdr in debug dump

The steering dump parser expects to see 0 as rewrite num of actions
in case pattern/args aren't supported - parsing of legacy modify header
is based on this assumption.
Fix this to align to parser's expectation.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

net: libwx: fix memory leak in wx_setup_rx_resources

When wx_alloc_page_pool() failed in wx_setup_rx_resources(), it doesn't
release DMA buffer. Add dma_free_coherent() in the error path to release
the DMA buffer.

Fixes: 850b971110b2 ("net: libwx: Allocate Rx and Tx resources")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230418065450.2268522-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'another-crack-at-a-handshake-upcall-mechanism'

Chuck Lever says:

====================
Another crack at a handshake upcall mechanism

Here is v10 of a series to add generic support for transport layer
security handshake on behalf of kernel socket consumers (user space
consumers use a security library directly, of course). A summary of
the purpose of these patches is archived here:

https://lore.kernel.org/netdev/1DE06BB1-6BA9-4DB4-B2AA-07DE532963D6@oracle.com/

The first patch in the series applies to the top-level .gitignore
file to address the build warnings reported a few days ago. I intend
to submit that separately. I'd like you to consider taking the rest
of this series for v6.4.

The full patch set to support SunRPC with TLSv1.3 is available in
the topic-rpc-with-tls-upcall branch here, based on net-next/main:

https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git

This patch set includes support for in-transit confidentiality and
peer authentication for both the Linux NFS client and server.

A user space handshake agent for TLSv1.3 to go along with the kernel
patches is available in the "main" branch here:

https://github.com/oracle/ktls-utils
====================

Link: https://lore.kernel.org/r/168174169259.9520.1911007910797225963.stgit@91.116.238.104.host.secureserver.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/handshake: Add Kunit tests for the handshake consumer API

These verify the API contracts and help exercise lifetime rules for
consumer sockets and handshake_req structures.

One way to run these tests:

./tools/testing/kunit/kunit.py run --kunitconfig ./net/handshake/.kunitconfig

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/handshake: Add a kernel API for requesting a TLSv1.3 handshake

To enable kernel consumers of TLS to request a TLS handshake, add
support to net/handshake/ to request a handshake upcall.

This patch also acts as a template for adding handshake upcall
support for other kernel transport layer security providers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/handshake: Create a NETLINK service for handling handshake requests

When a kernel consumer needs a transport layer security session, it
first needs a handshake to negotiate and establish a session. This
negotiation can be done in user space via one of the several
existing library implementations, or it can be done in the kernel.

No in-kernel handshake implementations yet exist. In their absence,
we add a netlink service that can:

a. Notify a user space daemon that a handshake is needed.

b. Once notified, the daemon calls the kernel back via this
   netlink service to get the handshake parameters, including an
   open socket on which to establish the session.

c. Once the handshake is complete, the daemon reports the
   session status and other information via a second netlink
   operation. This operation marks that it is safe for the
   kernel to use the open socket and the security session
   established there.

The notification service uses a multicast group. Each handshake
mechanism (eg, tlshd) adopts its own group number so that the
handshake services are completely independent of one another. The
kernel can then tell via netlink_has_listeners() whether a handshake
service is active and prepared to handle a handshake request.

A new netlink operation, ACCEPT, acts like accept(2) in that it
instantiates a file descriptor in the user space daemon's fd table.
If this operation is successful, the reply carries the fd number,
which can be treated as an open and ready file descriptor.

While user space is performing the handshake, the kernel keeps its
muddy paws off the open socket. A second new netlink operation,
DONE, indicates that the user space daemon is finished with the
socket and it is safe for the kernel to use again. The operation
also indicates whether a session was established successfully.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

.gitignore: Do not ignore .kunitconfig files

Circumvent the .gitignore wildcard to avoid warnings about ignored
.kunitconfig files. As far as I can tell, the warnings are harmless
and these files are not actually ignored.

Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202304142337.jc4oUrov-lkp@intel.com/
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'ipsec-next-2023-04-19' of git://git./linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
ipsec-next 2023-04-19

1) Remove inner/outer modes from input/output path. These are
   not needed anymore. From Herbert Xu.

* tag 'ipsec-next-2023-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
  xfrm: Remove inner/outer modes from output path
  xfrm: Remove inner/outer modes from input path
====================

Link: https://lore.kernel.org/r/20230419075300.452227-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: ethernet: Fix JSON pointer references

A JSON pointer reference (the part after the "#") must start with a "/".
Conversely, references to the entire document must not have a trailing "/"
and should be just a "#". The existing jsonschema package allows these,
but coming changes make allowed "$ref" URIs stricter and throw errors on
these references.

Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230418150628.1528480-1-robh@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: micrel: Update the list of supported phys

At the beginning of the file micrel.c there is list of supported PHYs.
Extend this list with the following PHYs lan8841, lan8814 and lan8804,
as these PHYs were added but the list was not updated.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230418124713.2221451-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: dwmac-meson8b: Avoid cast to incompatible function type

Rather than casting clk_disable_unprepare to an incompatible function
type provide a trivial wrapper with the correct signature for the
use-case.

Reported by clang-16 with W=1:

drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c:276:6: error: cast from 'void (*)(struct clk *)' to 'void (*)(void *)' converts to incompatible function type [-Werror,-Wcast-function-type-strict]
(void(*)(void *))clk_disable_unprepare,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
No functional change intended.
Compile tested only.

Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20230418-dwmac-meson8b-clk-cb-cast-v1-1-e892b670cbbb@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: mt7530: fix support for MT7531BE

There are two variants of the MT7531 switch IC which got different
features (and pins) regarding port 5:
* MT7531AE: SGMII/1000Base-X/2500Base-X SerDes PCS
* MT7531BE: RGMII

Moving the creation of the SerDes PCS from mt753x_setup to mt7530_probe
with commit 6de285229773 ("net: dsa: mt7530: move SGMII PCS creation
to mt7530_probe function") works fine for MT7531AE which got two
instances of mtk-pcs-lynxi, however, MT7531BE requires mt7531_pll_setup
to setup clocks before the single PCS on port 6 (usually used as CPU
port) starts to work and hence the PCS creation failed on MT7531BE.

Fix this by introducing a pointer to mt7531_create_sgmii function in
struct mt7530_priv and call it again at the end of mt753x_setup like it
was before commit 6de285229773 ("net: dsa: mt7530: move SGMII PCS
creation to mt7530_probe function").

Fixes: 6de285229773 ("net: dsa: mt7530: move SGMII PCS creation to mt7530_probe function")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Link: https://lore.kernel.org/r/ZDvlLhhqheobUvOK@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

page_pool: add DMA_ATTR_WEAK_ORDERING on all mappings

Commit c519fe9a4f0d ("bnxt: add dma mapping attributes") added
DMA_ATTR_WEAK_ORDERING to DMA attrs on bnxt. It has since spread
to a few more drivers (possibly as a copy'n'paste).

DMA_ATTR_WEAK_ORDERING only seems to matter on Sparc and PowerPC/cell,
the rarity of these platforms is likely why we never bothered adding
the attribute in the page pool, even though it should be safe to add.

To make the page pool migration in drivers which set this flag less
of a risk (of regressing the precious sparc database workloads or
whatever needed this) let's add DMA_ATTR_WEAK_ORDERING on all
page pool DMA mappings.

We could make this a driver opt-in but frankly I don't think it's
worth complicating the API. I can't think of a reason why device
accesses to packet memory would have to be ordered.

Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/r/20230417152805.331865-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

stmmac: fix changing mac address

Without the IFF_LIVE_ADDR_CHANGE flag being set, the network code
disallows changing the mac address while the interface is UP.

Consequences are, for instance, that the interface can't be used
in a failover bond.

Add the missing flag to net_device priv_flags.

Tested on Intel Elkhart Lake with default settings, as well as with
failover and alb mode bonds.

Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'skbuff-bitfields'

Jakub Kicinski says:

====================
net: skbuff: hide some bitfield members

There is a number of protocol or subsystem specific fields
in struct sk_buff which are only accessed by one subsystem.
We can wrap them in ifdefs with minimal code impact.

This gives us a better chance to save a 2B and a 4B holes
resulting with the following savings (assuming a lucky
kernel config):

- /* size: 232, cachelines: 4, members: 28 */
- /* sum members: 227, holes: 1, sum holes: 4 */
- /* sum bitfield members: 8 bits (1 bytes) */
+ /* size: 224, cachelines: 4, members: 28 */
/* forced alignments: 2 */
- /* last cacheline: 40 bytes */
+ /* last cacheline: 32 bytes */

I think that the changes shouldn't be too controversial.
The only one I'm not 100% sure of is the SCTP one,
12 extra LoC for one bit.. But it did fit squarely
in the "this bit has only one user" category.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Simon Horman <horms@kernel.org>

net: skbuff: hide nf_trace and ipvs_property

Accesses to nf_trace and ipvs_property are already wrapped
by ifdefs where necessary. Don't allocate the bits for those
fields at all if possible.

Acked-by: Florian Westphal <fw@strlen.de>
Acked-by: Simon Horman <horms@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: skbuff: push nf_trace down the bitfield

nf_trace is a debug feature, AFAIU, and yet it sits oddly
high in the sk_buff bitfield. Move it down, pushing up
dst_pending_confirm and inner_protocol_type.

Next change will make nf_trace optional (under Kconfig)
and all optional fields should be placed after 2b fields
to avoid 2b fields straddling bytes.

dst_pending_confirm is L3, so it makes sense next to ignore_df.
inner_protocol_type goes up just to keep the balance.

Acked-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: skbuff: move alloc_cpu into a potential hole

alloc_cpu is currently between 4 byte fields, so it's almost
guaranteed to create a 2B hole. It has a knock on effect of
creating a 4B hole after @end (and @end and @tail being in
different cachelines).

None of this matters hugely, but for kernel configs which
don't enable all the features there may well be a 2B hole
after the bitfield. Move alloc_cpu there.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: skbuff: hide csum_not_inet when CONFIG_IP_SCTP not set

SCTP is not universally deployed, allow hiding its bit
from the skb.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: skbuff: hide wifi_acked when CONFIG_WIRELESS not set

Datacenter kernel builds will very likely not include WIRELESS,
so let them shave 2 bits off the skb by hiding the wifi fields.

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'switch-phy-leds'

Christian Marangi says:

====================
net: Add basic LED support for switch/phy

This is a continue of [1]. It was decided to take a more gradual
approach to implement LEDs support for switch and phy starting with
basic support and then implementing the hw control part when we have all
the prereq done.

This series implements only the brightness_set() and blink_set() ops.
An example of switch implementation is done with qca8k.

For PHY a more generic approach is used with implementing the LED
support in PHY core and with the user (in this case marvell) adding all
the required functions.

Currently we set the default-state as "keep" to not change the default
configuration of the declared LEDs since almost every switch have a
default configuration.

[1] https://lore.kernel.org/lkml/20230216013230.22978-1-ansuelsmth@gmail.com/

Changes in new series v7:
- Drop ethernet-leds schema and add unevaluatedProperties to
  ethernet-controller and ethernet-phy schema
- Drop function-enumerator binding from schema example and DT
- Set devname_mandatory for qca8k leds and assign better name to LEDs
  using the format {slave_mii_bus id}:0{port number}:{color}:{function}
- Add Documentation patch for Correct LEDs naming from Andrew
- Changes in Andrew patch:
  - net: phy: Add a binding for PHY LEDs
    - Convert index from u32 to u8
  - net: phy: phy_device: Call into the PHY driver to set LED brightness
    - Fixup kernel doc
    - Convert index from u32 to u8
  - net: phy: marvell: Add software control of the LEDs
    - Convert index from u32 to u8
  - net: phy: phy_device: Call into the PHY driver to set LED blinking
    - Kernel doc fix
    - Convert index from u32 to u8
  - net: phy: marvell: Implement led_blink_set()
    - Convert index from u32 to u8
Changes in new series v6:
- Add leds-ethernet.yaml to document reg in led node
- Update ethernet-controller and ethernet-phy to follow new leds-ethernet schema
- Fix comments in qca8k-leds.c (at least -> at most)
  (wrong GENMASK for led phy 0 and 4)
- Add review and ack tag from Pavel Machek
- Changes in Andrew patch:
  - leds: Provide stubs for when CLASS_LED & NEW_LEDS are disabled
    - Change LED_CLASS to NEW_LEDS for led_init_default_state_get()
  - net: phy: Add a binding for PHY LEDs
    - Add dependency on LED_CLASS
    - Drop review tag from Michal Kubiak (patch modified)
Changes in new series v5:
- Rebase everything on top of net-next/main
- Add more info on LED probe fail for qca8k
- Drop some additional raw number and move to define in qca8k header
- Add additional info on LED mapping on qca8k regs
- Checks port number in qca8k switch port parse
- Changes in Andrew patch:
  - Add additional patch for stubs when CLASS_LED disabled
  - Drop CLASS_LED dependency for PHYLIB (to fix kbot errors reported)
Changes in new series v4:
- Changes in Andrew patch:
  - net: phy: Add a binding for PHY LEDs:
    - Rename phy_led: led_list to list
    - Rename phy_device: led_list to leds
    - Remove phy_leds_remove() since devm_ should do what is needed
    - Fixup documentation for struct phy_led
    - Fail probe on LED errors
  - net: phy: phy_device: Call into the PHY driver to set LED brightness
    - Moved phy_led::phydev from previous patch to here since it is first
      used here.
  - net: phy: marvell: Implement led_blink_set()
    - Use int instead of unsigned
  - net: phy: marvell: Add software control of the LEDs
    - Use int instead of unsigned
- Add depends on LED_CLASS for qca8k Kconfig
- Fix Makefile for qca8k as suggested
- Move qca8k_setup_led_ctrl to separate header
- Move Documentation from dsa-port to ethernet-controller
- Drop trailing . from Andrew patch fro consistency
Changes in new series v3:
- Move QCA8K_LEDS Kconfig option from tristate to bool
- Use new helper led_init_default_state_get for default-state in qca8k
- Drop cled_qca8k_brightness_get() as there isn't a good way to describe
  the mode the led is currently in
- Rework qca8k_led_brightness_get() to return true only when LED is set
  to always ON
Changes in new series v2:
- Add LEDs node for rb3011
- Fix rb3011 switch node unevaluated properties while running
  make dtbs_check
- Fix a copypaste error in qca8k-leds.c for port 4 required shift
- Drop phy-handle usage for qca8k and use qca8k_port_to_phy()
- Add review tag from Andrew
- Add Christian Marangi SOB in each Andrew patch
- Add extra description for dsa-port stressing that PHY have no access
  and LED are controlled by the related MAC
- Add missing additionalProperties for dsa-port.yaml and ethernet-phy.yaml

Changes from the old v8 series:
- Drop linux,default-trigger set to netdev.
- Dropped every hw control related patch and implement only
  blink_set and brightness_set
- Add default-state to "keep" for each LED node example
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Documentation: LEDs: Describe good names for network LEDs

Network LEDs can exist in both the MAC and the PHY. Naming is
difficult because the netdev name is neither stable or unique, do to
commands like ip link set name eth42 dev eth0, and network
namesspaces.

Give some example names where the MAC and the PHY have unique names
based on device tree nodes, or PCI bus addresses.

Since the LED can be used for anything which Linux supports for LEDs,
avoid using names like activity or link, rather describe the location
on the RJ-45, of what the RJ-45 is expected to be used for, WAN/LAN
etc.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

arm: mvebu: dt: Add PHY LED support for 370-rd WAN port

The WAN port of the 370-RD has a Marvell PHY, with one LED on
the front panel.y List this LED in the device tree.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: phy: Document support for LEDs node

Document support for LEDs node in phy and add an example for it.
PHY LED will have to match led pattern and should be treated as a
generic led.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

ARM: dts: qcom: ipq8064-rb3011: Add Switch LED for each port

Add Switch LED for each port for MikroTik RB3011UiAS-RM.

MikroTik RB3011UiAS-RM is a 10 port device with 2 qca8337 switch chips
connected.

It was discovered that in the hardware design all 3 Switch LED trace of
the related port is connected to the same LED. This was discovered by
setting to 'always on' the related led in the switch regs and noticing
that all 3 LED for the specific port (for example for port 1) cause the
connected LED for port 1 to turn on. As an extra test we tried enabling
2 different LED for the port resulting in the LED turned off only if
every led in the reg was off.

Aside from this funny and strange hardware implementation, the device
itself have one green LED for each port, resulting in 10 green LED one
for each of the 10 supported port.

Cc: Jonathan McDowell <noodles@earth.li>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ARM: dts: qcom: ipq8064-rb3011: Drop unevaluated properties in switch nodes

IPQ8064 MikroTik RB3011UiAS-RM DT have currently unevaluted properties
in the 2 switch nodes. The bindings #address-cells and #size-cells are
redundant and cause warning for 'Unevaluated properties are not
allowed'.

Drop these bindings to mute these warning as they should not be there
from the start.

Cc: Jonathan McDowell <noodles@earth.li>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Jonathan McDowell <noodles@earth.li>
Tested-by: Jonathan McDowell <noodles@earth.li>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: dsa: qca8k: add LEDs definition example

Add LEDs definition example for qca8k Switch Family to describe how they
should be defined for a correct usage.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: ethernet-controller: Document support for LEDs node

Document support for LEDs node in ethernet-controller.
Ethernet Controller may support different LEDs that can be configured
for different operation like blinking on traffic event or port link.

Also add some Documentation to describe the difference of these nodes
compared to PHY LEDs, since ethernet-controller LEDs are controllable
by the ethernet controller regs and the possible intergated PHY doesn't
have control on them.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: marvell: Implement led_blink_set()

The Marvell PHY can blink the LEDs, simple on/off. All LEDs blink at
the same rate, and the reset default is 84ms per blink, which is
around 12Hz.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: phy_device: Call into the PHY driver to set LED blinking

Linux LEDs can be requested to perform hardware accelerated
blinking. Pass this to the PHY driver, if it implements the op.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: marvell: Add software control of the LEDs

Add a brightness function, so the LEDs can be controlled from
software using the standard Linux LED infrastructure.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: phy_device: Call into the PHY driver to set LED brightness

Linux LEDs can be software controlled via the brightness file in /sys.
LED drivers need to implement a brightness_set function which the core
will call. Implement an intermediary in phy_device, which will call
into the phy driver if it implements the necessary function.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: Add a binding for PHY LEDs

Define common binding parsing for all PHY drivers with LEDs using
phylib. Parse the DT as part of the phy_probe and add LEDs to the
linux LED class infrastructure. For the moment, provide a dummy
brightness function, which will later be replaced with a call into the
PHY driver. This allows testing since the LED core might otherwise
reject an LED whose brightness cannot be set.

Add a dependency on LED_CLASS. It either needs to be built in, or not
enabled, since a modular build can result in linker errors.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

leds: Provide stubs for when CLASS_LED & NEW_LEDS are disabled

Provide stubs for devm_led_classdev_register_ext() and
led_init_default_state_get() so that LED drivers embedded within other
drivers such as PHYs and Ethernet switches still build when LEDS_CLASS
or NEW_LEDS are disabled. This also helps with Kconfig dependencies,
which are somewhat hairy for phylib and mdio and only get worse when
adding a dependency on LED_CLASS.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: qca8k: add LEDs blink_set() support

Add LEDs blink_set() support to qca8k Switch Family.
These LEDs support hw accellerated blinking at a fixed rate
of 4Hz.

Reject any other value since not supported by the LEDs switch.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Pavel Machek <pavel@ucw.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: qca8k: add LEDs basic support

Add LEDs basic support for qca8k Switch Family by adding basic
brightness_set() support.

Since these LEDs refelect port status, the default label is set to
":port". DT binding should describe the color and function of the
LEDs using standard LEDs api.
Each LED always have the device name as prefix. The device name is
composed from the mii bus id and the PHY addr resulting in example
names like:
- qca8k-0.0:00:amber:lan
- qca8k-0.0:00:white:lan
- qca8k-0.0:01:amber:lan
- qca8k-0.0:01:white:lan

These LEDs supports only blocking variant of the brightness_set()
function since they can sleep during access of the switch leds to set
the brightness.

While at it add to the qca8k header file each mode defined by the Switch
Documentation for future use.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: qca8k: move qca8k_port_to_phy() to header

Move qca8k_port_to_phy() to qca8k header as it's useful for future
reference in Switch LEDs module since the same logic is applied to get
the right index of the switch port.
Make it inline as it's simple function that just decrease the port.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: wwan: Expose secondary AT port on DATA1

Our use-case needs two AT ports available:
One for running a ppp daemon, and another one for management

This patch enables a second AT port on DATA1

Signed-off-by: Jaime Breva <jbreva@nayarsystems.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlx5e-xdp-extend'

Tariq Toukan says:

====================
net/mlx5e: Extend XDP multi-buffer capabilities

This series extends the XDP multi-buffer support in the mlx5e driver.

Patchset breakdown:
- Infrastructural changes and preparations.
- Add XDP multi-buffer support for XDP redirect-in.
- Use TX MPWQE (multi-packet WQE) HW feature for non-linear
  single-segmented XDP frames.
- Add XDP multi-buffer support for striding RQ.

In Striding RQ, we overcome the lack of headroom and tailroom between
the RQ strides by allocating a side page per packet and using it for the
xdp_buff descriptor. We structure the xdp_buff so that it contains
nothing in the linear part, and the whole packet resides in the
fragments.

Performance highlight:

Packet rate test, 64 bytes, 32 channels, MTU 9000 bytes.
CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz.
NIC: ConnectX-6 Dx, at 100 Gbps.

+----------+-------------+-------------+---------+
| Test     | Legacy RQ   | Striding RQ | Speedup |
+----------+-------------+-------------+---------+
| XDP_DROP | 101,615,544 | 117,191,020 | +15%    |
+----------+-------------+-------------+---------+
| XDP_TX   |  95,608,169 | 117,043,422 | +22%    |
+----------+-------------+-------------+---------+

Series generated against net commit:
e61caf04b9f8 Merge branch 'page_pool-allow-caching-from-safely-localized-napi'

I'm submitting this directly as Saeed is traveling.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: RX, Add XDP multi-buffer support in Striding RQ

Here we add support for multi-buffer XDP handling in Striding RQ, which
is our default out-of-the-box RQ type. Before this series, loading such
an XDP program would fail, until you switch to the legacy RQ (by
unsetting the rx_striding_rq priv-flag).

To overcome the lack of headroom and tailroom between the strides, we
allocate a side page to be used for the descriptor (xdp_buff / skb) and
the linear part. When an XDP program is attached, we structure the
xdp_buff so that it contains no data in the linear part, and the whole
packet resides in the fragments.

In case of XDP_PASS, where an SKB still needs to be created, we copy up
to 256 bytes to its linear part, to match the current behavior, and
satisfy functions that assume finding the packet headers in the SKB
linear part (like eth_type_trans).

Performance testing:

Packet rate test, 64 bytes, 32 channels, MTU 9000 bytes.
CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz.
NIC: ConnectX-6 Dx, at 100 Gbps.

+----------+-------------+-------------+---------+
| Test     | Legacy RQ   | Striding RQ | Speedup |
+----------+-------------+-------------+---------+
| XDP_DROP | 101,615,544 | 117,191,020 | +15%    |
+----------+-------------+-------------+---------+
| XDP_TX   |  95,608,169 | 117,043,422 | +22%    |
+----------+-------------+-------------+---------+

Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: RX, Prepare non-linear striding RQ for XDP multi-buffer support

In preparation for supporting XDP multi-buffer in striding RQ, use
xdp_buff struct to describe the packet. Make its skb_shared_info collide
the one of the allocated SKB, then add the fragments using the xdp_buff
API.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: RX, Generalize mlx5e_fill_mxbuf()

Make the function more generic. Let it get an additional frame_sz
parameter instead of deriving it from the RQ struct.

No functional change here, just a preparation for a downstream patch.

Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: RX, Take shared info fragment addition into a function

Introduce mlx5e_add_skb_shared_info_frag(), a function dedicated for
adding a fragment into a struct skb_shared_info object.

Use it in the Legacy RQ flow. Similar usage will be added in a
downstream patch by the corresponding Striding RQ flow.

Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: XDP, Allow non-linear single-segment frames in XDP TX MPWQE

Under a few restrictions, TX MPWQE feature can serve multiple TX packets
in a single TX descriptor. It requires each of the packets to have a
single scatter entry / segment.

Today we allow only linear frames to use this feature, although there's
no real problem with non-linear ones where the whole packet reside in
the first fragment.

Expand the XDP TX MPWQE feature support to include such frames. This is
in preparation for the downstream patch, in which we will generate such
non-linear frames.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: XDP, Remove un-established assumptions on XDP buffer

Remove the assumption of non-zero linear length in the XDP xmit
function, used to serve both internal XDP_TX operations as well as
redirected-in requests.

Do not apply the MLX5E_XDP_MIN_INLINE check unless necessary.

Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: XDP, Consider large muti-buffer packets in Striding RQ params calculations

Function mlx5e_rx_get_linear_stride_sz() returns PAGE_SIZE immediately
in case an XDP program is attached. The more accurate formula is
ALIGN(sz, PAGE_SIZE), to prevent two packets from residing on the same
page.

The assumption behind the current code is that sz <= PAGE_SIZE holds for
all cases with XDP program set.

This is true because it is being called from:
- 3 times from Striding RQ flows, in which XDP is not supported for such
large packets.
- 1 time from Legacy RQ flow, under the condition
mlx5e_rx_is_linear_skb().

No functional change here, just removing the implied assumption in
preparation for supporting XDP multi-buffer in Striding RQ.

Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: XDP, Let XDP checker function get the params as input

Change mlx5e_xdp_allowed() so it gets the params structure with the
xdp_prog applied, rather than creating a local copy based on the current
params in priv.

This reduces the amount of memory on the stack, and acts on the exact
params instance that's about to be applied.

Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: XDP, Improve Striding RQ check with XDP

Non-linear mem scheme of Striding RQ does not yet support XDP at this
point. Take the check where it belongs, inside the params validation
function mlx5e_params_validate_xdp().

Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: XDP, Add support for multi-buffer XDP redirect-in

Handle multi-buffer XDP redirect-in requests coming through
mlx5e_xdp_xmit.

Extend struct mlx5e_xmit_data_frags with an additional dma_arr field, to
point to the fragments dma mapping, as they cannot be retrieved via the
page_pool_get_dma_addr() function.

Push a dma_addr xdpi instance per each fragment, and use them in the
completion flow to dma_unmap the frags.

Finally, remove the restriction in mlx5e_open_xdpsq, and set the flag in
xdp_features.

Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: XDP, Use multiple single-entry objects in xdpi_fifo

Here we fix the current wi->num_pkts abuse, as it was used to indicate
multiple xdpi entries in the xdpi_fifo.

Instead, reduce mlx5e_xdp_info to the size of a single field, making it
a union of unions. Per packet, use as many instances as needed to
provide the information needed at the time of completion.

The sequence of xdpi instances pushed is well defined, derived by the
xmit_mode.

Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: XDP, Remove doubtful unlikely calls

It is not likely nor unlikely that the xdp buff has fragments, it
depends on the program loaded and size of the packet received.

Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: Introduce extended version for mlx5e_xmit_data

Introduce struct mlx5e_xmit_data_frags to be used for non-linear xmit
buffers. Let it include sinfo pointer.

Take one bit from the len field to indicate if the descriptor has
fragments and can be casted-up into the extended version.

Zero-init to make sure has_frags, and potentially future fields, are
zero when not explicitly assigned.

Another field will be added in a downstream patch to indicate and point
to dma addresses of the different frags, for redirect-in requests.

This simplifies the mlx5e_xmit_xdp_frame/mlx5e_xmit_xdp_frame_mpwqe
functions params.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: Move struct mlx5e_xmit_data to datapath header

Move TX datapath struct from the generic en.h to the datapath txrx.h
header, where it belongs.

Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: Move XDP struct and enum to XDP header

Move struct mlx5e_xdp_info and enum mlx5e_xdp_xmit_mode from the generic
en.h to the XDP header, where they belong.

Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: stmmac: dwmac-sti: remove stih415/stih416/stid127

Remove no more supported platforms (stih415/stih416 and stid127)

Signed-off-by: Alain Volmat <avolmat@me.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20230416195523.61075-1-avolmat@me.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mscc: ocelot: remove incompatible prototypes

The types for the register argument changed recently, but there are
still incompatible prototypes that got left behind, and gcc-13 warns
about these:

In file included from drivers/net/ethernet/mscc/ocelot.c:13:
drivers/net/ethernet/mscc/ocelot.h:97:5: error: conflicting types for 'ocelot_port_readl' due to enum/integer mismatch; have 'u32(struct ocelot_port *, u32)' {aka 'unsigned int(struct ocelot_port *, unsigned int)'} [-Werror=enum-int-mismatch]
97 | u32 ocelot_port_readl(struct ocelot_port *port, u32 reg);
| ^~~~~~~~~~~~~~~~~

Just remove the two prototypes, and rely on the copy in the global
header.

Fixes: 9ecd05794b8d ("net: mscc: ocelot: strengthen type of "u32 reg" in I/O accessors")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20230417205531.1880657-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: propagate feature flags to vlan

stmmac_dev_probe doesn't propagate feature flags to VLANs. So features
like offloading don't correspond with the general features and it's not
possible to manipulate features via ethtool -K to affect VLANs.

Propagate feature flags to vlan features. Drop TSO feature because
it does not work on VLANs yet.

Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Link: https://lore.kernel.org/r/20230417192845.590034-1-vinschen@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bonding: add software tx timestamping support

Currently, bonding only obtain the timestamp (ts) information of
the active slave, which is available only for modes 1, 5, and 6.
For other modes, bonding only has software rx timestamping support.

However, some users who use modes such as LACP also want tx timestamp
support. To address this issue, let's check the ts information of each
slave. If all slaves support tx timestamping, we can enable tx
timestamping support for the bond.

Add a note that the get_ts_info may be called with RCU, or rtnl or
reference on the device in ethtool.h>

Suggested-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://lore.kernel.org/r/20230418034841.2566262-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'add-ethernet-driver-for-starfive-jh7110-soc'

Samin Guo says:

====================
Add Ethernet driver for StarFive JH7110 SoC

This series adds ethernet support for the StarFive JH7110 RISC-V SoC,
which includes a dwmac-5.20 MAC driver (from Synopsys DesignWare).
This series has been tested and works fine on VisionFive-2 v1.2A and
v1.3B SBC boards.

For more information and support, you can visit RVspace wiki[1].
You can simply review or test the patches at the link [2].
This patchset should be applied after the patchset [3] [4].

[1]: https://wiki.rvspace.org/
[2]: https://github.com/saminGuo/linux/tree/vf2-6.3rc4-gmac-net-next
[3]: https://patchwork.kernel.org/project/linux-riscv/cover/20230401111934.130844-1-hal.feng@starfivetech.com
[4]: https://patchwork.kernel.org/project/linux-riscv/cover/20230315055813.94740-1-william.qiu@starfivetech.com
====================

Link: https://lore.kernel.org/r/20230417100251.11871-1-samin.guo@starfivetech.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: stmmac: dwmac-starfive: Add phy interface settings

dwmac supports multiple modess. When working under rmii and rgmii,
you need to set different phy interfaces.

According to the dwmac document, when working in rmii, it needs to be
set to 0x4, and rgmii needs to be set to 0x1.

The phy interface needs to be set in syscon, the format is as follows:
starfive,syscon: <&syscon, offset, shift>

Tested-by: Tommaso Merciai <tomm.merciai@gmail.com>
Signed-off-by: Samin Guo <samin.guo@starfivetech.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: stmmac: Add glue layer for StarFive JH7110 SoC

This adds StarFive dwmac driver support on the StarFive JH7110 SoC.

Tested-by: Tommaso Merciai <tomm.merciai@gmail.com>
Co-developed-by: Emil Renner Berthing <kernel@esmil.dk>
Signed-off-by: Emil Renner Berthing <kernel@esmil.dk>
Signed-off-by: Samin Guo <samin.guo@starfivetech.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

dt-bindings: net: Add support StarFive dwmac

Add documentation to describe StarFive dwmac driver(GMAC).

Signed-off-by: Yanhong Wang <yanhong.wang@starfivetech.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Samin Guo <samin.guo@starfivetech.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

dt-bindings: net: snps,dwmac: Add 'ahb' reset/reset-name

According to:
stmmac_platform.c: stmmac_probe_config_dt
stmmac_main.c: stmmac_dvr_probe

dwmac controller may require one (stmmaceth) or two (stmmaceth+ahb)
reset signals, and the maxItems of resets/reset-names is going to be 2.

The gmac of Starfive Jh7110 SOC must have two resets.
it uses snps,dwmac-5.20 IP.

Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Samin Guo <samin.guo@starfivetech.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: stmmac: platform: Add snps,dwmac-5.20 IP compatible string

Add "snps,dwmac-5.20" compatible string for 5.20 version that can avoid
to define some platform data in the glue layer.

Tested-by: Tommaso Merciai <tomm.merciai@gmail.com>
Signed-off-by: Emil Renner Berthing <kernel@esmil.dk>
Signed-off-by: Samin Guo <samin.guo@starfivetech.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

dt-bindings: net: snps,dwmac: Add dwmac-5.20 version

Add dwmac-5.20 IP version to snps.dwmac.yaml

Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Emil Renner Berthing <kernel@esmil.dk>
Signed-off-by: Samin Guo <samin.guo@starfivetech.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'r8169-use-new-macros-from-netdev_queues-h'

Heiner Kallweit says:

====================
r8169: use new macros from netdev_queues.h

Add one missing subqueue version of the macros, and use the new macros
in r8169 to simplify the code.
====================

Link: https://lore.kernel.org/r/7147a001-3d9c-a48d-d398-a94c666aa65b@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

r8169: use new macro netif_subqueue_completed_wake in the tx cleanup path

Use new net core macro netif_subqueue_completed_wake to simplify
the code of the tx cleanup path.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

r8169: use new macro netif_subqueue_maybe_stop in rtl8169_start_xmit

Use new net core macro netif_subqueue_maybe_stop in the start_xmit path
to simplify the code. Whilst at it, set the tx queue start threshold to
twice the stop threshold. Before values were the same, resulting in
stopping/starting the queue more often than needed.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: add macro netif_subqueue_completed_wake

Add netif_subqueue_completed_wake, complementing the subqueue versions
netif_subqueue_try_stop and netif_subqueue_maybe_stop.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'ocelot-felix-driver-support-for-preemptible-traffic-classes'

Vladimir Oltean says:

====================
Ocelot/Felix driver support for preemptible traffic classes

The series "Add tc-mqprio and tc-taprio support for preemptible traffic
classes" from:
https://lore.kernel.org/netdev/20230220122343.1156614-1-vladimir.oltean@nxp.com/

was eventually submitted in a form without the support for the
Ocelot/Felix switch driver. This patch set picks up that work again,
and presents a fairly modified form compared to the original.
====================

Link: https://lore.kernel.org/r/20230415170551.3939607-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mscc: ocelot: add support for preemptible traffic classes

In order to not transmit (preemptible) frames which will be received by
the link partner as corrupted (because it doesn't support FP), the
hardware requires the driver to program the QSYS_PREEMPTION_CFG_P_QUEUES
register only after the MAC Merge layer becomes active (verification
succeeds, or was disabled).

There are some cases when FP is known (through experimentation) to be
broken. Give priority to FP over cut-through switching, and disable FP
for known broken link modes.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: felix: act upon the mqprio qopt in taprio offload

The mqprio queue configuration can appear either through
TC_SETUP_QDISC_MQPRIO or through TC_SETUP_QDISC_TAPRIO. Make sure both
are treated in the same way.

Code does nothing new for now (except for rejecting multiple TXQs per
TC, which is a useless concept with DSA switches).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mscc: ocelot: add support for mqprio offload

This doesn't apply anything to hardware and in general doesn't do
anything that the software variant doesn't do, except for checking that
there isn't more than 1 TXQ per TC (TXQs for a DSA switch are a dubious
concept anyway). The reason we add this is to be able to parse one more
field added to struct tc_mqprio_qopt_offload, namely preemptible_tcs.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ferenc Fejes <fejes@inf.elte.hu>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mscc: ocelot: don't rely on cached verify_status in ocelot_port_get_mm()

ocelot_mm_update_port_status() updates mm->verify_status, but when the
verification state of a port changes, an IRQ isn't emitted, but rather,
only when the verification state reaches one of the final states (like
DISABLED, FAILED, SUCCEEDED) - things that would affect mm->tx_active,
which is what the IRQ *is* actually emitted for.

That is to say, user space may miss reports of an intermediary MAC Merge
verification state (like from INITIAL to VERIFYING), unless there was an
IRQ notifying the driver of the change in mm->tx_active as well.

This is not a huge deal, but for reliable reporting to user space, let's
call ocelot_mm_update_port_status() synchronously from
ocelot_port_get_mm(), which makes user space see the current MM status.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mscc: ocelot: optimize ocelot_mm_irq()

The MAC Merge IRQ of all ports is shared with the PTP TX timestamp IRQ
of all ports, which means that currently, when a PTP TX timestamp is
generated, felix_irq_handler() also polls for the MAC Merge layer status
of all ports, looking for changes. This makes the kernel do more work,
and under certain circumstances may make ptp4l require a
tx_timestamp_timeout argument higher than before.

Changes to the MAC Merge layer status are only to be expected under
certain conditions - its TX direction needs to be enabled - so we can
check early if that is the case, and omit register access otherwise.

Make ocelot_mm_update_port_status() skip register access if
mm->tx_enabled is unset, and also call it once more, outside IRQ
context, from ocelot_port_set_mm(), when mm->tx_enabled transitions from
true to false, because an IRQ is also expected in that case.

Also, a port may have its MAC Merge layer enabled but it may not have
generated the interrupt. In that case, there's no point in writing to
DEV_MM_STATUS to acknowledge that IRQ. We can reduce the number of
register writes per port with MM enabled by keeping an "ack" variable
which writes the "write-one-to-clear" bits. Those are 3 in number:
PRMPT_ACTIVE_STICKY, UNEXP_RX_PFRM_STICKY and UNEXP_TX_PFRM_STICKY.
The other fields in DEV_MM_STATUS are read-only and it doesn't matter
what is written to them, so writing zero is just fine.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mscc: ocelot: remove struct ocelot_mm_state :: lock

Unfortunately, the workarounds for the hardware bugs make it pointless
to keep fine-grained locking for the MAC Merge state of each port.

Our vsc9959_cut_through_fwd() implementation requires
ocelot->fwd_domain_lock to be held, in order to serialize with changes
to the bridging domains and to port speed changes (which affect which
ports can be cut-through). Simultaneously, the traffic classes which can
be cut-through cannot be preemptible at the same time, and this will
depend on the MAC Merge layer state (which changes from threaded
interrupt context).

Since vsc9959_cut_through_fwd() would have to hold the mm->lock of all
ports for a correct and race-free implementation with respect to
ocelot_mm_irq(), in practice it means that any time a port's mm->lock is
held, it would potentially block holders of ocelot->fwd_domain_lock.

In the interest of simple locking rules, make all MAC Merge layer state
changes (and preemptible traffic class changes) be serialized by the
ocelot->fwd_domain_lock.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: mscc: ocelot: export a single ocelot_mm_irq()

When the switch emits an IRQ, we don't know what caused it, and we
iterate through all ports to check the MAC Merge status.

Move that iteration inside the ocelot lib; we will change the locking in
a future change and it would be good to encapsulate that lock completely
within the ocelot lib.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'xdp-rx-hwts-metadata-for-stmmac-driver'

Song Yoong Siang says:

====================
XDP Rx HWTS metadata for stmmac driver

Implemented XDP receive hardware timestamp metadata for stmmac driver.

This patchset is tested with tools/testing/selftests/bpf/xdp_hw_metadata.
Below are the test steps and results.

Command on DUT:
sudo ./xdp_hw_metadata <interface name>

Command on Link Partner:
echo -n xdp | nc -u -q1 <destination IPv4 addr> 9091
echo -n skb | nc -u -q1 <destination IPv4 addr> 9092

Result for port 9091:
poll: 1 (0) skip=1 fail=0 redir=1
xsk_ring_cons__peek: 1
0x55f69f65f6d0: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000
rx_timestamp: 1677762069053692631
No rx_hash err=-95
0x55f69f65f6d0: complete idx=8 addr=8000

Result for port 9092:
poll: 1 (0) skip=2 fail=0 redir=1
found skb hwtstamp = 1677762071.937207680
====================

Link: https://lore.kernel.org/r/20230415064503.3225835-1-yoong.siang.song@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: add Rx HWTS metadata to XDP ZC receive pkt

Add receive hardware timestamp metadata support via kfunc to XDP Zero Copy
receive packets.

Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: add Rx HWTS metadata to XDP receive pkt

Add receive hardware timestamp metadata support via kfunc to XDP receive
packets.

Suggested-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: introduce wrapper for struct xdp_buff

Introduce struct stmmac_xdp_buff as a preparation to support XDP Rx
metadata via kfuncs.

Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'support-tunnel-mode-in-mlx5-ipsec-packet-offload'

Leon Romanovsky says:

====================
Support tunnel mode in mlx5 IPsec packet offload

This series extends mlx5 to support tunnel mode in its IPsec packet
offload implementation.

v0: https://lore.kernel.org/all/cover.1681106636.git.leonro@nvidia.com
====================

Link: https://lore.kernel.org/r/cover.1681388425.git.leonro@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Accept tunnel mode for IPsec packet offload

Open mlx5 driver to accept IPsec tunnel mode.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Create IPsec table with tunnel support only when encap is disabled

Current hardware doesn't support double encapsulation which is
happening when IPsec packet offload tunnel mode is configured
together with eswitch encap option.

Any user attempt to add new SA/policy after he/she sets encap mode, will
generate the following FW syndrome:

mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 1904): CREATE_FLOW_TABLE(0x930) op_mod(0x0) failed,
status bad parameter(0x3), syndrome (0xa43321), err(-22)

Make sure that we block encap changes before creating flow steering tables.
This is applicable only for packet offload in tunnel mode, while packet
offload in transport mode and crypto offload, don't have such limitation
as they don't perform encapsulation.

Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Allow blocking encap changes in eswitch

Existing eswitch encap option enables header encapsulation. Unfortunately
currently available hardware isn't able to perform double encapsulation,
which can happen once IPsec packet offload tunnel mode is used together
with encap mode set to BASIC.

So as a solution for misconfiguration, provide an option to block encap
changes, which will be used for IPsec packet offload.

Reviewed-by: Emeel Hakim <ehakim@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Listen to ARP events to update IPsec L2 headers in tunnel mode

In IPsec packet offload mode all header manipulations are performed by
hardware, which is responsible to add/remove L2 header with source and
destinations MACs.

CX-7 devices don't support offload of in-kernel routing functionality,
as such HW needs external help to fill other side MAC as it isn't
available for HW.

As a solution, let's listen to neigh ARP updates and reconfigure IPsec
rules on the fly once new MAC data information arrives.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Support IPsec TX packet offload in tunnel mode

Extend mlx5 driver with logic to support IPsec TX packet offload
in tunnel mode.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Support IPsec RX packet offload in tunnel mode

Extend mlx5 driver with logic to support IPsec RX packet offload
in tunnel mode.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Prepare IPsec packet reformat code for tunnel mode

Refactor setup_pkt_reformat() function to accommodate future extension
to support tunnel mode.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Configure IPsec SA tables to support tunnel mode

Create SA flow steering tables both for RX and TX with tunnel reformat
property. This allows to add and delete extra headers needed for tunnel
mode.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Check IPsec packet offload tunnel capabilities

Validate tunnel mode support for IPsec packet offload.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Add IPsec packet offload tunnel bits

Extend packet reformat types and flow table capabilities with
IPsec packet offload tunnel bits.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>