linux-2.6-block.git
4 months agoBluetooth: ath3k: Fix multiple issues reported by checkpatch.pl
Uri Arev [Fri, 5 Apr 2024 21:42:24 +0000 (00:42 +0300)]
Bluetooth: ath3k: Fix multiple issues reported by checkpatch.pl

This fixes some CHECKs reported by the checkpatch script.

Issues reported in ath3k.c:
-------
ath3k.c
-------
CHECK: Please don't use multiple blank lines
+
+

CHECK: Blank lines aren't necessary after an open brace '{'
+static const struct usb_device_id ath3k_blist_tbl[] = {
+

CHECK: Alignment should match open parenthesis
+static int ath3k_load_firmware(struct usb_device *udev,
+                               const struct firmware *firmware)

CHECK: Alignment should match open parenthesis
+               err = usb_bulk_msg(udev, pipe, send_buf, size,
+                                       &len, 3000);

CHECK: Unnecessary parentheses around 'len != size'
+               if (err || (len != size)) {

CHECK: Alignment should match open parenthesis
+static int ath3k_get_version(struct usb_device *udev,
+                       struct ath3k_version *version)

CHECK: Alignment should match open parenthesis
+static int ath3k_load_fwfile(struct usb_device *udev,
+               const struct firmware *firmware)

CHECK: Alignment should match open parenthesis
+               err = usb_bulk_msg(udev, pipe, send_buf, size,
+                                       &len, 3000);

CHECK: Unnecessary parentheses around 'len != size'
+               if (err || (len != size)) {

CHECK: Blank lines aren't necessary after an open brace '{'
+       switch (fw_version.ref_clock) {
+

CHECK: Alignment should match open parenthesis
+       snprintf(filename, ATH3K_NAME_LEN, "ar3k/ramps_0x%08x_%d%s",
+               le32_to_cpu(fw_version.rom_version), clk_value, ".dfu");

CHECK: Alignment should match open parenthesis
+static int ath3k_probe(struct usb_interface *intf,
+                       const struct usb_device_id *id)

CHECK: Alignment should match open parenthesis
+                       BT_ERR("Firmware file \"%s\" not found",
+                                                       ATH3K_FIRMWARE);

CHECK: Alignment should match open parenthesis
+               BT_ERR("Firmware file \"%s\" request failed (err=%d)",
+                                               ATH3K_FIRMWARE, ret);

total: 0 errors, 0 warnings, 14 checks, 540 lines checked

Signed-off-by: Uri Arev <me@wantyapps.xyz>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
4 months agoBluetooth: hci_bcm: Limit bcm43455 baudrate to 2000000
Hans de Goede [Sat, 6 Apr 2024 13:51:06 +0000 (15:51 +0200)]
Bluetooth: hci_bcm: Limit bcm43455 baudrate to 2000000

Like the bcm43430a0 the bcm43455 BT does not support the 0xfc45 command
to set the UART clock to 48 MHz and because of this it does not work
at 4000000 baud.

These chips are found on ACPI/x86 devices where the operating baudrate
does not come from the firmware but is hardcoded at 4000000, which does
not work.

Make the driver_data for the "BCM2EA4" ACPI HID which is used for
the bcm43455 BT point to bcm43430_device_data which limits the baudrate
to 2000000.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
4 months agoBluetooth: L2CAP: Avoid -Wflex-array-member-not-at-end warnings
Gustavo A. R. Silva [Wed, 27 Mar 2024 16:23:51 +0000 (10:23 -0600)]
Bluetooth: L2CAP: Avoid -Wflex-array-member-not-at-end warnings

-Wflex-array-member-not-at-end is coming in GCC-14, and we are getting
ready to enable it globally.

There are currently a couple of objects (`req` and `rsp`), in a couple
of structures, that contain flexible structures (`struct l2cap_ecred_conn_req`
and `struct l2cap_ecred_conn_rsp`), for example:

struct l2cap_ecred_rsp_data {
        struct {
                struct l2cap_ecred_conn_rsp rsp;
                __le16 scid[L2CAP_ECRED_MAX_CID];
        } __packed pdu;
        int count;
};

in the struct above, `struct l2cap_ecred_conn_rsp` is a flexible
structure:

struct l2cap_ecred_conn_rsp {
        __le16 mtu;
        __le16 mps;
        __le16 credits;
        __le16 result;
        __le16 dcid[];
};

So, in order to avoid ending up with a flexible-array member in the
middle of another structure, we use the `struct_group_tagged()` (and
`__struct_group()` when the flexible structure is `__packed`) helper
to separate the flexible array from the rest of the members in the
flexible structure:

struct l2cap_ecred_conn_rsp {
        struct_group_tagged(l2cap_ecred_conn_rsp_hdr, hdr,

... the rest of members

        );
        __le16 dcid[];
};

With the change described above, we now declare objects of the type of
the tagged struct, in this example `struct l2cap_ecred_conn_rsp_hdr`,
without embedding flexible arrays in the middle of other structures:

struct l2cap_ecred_rsp_data {
        struct {
                struct l2cap_ecred_conn_rsp_hdr rsp;
                __le16 scid[L2CAP_ECRED_MAX_CID];
        } __packed pdu;
        int count;
};

Also, when the flexible-array member needs to be accessed, we use
`container_of()` to retrieve a pointer to the flexible structure.

We also use the `DEFINE_RAW_FLEX()` helper for a couple of on-stack
definitions of a flexible structure where the size of the flexible-array
member is known at compile-time.

So, with these changes, fix the following warnings:
net/bluetooth/l2cap_core.c:1260:45: warning: structure containing a
flexible array member is not at the end of another structure
[-Wflex-array-member-not-at-end]
net/bluetooth/l2cap_core.c:3740:45: warning: structure containing a
flexible array member is not at the end of another structure
[-Wflex-array-member-not-at-end]
net/bluetooth/l2cap_core.c:4999:45: warning: structure containing a
flexible array member is not at the end of another structure
[-Wflex-array-member-not-at-end]
net/bluetooth/l2cap_core.c:7116:47: warning: structure containing a
flexible array member is not at the end of another structure
[-Wflex-array-member-not-at-end]

Link: https://github.com/KSPP/linux/issues/202
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
4 months agoBluetooth: hci_intel: Fix multiple issues reported by checkpatch.pl
Uri Arev [Tue, 2 Apr 2024 18:37:45 +0000 (21:37 +0300)]
Bluetooth: hci_intel: Fix multiple issues reported by checkpatch.pl

This fixes the following CHECKs, WARNINGs, and ERRORs reported in
hci_intel.c

Reported by checkpatch.pl:
-----------
hci_intel.c
-----------
WARNING: Prefer using '"%s...", __func__' to using 'intel_setup', this
        function's name, in a string
+       bt_dev_dbg(hdev, "start intel_setup");

ERROR: code indent should use tabs where possible
+        /* Check for supported iBT hardware variants of this firmware$

ERROR: code indent should use tabs where possible
+         * loading method.$

ERROR: code indent should use tabs where possible
+         *$

ERROR: code indent should use tabs where possible
+         * This check has been put in place to ensure correct forward$

ERROR: code indent should use tabs where possible
+         * compatibility options when newer hardware variants come along.$

ERROR: code indent should use tabs where possible
+         */$

CHECK: No space is necessary after a cast
+       duration = (unsigned long long) ktime_to_ns(delta) >> 10;

CHECK: No space is necessary after a cast
+       duration = (unsigned long long) ktime_to_ns(delta) >> 10;

WARNING: Missing a blank line after declarations
+               int err = PTR_ERR(intel->rx_skb);
+               bt_dev_err(hu->hdev, "Frame reassembly failed (%d)", err);

Signed-off-by: Uri Arev <me@wantyapps.xyz>
Suggested-by: Luiz Augusto von Dentz <luiz.dentz@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
4 months agoBluetooth: ISO: Handle PA sync when no BIGInfo reports are generated
Iulia Tanasescu [Tue, 2 Apr 2024 11:39:31 +0000 (14:39 +0300)]
Bluetooth: ISO: Handle PA sync when no BIGInfo reports are generated

In case of a Broadcast Source that has PA enabled but no active BIG,
a Broadcast Sink needs to establish PA sync and parse BASE from PA
reports.

This commit moves the allocation of a PA sync hcon from the BIGInfo
advertising report event to the PA sync established event. After the
first complete PA report, the hcon is notified to the ISO layer. A
child socket is allocated and enqueued in the parent's accept queue.

BIGInfo reports also need to be processed, to extract the encryption
field and inform userspace. After the first BIGInfo report is received,
the PA sync hcon is notified again to the ISO layer. Since a socket will
be found this time, the socket state will transition to BT_CONNECTED and
the userspace will be woken up using sk_state_change.

Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
4 months agoBluetooth: ISO: Make iso_get_sock_listen generic
Iulia Tanasescu [Tue, 2 Apr 2024 11:39:30 +0000 (14:39 +0300)]
Bluetooth: ISO: Make iso_get_sock_listen generic

This makes iso_get_sock_listen more generic, to return matching socket
in the state provided as argument.

Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
4 months agoBluetooth: hci_event: Set DISCOVERY_FINDING on SCAN_ENABLED
Luiz Augusto von Dentz [Thu, 28 Mar 2024 21:40:53 +0000 (17:40 -0400)]
Bluetooth: hci_event: Set DISCOVERY_FINDING on SCAN_ENABLED

This makes sure that discovery state is properly synchronized otherwise
reports may not generate MGMT DeviceFound events as it would be assumed
that it was not initiated by a discovery session.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
4 months agoBluetooth: Add proper definitions for scan interval and window
Luiz Augusto von Dentz [Thu, 28 Mar 2024 19:46:01 +0000 (15:46 -0400)]
Bluetooth: Add proper definitions for scan interval and window

This adds proper definitions for scan interval and window and then make
use of them instead their values.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
4 months agoBluetooth: hci_intel: Convert to platform remove callback returning void
Uwe Kleine-König [Mon, 11 Mar 2024 21:49:54 +0000 (22:49 +0100)]
Bluetooth: hci_intel: Convert to platform remove callback returning void

The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
4 months agoBluetooth: hci_bcm: Convert to platform remove callback returning void
Uwe Kleine-König [Mon, 11 Mar 2024 21:49:53 +0000 (22:49 +0100)]
Bluetooth: hci_bcm: Convert to platform remove callback returning void

The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
4 months agoBluetooth: btqcomsmd: Convert to platform remove callback returning void
Uwe Kleine-König [Mon, 11 Mar 2024 21:49:52 +0000 (22:49 +0100)]
Bluetooth: btqcomsmd: Convert to platform remove callback returning void

The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
4 months agoBluetooth: Add support for MediaTek MT7922 device
Ian W MORRISON [Fri, 15 Mar 2024 07:48:08 +0000 (18:48 +1100)]
Bluetooth: Add support for MediaTek MT7922 device

This patch adds support for the MediaTek MT7922 Bluetooth device.

The information in /sys/kernel/debug/usb/devices about the MT7922
is as follows:

T:  Bus=03 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  2 Spd=480  MxCh= 0
D:  Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=13d3 ProdID=3585 Rev= 1.00
S:  Manufacturer=MediaTek Inc.
S:  Product=Wireless_Device
S:  SerialNumber=000000000
C:* #Ifs= 3 Cfg#= 1 Atr=e0 MxPwr=100mA
A:  FirstIf#= 0 IfCount= 3 Cls=e0(wlcon) Sub=01 Prot=01
I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=81(I) Atr=03(Int.) MxPS=  16 Ivl=125us
E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=   0 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=   0 Ivl=1ms
I:  If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=   9 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=   9 Ivl=1ms
I:  If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  17 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  17 Ivl=1ms
I:  If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  25 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  25 Ivl=1ms
I:  If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  33 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  33 Ivl=1ms
I:  If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  49 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  49 Ivl=1ms
I:  If#= 1 Alt= 6 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
E:  Ad=83(I) Atr=01(Isoc) MxPS=  63 Ivl=1ms
E:  Ad=03(O) Atr=01(Isoc) MxPS=  63 Ivl=1ms
I:* If#= 2 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=(none)
E:  Ad=8a(I) Atr=03(Int.) MxPS=  64 Ivl=125us
E:  Ad=0a(O) Atr=03(Int.) MxPS=  64 Ivl=125us
I:  If#= 2 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=(none)
E:  Ad=8a(I) Atr=03(Int.) MxPS= 512 Ivl=125us
E:  Ad=0a(O) Atr=03(Int.) MxPS= 512 Ivl=125us

Signed-off-by: Ian W MORRISON <ianwmorrison@live.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
4 months agoBluetooth: btintel: Add support to download intermediate loader
Kiran K [Mon, 11 Mar 2024 08:46:26 +0000 (14:16 +0530)]
Bluetooth: btintel: Add support to download intermediate loader

Some variants of Intel controllers like BlazarI supports downloading of
Intermediate bootloader (IML) image. IML gives flexibility to fix issues as its
not possible to fix issue in Primary bootloader once flashed to ROM. This patch
adds the support to download IML before downloading operational firmware image.

dmesg logs:
[13.399003] Bluetooth: Core ver 2.22
[13.399006] Bluetooth: Starting self testing
[13.401194] Bluetooth: ECDH test passed in 2135 usecs
[13.421175] Bluetooth: SMP test passed in 597 usecs
[13.421184] Bluetooth: Finished self testing
[13.422919] Bluetooth: HCI device and connection manager initialized
[13.422923] Bluetooth: HCI socket layer initialized
[13.422925] Bluetooth: L2CAP socket layer initialized
[13.422930] Bluetooth: SCO socket layer initialized
[13.458065] Bluetooth: hci0: Device revision is 0
[13.458071] Bluetooth: hci0: Secure boot is disabled
[13.458072] Bluetooth: hci0: OTP lock is disabled
[13.458072] Bluetooth: hci0: API lock is enabled
[13.458073] Bluetooth: hci0: Debug lock is disabled
[13.458073] Bluetooth: hci0: Minimum firmware build 1 week 10 2014
[13.458075] Bluetooth: hci0: Bootloader timestamp 2022.46 buildtype 1 build 26590
[13.458324] Bluetooth: hci0: DSM reset method type: 0x00
[13.460678] Bluetooth: hci0: Found device firmware: intel/ibt-0090-0291-iml.sfi
[13.460684] Bluetooth: hci0: Boot Address: 0x30099000
[13.460685] Bluetooth: hci0: Firmware Version: 227-11.24
[13.562554] Bluetooth: hci0: Waiting for firmware download to complete
[13.563023] Bluetooth: hci0: Firmware loaded in 99941 usecs
[13.563057] Bluetooth: hci0: Waiting for device to boot
[13.565029] Bluetooth: hci0: Malformed MSFT vendor event: 0x02
[13.565148] Bluetooth: hci0: Device booted in 2064 usecs
[13.567065] Bluetooth: hci0: No device address configured
[13.569010] Bluetooth: hci0: Found device firmware: intel/ibt-0090-0291.sfi
[13.569061] Bluetooth: hci0: Boot Address: 0x10000800
[13.569062] Bluetooth: hci0: Firmware Version: 227-11.24
[13.788891] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[13.788897] Bluetooth: BNEP filters: protocol multicast
[13.788902] Bluetooth: BNEP socket layer initialized
[15.435905] Bluetooth: hci0: Waiting for firmware download to complete
[15.436016] Bluetooth: hci0: Firmware loaded in 1823233 usecs
[15.436258] Bluetooth: hci0: Waiting for device to boot
[15.471140] Bluetooth: hci0: Device booted in 34277 usecs
[15.471201] Bluetooth: hci0: Malformed MSFT vendor event: 0x02
[15.471487] Bluetooth: hci0: Found Intel DDC parameters: intel/ibt-0090-0291.ddc
[15.474353] Bluetooth: hci0: Applying Intel DDC parameters completed
[15.474486] Bluetooth: hci0: Found Intel DDC parameters: intel/bdaddress.cfg
[15.475299] Bluetooth: hci0: Applying Intel DDC parameters completed
[15.479381] Bluetooth: hci0: Firmware timestamp 2024.10 buildtype 3 build 58595
[15.479385] Bluetooth: hci0: Firmware SHA1: 0xb4f3cc46
[15.483243] Bluetooth: hci0: Fseq status: Success (0x00)
[15.483246] Bluetooth: hci0: Fseq executed: 00.00.00.00
[15.483247] Bluetooth: hci0: Fseq BT Top: 00.00.00.00
[15.578712] Bluetooth: MGMT ver 1.22
[15.822682] Bluetooth: RFCOMM TTY layer initialized
[15.822690] Bluetooth: RFCOMM socket layer initialized
[15.822695] Bluetooth: RFCOMM ver 1.11

Signed-off-by: Kiran K <kiran.k@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
4 months agoBluetooth: btintel: Define macros for image types
Kiran K [Mon, 11 Mar 2024 08:46:25 +0000 (14:16 +0530)]
Bluetooth: btintel: Define macros for image types

Use macro for image type instead of using hard code number.

Signed-off-by: Kiran K <kiran.k@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
4 months agonet: revert partially applied PHY topology series
Jakub Kicinski [Mon, 13 May 2024 15:41:55 +0000 (08:41 -0700)]
net: revert partially applied PHY topology series

The series is causing issues with PHY drivers built as modules.
Since it was only partially applied and the merge window has
opened let's revert and try again for v6.11.

Revert 6916e461e793 ("net: phy: Introduce ethernet link topology representation")
Revert 0ec5ed6c130e ("net: sfp: pass the phy_device when disconnecting an sfp module's PHY")
Revert e75e4e074c44 ("net: phy: add helpers to handle sfp phy connect/disconnect")
Revert fdd353965b52 ("net: sfp: Add helper to return the SFP bus name")
Revert 841942bc6212 ("net: ethtool: Allow passing a phy index for some commands")

Link: https://lore.kernel.org/all/171242462917.4000.9759453824684907063.git-patchwork-notify@kernel.org/
Link: https://lore.kernel.org/all/20240507102822.2023826-1-maxime.chevallier@bootlin.com/
Link: https://lore.kernel.org/r/20240513154156.104281-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch 'move-est-lock-and-est-structure-to-struct-stmmac_priv'
Jakub Kicinski [Tue, 14 May 2024 01:33:14 +0000 (18:33 -0700)]
Merge branch 'move-est-lock-and-est-structure-to-struct-stmmac_priv'

Xiaolei Wang says:

====================
Move EST lock and EST structure to struct stmmac_priv

1. Pulling the mutex protecting the EST structure out to avoid
    clearing it during reinit/memset of the EST structure,and
    reacquire the mutex lock when doing this initialization.

2. Moving the EST structure to a more logical location
====================

Link: https://lore.kernel.org/r/20240513014346.1718740-1-xiaolei.wang@windriver.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: stmmac: move the EST structure to struct stmmac_priv
Xiaolei Wang [Mon, 13 May 2024 01:43:46 +0000 (09:43 +0800)]
net: stmmac: move the EST structure to struct stmmac_priv

Move the EST structure to struct stmmac_priv, because the
EST configs don't look like platform config, but EST is
enabled in runtime with the settings retrieved for the TC
TAPRIO feature also in runtime. So it's better to have the
EST-data preserved in the driver private data instead of
the platform data storage.

Signed-off-by: Xiaolei Wang <xiaolei.wang@windriver.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Link: https://lore.kernel.org/r/20240513014346.1718740-3-xiaolei.wang@windriver.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: stmmac: move the EST lock to struct stmmac_priv
Xiaolei Wang [Mon, 13 May 2024 01:43:45 +0000 (09:43 +0800)]
net: stmmac: move the EST lock to struct stmmac_priv

Reinitialize the whole EST structure would also reset the mutex
lock which is embedded in the EST structure, and then trigger
the following warning. To address this, move the lock to struct
stmmac_priv. We also need to reacquire the mutex lock when doing
this initialization.

DEBUG_LOCKS_WARN_ON(lock->magic != lock)
WARNING: CPU: 3 PID: 505 at kernel/locking/mutex.c:587 __mutex_lock+0xd84/0x1068
 Modules linked in:
 CPU: 3 PID: 505 Comm: tc Not tainted 6.9.0-rc6-00053-g0106679839f7-dirty #29
 Hardware name: NXP i.MX8MPlus EVK board (DT)
 pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : __mutex_lock+0xd84/0x1068
 lr : __mutex_lock+0xd84/0x1068
 sp : ffffffc0864e3570
 x29: ffffffc0864e3570 x28: ffffffc0817bdc78 x27: 0000000000000003
 x26: ffffff80c54f1808 x25: ffffff80c9164080 x24: ffffffc080d723ac
 x23: 0000000000000000 x22: 0000000000000002 x21: 0000000000000000
 x20: 0000000000000000 x19: ffffffc083bc3000 x18: ffffffffffffffff
 x17: ffffffc08117b080 x16: 0000000000000002 x15: ffffff80d2d40000
 x14: 00000000000002da x13: ffffff80d2d404b8 x12: ffffffc082b5a5c8
 x11: ffffffc082bca680 x10: ffffffc082bb2640 x9 : ffffffc082bb2698
 x8 : 0000000000017fe8 x7 : c0000000ffffefff x6 : 0000000000000001
 x5 : ffffff8178fe0d48 x4 : 0000000000000000 x3 : 0000000000000027
 x2 : ffffff8178fe0d50 x1 : 0000000000000000 x0 : 0000000000000000
 Call trace:
  __mutex_lock+0xd84/0x1068
  mutex_lock_nested+0x28/0x34
  tc_setup_taprio+0x118/0x68c
  stmmac_setup_tc+0x50/0xf0
  taprio_change+0x868/0xc9c

Fixes: b2aae654a479 ("net: stmmac: add mutex lock to protect est parameters")
Signed-off-by: Xiaolei Wang <xiaolei.wang@windriver.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Link: https://lore.kernel.org/r/20240513014346.1718740-2-xiaolei.wang@windriver.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch 'mptcp-small-improvements-fix-and-clean-ups'
Jakub Kicinski [Tue, 14 May 2024 01:29:25 +0000 (18:29 -0700)]
Merge branch 'mptcp-small-improvements-fix-and-clean-ups'

Mat Martineau says:

====================
mptcp: small improvements, fix and clean-ups

This series contain mostly unrelated patches:

- The two first patches can be seen as "fixes". They are part of this
  series for -next because it looks like the last batch of fixes for
  v6.9 has already been sent. These fixes are not urgent, so they can
  wait if an unlikely v6.9-rc8 is published. About the two patches:
    - Patch 1 fixes getsockopt(SO_KEEPALIVE) support on MPTCP sockets
    - Patch 2 makes sure the full TCP keep-alive feature is supported,
      not just SO_KEEPALIVE.

- Patch 3 is a small optimisation when getsockopt(MPTCP_INFO) is used
  without buffer, just to check if MPTCP is still being used: no
  fallback to TCP.

- Patch 4 adds net.mptcp.available_schedulers sysctl knob to list packet
  schedulers, similar to net.ipv4.tcp_available_congestion_control.

- Patch 5 and 6 fix CheckPatch warnings: "prefer strscpy over strcpy"
  and "else is not generally useful after a break or return".

- Patch 7 and 8 remove and add header includes to avoid unused ones, and
  add missing ones to be self-contained.
====================

Link: https://lore.kernel.org/r/20240514011335.176158-1-martineau@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agomptcp: include inet_common in mib.h
Matthieu Baerts (NGI0) [Tue, 14 May 2024 01:13:32 +0000 (18:13 -0700)]
mptcp: include inet_common in mib.h

So this file is now self-contained: it can be compiled alone with
analytic tools.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20240514011335.176158-9-martineau@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agomptcp: move mptcp_pm_gen.h's include
Matthieu Baerts (NGI0) [Tue, 14 May 2024 01:13:31 +0000 (18:13 -0700)]
mptcp: move mptcp_pm_gen.h's include

Nothing from protocol.h depends on mptcp_pm_gen.h, only code from
pm_netlink.c and pm_userspace.c depends on it.

So this include can be moved where it is needed to avoid a "unused
includes" warning.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20240514011335.176158-8-martineau@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agomptcp: remove unnecessary else statements
Matthieu Baerts (NGI0) [Tue, 14 May 2024 01:13:30 +0000 (18:13 -0700)]
mptcp: remove unnecessary else statements

The 'else' statements are not needed here, because their previous 'if'
block ends with a 'return'.

This fixes CheckPatch warnings:

  WARNING: else is not generally useful after a break or return

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20240514011335.176158-7-martineau@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agomptcp: prefer strscpy over strcpy
Matthieu Baerts (NGI0) [Tue, 14 May 2024 01:13:29 +0000 (18:13 -0700)]
mptcp: prefer strscpy over strcpy

strcpy() performs no bounds checking on the destination buffer. This
could result in linear overflows beyond the end of the buffer, leading
to all kinds of misbehaviors. The safe replacement is strscpy() [1].

This is in preparation of a possible future step where all strcpy() uses
will be removed in favour of strscpy() [2].

This fixes CheckPatch warnings:

  WARNING: Prefer strscpy over strcpy

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy
Link: https://github.com/KSPP/linux/issues/88
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20240514011335.176158-6-martineau@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agomptcp: add net.mptcp.available_schedulers
Gregory Detal [Tue, 14 May 2024 01:13:28 +0000 (18:13 -0700)]
mptcp: add net.mptcp.available_schedulers

The sysctl lists the available schedulers that can be set using
net.mptcp.scheduler similarly to net.ipv4.tcp_available_congestion_control.

Signed-off-by: Gregory Detal <gregory.detal@gmail.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Tested-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20240514011335.176158-5-martineau@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agomptcp: sockopt: info: stop early if no buffer
Matthieu Baerts (NGI0) [Tue, 14 May 2024 01:13:27 +0000 (18:13 -0700)]
mptcp: sockopt: info: stop early if no buffer

Up to recently, it has been recommended to use getsockopt(MPTCP_INFO) to
check if a fallback to TCP happened, or if the client requested to use
MPTCP.

In this case, the userspace app is only interested by the returned value
of the getsocktop() call, and can then give 0 for the option length, and
NULL for the buffer address. An easy optimisation is then to stop early,
and avoid filling a local buffer -- which now requires two different
locks -- if it is not needed.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20240514011335.176158-4-martineau@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agomptcp: fix full TCP keep-alive support
Matthieu Baerts (NGI0) [Tue, 14 May 2024 01:13:26 +0000 (18:13 -0700)]
mptcp: fix full TCP keep-alive support

SO_KEEPALIVE support has been added a while ago, as part of a series
"adding SOL_SOCKET" support. To have a full control of this keep-alive
feature, it is important to also support TCP_KEEP* socket options at the
SOL_TCP level.

Supporting them on the setsockopt() part is easy, it is just a matter of
remembering each value in the MPTCP sock structure, and calling
tcp_sock_set_keep*() helpers on each subflow. If the value is not
modified (0), calling these helpers will not do anything. For the
getsockopt() part, the corresponding value from the MPTCP sock structure
or the default one is simply returned. All of this is very similar to
other TCP_* socket options supported by MPTCP.

It looks important for kernels supporting SO_KEEPALIVE, to also support
TCP_KEEP* options as well: some apps seem to (wrongly) consider that if
the former is supported, the latter ones will be supported as well. But
also, not having this simple and isolated change is preventing MPTCP
support in some apps, and libraries like GoLang [1]. This is why this
patch is seen as a fix.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/383
Fixes: 1b3e7ede1365 ("mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY")
Link: https://github.com/golang/go/issues/56539
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20240514011335.176158-3-martineau@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agomptcp: SO_KEEPALIVE: fix getsockopt support
Matthieu Baerts (NGI0) [Tue, 14 May 2024 01:13:25 +0000 (18:13 -0700)]
mptcp: SO_KEEPALIVE: fix getsockopt support

SO_KEEPALIVE support has to be set on each subflow: on each TCP socket,
where sk_prot->keepalive is defined. Technically, nothing has to be done
on the MPTCP socket. That's why mptcp_sol_socket_sync_intval() was
called instead of mptcp_sol_socket_intval().

Except that when nothing is done on the MPTCP socket, the
getsockopt(SO_KEEPALIVE), handled in net/core/sock.c:sk_getsockopt(),
will not know if SO_KEEPALIVE has been set on the different subflows or
not.

The fix is simple: simply call mptcp_sol_socket_intval() which will end
up calling net/core/sock.c:sk_setsockopt() where the SOCK_KEEPOPEN flag
will be set, the one used in sk_getsockopt().

So now, getsockopt(SO_KEEPALIVE) on an MPTCP socket will return the same
value as the one previously set with setsockopt(SO_KEEPALIVE).

Fixes: 1b3e7ede1365 ("mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Link: https://lore.kernel.org/r/20240514011335.176158-2-martineau@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: mana: Enable MANA driver on ARM64 with 4K page size
Haiyang Zhang [Mon, 13 May 2024 20:29:01 +0000 (13:29 -0700)]
net: mana: Enable MANA driver on ARM64 with 4K page size

Change the Kconfig dependency, so this driver can be built and run on ARM64
with 4K page size.
16/64K page sizes are not supported yet.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://lore.kernel.org/r/1715632141-8089-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: prestera: Add flex arrays to some structs
Erick Archer [Sun, 12 May 2024 16:10:27 +0000 (18:10 +0200)]
net: prestera: Add flex arrays to some structs

The "struct prestera_msg_vtcam_rule_add_req" uses a dynamically sized
set of trailing elements. Specifically, it uses an array of structures
of type "prestera_msg_acl_action actions_msg".

The "struct prestera_msg_flood_domain_ports_set_req" also uses a
dynamically sized set of trailing elements. Specifically, it uses an
array of structures of type "prestera_msg_acl_action actions_msg".

So, use the preferred way in the kernel declaring flexible arrays [1].

At the same time, prepare for the coming implementation by GCC and Clang
of the __counted_by attribute. Flexible array members annotated with
__counted_by can have their accesses bounds-checked at run-time via
CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for
strcpy/memcpy-family functions). In this case, it is important to note
that the attribute used is specifically __counted_by_le since the
counters are of type __le32.

The logic does not need to change since the counters for the flexible
arrays are asigned before any access to the arrays.

The order in which the structure prestera_msg_vtcam_rule_add_req and the
structure prestera_msg_flood_domain_ports_set_req are defined must be
changed to avoid incomplete type errors.

Also, avoid the open-coded arithmetic in memory allocator functions [2]
using the "struct_size" macro.

Moreover, the new structure members also allow us to avoid the open-
coded arithmetic on pointers. So, take advantage of this refactoring
accordingly.

This code was detected with the help of Coccinelle, and audited and
modified manually.

Link: https://www.kernel.org/doc/html/next/process/deprecated.html#zero-length-and-one-element-arrays
Link: https://www.kernel.org/doc/html/next/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments
Signed-off-by: Erick Archer <erick.archer@outlook.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/AS8PR02MB7237E8469568A59795F1F0408BE12@AS8PR02MB7237.eurprd02.prod.outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch 'tcp-support-rstreasons-in-the-passive-logic'
Jakub Kicinski [Tue, 14 May 2024 00:34:10 +0000 (17:34 -0700)]
Merge branch 'tcp-support-rstreasons-in-the-passive-logic'

Jason Xing says:

====================
tcp: support rstreasons in the passive logic

In this series, I split all kinds of reasons into five part which,
I think, can be easily reviewed. I respectively implement corresponding
rstreasons in those functions. After this, we can trace the whole tcp
passive reset with clear reasons.
====================

Link: https://lore.kernel.org/r/20240510122502.27850-1-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agotcp: rstreason: fully support in tcp_check_req()
Jason Xing [Fri, 10 May 2024 12:25:02 +0000 (20:25 +0800)]
tcp: rstreason: fully support in tcp_check_req()

We're going to send an RST due to invalid syn packet which is already
checked whether 1) it is in sequence, 2) it is a retransmitted skb.

As RFC 793 says, if the state of socket is not CLOSED/LISTEN/SYN-SENT,
then we should send an RST when receiving bad syn packet:
"fourth, check the SYN bit,...If the SYN is in the window it is an
error, send a reset"

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Link: https://lore.kernel.org/r/20240510122502.27850-6-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agotcp: rstreason: handle timewait cases in the receive path
Jason Xing [Fri, 10 May 2024 12:25:01 +0000 (20:25 +0800)]
tcp: rstreason: handle timewait cases in the receive path

There are two possible cases where TCP layer can send an RST. Since they
happen in the same place, I think using one independent reason is enough
to identify this special situation.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Link: https://lore.kernel.org/r/20240510122502.27850-5-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agotcp: rstreason: fully support in tcp_rcv_state_process()
Jason Xing [Fri, 10 May 2024 12:25:00 +0000 (20:25 +0800)]
tcp: rstreason: fully support in tcp_rcv_state_process()

Like the previous patch does in this series, finish the conversion map is
enough to let rstreason mechanism work in this function.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Link: https://lore.kernel.org/r/20240510122502.27850-4-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agotcp: rstreason: fully support in tcp_ack()
Jason Xing [Fri, 10 May 2024 12:24:59 +0000 (20:24 +0800)]
tcp: rstreason: fully support in tcp_ack()

Based on the existing skb drop reason, updating the rstreason map can
help us finish the rstreason job in this function.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Link: https://lore.kernel.org/r/20240510122502.27850-3-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agotcp: rstreason: fully support in tcp_rcv_synsent_state_process()
Jason Xing [Fri, 10 May 2024 12:24:58 +0000 (20:24 +0800)]
tcp: rstreason: fully support in tcp_rcv_synsent_state_process()

In this function, only updating the map can finish the job for socket
reset reason because the corresponding drop reasons are ready.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Link: https://lore.kernel.org/r/20240510122502.27850-2-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch 'net-stmmac-add-support-for-rzn1-gmac-devices'
Jakub Kicinski [Tue, 14 May 2024 00:20:03 +0000 (17:20 -0700)]
Merge branch 'net-stmmac-add-support-for-rzn1-gmac-devices'

Romain Gantois says:

====================
net: stmmac: Add support for RZN1 GMAC devices

This is version seven of my series that adds support for a Gigabit Ethernet
controller featured in the Renesas r9a06g032 SoC, of the RZ/N1 family. This
GMAC device is based on a Synopsys IP and is compatible with the stmmac driver.

My former colleague Clément Léger originally sent a series for this driver,
but an issue in bringing up the PCS clock had blocked the upstreaming
process. This issue has since been resolved by the following series:

https://lore.kernel.org/all/20240326-rxc_bugfix-v6-0-24a74e5c761f@bootlin.com/

This series consists of a devicetree binding describing the RZN1 GMAC
controller IP, a node for the GMAC1 device in the r9a06g032 SoC device
tree, and the GMAC driver itself which is a glue layer in stmmac.

There are also two patches by Russell that improve pcs initialization handling
in stmmac.
====================

Link: https://lore.kernel.org/r/20240513-rzn1-gmac1-v7-0-6acf58b5440d@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: stmmac: add support for RZ/N1 GMAC
Clément Léger [Mon, 13 May 2024 07:25:17 +0000 (09:25 +0200)]
net: stmmac: add support for RZ/N1 GMAC

Add support for the Renesas RZ/N1 GMAC. This support can make use of a
custom RZ/N1 PCS which is fetched by parsing the pcs-handle device tree
property.

Signed-off-by: Clément Léger <clement.leger@bootlin.com>
Co-developed-by: Romain Gantois <romain.gantois@bootlin.com>
Signed-off-by: Romain Gantois <romain.gantois@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://lore.kernel.org/r/20240513-rzn1-gmac1-v7-6-6acf58b5440d@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: stmmac: dwmac-socfpga: use pcs_init/pcs_exit
Russell King (Oracle) [Mon, 13 May 2024 07:25:16 +0000 (09:25 +0200)]
net: stmmac: dwmac-socfpga: use pcs_init/pcs_exit

Use the newly introduced pcs_init() and pcs_exit() operations to
create and destroy the PCS instance at a more appropriate moment during
the driver lifecycle, thereby avoiding publishing a network device to
userspace that has not yet finished its PCS initialisation.

There are other similar issues with this driver which remain
unaddressed, but these are out of scope for this patch.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
[rgantois: removed second parameters of new callbacks]
Signed-off-by: Romain Gantois <romain.gantois@bootlin.com>
Reviewed-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://lore.kernel.org/r/20240513-rzn1-gmac1-v7-5-6acf58b5440d@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: stmmac: introduce pcs_init/pcs_exit stmmac operations
Russell King (Oracle) [Mon, 13 May 2024 07:25:15 +0000 (09:25 +0200)]
net: stmmac: introduce pcs_init/pcs_exit stmmac operations

Introduce a mechanism whereby platforms can create their PCS instances
prior to the network device being published to userspace, but after
some of the core stmmac initialisation has been completed. This means
that the data structures that platforms need will be available.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Co-developed-by: Romain Gantois <romain.gantois@bootlin.com>
Signed-off-by: Romain Gantois <romain.gantois@bootlin.com>
Reviewed-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://lore.kernel.org/r/20240513-rzn1-gmac1-v7-4-6acf58b5440d@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: stmmac: Make stmmac_xpcs_setup() generic to all PCS devices
Serge Semin [Mon, 13 May 2024 07:25:14 +0000 (09:25 +0200)]
net: stmmac: Make stmmac_xpcs_setup() generic to all PCS devices

A pcs_init() callback will be introduced to stmmac in a future patch. This
new function will be called during the hardware initialization phase.
Instead of separately initializing XPCS and PCS components, let's group all
PCS-related hardware initialization logic in the current
stmmac_xpcs_setup() function.

Rename stmmac_xpcs_setup() to stmmac_pcs_setup() and move the conditional
call to stmmac_xpcs_setup() inside the function itself.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Co-developed-by: Romain Gantois <romain.gantois@bootlin.com>
Signed-off-by: Romain Gantois <romain.gantois@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://lore.kernel.org/r/20240513-rzn1-gmac1-v7-3-6acf58b5440d@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: stmmac: Add dedicated XPCS cleanup method
Serge Semin [Mon, 13 May 2024 07:25:13 +0000 (09:25 +0200)]
net: stmmac: Add dedicated XPCS cleanup method

Currently the XPCS handler destruction is performed in the
stmmac_mdio_unregister() method. It doesn't look good because the handler
isn't originally created in the corresponding protagonist
stmmac_mdio_unregister(), but in the stmmac_xpcs_setup() function. In
order to have more coherent MDIO and XPCS setup/cleanup procedures,
let's move the DW XPCS destruction to the dedicated stmmac_pcs_clean()
method.

This method will also be used to cleanup PCS hardware using the
pcs_exit() callback that will be introduced to stmmac in a subsequent
patch.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Co-developed-by: Romain Gantois <romain.gantois@bootlin.com>
Signed-off-by: Romain Gantois <romain.gantois@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://lore.kernel.org/r/20240513-rzn1-gmac1-v7-2-6acf58b5440d@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agodt-bindings: net: renesas,rzn1-gmac: Document RZ/N1 GMAC support
Clément Léger [Mon, 13 May 2024 07:25:12 +0000 (09:25 +0200)]
dt-bindings: net: renesas,rzn1-gmac: Document RZ/N1 GMAC support

The RZ/N1 series of MPUs feature up to two Gigabit Ethernet controllers.
These controllers are based on Synopsys IPs. They can be connected to
RZ/N1 RGMII/RMII converters.

Add a binding that describes these GMAC devices.

Signed-off-by: Clément Léger <clement.leger@bootlin.com>
[rgantois: commit log]
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Romain Gantois <romain.gantois@bootlin.com>
Link: https://lore.kernel.org/r/20240513-rzn1-gmac1-v7-1-6acf58b5440d@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: qede: flower: validate control flags
Asbjørn Sloth Tønnesen [Sat, 11 May 2024 07:37:03 +0000 (07:37 +0000)]
net: qede: flower: validate control flags

This driver currently doesn't support any control flags.

Use flow_rule_match_has_control_flags() to check for control flags,
such as can be set through `tc flower ... ip_flags frag`.

In case any control flags are masked, flow_rule_match_has_control_flags()
sets a NL extended error message, and we return -EOPNOTSUPP.

Only compile-tested.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240511073705.230507-1-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch 'virtio_net-rx-enable-premapped-mode-by-default'
Jakub Kicinski [Tue, 14 May 2024 00:07:43 +0000 (17:07 -0700)]
Merge branch 'virtio_net-rx-enable-premapped-mode-by-default'

Xuan Zhuo says:

====================
virtio_net: rx enable premapped mode by default

Actually, for the virtio drivers, we can enable premapped mode whatever
the value of use_dma_api. Because we provide the virtio dma apis.
So the driver can enable premapped mode unconditionally.

This patch set makes the big mode of virtio-net to support premapped mode.
And enable premapped mode for rx by default.

Based on the following points, we do not use page pool to manage these
    pages:

    1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
       we can only prevent the page pool from performing DMA operations, and
       let the driver perform DMA operations on the allocated pages.
    2. But when the page pool releases the page, we have no chance to
       execute dma unmap.
    3. A solution to #2 is to execute dma unmap every time before putting
       the page back to the page pool. (This is actually a waste, we don't
       execute unmap so frequently.)
    4. But there is another problem, we still need to use page.dma_addr to
       save the dma address. Using page.dma_addr while using page pool is
       unsafe behavior.
    5. And we need space the chain the pages submitted once to virtio core.

    More:
        https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/

Why we do not use the page space to store the dma?

    http://lore.kernel.org/all/CACGkMEuyeJ9mMgYnnB42=hw6umNuo=agn7VBqBqYPd7GN=+39Q@mail.gmail.com
====================

Link: https://lore.kernel.org/r/20240511031404.30903-1-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agovirtio_net: remove the misleading comment
Xuan Zhuo [Sat, 11 May 2024 03:14:04 +0000 (11:14 +0800)]
virtio_net: remove the misleading comment

We call the build_skb() actually without copying data.
The comment is misleading. So remove it.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20240511031404.30903-5-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agovirtio_net: rx remove premapped failover code
Xuan Zhuo [Sat, 11 May 2024 03:14:03 +0000 (11:14 +0800)]
virtio_net: rx remove premapped failover code

Now, the premapped mode can be enabled unconditionally.

So we can remove the failover code for merge and small mode.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://lore.kernel.org/r/20240511031404.30903-4-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agovirtio_net: big mode skip the unmap check
Xuan Zhuo [Sat, 11 May 2024 03:14:02 +0000 (11:14 +0800)]
virtio_net: big mode skip the unmap check

The virtio-net big mode did not enable premapped mode,
so we did not need to check the unmap. And the subsequent
commit will remove the failover code for failing enable
premapped for merge and small mode. So we need to remove
the checking do_dma code in the big mode path.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20240511031404.30903-3-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agovirtio_ring: enable premapped mode whatever use_dma_api
Xuan Zhuo [Sat, 11 May 2024 03:14:01 +0000 (11:14 +0800)]
virtio_ring: enable premapped mode whatever use_dma_api

Now, we have virtio DMA APIs, the driver can be the premapped
mode whatever the virtio core uses dma api or not.

So remove the limit of checking use_dma_api from
virtqueue_set_dma_premapped().

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20240511031404.30903-2-xuanzhuo@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf...
Jakub Kicinski [Mon, 13 May 2024 23:40:22 +0000 (16:40 -0700)]
Merge tag 'for-netdev' of https://git./linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2024-05-13

We've added 119 non-merge commits during the last 14 day(s) which contain
a total of 134 files changed, 9462 insertions(+), 4742 deletions(-).

The main changes are:

1) Add BPF JIT support for 32-bit ARCv2 processors, from Shahab Vahedi.

2) Add BPF range computation improvements to the verifier in particular
   around XOR and OR operators, refactoring of checks for range computation
   and relaxing MUL range computation so that src_reg can also be an unknown
   scalar, from Cupertino Miranda.

3) Add support to attach kprobe BPF programs through kprobe_multi link in
   a session mode, meaning, a BPF program is attached to both function entry
   and return, the entry program can decide if the return program gets
   executed and the entry program can share u64 cookie value with return
   program. Session mode is a common use-case for tetragon and bpftrace,
   from Jiri Olsa.

4) Fix a potential overflow in libbpf's ring__consume_n() and improve libbpf
   as well as BPF selftest's struct_ops handling, from Andrii Nakryiko.

5) Improvements to BPF selftests in context of BPF gcc backend,
   from Jose E. Marchesi & David Faust.

6) Migrate remaining BPF selftest tests from test_sock_addr.c to prog_test-
   -style in order to retire the old test, run it in BPF CI and additionally
   expand test coverage, from Jordan Rife.

7) Big batch for BPF selftest refactoring in order to remove duplicate code
   around common network helpers, from Geliang Tang.

8) Another batch of improvements to BPF selftests to retire obsolete
   bpf_tcp_helpers.h as everything is available vmlinux.h,
   from Martin KaFai Lau.

9) Fix BPF map tear-down to not walk the map twice on free when both timer
   and wq is used, from Benjamin Tissoires.

10) Fix BPF verifier assumptions about socket->sk that it can be non-NULL,
    from Alexei Starovoitov.

11) Change BTF build scripts to using --btf_features for pahole v1.26+,
    from Alan Maguire.

12) Small improvements to BPF reusing struct_size() and krealloc_array(),
    from Andy Shevchenko.

13) Fix s390 JIT to emit a barrier for BPF_FETCH instructions,
    from Ilya Leoshkevich.

14) Extend TCP ->cong_control() callback in order to feed in ack and
    flag parameters and allow write-access to tp->snd_cwnd_stamp
    from BPF program, from Miao Xu.

15) Add support for internal-only per-CPU instructions to inline
    bpf_get_smp_processor_id() helper call for arm64 and riscv64 BPF JITs,
    from Puranjay Mohan.

16) Follow-up to remove the redundant ethtool.h from tooling infrastructure,
    from Tushar Vyavahare.

17) Extend libbpf to support "module:<function>" syntax for tracing
    programs, from Viktor Malik.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits)
  bpf: make list_for_each_entry portable
  bpf: ignore expected GCC warning in test_global_func10.c
  bpf: disable strict aliasing in test_global_func9.c
  selftests/bpf: Free strdup memory in xdp_hw_metadata
  selftests/bpf: Fix a few tests for GCC related warnings.
  bpf: avoid gcc overflow warning in test_xdp_vlan.c
  tools: remove redundant ethtool.h from tooling infra
  selftests/bpf: Expand ATTACH_REJECT tests
  selftests/bpf: Expand getsockname and getpeername tests
  sefltests/bpf: Expand sockaddr hook deny tests
  selftests/bpf: Expand sockaddr program return value tests
  selftests/bpf: Retire test_sock_addr.(c|sh)
  selftests/bpf: Remove redundant sendmsg test cases
  selftests/bpf: Migrate ATTACH_REJECT test cases
  selftests/bpf: Migrate expected_attach_type tests
  selftests/bpf: Migrate wildcard destination rewrite test
  selftests/bpf: Migrate sendmsg6 v4 mapped address tests
  selftests/bpf: Migrate sendmsg deny test cases
  selftests/bpf: Migrate WILDCARD_IP test
  selftests/bpf: Handle SYSCALL_EPERM and SYSCALL_ENOTSUPP test cases
  ...
====================

Link: https://lore.kernel.org/r/20240513134114.17575-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: pcs: lynx: no need to read LPA in lynx_pcs_get_state_2500basex()
Vladimir Oltean [Mon, 13 May 2024 11:53:45 +0000 (14:53 +0300)]
net: pcs: lynx: no need to read LPA in lynx_pcs_get_state_2500basex()

Nothing useful is done with the LPA variable in lynx_pcs_get_state_2500basex(),
we can just remove the read.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20240513115345.2452799-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch 'mlx5-misc-patches'
Jakub Kicinski [Mon, 13 May 2024 23:35:49 +0000 (16:35 -0700)]
Merge branch 'mlx5-misc-patches'

Tariq Toukan says:

====================
mlx5 misc patches

This series includes patches for the mlx5 driver.

Patch 1 by Shay enables LAG with HCAs of 8 ports.

Patch 2 by Carolina optimizes the safe switch channels operation for the
TX-only changes.

Patch 3 by Parav cleans up some unused code.
====================

Link: https://lore.kernel.org/r/20240512124306.740898-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet/mlx5: Remove unused msix related exported APIs
Parav Pandit [Sun, 12 May 2024 12:43:05 +0000 (15:43 +0300)]
net/mlx5: Remove unused msix related exported APIs

MSIX irq allocation and free APIs are no longer
in use. Hence, remove the dead code.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://lore.kernel.org/r/20240512124306.740898-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet/mlx5e: Modifying channels number and updating TX queues
Carolina Jubran [Sun, 12 May 2024 12:43:04 +0000 (15:43 +0300)]
net/mlx5e: Modifying channels number and updating TX queues

It is not appropriate for the mlx5e_num_channels_changed
function to be called solely for updating the TX queues,
even if the channels number has not been changed.

Move the code responsible for updating the TC and TX queues
from mlx5e_num_channels_changed and produce a new function
called mlx5e_update_tc_and_tx_queues. This new function should
only be called when the channels number remains unchanged.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240512124306.740898-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet/mlx5: Enable 8 ports LAG
Shay Drory [Sun, 12 May 2024 12:43:03 +0000 (15:43 +0300)]
net/mlx5: Enable 8 ports LAG

This patch adds to mlx5 drivers support for 8 ports HCAs.
Starting with ConnectX-8 HCAs with 8 ports are possible.

As most driver parts aren't affected by such configuration most driver
code is unchanged.

Specially the only affected areas are:
- Lag
- Multiport E-Switch
- Single FDB E-Switch

All of the above are already factored in generic way, and LAG and VF LAG
are tested, so all that left is to change a #define and remove checks
which are no longer needed.
However, Multiport E-Switch is not tested yet, so it is left untouched.

This patch will allow to create hardware LAG/VF LAG when all 8 ports are
added to the same bond device.

for example, In order to activate the hardware lag a user can execute
the following:

ip link add bond0 type bond
ip link set bond0 type bond miimon 100 mode 2
ip link set eth2 master bond0
ip link set eth3 master bond0
ip link set eth4 master bond0
ip link set eth5 master bond0
ip link set eth6 master bond0
ip link set eth7 master bond0
ip link set eth8 master bond0
ip link set eth9 master bond0

Where eth2, eth3, eth4, eth5, eth6, eth7, eth8 and eth9 are the PFs of
the same HCA.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240512124306.740898-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agotest: hsr: Extend the hsr_redbox.sh to have more SAN devices connected
Lukasz Majewski [Fri, 10 May 2024 14:37:10 +0000 (16:37 +0200)]
test: hsr: Extend the hsr_redbox.sh to have more SAN devices connected

After this change the single SAN device (ns3eth1) is now replaced with
two SAN devices - respectively ns4eth1 and ns5eth1.

It is possible to extend this script to have more SAN devices connected
by adding them to ns3br1 bridge.

Signed-off-by: Lukasz Majewski <lukma@denx.de>
Link: https://lore.kernel.org/r/20240510143710.3916631-1-lukma@denx.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch 'net-dsa-microchip-dcb-fixes'
Jakub Kicinski [Mon, 13 May 2024 22:52:52 +0000 (15:52 -0700)]
Merge branch 'net-dsa-microchip-dcb-fixes'

Oleksij Rempel says:

====================
net: dsa: microchip: DCB fixes

This patch series address recommendation to rename IPV to IPM to avoid
confusion with IPV name used in 802.1Qci PSFP. And restores default "PCP
only" configuration as source of priorities to avoid possible
regressions.
====================

Link: https://lore.kernel.org/r/20240510053828.2412516-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: dsa: microchip: dcb: set default apptrust to PCP only
Oleksij Rempel [Fri, 10 May 2024 05:38:28 +0000 (07:38 +0200)]
net: dsa: microchip: dcb: set default apptrust to PCP only

Before DCB support, the KSZ driver had only PCP as source of packet
priority values. To avoid regressions, make PCP only as default value.
User will need enable DSCP support manually.

This patch do not affect other KSZ8 related quirks. User will still be
warned by setting not support configurations for the port 2.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240510053828.2412516-4-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: dsa: microchip: dcb: add comments for DSCP related functions
Oleksij Rempel [Fri, 10 May 2024 05:38:27 +0000 (07:38 +0200)]
net: dsa: microchip: dcb: add comments for DSCP related functions

All other functions are commented. Add missing comments to following
functions:
ksz_set_global_dscp_entry()
ksz_port_add_dscp_prio()
ksz_port_del_dscp_prio()

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240510053828.2412516-3-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: dsa: microchip: dcb: rename IPV to IPM
Oleksij Rempel [Fri, 10 May 2024 05:38:26 +0000 (07:38 +0200)]
net: dsa: microchip: dcb: rename IPV to IPM

IPV is added and used term in 802.1Qci PSFP and merged into 802.1Q (from
802.1Q-2018) for another functions.

Even it does similar operation holding temporal priority value
internally (as it is named), because KSZ datasheet doesn't use the term
of IPV (Internal Priority Value) and avoiding any confusion later when
PSFP is in the Linux world, it is better to rename IPV to IPM (Internal
Priority Mapping).

In addition, LAN937x documentation already use IPV for 802.1Qci PSFP
related functionality.

Suggested-by: Woojung Huh <Woojung.Huh@microchip.com>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Woojung Huh <woojung.huh@microchip.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240510053828.2412516-2-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agol2tp: Support different protocol versions with same IP/port quadruple
Samuel Thibault [Thu, 9 May 2024 20:58:12 +0000 (22:58 +0200)]
l2tp: Support different protocol versions with same IP/port quadruple

628bc3e5a1be ("l2tp: Support several sockets with same IP/port quadruple")
added support for several L2TPv2 tunnels using the same IP/port quadruple,
but if an L2TPv3 socket exists it could eat all the trafic. We thus have to
first use the version from the packet to get the proper tunnel, and only
then check that the version matches.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: James Chapman <jchapman@katalix.com>
Link: https://lore.kernel.org/r/20240509205812.4063198-1-samuel.thibault@ens-lyon.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoynl: ensure exact-len value is resolved
Antonio Quartulli [Fri, 10 May 2024 23:22:02 +0000 (01:22 +0200)]
ynl: ensure exact-len value is resolved

For type String and Binary we are currently usinig the exact-len
limit value as is without attempting any name resolution.
However, the spec may specify the name of a constant rather than an
actual value, which would result in using the constant name as is
and thus break the policy.

Ensure the limit value is passed to get_limit(), which will always
attempt resolving the name before printing the policy rule.

Signed-off-by: Antonio Quartulli <a@unstable.cc>
Link: https://lore.kernel.org/r/20240510232202.24051-1-a@unstable.cc
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch 'add-tx-stop-wake-counters'
Jakub Kicinski [Mon, 13 May 2024 21:58:38 +0000 (14:58 -0700)]
Merge branch 'add-tx-stop-wake-counters'

Daniel Jurgens says:

====================
Add TX stop/wake counters

Several drivers provide TX stop and wake counters via ethtool stats. Add
those to the netdev queue stats, and use them in virtio_net.
====================

Link: https://lore.kernel.org/r/20240510201927.1821109-1-danielj@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agovirtio_net: Add TX stopped and wake counters
Daniel Jurgens [Fri, 10 May 2024 20:19:27 +0000 (23:19 +0300)]
virtio_net: Add TX stopped and wake counters

Add a tx queue stop and wake counters, they are useful for debugging.

$ ./tools/net/ynl/cli.py --spec netlink/specs/netdev.yaml \
--dump qstats-get --json '{"scope": "queue"}'
...
 {'ifindex': 13,
  'queue-id': 0,
  'queue-type': 'tx',
  'tx-bytes': 14756682850,
  'tx-packets': 226465,
  'tx-stop': 113208,
  'tx-wake': 113208},
 {'ifindex': 13,
  'queue-id': 1,
  'queue-type': 'tx',
  'tx-bytes': 18167675008,
  'tx-packets': 278660,
  'tx-stop': 8632,
  'tx-wake': 8632}]

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/r/20240510201927.1821109-3-danielj@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonetdev: Add queue stats for TX stop and wake
Daniel Jurgens [Fri, 10 May 2024 20:19:26 +0000 (23:19 +0300)]
netdev: Add queue stats for TX stop and wake

TX queue stop and wake are counted by some drivers.
Support reporting these via netdev-genl queue stats.

Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/r/20240510201927.1821109-2-danielj@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agotcp: socket option to check for MPTCP fallback to TCP
Matthieu Baerts (NGI0) [Thu, 9 May 2024 18:10:10 +0000 (20:10 +0200)]
tcp: socket option to check for MPTCP fallback to TCP

A way for an application to know if an MPTCP connection fell back to TCP
is to use getsockopt(MPTCP_INFO) and look for errors. The issue with
this technique is that the same errors -- EOPNOTSUPP (IPv4) and
ENOPROTOOPT (IPv6) -- are returned if there was a fallback, *or* if the
kernel doesn't support this socket option. The userspace then has to
look at the kernel version to understand what the errors mean.

It is not clean, and it doesn't take into account older kernels where
the socket option has been backported. A cleaner way would be to expose
this info to the TCP socket level. In case of MPTCP socket where no
fallback happened, the socket options for the TCP level will be handled
in MPTCP code, in mptcp_getsockopt_sol_tcp(). If not, that will be in
TCP code, in do_tcp_getsockopt(). So MPTCP simply has to set the value
1, while TCP has to set 0.

If the socket option is not supported, one of these two errors will be
reported:
- EOPNOTSUPP (95 - Operation not supported) for MPTCP sockets
- ENOPROTOOPT (92 - Protocol not available) for TCP sockets, e.g. on the
  socket received after an 'accept()', when the client didn't request to
  use MPTCP: this socket will be a TCP one, even if the listen socket
  was an MPTCP one.

With this new option, the kernel can return a clear answer to both "Is
this kernel new enough to tell me the fallback status?" and "If it is
new enough, is it currently a TCP or MPTCP socket?" questions, while not
breaking the previous method.

Acked-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240509-upstream-net-next-20240509-mptcp-tcp_is_mptcp-v1-1-f846df999202@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch 'net-gro-remove-network_header-use-move-p-flush-flush_id-calculations...
Jakub Kicinski [Mon, 13 May 2024 21:44:13 +0000 (14:44 -0700)]
Merge branch 'net-gro-remove-network_header-use-move-p-flush-flush_id-calculations-to-l4'

Richard Gobert says:

====================
net: gro: remove network_header use, move p->{flush/flush_id} calculations to L4

The cb fields network_offset and inner_network_offset are used instead of
skb->network_header throughout GRO.

These fields are then leveraged in the next commit to remove flush_id state
from napi_gro_cb, and stateful code in {ipv6,inet}_gro_receive which may be
unnecessarily complicated due to encapsulation support in GRO. These fields
are checked in L4 instead.

3rd patch adds tests for different flush_id flows in GRO.
====================

Link: https://lore.kernel.org/r/20240509190819.2985-1-richardbgobert@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoselftests/net: add flush id selftests
Richard Gobert [Thu, 9 May 2024 19:08:19 +0000 (21:08 +0200)]
selftests/net: add flush id selftests

Added flush id selftests to test different cases where DF flag is set or
unset and id value changes in the following packets. All cases where the
packets should coalesce or should not coalesce are tested.

Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240509190819.2985-4-richardbgobert@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: gro: move L3 flush checks to tcp_gro_receive and udp_gro_receive_segment
Richard Gobert [Thu, 9 May 2024 19:08:18 +0000 (21:08 +0200)]
net: gro: move L3 flush checks to tcp_gro_receive and udp_gro_receive_segment

{inet,ipv6}_gro_receive functions perform flush checks (ttl, flags,
iph->id, ...) against all packets in a loop. These flush checks are used in
all merging UDP and TCP flows.

These checks need to be done only once and only against the found p skb,
since they only affect flush and not same_flow.

This patch leverages correct network header offsets from the cb for both
outer and inner network headers - allowing these checks to be done only
once, in tcp_gro_receive and udp_gro_receive_segment. As a result,
NAPI_GRO_CB(p)->flush is not used at all. In addition, flush_id checks are
more declarative and contained in inet_gro_flush, thus removing the need
for flush_id in napi_gro_cb.

This results in less parsing code for non-loop flush tests for TCP and UDP
flows.

To make sure results are not within noise range - I've made netfilter drop
all TCP packets, and measured CPU performance in GRO (in this case GRO is
responsible for about 50% of the CPU utilization).

perf top while replaying 64 parallel IP/TCP streams merging in GRO:
(gro_receive_network_flush is compiled inline to tcp_gro_receive)
net-next:
        6.94% [kernel] [k] inet_gro_receive
        3.02% [kernel] [k] tcp_gro_receive

patch applied:
        4.27% [kernel] [k] tcp_gro_receive
        4.22% [kernel] [k] inet_gro_receive

perf top while replaying 64 parallel IP/IP/TCP streams merging in GRO (same
results for any encapsulation, in this case inet_gro_receive is top
offender in net-next)
net-next:
        10.09% [kernel] [k] inet_gro_receive
        2.08% [kernel] [k] tcp_gro_receive

patch applied:
        6.97% [kernel] [k] inet_gro_receive
        3.68% [kernel] [k] tcp_gro_receive

Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240509190819.2985-3-richardbgobert@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: gro: use cb instead of skb->network_header
Richard Gobert [Thu, 9 May 2024 19:08:17 +0000 (21:08 +0200)]
net: gro: use cb instead of skb->network_header

This patch converts references of skb->network_header to napi_gro_cb's
network_offset and inner_network_offset.

Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240509190819.2985-2-richardbgobert@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge branch 'ena-driver-changes-may-2024'
Jakub Kicinski [Mon, 13 May 2024 21:42:07 +0000 (14:42 -0700)]
Merge branch 'ena-driver-changes-may-2024'

David Arinzon says:

====================
ENA driver changes May 2024

This patchset contains several misc and minor
changes to the ENA driver.
====================

Link: https://lore.kernel.org/r/20240512134637.25299-1-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: ena: Change initial rx_usec interval
David Arinzon [Sun, 12 May 2024 13:46:37 +0000 (13:46 +0000)]
net: ena: Change initial rx_usec interval

For the purpose of obtaining better CPU utilization,
minimum rx moderation interval is set to 20 usec.

Signed-off-by: Osama Abboud <osamaabb@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240512134637.25299-6-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: ena: Changes around strscpy calls
David Arinzon [Sun, 12 May 2024 13:46:36 +0000 (13:46 +0000)]
net: ena: Changes around strscpy calls

strscpy copies as much of the string as possible,
meaning that the destination string will be truncated
in case of no space. As this is a non-critical error in
our case, adding a debug level print for indication.

This patch also removes a -1 which was added to ensure
enough space for NUL, but strscpy destination string is
guaranteed to be NUL-terminted, therefore, the -1 is
not needed.

Signed-off-by: David Arinzon <darinzon@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240512134637.25299-5-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: ena: Add validation for completion descriptors consistency
David Arinzon [Sun, 12 May 2024 13:46:35 +0000 (13:46 +0000)]
net: ena: Add validation for completion descriptors consistency

Validate that `first` flag is set only for the first
descriptor in multi-buffer packets.
In case of an invalid descriptor, a reset will occur.
A new reset reason for RX data corruption has been added.

Signed-off-by: Shahar Itzko <itzko@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240512134637.25299-4-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: ena: Reduce holes in ena_com structures
David Arinzon [Sun, 12 May 2024 13:46:34 +0000 (13:46 +0000)]
net: ena: Reduce holes in ena_com structures

This patch makes two changes in order to fill holes and
reduce ther overall size of the structures ena_com_dev
and ena_com_rx_ctx.

Signed-off-by: Shahar Itzko <itzko@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240512134637.25299-3-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: ena: Add a counter for driver's reset failures
David Arinzon [Sun, 12 May 2024 13:46:33 +0000 (13:46 +0000)]
net: ena: Add a counter for driver's reset failures

This patch adds a counter to the ena_adapter struct in
order to keep track of reset failures.
The counter is incremented every time either ena_restore_device()
or ena_destroy_device() fail.

Signed-off-by: Osama Abboud <osamaabb@amazon.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240512134637.25299-2-darinzon@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoselftests: netfilter: nft_flowtable.sh: bump socat timeout to 1m
Florian Westphal [Sat, 11 May 2024 06:48:03 +0000 (08:48 +0200)]
selftests: netfilter: nft_flowtable.sh: bump socat timeout to 1m

Now that this test runs in netdev CI it looks like 10s isn't enough
for debug kernels:
  selftests: net/netfilter: nft_flowtable.sh
  2024/05/10 20:33:08 socat[12204] E write(7, 0x563feb16a000, 8192): Broken pipe
  FAIL: file mismatch for ns1 -> ns2
  -rw------- 1 root root 37345280 May 10 20:32 /tmp/tmp.Am0yEHhNqI
 ...

Looks like socat gets zapped too quickly, so increase timeout to 1m.

Could also reduce tx file size for KSFT_MACHINE_SLOW, but its preferrable
to have same test for both debug and nondebug.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20240511064814.561525-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoselftests: net: use upstream mtools
Vladimir Oltean [Fri, 10 May 2024 11:28:56 +0000 (14:28 +0300)]
selftests: net: use upstream mtools

Joachim kindly merged the IPv6 support in
https://github.com/troglobit/mtools/pull/2, so we can just use his
version now. A few more fixes subsequently came in for IPv6, so even
better.

Check that the deployed mtools version is 3.0 or above. Note that the
version check breaks compatibility with my fork where I didn't bump the
version, but I assume that won't be a problem.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20240510112856.1262901-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoselftest: epoll_busy_poll: Fix spelling mistake "couldnt" -> "couldn't"
Colin Ian King [Fri, 10 May 2024 08:48:11 +0000 (09:48 +0100)]
selftest: epoll_busy_poll: Fix spelling mistake "couldnt" -> "couldn't"

There is a spelling mistake in a TH_LOG message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240510084811.3299685-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: phy: air_en8811h: reset netdev rules when LED is set manually
Daniel Golle [Thu, 9 May 2024 10:00:42 +0000 (11:00 +0100)]
net: phy: air_en8811h: reset netdev rules when LED is set manually

Setting LED_OFF via brightness_set should deactivate hw control, so make
sure netdev trigger rules also get cleared in that case.
This fixes unwanted restoration of the default netdev trigger rules and
matches the behaviour when using the 'netdev' trigger without any
hardware offloading.

Fixes: 71e79430117d ("net: phy: air_en8811h: Add the Airoha EN8811H PHY driver")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://lore.kernel.org/r/5ed8ea615890a91fa4df59a7ae8311bbdf63cdcf.1715248281.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge tag 'nf-next-24-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilt...
Jakub Kicinski [Mon, 13 May 2024 20:12:34 +0000 (13:12 -0700)]
Merge tag 'nf-next-24-05-12' of git://git./linux/kernel/git/netfilter/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next:

Patch #1 skips transaction if object type provides no .update interface.

Patch #2 skips NETDEV_CHANGENAME which is unused.

Patch #3 enables conntrack to handle Multicast Router Advertisements and
 Multicast Router Solicitations from the Multicast Router Discovery
 protocol (RFC4286) as untracked opposed to invalid packets.
 From Linus Luessing.

Patch #4 updates DCCP conntracker to mark invalid as invalid, instead of
 dropping them, from Jason Xing.

Patch #5 uses NF_DROP instead of -NF_DROP since NF_DROP is 0,
 also from Jason.

Patch #6 removes reference in netfilter's sysctl documentation on pickup
 entries which were already removed by Florian Westphal.

Patch #7 removes check for IPS_OFFLOAD flag to disable early drop which
 allows to evict entries from the conntrack table,
 also from Florian.

Patches #8 to #16 updates nf_tables pipapo set backend to allocate
 the datastructure copy on-demand from preparation phase,
 to better deal with OOM situations where .commit step is too late
 to fail. Series from Florian Westphal.

Patch #17 adds a selftest with packetdrill to cover conntrack TCP state
 transitions, also from Florian.

Patch #18 use GFP_KERNEL to clone elements from control plane to avoid
 quick atomic reserves exhaustion with large sets, reporter refers
 to million entries magnitude.

* tag 'nf-next-24-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nf_tables: allow clone callbacks to sleep
  selftests: netfilter: add packetdrill based conntrack tests
  netfilter: nft_set_pipapo: remove dirty flag
  netfilter: nft_set_pipapo: move cloning of match info to insert/removal path
  netfilter: nft_set_pipapo: prepare pipapo_get helper for on-demand clone
  netfilter: nft_set_pipapo: merge deactivate helper into caller
  netfilter: nft_set_pipapo: prepare walk function for on-demand clone
  netfilter: nft_set_pipapo: prepare destroy function for on-demand clone
  netfilter: nft_set_pipapo: make pipapo_clone helper return NULL
  netfilter: nft_set_pipapo: move prove_locking helper around
  netfilter: conntrack: remove flowtable early-drop test
  netfilter: conntrack: documentation: remove reference to non-existent sysctl
  netfilter: use NF_DROP instead of -NF_DROP
  netfilter: conntrack: dccp: try not to drop skb in conntrack
  netfilter: conntrack: fix ct-state for ICMPv6 Multicast Router Discovery
  netfilter: nf_tables: remove NETDEV_CHANGENAME from netdev chain event handler
  netfilter: nf_tables: skip transaction if update object is not implemented
====================

Link: https://lore.kernel.org/r/20240512161436.168973-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agobpf: make list_for_each_entry portable
Jose E. Marchesi [Sat, 11 May 2024 21:22:43 +0000 (23:22 +0200)]
bpf: make list_for_each_entry portable

[Changes from V1:
- The __compat_break has been abandoned in favor of
  a more readable can_loop macro that can be used anywhere, including
  loop conditions.]

The macro list_for_each_entry is defined in bpf_arena_list.h as
follows:

  #define list_for_each_entry(pos, head, member) \
for (void * ___tmp = (pos = list_entry_safe((head)->first, \
    typeof(*(pos)), member), \
      (void *)0); \
     pos && ({ ___tmp = (void *)pos->member.next; 1; }); \
     cond_break, \
     pos = list_entry_safe((void __arena *)___tmp, typeof(*(pos)), member))

The macro cond_break, in turn, expands to a statement expression that
contains a `break' statement.  Compound statement expressions, and the
subsequent ability of placing statements in the header of a `for'
loop, are GNU extensions.

Unfortunately, clang implements this GNU extension differently than
GCC:

- In GCC the `break' statement is bound to the containing "breakable"
  context in which the defining `for' appears.  If there is no such
  context, GCC emits a warning: break statement without enclosing `for'
  o `switch' statement.

- In clang the `break' statement is bound to the defining `for'.  If
  the defining `for' is itself inside some breakable construct, then
  clang emits a -Wgcc-compat warning.

This patch adds a new macro can_loop to bpf_experimental, that
implements the same logic than cond_break but evaluates to a boolean
expression.  The patch also changes all the current instances of usage
of cond_break withing the header of loop accordingly.

Tested in bpf-next master.
No regressions.

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Cc: david.faust@oracle.com
Cc: cupertino.miranda@oracle.com
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Link: https://lore.kernel.org/r/20240511212243.23477-1-jose.marchesi@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 months agobpf: ignore expected GCC warning in test_global_func10.c
Jose E. Marchesi [Sat, 11 May 2024 21:23:49 +0000 (23:23 +0200)]
bpf: ignore expected GCC warning in test_global_func10.c

The BPF selftest global_func10 in progs/test_global_func10.c contains:

  struct Small {
   long x;
  };

  struct Big {
   long x;
   long y;
  };

  [...]

  __noinline int foo(const struct Big *big)
  {
if (!big)
return 0;

return bpf_get_prandom_u32() < big->y;
  }

  [...]

  SEC("cgroup_skb/ingress")
  __failure __msg("invalid indirect access to stack")
  int global_func10(struct __sk_buff *skb)
  {
const struct Small small = {.x = skb->len };

return foo((struct Big *)&small) ? 1 : 0;
  }

GCC emits a "maybe uninitialized" warning for the code above, because
it knows `foo' accesses `big->y'.

Since the purpose of this selftest is to check that the verifier will
fail on this sort of invalid memory access, this patch just silences
the compiler warning.

Tested in bpf-next master.
No regressions.

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Cc: david.faust@oracle.com
Cc: cupertino.miranda@oracle.com
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240511212349.23549-1-jose.marchesi@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 months agobpf: disable strict aliasing in test_global_func9.c
Jose E. Marchesi [Sat, 11 May 2024 21:22:13 +0000 (23:22 +0200)]
bpf: disable strict aliasing in test_global_func9.c

The BPF selftest test_global_func9.c performs type punning and breaks
srict-aliasing rules.

In particular, given:

  int global_func9(struct __sk_buff *skb)
  {
int result = 0;

[...]
{
const struct C c = {.x = skb->len, .y = skb->family };

result |= foo((const struct S *)&c);
}
  }

When building with strict-aliasing enabled (the default) the
initialization of `c' gets optimized away in its entirely:

[... no initialization of `c' ...]
r1 = r10
r1 += -40
call foo
w0 |= w6

Since GCC knows that `foo' accesses s->x, we get a "maybe
uninitialized" warning.

On the other hand, when strict-aliasing is disabled GCC only optimizes
away the store to `.y':

r1 = *(u32 *) (r6+0)
*(u32 *) (r10+-40) = r1  ; This is .x = skb->len in `c'
r1 = r10
r1 += -40
call foo
w0 |= w6

In this case the warning is not emitted, because s-> is initialized.

This patch disables strict aliasing in this test when building with
GCC.  clang seems to not optimize this particular code even when
strict aliasing is enabled.

Tested in bpf-next master.

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Cc: david.faust@oracle.com
Cc: cupertino.miranda@oracle.com
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240511212213.23418-1-jose.marchesi@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 months agoselftests/bpf: Free strdup memory in xdp_hw_metadata
Geliang Tang [Sat, 11 May 2024 08:50:24 +0000 (16:50 +0800)]
selftests/bpf: Free strdup memory in xdp_hw_metadata

The strdup() function returns a pointer to a new string which is a
duplicate of the string "ifname". Memory for the new string is obtained
with malloc(), and need to be freed with free().

This patch adds this missing "free(saved_hwtstamp_ifname)" in cleanup()
to avoid a potential memory leak in xdp_hw_metadata.c.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/af9bcccb96655e82de5ce2b4510b88c9c8ed5ed0.1715417367.git.tanggeliang@kylinos.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 months agoselftests/bpf: Fix a few tests for GCC related warnings.
Cupertino Miranda [Fri, 10 May 2024 18:38:50 +0000 (19:38 +0100)]
selftests/bpf: Fix a few tests for GCC related warnings.

This patch corrects a few warnings to allow selftests to compile for
GCC.

-- progs/cpumask_failure.c --

progs/bpf_misc.h:136:22: error: ‘cpumask’ is used uninitialized
[-Werror=uninitialized]
  136 | #define __sink(expr) asm volatile("" : "+g"(expr))
      |                      ^~~
progs/cpumask_failure.c:68:9: note: in expansion of macro ‘__sink’
   68 |         __sink(cpumask);

The macro __sink(cpumask) with the '+' contraint modifier forces the
the compiler to expect a read and write from cpumask. GCC detects
that cpumask is never initialized and reports an error.
This patch removes the spurious non required definitions of cpumask.

-- progs/dynptr_fail.c --

progs/dynptr_fail.c:1444:9: error: ‘ptr1’ may be used uninitialized
[-Werror=maybe-uninitialized]
 1444 |         bpf_dynptr_clone(&ptr1, &ptr2);

Many of the tests in the file are related to the detection of
uninitialized pointers by the verifier. GCC is able to detect possible
uninitialized values, and reports this as an error.
The patch initializes all of the previous uninitialized structs.

-- progs/test_tunnel_kern.c --

progs/test_tunnel_kern.c:590:9: error: array subscript 1 is outside
array bounds of ‘struct geneve_opt[1]’ [-Werror=array-bounds=]
  590 |         *(int *) &gopt.opt_data = bpf_htonl(0xdeadbeef);
      |         ^~~~~~~~~~~~~~~~~~~~~~~
progs/test_tunnel_kern.c:575:27: note: at offset 4 into object ‘gopt’ of
size 4
  575 |         struct geneve_opt gopt;

This tests accesses beyond the defined data for the struct geneve_opt
which contains as last field "u8 opt_data[0]" which clearly does not get
reserved space (in stack) in the function header. This pattern is
repeated in ip6geneve_set_tunnel and geneve_set_tunnel functions.
GCC is able to see this and emits a warning.
The patch introduces a local struct that allocates enough space to
safely allow the write to opt_data field.

-- progs/jeq_infer_not_null_fail.c --

progs/jeq_infer_not_null_fail.c:21:40: error: array subscript ‘struct
bpf_map[0]’ is partly outside array bounds of ‘struct <anonymous>[1]’
[-Werror=array-bounds=]
   21 |         struct bpf_map *inner_map = map->inner_map_meta;
      |                                        ^~
progs/jeq_infer_not_null_fail.c:14:3: note: object ‘m_hash’ of size 32
   14 | } m_hash SEC(".maps");

This example defines m_hash in the context of the compilation unit and
casts it to struct bpf_map which is much smaller than the size of struct
bpf_map. It errors out in GCC when it attempts to access an element that
would be defined in struct bpf_map outsize of the defined limits for
m_hash.
This patch disables the warning through a GCC pragma.

This changes were tested in bpf-next master selftests without any
regressions.

Signed-off-by: Cupertino Miranda <cupertino.miranda@oracle.com>
Cc: jose.marchesi@oracle.com
Cc: david.faust@oracle.com
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Link: https://lore.kernel.org/r/20240510183850.286661-2-cupertino.miranda@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 months agobpf: avoid gcc overflow warning in test_xdp_vlan.c
David Faust [Wed, 8 May 2024 19:35:12 +0000 (12:35 -0700)]
bpf: avoid gcc overflow warning in test_xdp_vlan.c

This patch fixes an integer overflow warning raised by GCC in
xdp_prognum1 of progs/test_xdp_vlan.c:

  GCC-BPF  [test_maps] test_xdp_vlan.bpf.o
progs/test_xdp_vlan.c: In function 'xdp_prognum1':
progs/test_xdp_vlan.c:163:25: error: integer overflow in expression
 '(short int)(((__builtin_constant_p((int)vlan_hdr->h_vlan_TCI)) != 0
   ? (int)(short unsigned int)((short int)((int)vlan_hdr->h_vlan_TCI
   << 8 >> 8) << 8 | (short int)((int)vlan_hdr->h_vlan_TCI << 0 >> 8
   << 0)) & 61440 : (int)__builtin_bswap16(vlan_hdr->h_vlan_TCI)
   & 61440) << 8 >> 8) << 8' of type 'short int' results in '0' [-Werror=overflow]
  163 |                         bpf_htons((bpf_ntohs(vlan_hdr->h_vlan_TCI) & 0xf000)
      |                         ^~~~~~~~~

The problem lies with the expansion of the bpf_htons macro and the
expression passed into it.  The bpf_htons macro (and similarly the
bpf_ntohs macro) expand to a ternary operation using either
__builtin_bswap16 or ___bpf_swab16 to swap the bytes, depending on
whether the expression is constant.

For an expression, with 'value' as a u16, like:

  bpf_htons (value & 0xf000)

The entire (value & 0xf000) is 'x' in the expansion of ___bpf_swab16
and we get as one part of the expanded swab16:

  ((__u16)(value & 0xf000) << 8 >> 8 << 8

This will always evaluate to 0, which is intentional since this
subexpression deals with the byte guaranteed to be 0 by the mask.

However, GCC warns because the precise reason this always evaluates to 0
is an overflow.  Specifically, the plain 0xf000 in the expression is a
signed 32-bit integer, which causes 'value' to also be promoted to a
signed 32-bit integer, and the combination of the 8-bit left shift and
down-cast back to __u16 results in a signed overflow (really a 'warning:
overflow in conversion from int to __u16' which is propegated up through
the rest of the expression leading to the ultimate overflow warning
above), which is a valid warning despite being the intended result of
this code.

Clang does not warn on this case, likely because it performs constant
folding later in the compilation process relative to GCC.  It seems that
by the time clang does constant folding for this expression, the side of
the ternary with this overflow has already been discarded.

Fortunately, this warning is easily silenced by simply making the 0xf000
mask explicitly unsigned.  This has no impact on the result.

Signed-off-by: David Faust <david.faust@oracle.com>
Cc: jose.marchesi@oracle.com
Cc: cupertino.miranda@oracle.com
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240508193512.152759-1-david.faust@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 months agotools: remove redundant ethtool.h from tooling infra
Tushar Vyavahare [Wed, 8 May 2024 10:41:23 +0000 (10:41 +0000)]
tools: remove redundant ethtool.h from tooling infra

Remove the redundant ethtool.h header file from tools/include/uapi/linux.
The file is unnecessary as the system uses the kernel's
include/uapi/linux/ethtool.h directly.

Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20240508104123.434769-1-tushar.vyavahare@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 months agoMerge branch 'retire-progs-test_sock_addr'
Alexei Starovoitov [Mon, 13 May 2024 00:10:43 +0000 (17:10 -0700)]
Merge branch 'retire-progs-test_sock_addr'

Jordan Rife says:

====================
Retire progs/test_sock_addr.c

This patch series migrates remaining tests from bpf/test_sock_addr.c to
prog_tests/sock_addr.c and progs/verifier_sock_addr.c in order to fully
retire the old-style test program and expands test coverage to test
previously untested scenarios related to sockaddr hooks.

This is a continuation of the work started recently during the expansion
of prog_tests/sock_addr.c.

Link: https://lore.kernel.org/bpf/20240429214529.2644801-1-jrife@google.com/T/#u
=======
Patches
=======
* Patch 1 moves tests that check valid return values for recvmsg hooks
  into progs/verifier_sock_addr.c, a new addition to the verifier test
  suite.
* Patches 2-5 lay the groundwork for test migration, enabling
  prog_tests/sock_addr.c to handle more test dimensions.
* Patches 6-11 move existing tests to prog_tests/sock_addr.c.
* Patch 12 removes some redundant test cases.
* Patches 14-17 expand on existing test coverage.
====================

Link: https://lore.kernel.org/r/20240510190246.3247730-1-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 months agoselftests/bpf: Expand ATTACH_REJECT tests
Jordan Rife [Fri, 10 May 2024 19:02:34 +0000 (14:02 -0500)]
selftests/bpf: Expand ATTACH_REJECT tests

This expands coverage for ATTACH_REJECT tests to include connect_unix,
sendmsg_unix, recvmsg*, getsockname*, and getpeername*.

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-18-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 months agoselftests/bpf: Expand getsockname and getpeername tests
Jordan Rife [Fri, 10 May 2024 19:02:33 +0000 (14:02 -0500)]
selftests/bpf: Expand getsockname and getpeername tests

This expands coverage for getsockname and getpeername hooks to include
getsockname4, getsockname6, getpeername4, and getpeername6.

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-17-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 months agosefltests/bpf: Expand sockaddr hook deny tests
Jordan Rife [Fri, 10 May 2024 19:02:32 +0000 (14:02 -0500)]
sefltests/bpf: Expand sockaddr hook deny tests

This patch expands test coverage for EPERM tests to include connect and
bind calls and rounds out the coverage for sendmsg by adding tests for
sendmsg_unix.

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-16-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 months agoselftests/bpf: Expand sockaddr program return value tests
Jordan Rife [Fri, 10 May 2024 19:02:31 +0000 (14:02 -0500)]
selftests/bpf: Expand sockaddr program return value tests

This patch expands verifier coverage for program return values to cover
bind, connect, sendmsg, getsockname, and getpeername hooks. It also
rounds out the recvmsg coverage by adding test cases for recvmsg_unix
hooks.

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-15-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 months agoselftests/bpf: Retire test_sock_addr.(c|sh)
Jordan Rife [Fri, 10 May 2024 19:02:30 +0000 (14:02 -0500)]
selftests/bpf: Retire test_sock_addr.(c|sh)

Fully remove test_sock_addr.c and test_sock_addr.sh, as test coverage
has been fully moved to prog_tests/sock_addr.c.

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-14-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 months agoselftests/bpf: Remove redundant sendmsg test cases
Jordan Rife [Fri, 10 May 2024 19:02:29 +0000 (14:02 -0500)]
selftests/bpf: Remove redundant sendmsg test cases

Remove these test cases completely, as the same behavior is already
covered by other sendmsg* test cases in prog_tests/sock_addr.c. This
just rewrites the destination address similar to sendmsg_v4_prog and
sendmsg_v6_prog.

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-13-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 months agoselftests/bpf: Migrate ATTACH_REJECT test cases
Jordan Rife [Fri, 10 May 2024 19:02:28 +0000 (14:02 -0500)]
selftests/bpf: Migrate ATTACH_REJECT test cases

Migrate test case from bpf/test_sock_addr.c ensuring that program
attachment fails when using an inappropriate attach type.

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-12-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 months agoselftests/bpf: Migrate expected_attach_type tests
Jordan Rife [Fri, 10 May 2024 19:02:27 +0000 (14:02 -0500)]
selftests/bpf: Migrate expected_attach_type tests

Migrates tests from progs/test_sock_addr.c ensuring that programs fail
to load when the expected attach type does not match.

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-11-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 months agoselftests/bpf: Migrate wildcard destination rewrite test
Jordan Rife [Fri, 10 May 2024 19:02:26 +0000 (14:02 -0500)]
selftests/bpf: Migrate wildcard destination rewrite test

Migrate test case from bpf/test_sock_addr.c ensuring that sendmsg
respects when sendmsg6 hooks rewrite the destination IP with the IPv6
wildcard IP, [::].

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-10-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 months agoselftests/bpf: Migrate sendmsg6 v4 mapped address tests
Jordan Rife [Fri, 10 May 2024 19:02:25 +0000 (14:02 -0500)]
selftests/bpf: Migrate sendmsg6 v4 mapped address tests

Migrate test case from bpf/test_sock_addr.c ensuring that sendmsg
returns -ENOTSUPP when sending to an IPv4-mapped IPv6 address to
prog_tests/sock_addr.c.

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-9-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 months agoselftests/bpf: Migrate sendmsg deny test cases
Jordan Rife [Fri, 10 May 2024 19:02:24 +0000 (14:02 -0500)]
selftests/bpf: Migrate sendmsg deny test cases

This set of tests checks that sendmsg calls are rejected (return -EPERM)
when the sendmsg* hook returns 0. Replace those in bpf/test_sock_addr.c
with corresponding tests in prog_tests/sock_addr.c.

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-8-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
4 months agoselftests/bpf: Migrate WILDCARD_IP test
Jordan Rife [Fri, 10 May 2024 19:02:23 +0000 (14:02 -0500)]
selftests/bpf: Migrate WILDCARD_IP test

Move wildcard IP sendmsg test case out of bpf/test_sock_addr.c into
prog_tests/sock_addr.c.

Signed-off-by: Jordan Rife <jrife@google.com>
Link: https://lore.kernel.org/r/20240510190246.3247730-7-jrife@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>