From: Linus Torvalds Date: Thu, 27 Mar 2025 04:48:21 +0000 (-0700) Subject: Merge tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev... X-Git-Tag: io_uring-6.15-20250403~82 X-Git-Url: https://git.kernel.dk/?a=commitdiff_plain;h=1a9239bb4253f9076b5b4b2a1a4e8d7defd77a95;p=linux-block.git Merge tag 'net-next-6.15' of git://git./linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Continue Netlink conversions to per-namespace RTNL lock (IPv4 routing, routing rules, routing next hops, ARP ioctls) - Continue extending the use of netdev instance locks. As a driver opt-in protect queue operations and (in due course) ethtool operations with the instance lock and not RTNL lock. - Support collecting TCP timestamps (data submitted, sent, acked) in BPF, allowing for transparent (to the application) and lower overhead tracking of TCP RPC performance. - Tweak existing networking Rx zero-copy infra to support zero-copy Rx via io_uring. - Optimize MPTCP performance in single subflow mode by 29%. - Enable GRO on packets which went thru XDP CPU redirect (were queued for processing on a different CPU). Improving TCP stream performance up to 2x. - Improve performance of contended connect() by 200% by searching for an available 4-tuple under RCU rather than a spin lock. Bring an additional 229% improvement by tweaking hash distribution. - Avoid unconditionally touching sk_tsflags on RX, improving performance under UDP flood by as much as 10%. - Avoid skb_clone() dance in ping_rcv() to improve performance under ping flood. - Avoid FIB lookup in netfilter if socket is available, 20% perf win. - Rework network device creation (in-kernel) API to more clearly identify network namespaces and their roles. There are up to 4 namespace roles but we used to have just 2 netns pointer arguments, interpreted differently based on context. - Use sysfs_break_active_protection() instead of trylock to avoid deadlocks between unregistering objects and sysfs access. - Add a new sysctl and sockopt for capping max retransmit timeout in TCP. - Support masking port and DSCP in routing rule matches. - Support dumping IPv4 multicast addresses with RTM_GETMULTICAST. - Support specifying at what time packet should be sent on AF_XDP sockets. - Expose TCP ULP diagnostic info (for TLS and MPTCP) to non-admin users. - Add Netlink YAML spec for WiFi (nl80211) and conntrack. - Introduce EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL() for symbols which only need to be exported when IPv6 support is built as a module. - Age FDB entries based on Rx not Tx traffic in VxLAN, similar to normal bridging. - Allow users to specify source port range for GENEVE tunnels. - netconsole: allow attaching kernel release, CPU ID and task name to messages as metadata Driver API: - Continue rework / fixing of Energy Efficient Ethernet (EEE) across the SW layers. Delegate the responsibilities to phylink where possible. Improve its handling in phylib. - Support symmetric OR-XOR RSS hashing algorithm. - Support tracking and preserving IRQ affinity by NAPI itself. - Support loopback mode speed selection for interface selftests. Device drivers: - Remove the IBM LCS driver for s390 - Remove the sb1000 cable modem driver - Add support for SFP module access over SMBus - Add MCTP transport driver for MCTP-over-USB - Enable XDP metadata support in multiple drivers - Ethernet high-speed NICs: - Broadcom (bnxt): - add PCIe TLP Processing Hints (TPH) support for new AMD platforms - support dumping RoCE queue state for debug - opt into instance locking - Intel (100G, ice, idpf): - ice: rework MSI-X IRQ management and distribution - ice: support for E830 devices - iavf: add support for Rx timestamping - iavf: opt into instance locking - nVidia/Mellanox: - mlx4: use page pool memory allocator for Rx - mlx5: support for one PTP device per hardware clock - mlx5: support for 200Gbps per-lane link modes - mlx5: move IPSec policy check after decryption - AMD/Solarflare: - support FW flashing via devlink - Cisco (enic): - use page pool memory allocator for Rx - enable 32, 64 byte CQEs - get max rx/tx ring size from the device - Meta (fbnic): - support flow steering and RSS configuration - report queue stats - support TCP segmentation - support IRQ coalescing - support ring size configuration - Marvell/Cavium: - support AF_XDP - Wangxun: - support for PTP clock and timestamping - Huawei (hibmcge): - checksum offload - add more statistics - Ethernet virtual: - VirtIO net: - aggressively suppress Tx completions, improve perf by 96% with 1 CPU and 55% with 2 CPUs - expose NAPI to IRQ mapping and persist NAPI settings - Google (gve): - support XDP in DQO RDA Queue Format - opt into instance locking - Microsoft vNIC: - support BIG TCP - Ethernet NICs consumer, and embedded: - Synopsys (stmmac): - cleanup Tx and Tx clock setting and other link-focused cleanups - enable SGMII and 2500BASEX mode switching for Intel platforms - support Sophgo SG2044 - Broadcom switches (b53): - support for BCM53101 - TI: - iep: add perout configuration support - icssg: support XDP - Cadence (macb): - implement BQL - Xilinx (axinet): - support dynamic IRQ moderation and changing coalescing at runtime - implement BQL - report standard stats - MediaTek: - support phylink managed EEE - Intel: - igc: don't restart the interface on every XDP program change - RealTek (r8169): - support reading registers of internal PHYs directly - increase max jumbo packet size on RTL8125/RTL8126 - Airoha: - support for RISC-V NPU packet processing unit - enable scatter-gather and support MTU up to 9kB - Tehuti (tn40xx): - support cards with TN4010 MAC and an Aquantia AQR105 PHY - Ethernet PHYs: - support for TJA1102S, TJA1121 - dp83tg720: add randomized polling intervals for link detection - dp83822: support changing the transmit amplitude voltage - support for LEDs on 88q2xxx - CAN: - canxl: support Remote Request Substitution bit access - flexcan: add S32G2/S32G3 SoC - WiFi: - remove cooked monitor support - strict mode for better AP testing - basic EPCS support - OMI RX bandwidth reduction support - batman-adv: add support for jumbo frames - WiFi drivers: - RealTek (rtw88): - support RTL8814AE and RTL8814AU - RealTek (rtw89): - switch using wiphy_lock and wiphy_work - add BB context to manipulate two PHY as preparation of MLO - improve BT-coexistence mechanism to play A2DP smoothly - Intel (iwlwifi): - add new iwlmld sub-driver for latest HW/FW combinations - MediaTek (mt76): - preparation for mt7996 Multi-Link Operation (MLO) support - Qualcomm/Atheros (ath12k): - continued work on MLO - Silabs (wfx): - Wake-on-WLAN support - Bluetooth: - add support for skb TX SND/COMPLETION timestamping - hci_core: enable buffer flow control for SCO/eSCO - coredump: log devcd dumps into the monitor - Bluetooth drivers: - intel: add support to configure TX power - nxp: handle bootloader error during cmd5 and cmd7" * tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1681 commits) unix: fix up for "apparmor: add fine grained af_unix mediation" mctp: Fix incorrect tx flow invalidation condition in mctp-i2c net: usb: asix: ax88772: Increase phy_name size net: phy: Introduce PHY_ID_SIZE — minimum size for PHY ID string net: libwx: fix Tx L4 checksum net: libwx: fix Tx descriptor content for some tunnel packets atm: Fix NULL pointer dereference net: tn40xx: add pci-id of the aqr105-based Tehuti TN4010 cards net: tn40xx: prepare tn40xx driver to find phy of the TN9510 card net: tn40xx: create swnode for mdio and aqr105 phy and add to mdiobus net: phy: aquantia: add essential functions to aqr105 driver net: phy: aquantia: search for firmware-name in fwnode net: phy: aquantia: add probe function to aqr105 for firmware loading net: phy: Add swnode support to mdiobus_scan gve: add XDP DROP and PASS support for DQ gve: update XDP allocation path support RX buffer posting gve: merge packet buffer size fields gve: update GQ RX to use buf_size gve: introduce config-based allocation for XDP gve: remove xdp_xsk_done and xdp_xsk_wakeup statistics ... --- 1a9239bb4253f9076b5b4b2a1a4e8d7defd77a95 diff --cc fs/eventpoll.c index 9b06a0ab9c32,2fecf66661e9..100376863a44 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@@ -447,8 -447,8 +447,8 @@@ static bool ep_busy_loop(struct eventpo if (!budget) budget = BUSY_POLL_BUDGET; - if (napi_id >= MIN_NAPI_ID && ep_busy_loop_on(ep)) { + if (napi_id_valid(napi_id) && ep_busy_loop_on(ep)) { - napi_busy_loop(napi_id, nonblock ? NULL : ep_busy_loop_end, + napi_busy_loop(napi_id, ep_busy_loop_end, ep, prefer_busy_poll, budget); if (ep_events_available(ep)) return true; diff --cc lib/tests/Makefile index 498915255860,8e4f42cb9c54..a434c7cb733a --- a/lib/tests/Makefile +++ b/lib/tests/Makefile @@@ -1,44 -1,1 +1,45 @@@ +# SPDX-License-Identifier: GPL-2.0 +# +# Makefile for tests of kernel library functions. + +# KUnit tests +CFLAGS_bitfield_kunit.o := $(DISABLE_STRUCTLEAK_PLUGIN) +obj-$(CONFIG_BITFIELD_KUNIT) += bitfield_kunit.o +obj-$(CONFIG_BITS_TEST) += test_bits.o ++obj-$(CONFIG_BLACKHOLE_DEV_KUNIT_TEST) += blackhole_dev_kunit.o +obj-$(CONFIG_CHECKSUM_KUNIT) += checksum_kunit.o +obj-$(CONFIG_CMDLINE_KUNIT_TEST) += cmdline_kunit.o +obj-$(CONFIG_CPUMASK_KUNIT_TEST) += cpumask_kunit.o +obj-$(CONFIG_CRC_KUNIT_TEST) += crc_kunit.o +CFLAGS_fortify_kunit.o += $(call cc-disable-warning, unsequenced) +CFLAGS_fortify_kunit.o += $(call cc-disable-warning, stringop-overread) +CFLAGS_fortify_kunit.o += $(call cc-disable-warning, stringop-truncation) +CFLAGS_fortify_kunit.o += $(DISABLE_STRUCTLEAK_PLUGIN) +obj-$(CONFIG_FORTIFY_KUNIT_TEST) += fortify_kunit.o +CFLAGS_test_fprobe.o += $(CC_FLAGS_FTRACE) +obj-$(CONFIG_FPROBE_SANITY_TEST) += test_fprobe.o +obj-$(CONFIG_HASHTABLE_KUNIT_TEST) += hashtable_test.o +obj-$(CONFIG_HASH_KUNIT_TEST) += test_hash.o +obj-$(CONFIG_TEST_IOV_ITER) += kunit_iov_iter.o +obj-$(CONFIG_IS_SIGNED_TYPE_KUNIT_TEST) += is_signed_type_kunit.o +obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o +obj-$(CONFIG_LIST_KUNIT_TEST) += list-test.o +obj-$(CONFIG_KFIFO_KUNIT_TEST) += kfifo_kunit.o +obj-$(CONFIG_TEST_LIST_SORT) += test_list_sort.o +obj-$(CONFIG_LINEAR_RANGES_TEST) += test_linear_ranges.o +obj-$(CONFIG_MEMCPY_KUNIT_TEST) += memcpy_kunit.o +CFLAGS_overflow_kunit.o = $(call cc-disable-warning, tautological-constant-out-of-range-compare) +obj-$(CONFIG_OVERFLOW_KUNIT_TEST) += overflow_kunit.o +obj-$(CONFIG_PRINTF_KUNIT_TEST) += printf_kunit.o +obj-$(CONFIG_SCANF_KUNIT_TEST) += scanf_kunit.o +obj-$(CONFIG_SIPHASH_KUNIT_TEST) += siphash_kunit.o +obj-$(CONFIG_SLUB_KUNIT_TEST) += slub_kunit.o +obj-$(CONFIG_TEST_SORT) += test_sort.o +CFLAGS_stackinit_kunit.o += $(call cc-disable-warning, switch-unreachable) +obj-$(CONFIG_STACKINIT_KUNIT_TEST) += stackinit_kunit.o +obj-$(CONFIG_STRING_KUNIT_TEST) += string_kunit.o +obj-$(CONFIG_STRING_HELPERS_KUNIT_TEST) += string_helpers_kunit.o +obj-$(CONFIG_USERCOPY_KUNIT_TEST) += usercopy_kunit.o +obj-$(CONFIG_UTIL_MACROS_KUNIT) += util_macros_kunit.o + obj-$(CONFIG_TEST_RUNTIME_MODULE) += module/ diff --cc lib/tests/blackhole_dev_kunit.c index 000000000000,000000000000..06834ab35f43 new file mode 100644 --- /dev/null +++ b/lib/tests/blackhole_dev_kunit.c @@@ -1,0 -1,0 +1,85 @@@ ++// SPDX-License-Identifier: GPL-2.0 ++/* ++ * This tests the blackhole_dev that is created during the ++ * net subsystem initialization. The test this module performs is ++ * by injecting an skb into the stack with skb->dev as the ++ * blackhole_dev and expects kernel to behave in a sane manner ++ * (in other words, *not crash*)! ++ * ++ * Copyright (c) 2018, Mahesh Bandewar ++ */ ++ ++#include ++#include ++#include ++#include ++#include ++#include ++ ++#include ++ ++#define SKB_SIZE 256 ++#define HEAD_SIZE (14+40+8) /* Ether + IPv6 + UDP */ ++#define TAIL_SIZE 32 /* random tail-room */ ++ ++#define UDP_PORT 1234 ++ ++static void test_blackholedev(struct kunit *test) ++{ ++ struct ipv6hdr *ip6h; ++ struct sk_buff *skb; ++ struct udphdr *uh; ++ int data_len; ++ ++ skb = alloc_skb(SKB_SIZE, GFP_KERNEL); ++ KUNIT_ASSERT_NOT_NULL(test, skb); ++ ++ /* Reserve head-room for the headers */ ++ skb_reserve(skb, HEAD_SIZE); ++ ++ /* Add data to the skb */ ++ data_len = SKB_SIZE - (HEAD_SIZE + TAIL_SIZE); ++ memset(__skb_put(skb, data_len), 0xf, data_len); ++ ++ /* Add protocol data */ ++ /* (Transport) UDP */ ++ uh = (struct udphdr *)skb_push(skb, sizeof(struct udphdr)); ++ skb_set_transport_header(skb, 0); ++ uh->source = uh->dest = htons(UDP_PORT); ++ uh->len = htons(data_len); ++ uh->check = 0; ++ /* (Network) IPv6 */ ++ ip6h = (struct ipv6hdr *)skb_push(skb, sizeof(struct ipv6hdr)); ++ skb_set_network_header(skb, 0); ++ ip6h->hop_limit = 32; ++ ip6h->payload_len = htons(data_len + sizeof(struct udphdr)); ++ ip6h->nexthdr = IPPROTO_UDP; ++ ip6h->saddr = in6addr_loopback; ++ ip6h->daddr = in6addr_loopback; ++ /* Ether */ ++ skb_push(skb, sizeof(struct ethhdr)); ++ skb_set_mac_header(skb, 0); ++ ++ skb->protocol = htons(ETH_P_IPV6); ++ skb->pkt_type = PACKET_HOST; ++ skb->dev = blackhole_netdev; ++ ++ /* Now attempt to send the packet */ ++ KUNIT_EXPECT_EQ(test, dev_queue_xmit(skb), NET_XMIT_SUCCESS); ++} ++ ++static struct kunit_case blackholedev_cases[] = { ++ KUNIT_CASE(test_blackholedev), ++ {}, ++}; ++ ++static struct kunit_suite blackholedev_suite = { ++ .name = "blackholedev", ++ .test_cases = blackholedev_cases, ++}; ++ ++kunit_test_suite(blackholedev_suite); ++ ++MODULE_AUTHOR("Mahesh Bandewar "); ++MODULE_DESCRIPTION("module test of the blackhole_dev"); ++MODULE_LICENSE("GPL"); diff --cc net/core/dev.c index 901514e42d15,b597cc27a115..be17e0660144 --- a/net/core/dev.c +++ b/net/core/dev.c @@@ -7016,11 -7164,10 +7164,9 @@@ void netif_napi_add_weight_locked(struc INIT_LIST_HEAD(&napi->poll_list); INIT_HLIST_NODE(&napi->napi_hash_node); - hrtimer_init(&napi->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED); - napi->timer.function = napi_watchdog; + hrtimer_setup(&napi->timer, napi_watchdog, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED); - init_gro_hash(napi); + gro_init(&napi->gro); napi->skb = NULL; - INIT_LIST_HEAD(&napi->rx_list); - napi->rx_count = 0; napi->poll = poll; if (weight > NAPI_POLL_WEIGHT) netdev_err_once(dev, "%s() called with weight %d\n", __func__,