Nikita Yushchenko [Wed, 11 Dec 2024 05:30:12 +0000 (10:30 +0500)]
net: renesas: rswitch: fix initial MPIC register setting
MPIC.PIS must be set per phy interface type.
MPIC.LSC must be set per speed.
Do that strictly per datasheet, instead of hardcoding MPIC.PIS to GMII.
Fixes:
3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"")
Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20241211053012.368914-1-nikita.yoush@cogentembedded.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 12 Dec 2024 12:11:38 +0000 (13:11 +0100)]
Merge tag 'nf-24-12-11' of git://git./linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Fix bogus test reports in rpath.sh selftest by adding permanent
neighbor entries, from Phil Sutter.
2) Lockdep reports possible ABBA deadlock in xt_IDLETIMER, fix it by
removing sysfs out of the mutex section, also from Phil Sutter.
3) It is illegal to release basechain via RCU callback, for several
reasons. Keep it simple and safe by calling synchronize_rcu() instead.
This is a partially reverting a botched recent attempt of me to fix
this basechain release path on netdevice removal.
From Florian Westphal.
netfilter pull request 24-12-11
* tag 'nf-24-12-11' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: do not defer rule destruction via call_rcu
netfilter: IDLETIMER: Fix for possible ABBA deadlock
selftests: netfilter: Stabilize rpath.sh
====================
Link: https://patch.msgid.link/20241211230130.176937-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Borkmann [Tue, 10 Dec 2024 14:12:45 +0000 (15:12 +0100)]
team: Fix feature propagation of NETIF_F_GSO_ENCAP_ALL
Similar to bonding driver, add NETIF_F_GSO_ENCAP_ALL to TEAM_VLAN_FEATURES
in order to support slave devices which propagate NETIF_F_GSO_UDP_TUNNEL &
NETIF_F_GSO_UDP_TUNNEL_CSUM as vlan_features.
Fixes:
3625920b62c3 ("teaming: fix vlan_features computing")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Cc: Ido Schimmel <idosch@idosch.org>
Cc: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20241210141245.327886-5-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Borkmann [Tue, 10 Dec 2024 14:12:44 +0000 (15:12 +0100)]
team: Fix initial vlan_feature set in __team_compute_features
Similarly as with bonding, fix the calculation of vlan_features to reuse
netdev_base_features() in order derive the set in the same way as
ndo_fix_features before iterating through the slave devices to refine the
feature set.
Fixes:
3625920b62c3 ("teaming: fix vlan_features computing")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Cc: Ido Schimmel <idosch@idosch.org>
Cc: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20241210141245.327886-4-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Borkmann [Tue, 10 Dec 2024 14:12:43 +0000 (15:12 +0100)]
bonding: Fix feature propagation of NETIF_F_GSO_ENCAP_ALL
Drivers like mlx5 expose NIC's vlan_features such as
NETIF_F_GSO_UDP_TUNNEL & NETIF_F_GSO_UDP_TUNNEL_CSUM which are
later not propagated when the underlying devices are bonded and
a vlan device created on top of the bond.
Right now, the more cumbersome workaround for this is to create
the vlan on top of the mlx5 and then enslave the vlan devices
to a bond.
To fix this, add NETIF_F_GSO_ENCAP_ALL to BOND_VLAN_FEATURES
such that bond_compute_features() can probe and propagate the
vlan_features from the slave devices up to the vlan device.
Given the following bond:
# ethtool -i enp2s0f{0,1}np{0,1}
driver: mlx5_core
[...]
# ethtool -k enp2s0f0np0 | grep udp
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-udp-segmentation: on
rx-udp_tunnel-port-offload: on
rx-udp-gro-forwarding: off
# ethtool -k enp2s0f1np1 | grep udp
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-udp-segmentation: on
rx-udp_tunnel-port-offload: on
rx-udp-gro-forwarding: off
# ethtool -k bond0 | grep udp
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-udp-segmentation: on
rx-udp_tunnel-port-offload: off [fixed]
rx-udp-gro-forwarding: off
Before:
# ethtool -k bond0.100 | grep udp
tx-udp_tnl-segmentation: off [requested on]
tx-udp_tnl-csum-segmentation: off [requested on]
tx-udp-segmentation: on
rx-udp_tunnel-port-offload: off [fixed]
rx-udp-gro-forwarding: off
After:
# ethtool -k bond0.100 | grep udp
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-udp-segmentation: on
rx-udp_tunnel-port-offload: off [fixed]
rx-udp-gro-forwarding: off
Various users have run into this reporting performance issues when
configuring Cilium in vxlan tunneling mode and having the combination
of bond & vlan for the core devices connecting the Kubernetes cluster
to the outside world.
Fixes:
a9b3ace44c7d ("bonding: fix vlan_features computing")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Cc: Ido Schimmel <idosch@idosch.org>
Cc: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20241210141245.327886-3-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Borkmann [Tue, 10 Dec 2024 14:12:42 +0000 (15:12 +0100)]
bonding: Fix initial {vlan,mpls}_feature set in bond_compute_features
If a bonding device has slave devices, then the current logic to derive
the feature set for the master bond device is limited in that flags which
are fully supported by the underlying slave devices cannot be propagated
up to vlan devices which sit on top of bond devices. Instead, these get
blindly masked out via current NETIF_F_ALL_FOR_ALL logic.
vlan_features and mpls_features should reuse netdev_base_features() in
order derive the set in the same way as ndo_fix_features before iterating
through the slave devices to refine the feature set.
Fixes:
a9b3ace44c7d ("bonding: fix vlan_features computing")
Fixes:
2e770b507ccd ("net: bonding: Inherit MPLS features from slave devices")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Cc: Ido Schimmel <idosch@idosch.org>
Cc: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20241210141245.327886-2-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Daniel Borkmann [Tue, 10 Dec 2024 14:12:41 +0000 (15:12 +0100)]
net, team, bonding: Add netdev_base_features helper
Both bonding and team driver have logic to derive the base feature
flags before iterating over their slave devices to refine the set
via netdev_increment_features().
Add a small helper netdev_base_features() so this can be reused
instead of having it open-coded multiple times.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Cc: Ido Schimmel <idosch@idosch.org>
Cc: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20241210141245.327886-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Martin Ottens [Tue, 10 Dec 2024 13:14:11 +0000 (14:14 +0100)]
net/sched: netem: account for backlog updates from child qdisc
In general, 'qlen' of any classful qdisc should keep track of the
number of packets that the qdisc itself and all of its children holds.
In case of netem, 'qlen' only accounts for the packets in its internal
tfifo. When netem is used with a child qdisc, the child qdisc can use
'qdisc_tree_reduce_backlog' to inform its parent, netem, about created
or dropped SKBs. This function updates 'qlen' and the backlog statistics
of netem, but netem does not account for changes made by a child qdisc.
'qlen' then indicates the wrong number of packets in the tfifo.
If a child qdisc creates new SKBs during enqueue and informs its parent
about this, netem's 'qlen' value is increased. When netem dequeues the
newly created SKBs from the child, the 'qlen' in netem is not updated.
If 'qlen' reaches the configured sch->limit, the enqueue function stops
working, even though the tfifo is not full.
Reproduce the bug:
Ensure that the sender machine has GSO enabled. Configure netem as root
qdisc and tbf as its child on the outgoing interface of the machine
as follows:
$ tc qdisc add dev <oif> root handle 1: netem delay 100ms limit 100
$ tc qdisc add dev <oif> parent 1:0 tbf rate 50Mbit burst 1542 latency 50ms
Send bulk TCP traffic out via this interface, e.g., by running an iPerf3
client on the machine. Check the qdisc statistics:
$ tc -s qdisc show dev <oif>
Statistics after 10s of iPerf3 TCP test before the fix (note that
netem's backlog > limit, netem stopped accepting packets):
qdisc netem 1: root refcnt 2 limit 1000 delay 100ms
Sent
2767766 bytes 1848 pkt (dropped 652, overlimits 0 requeues 0)
backlog
4294528236b 1155p requeues 0
qdisc tbf 10: parent 1:1 rate 50Mbit burst 1537b lat 50ms
Sent
2767766 bytes 1848 pkt (dropped 327, overlimits 7601 requeues 0)
backlog 0b 0p requeues 0
Statistics after the fix:
qdisc netem 1: root refcnt 2 limit 1000 delay 100ms
Sent
37766372 bytes 24974 pkt (dropped 9, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc tbf 10: parent 1:1 rate 50Mbit burst 1537b lat 50ms
Sent
37766372 bytes 24974 pkt (dropped 327, overlimits 96017 requeues 0)
backlog 0b 0p requeues 0
tbf segments the GSO SKBs (tbf_segment) and updates the netem's 'qlen'.
The interface fully stops transferring packets and "locks". In this case,
the child qdisc and tfifo are empty, but 'qlen' indicates the tfifo is at
its limit and no more packets are accepted.
This patch adds a counter for the entries in the tfifo. Netem's 'qlen' is
only decreased when a packet is returned by its dequeue function, and not
during enqueuing into the child qdisc. External updates to 'qlen' are thus
accounted for and only the behavior of the backlog statistics changes. As
in other qdiscs, 'qlen' then keeps track of how many packets are held in
netem and all of its children. As before, sch->limit remains as the
maximum number of packets in the tfifo. The same applies to netem's
backlog statistics.
Fixes:
50612537e9ab ("netem: fix classful handling")
Signed-off-by: Martin Ottens <martin.ottens@fau.de>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20241210131412.1837202-1-martin.ottens@fau.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 12 Dec 2024 04:25:59 +0000 (20:25 -0800)]
Merge tag 'batadv-net-pullrequest-
20241210' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
Here are some batman-adv bugfixes:
- fix TT unitialized data and size limit issues, by Remi Pommarel
(3 patches)
* tag 'batadv-net-pullrequest-
20241210' of git://git.open-mesh.org/linux-merge:
batman-adv: Do not let TT changes list grows indefinitely
batman-adv: Remove uninitialized data in full table TT response
batman-adv: Do not send uninitialized TT changes
====================
Link: https://patch.msgid.link/20241210135024.39068-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 10 Dec 2024 13:26:40 +0000 (15:26 +0200)]
net: dsa: felix: fix stuck CPU-injected packets with short taprio windows
With this port schedule:
tc qdisc replace dev $send_if parent root handle 100 taprio \
num_tc 8 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
map 0 1 2 3 4 5 6 7 \
base-time 0 cycle-time 10000 \
sched-entry S 01 1250 \
sched-entry S 02 1250 \
sched-entry S 04 1250 \
sched-entry S 08 1250 \
sched-entry S 10 1250 \
sched-entry S 20 1250 \
sched-entry S 40 1250 \
sched-entry S 80 1250 \
flags 2
ptp4l would fail to take TX timestamps of Pdelay_Resp messages like this:
increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug
ptp4l[4134.168]: port 2: send peer delay response failed
It turns out that the driver can't take their TX timestamps because it
can't transmit them in the first place. And there's nothing special
about the Pdelay_Resp packets - they're just regular 68 byte packets.
But with this taprio configuration, the switch would refuse to send even
the ETH_ZLEN minimum packet size.
This should have definitely not been the case. When applying the taprio
config, the driver prints:
mscc_felix 0000:00:00.5: port 0 tc 0 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 1 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 2 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 3 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 4 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 5 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 6 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 132 octets including FCS
and thus, everything under 132 bytes - ETH_FCS_LEN should have been sent
without problems. Yet it's not.
For the forwarding path, the configuration is fine, yet packets injected
from Linux get stuck with this schedule no matter what.
The first hint that the static guard bands are the cause of the problem
is that reverting Michael Walle's commit
297c4de6f780 ("net: dsa: felix:
re-enable TAS guard band mode") made things work. It must be that the
guard bands are calculated incorrectly.
I remembered that there is a magic constant in the driver, set to 33 ns
for no logical reason other than experimentation, which says "never let
the static guard bands get so large as to leave less than this amount of
remaining space in the time slot, because the queue system will refuse
to schedule packets otherwise, and they will get stuck". I had a hunch
that my previous experimentally-determined value was only good for
packets coming from the forwarding path, and that the CPU injection path
needed more.
I came to the new value of 35 ns through binary search, after seeing
that with 544 ns (the bit time required to send the Pdelay_Resp packet
at gigabit) it works. Again, this is purely experimental, there's no
logic and the manual doesn't say anything.
The new driver prints for this schedule look like this:
mscc_felix 0000:00:00.5: port 0 tc 0 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 1 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 2 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 3 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 4 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 5 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 6 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 1250 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 131 octets including FCS
So yes, the maximum MTU is now even smaller by 1 byte than before.
This is maybe counter-intuitive, but makes more sense with a diagram of
one time slot.
Before:
Gate open Gate close
| |
v 1250 ns total time slot duration v
<---------------------------------------------------->
<----><---------------------------------------------->
33 ns 1217 ns static guard band
useful
Gate open Gate close
| |
v 1250 ns total time slot duration v
<---------------------------------------------------->
<-----><--------------------------------------------->
35 ns 1215 ns static guard band
useful
The static guard band implemented by this switch hardware directly
determines the maximum allowable MTU for that traffic class. The larger
it is, the earlier the switch will stop scheduling frames for
transmission, because otherwise they might overrun the gate close time
(and avoiding that is the entire purpose of Michael's patch).
So, we now have guard bands smaller by 2 ns, thus, in this particular
case, we lose a byte of the maximum MTU.
Fixes:
11afdc6526de ("net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Michael Walle <mwalle@kernel.org>
Link: https://patch.msgid.link/20241210132640.3426788-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Frederik Deweerdt [Tue, 10 Dec 2024 05:06:48 +0000 (21:06 -0800)]
splice: do not checksum AF_UNIX sockets
When `skb_splice_from_iter` was introduced, it inadvertently added
checksumming for AF_UNIX sockets. This resulted in significant
slowdowns, for example when using sendfile over unix sockets.
Using the test code in [1] in my test setup (2G single core qemu),
the client receives a 1000M file in:
- without the patch: 1482ms (+/- 36ms)
- with the patch: 652.5ms (+/- 22.9ms)
This commit addresses the issue by marking checksumming as unnecessary in
`unix_stream_sendmsg`
Cc: stable@vger.kernel.org
Signed-off-by: Frederik Deweerdt <deweerdt.lkml@gmail.com>
Fixes:
2e910b95329c ("net: Add a function to splice pages into an skbuff for MSG_SPLICE_PAGES")
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/Z1fMaHkRf8cfubuE@xiberoa
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniele Palmas [Mon, 9 Dec 2024 15:18:21 +0000 (16:18 +0100)]
net: usb: qmi_wwan: add Telit
FE910C04 compositions
Add the following Telit
FE910C04 compositions:
0x10c0: rmnet + tty (AT/NMEA) + tty (AT) + tty (diag)
T: Bus=02 Lev=01 Prnt=03 Port=06 Cnt=01 Dev#= 13 Spd=480 MxCh= 0
D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=1bc7 ProdID=10c0 Rev=05.15
S: Manufacturer=Telit Cinterion
S: Product=FE910
S: SerialNumber=
f71b8b32
C: #Ifs= 4 Cfg#= 1 Atr=e0 MxPwr=500mA
I: If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan
E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=82(I) Atr=03(Int.) MxPS= 8 Ivl=32ms
I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=option
E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=84(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=86(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
I: If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
0x10c4: rmnet + tty (AT) + tty (AT) + tty (diag)
T: Bus=02 Lev=01 Prnt=03 Port=06 Cnt=01 Dev#= 14 Spd=480 MxCh= 0
D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=1bc7 ProdID=10c4 Rev=05.15
S: Manufacturer=Telit Cinterion
S: Product=FE910
S: SerialNumber=
f71b8b32
C: #Ifs= 4 Cfg#= 1 Atr=e0 MxPwr=500mA
I: If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan
E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=82(I) Atr=03(Int.) MxPS= 8 Ivl=32ms
I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=84(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=86(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
I: If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
0x10c8: rmnet + tty (AT) + tty (diag) + DPL (data packet logging) + adb
T: Bus=02 Lev=01 Prnt=03 Port=06 Cnt=01 Dev#= 17 Spd=480 MxCh= 0
D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=1bc7 ProdID=10c8 Rev=05.15
S: Manufacturer=Telit Cinterion
S: Product=FE910
S: SerialNumber=
f71b8b32
C: #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA
I: If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan
E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=82(I) Atr=03(Int.) MxPS= 8 Ivl=32ms
I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=84(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
I: If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I: If#= 3 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=80 Driver=(none)
E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I: If#= 4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none)
E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Link: https://patch.msgid.link/20241209151821.3688829-1-dnlplm@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 12 Dec 2024 04:21:06 +0000 (20:21 -0800)]
Merge branch 'mana-fix-few-memory-leaks-in-mana_gd_setup_irqs'
Maxim Levitsky says:
====================
MANA: Fix few memory leaks in mana_gd_setup_irqs
Fix 2 minor memory leaks in the mana driver,
introduced by commit
====================
Link: https://patch.msgid.link/20241209175751.287738-1-mlevitsk@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Levitsky [Mon, 9 Dec 2024 17:57:51 +0000 (12:57 -0500)]
net: mana: Fix irq_contexts memory leak in mana_gd_setup_irqs
gc->irq_contexts is not freeded if one of the later operations
fail.
Suggested-by: Michael Kelley <mhklinux@outlook.com>
Fixes:
8afefc361209 ("net: mana: Assigning IRQ affinity on HT cores")
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Link: https://patch.msgid.link/20241209175751.287738-3-mlevitsk@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Maxim Levitsky [Mon, 9 Dec 2024 17:57:50 +0000 (12:57 -0500)]
net: mana: Fix memory leak in mana_gd_setup_irqs
Commit
8afefc361209 ("net: mana: Assigning IRQ affinity on HT cores")
added memory allocation in mana_gd_setup_irqs of 'irqs' but the code
doesn't free this temporary array in the success path.
This was caught by kmemleak.
Fixes:
8afefc361209 ("net: mana: Assigning IRQ affinity on HT cores")
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Link: https://patch.msgid.link/20241209175751.287738-2-mlevitsk@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Simon Horman [Tue, 10 Dec 2024 13:47:44 +0000 (13:47 +0000)]
MAINTAINERS: Add ethtool.h to NETWORKING [GENERAL]
This is part of an effort to assign a section in MAINTAINERS to header
files related to Networking. In this case the files named ethool.h.
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241210-mnt-ethtool-h-v1-1-2a40b567939d@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Westphal [Sat, 7 Dec 2024 11:14:48 +0000 (12:14 +0100)]
netfilter: nf_tables: do not defer rule destruction via call_rcu
nf_tables_chain_destroy can sleep, it can't be used from call_rcu
callbacks.
Moreover, nf_tables_rule_release() is only safe for error unwinding,
while transaction mutex is held and the to-be-desroyed rule was not
exposed to either dataplane or dumps, as it deactives+frees without
the required synchronize_rcu() in-between.
nft_rule_expr_deactivate() callbacks will change ->use counters
of other chains/sets, see e.g. nft_lookup .deactivate callback, these
must be serialized via transaction mutex.
Also add a few lockdep asserts to make this more explicit.
Calling synchronize_rcu() isn't ideal, but fixing this without is hard
and way more intrusive. As-is, we can get:
WARNING: .. net/netfilter/nf_tables_api.c:5515 nft_set_destroy+0x..
Workqueue: events nf_tables_trans_destroy_work
RIP: 0010:nft_set_destroy+0x3fe/0x5c0
Call Trace:
<TASK>
nf_tables_trans_destroy_work+0x6b7/0xad0
process_one_work+0x64a/0xce0
worker_thread+0x613/0x10d0
In case the synchronize_rcu becomes an issue, we can explore alternatives.
One way would be to allocate nft_trans_rule objects + one nft_trans_chain
object, deactivate the rules + the chain and then defer the freeing to the
nft destroy workqueue. We'd still need to keep the synchronize_rcu path as
a fallback to handle -ENOMEM corner cases though.
Reported-by: syzbot+b26935466701e56cfdc2@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/
67478d92.
050a0220.253251.0062.GAE@google.com/T/
Fixes:
c03d278fdf35 ("netfilter: nf_tables: wait for rcu grace period on net_device removal")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Fri, 6 Dec 2024 18:32:29 +0000 (19:32 +0100)]
netfilter: IDLETIMER: Fix for possible ABBA deadlock
Deletion of the last rule referencing a given idletimer may happen at
the same time as a read of its file in sysfs:
| ======================================================
| WARNING: possible circular locking dependency detected
|
6.12.0-rc7-01692-g5e9a28f41134-dirty #594 Not tainted
| ------------------------------------------------------
| iptables/3303 is trying to acquire lock:
|
ffff8881057e04b8 (kn->active#48){++++}-{0:0}, at: __kernfs_remove+0x20
|
| but task is already holding lock:
|
ffffffffa0249068 (list_mutex){+.+.}-{3:3}, at: idletimer_tg_destroy_v]
|
| which lock already depends on the new lock.
A simple reproducer is:
| #!/bin/bash
|
| while true; do
| iptables -A INPUT -i foo -j IDLETIMER --timeout 10 --label "testme"
| iptables -D INPUT -i foo -j IDLETIMER --timeout 10 --label "testme"
| done &
| while true; do
| cat /sys/class/xt_idletimer/timers/testme >/dev/null
| done
Avoid this by freeing list_mutex right after deleting the element from
the list, then continuing with the teardown.
Fixes:
0902b469bd25 ("netfilter: xtables: idletimer target implementation")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Phil Sutter [Fri, 6 Dec 2024 14:08:40 +0000 (15:08 +0100)]
selftests: netfilter: Stabilize rpath.sh
On some systems, neighbor discoveries from ns1 for fec0:42::1 (i.e., the
martian trap address) would happen at the wrong time and cause
false-negative test result.
Problem analysis also discovered that IPv6 martian ping test was broken
in that sent neighbor discoveries, not echo requests were inadvertently
trapped
Avoid the race condition by introducing the neighbors to each other
upfront. Also pin down the firewall rules to matching on echo requests
only.
Fixes:
efb056e5f1f0 ("netfilter: ip6t_rpfilter: Fix regression with VRF interfaces")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Nikita Yushchenko [Mon, 9 Dec 2024 11:32:04 +0000 (16:32 +0500)]
net: renesas: rswitch: handle stop vs interrupt race
Currently the stop routine of rswitch driver does not immediately
prevent hardware from continuing to update descriptors and requesting
interrupts.
It can happen that when rswitch_stop() executes the masking of
interrupts from the queues of the port being closed, napi poll for
that port is already scheduled or running on a different CPU. When
execution of this napi poll completes, it will unmask the interrupts.
And unmasked interrupt can fire after rswitch_stop() returns from
napi_disable() call. Then, the handler won't mask it, because
napi_schedule_prep() will return false, and interrupt storm will
happen.
This can't be fixed by making rswitch_stop() call napi_disable() before
masking interrupts. In this case, the interrupt storm will happen if
interrupt fires between napi_disable() and masking.
Fix this by checking for priv->opened_ports bit when unmasking
interrupts after napi poll. For that to be consistent, move
priv->opened_ports changes into spinlock-protected areas, and reorder
other operations in rswitch_open() and rswitch_stop() accordingly.
Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Fixes:
3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"")
Link: https://patch.msgid.link/20241209113204.175015-1-nikita.yoush@cogentembedded.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 11 Dec 2024 03:02:49 +0000 (19:02 -0800)]
Merge branch 'net-renesas-rswitch-several-fixes'
Nikita Yushchenko says:
====================
net: renesas: rswitch: several fixes
This series fixes several glitches found in the rswitch driver.
Repost of https://lore.kernel.org/
20241206190015.
4194153-1-nikita.yoush@cogentembedded.com
====================
Link: https://patch.msgid.link/20241208095004.69468-1-nikita.yoush@cogentembedded.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nikita Yushchenko [Sun, 8 Dec 2024 09:50:04 +0000 (14:50 +0500)]
net: renesas: rswitch: avoid use-after-put for a device tree node
The device tree node saved in the rswitch_device structure is used at
several driver locations. So passing this node to of_node_put() after
the first use is wrong.
Move of_node_put() for this node to exit paths.
Fixes:
b46f1e579329 ("net: renesas: rswitch: Simplify struct phy * handling")
Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://patch.msgid.link/20241208095004.69468-5-nikita.yoush@cogentembedded.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nikita Yushchenko [Sun, 8 Dec 2024 09:50:03 +0000 (14:50 +0500)]
net: renesas: rswitch: fix leaked pointer on error path
If error path is taken while filling descriptor for a frame, skb
pointer is left in the entry. Later, on the ring entry reuse, the
same entry could be used as a part of a multi-descriptor frame,
and skb for that new frame could be stored in a different entry.
Then, the stale pointer will reach the completion routine, and passed
to the release operation.
Fix that by clearing the saved skb pointer at the error path.
Fixes:
d2c96b9d5f83 ("net: rswitch: Add jumbo frames handling for TX")
Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://patch.msgid.link/20241208095004.69468-4-nikita.yoush@cogentembedded.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nikita Yushchenko [Sun, 8 Dec 2024 09:50:02 +0000 (14:50 +0500)]
net: renesas: rswitch: fix race window between tx start and complete
If hardware is already transmitting, it can start handling the
descriptor being written to immediately after it observes updated DT
field, before the queue is kicked by a write to GWTRC.
If the start_xmit() execution is preempted at unfortunate moment, this
transmission can complete, and interrupt handled, before gq->cur gets
updated. With the current implementation of completion, this will cause
the last entry not completed.
Fix that by changing completion loop to check DT values directly, instead
of depending on gq->cur.
Fixes:
3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"")
Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://patch.msgid.link/20241208095004.69468-3-nikita.yoush@cogentembedded.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Nikita Yushchenko [Sun, 8 Dec 2024 09:50:01 +0000 (14:50 +0500)]
net: renesas: rswitch: fix possible early skb release
When sending frame split into multiple descriptors, hardware processes
descriptors one by one, including writing back DT values. The first
descriptor could be already marked as completed when processing of
next descriptors for the same frame is still in progress.
Although only the last descriptor is configured to generate interrupt,
completion of the first descriptor could be noticed by the driver when
handling interrupt for the previous frame.
Currently, driver stores skb in the entry that corresponds to the first
descriptor. This results into skb could be unmapped and freed when
hardware did not complete the send yet. This opens a window for
corrupting the data being sent.
Fix this by saving skb in the entry that corresponds to the last
descriptor used to send the frame.
Fixes:
d2c96b9d5f83 ("net: rswitch: Add jumbo frames handling for TX")
Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://patch.msgid.link/20241208095004.69468-2-nikita.yoush@cogentembedded.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 11 Dec 2024 02:44:24 +0000 (18:44 -0800)]
Merge tag 'wireless-2024-12-10' of https://git./linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
A small set of fixes:
- avoid CSA warnings during link removal
(by changing link bitmap after remove)
- fix # of spatial streams initialisation
- fix queues getting stuck in some CSA cases
and resume failures
- fix interface address when switching monitor mode
- fix MBSS change flags 32-bit stack corruption
- more UBSAN __counted_by "fixes" ...
- fix link ID netlink validation
* tag 'wireless-2024-12-10' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: cfg80211: sme: init n_channels before channels[] access
wifi: mac80211: fix station NSS capability initialization order
wifi: mac80211: fix vif addr when switching from monitor to station
wifi: mac80211: fix a queue stall in certain cases of CSA
wifi: mac80211: wake the queues in case of failure in resume
wifi: cfg80211: clear link ID from bitmap during link delete after clean up
wifi: mac80211: init cnt before accessing elem in ieee80211_copy_mbssid_beacon
wifi: mac80211: fix mbss changed flags corruption on 32 bit systems
wifi: nl80211: fix NL80211_ATTR_MLO_LINK_ID off-by-one
====================
Link: https://patch.msgid.link/20241210130145.28618-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
MoYuanhao [Mon, 9 Dec 2024 12:28:14 +0000 (13:28 +0100)]
tcp: check space before adding MPTCP SYN options
Ensure there is enough space before adding MPTCP options in
tcp_syn_options().
Without this check, 'remaining' could underflow, and causes issues. If
there is not enough space, MPTCP should not be used.
Signed-off-by: MoYuanhao <moyuanhao3676@163.com>
Fixes:
cec37a6e41aa ("mptcp: Handle MP_CAPABLE options for outgoing connections")
Cc: stable@vger.kernel.org
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
[ Matt: Add Fixes, cc Stable, update Description ]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20241209-net-mptcp-check-space-syn-v1-1-2da992bb6f74@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Petr Machata [Mon, 9 Dec 2024 11:05:31 +0000 (12:05 +0100)]
Documentation: networking: Add a caveat to nexthop_compat_mode sysctl
net.ipv4.nexthop_compat_mode was added when nexthop objects were added to
provide the view of nexthop objects through the usual lens of the route
UAPI. As nexthop objects evolved, the information provided through this
lens became incomplete. For example, details of resilient nexthop groups
are obviously omitted.
Now that 16-bit nexthop group weights are a thing, the 8-bit UAPI cannot
convey the >8-bit weight accurately. Instead of inventing workarounds for
an obsolete interface, just document the expectations of inaccuracy.
Fixes:
b72a6a7ab957 ("net: nexthop: Increase weight to u16")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/b575e32399ccacd09079b2a218255164535123bd.1733740749.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Mon, 9 Dec 2024 01:54:48 +0000 (17:54 -0800)]
bnxt_en: Fix aggregation ID mask to prevent oops on 5760X chips
The 5760X (P7) chip's HW GRO/LRO interface is very similar to that of
the previous generation (5750X or P5). However, the aggregation ID
fields in the completion structures on P7 have been redefined from
16 bits to 12 bits. The freed up 4 bits are redefined for part of the
metadata such as the VLAN ID. The aggregation ID mask was not modified
when adding support for P7 chips. Including the extra 4 bits for the
aggregation ID can potentially cause the driver to store or fetch the
packet header of GRO/LRO packets in the wrong TPA buffer. It may hit
the BUG() condition in __skb_pull() because the SKB contains no valid
packet header:
kernel BUG at include/linux/skbuff.h:2766!
Oops: invalid opcode: 0000 1 PREEMPT SMP NOPTI
CPU: 4 UID: 0 PID: 0 Comm: swapper/4 Kdump: loaded Tainted: G OE 6.12.0-rc2+ #7
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Dell Inc. PowerEdge R760/0VRV9X, BIOS 1.0.1 12/27/2022
RIP: 0010:eth_type_trans+0xda/0x140
Code: 80 00 00 00 eb c1 8b 47 70 2b 47 74 48 8b 97 d0 00 00 00 83 f8 01 7e 1b 48 85 d2 74 06 66 83 3a ff 74 09 b8 00 04 00 00 eb a5 <0f> 0b b8 00 01 00 00 eb 9c 48 85 ff 74 eb 31 f6 b9 02 00 00 00 48
RSP: 0018:
ff615003803fcc28 EFLAGS:
00010283
RAX:
00000000000022d2 RBX:
0000000000000003 RCX:
ff2e8c25da334040
RDX:
0000000000000040 RSI:
ff2e8c25c1ce8000 RDI:
ff2e8c25869f9000
RBP:
ff2e8c258c31c000 R08:
ff2e8c25da334000 R09:
0000000000000001
R10:
ff2e8c25da3342c0 R11:
ff2e8c25c1ce89c0 R12:
ff2e8c258e0990b0
R13:
ff2e8c25bb120000 R14:
ff2e8c25c1ce89c0 R15:
ff2e8c25869f9000
FS:
0000000000000000(0000) GS:
ff2e8c34be300000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
000055f05317e4c8 CR3:
000000108bac6006 CR4:
0000000000773ef0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe07f0 DR7:
0000000000000400
PKRU:
55555554
Call Trace:
<IRQ>
? die+0x33/0x90
? do_trap+0xd9/0x100
? eth_type_trans+0xda/0x140
? do_error_trap+0x65/0x80
? eth_type_trans+0xda/0x140
? exc_invalid_op+0x4e/0x70
? eth_type_trans+0xda/0x140
? asm_exc_invalid_op+0x16/0x20
? eth_type_trans+0xda/0x140
bnxt_tpa_end+0x10b/0x6b0 [bnxt_en]
? bnxt_tpa_start+0x195/0x320 [bnxt_en]
bnxt_rx_pkt+0x902/0xd90 [bnxt_en]
? __bnxt_tx_int.constprop.0+0x89/0x300 [bnxt_en]
? kmem_cache_free+0x343/0x440
? __bnxt_tx_int.constprop.0+0x24f/0x300 [bnxt_en]
__bnxt_poll_work+0x193/0x370 [bnxt_en]
bnxt_poll_p5+0x9a/0x300 [bnxt_en]
? try_to_wake_up+0x209/0x670
__napi_poll+0x29/0x1b0
Fix it by redefining the aggregation ID mask for P5_PLUS chips to be
12 bits. This will work because the maximum aggregation ID is less
than 4096 on all P5_PLUS chips.
Fixes:
13d2d3d381ee ("bnxt_en: Add new P7 hardware interface definitions")
Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241209015448.1937766-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Fri, 6 Dec 2024 15:49:14 +0000 (16:49 +0100)]
udp: fix l4 hash after reconnect
After the blamed commit below, udp_rehash() is supposed to be called
with both local and remote addresses set.
Currently that is already the case for IPv6 sockets, but for IPv4 the
destination address is updated after rehashing.
Address the issue moving the destination address and port initialization
before rehashing.
Fixes:
1b29a730ef8b ("ipv6/udp: Add 4-tuple hash for connected socket")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/4761e466ab9f7542c68cdc95f248987d127044d2.1733499715.git.pabeni@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Tue, 10 Dec 2024 10:22:21 +0000 (11:22 +0100)]
Merge branch 'virtio_net-correct-netdev_tx_reset_queue-invocation-points'
Koichiro Den says:
====================
virtio_net: correct netdev_tx_reset_queue() invocation points
When virtnet_close is followed by virtnet_open, some TX completions can
possibly remain unconsumed, until they are finally processed during the
first NAPI poll after the netdev_tx_reset_queue(), resulting in a crash
[1]. Commit
b96ed2c97c79 ("virtio_net: move netdev_tx_reset_queue() call
before RX napi enable") was not sufficient to eliminate all BQL crash
scenarios for virtio-net.
This issue can be reproduced with the latest net-next master by running:
`while :; do ip l set DEV down; ip l set DEV up; done` under heavy network
TX load from inside the machine.
This patch series resolves the issue and also addresses similar existing
problems:
(a). Drop netdev_tx_reset_queue() from open/close path. This eliminates the
BQL crashes due to the problematic open/close path.
(b). As a result of (a), netdev_tx_reset_queue() is now explicitly required
in freeze/restore path. Add netdev_tx_reset_queue() immediately after
free_unused_bufs() invocation.
(c). Fix missing resetting in virtnet_tx_resize().
virtnet_tx_resize() has lacked proper resetting since commit
c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits").
(d). Fix missing resetting in the XDP_SETUP_XSK_POOL path.
Similar to (c), this path lacked proper resetting. Call
netdev_tx_reset_queue() when virtqueue_reset() has actually recycled
unused buffers.
This patch series consists of six commits:
[1/6]: Resolves (a) and (b). # also -stable 6.11.y
[2/6]: Minor fix to make [4/6] streamlined.
[3/6]: Prerequisite for (c). # also -stable 6.11.y
[4/6]: Resolves (c) (incl. Prerequisite for (d)) # also -stable 6.11.y
[5/6]: Preresuisite for (d).
[6/6]: Resolves (d).
Changes for v4:
- move netdev_tx_reset_queue() out of free_unused_bufs()
- submit to net, not net-next
Changes for v3:
- replace 'flushed' argument with 'recycle_done'
Changes for v2:
- add tx queue resetting for (b) to (d) above
v3: https://lore.kernel.org/all/
20241204050724.307544-1-koichiro.den@canonical.com/
v2: https://lore.kernel.org/all/
20241203073025.67065-1-koichiro.den@canonical.com/
v1: https://lore.kernel.org/all/
20241130181744.
3772632-1-koichiro.den@canonical.com/
[1]:
------------[ cut here ]------------
kernel BUG at lib/dynamic_queue_limits.c:99!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 UID: 0 PID: 1598 Comm: ip Tainted: G N 6.12.0net-next_main+ #2
Tainted: [N]=TEST
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), \
BIOS
rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:dql_completed+0x26b/0x290
Code: b7 c2 49 89 e9 44 89 da 89 c6 4c 89 d7 e8 ed 17 47 00 58 65 ff 0d
4d 27 90 7e 0f 85 fd fe ff ff e8 ea 53 8d ff e9 f3 fe ff ff <0f> 0b 01
d2 44 89 d1 29 d1 ba 00 00 00 00 0f 48 ca e9 28 ff ff ff
RSP: 0018:
ffffc900002b0d08 EFLAGS:
00010297
RAX:
0000000000000000 RBX:
ffff888102398c80 RCX:
0000000080190009
RDX:
0000000000000000 RSI:
000000000000006a RDI:
0000000000000000
RBP:
ffff888102398c00 R08:
0000000000000000 R09:
0000000000000000
R10:
00000000000000ca R11:
0000000000015681 R12:
0000000000000001
R13:
ffffc900002b0d68 R14:
ffff88811115e000 R15:
ffff8881107aca40
FS:
00007f41ded69500(0000) GS:
ffff888667dc0000(0000)
knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000556ccc2dc1a0 CR3:
0000000104fd8003 CR4:
0000000000772ef0
PKRU:
55555554
Call Trace:
<IRQ>
? die+0x32/0x80
? do_trap+0xd9/0x100
? dql_completed+0x26b/0x290
? dql_completed+0x26b/0x290
? do_error_trap+0x6d/0xb0
? dql_completed+0x26b/0x290
? exc_invalid_op+0x4c/0x60
? dql_completed+0x26b/0x290
? asm_exc_invalid_op+0x16/0x20
? dql_completed+0x26b/0x290
__free_old_xmit+0xff/0x170 [virtio_net]
free_old_xmit+0x54/0xc0 [virtio_net]
virtnet_poll+0xf4/0xe30 [virtio_net]
? __update_load_avg_cfs_rq+0x264/0x2d0
? update_curr+0x35/0x260
? reweight_entity+0x1be/0x260
__napi_poll.constprop.0+0x28/0x1c0
net_rx_action+0x329/0x420
? enqueue_hrtimer+0x35/0x90
? trace_hardirqs_on+0x1d/0x80
? kvm_sched_clock_read+0xd/0x20
? sched_clock+0xc/0x30
? kvm_sched_clock_read+0xd/0x20
? sched_clock+0xc/0x30
? sched_clock_cpu+0xd/0x1a0
handle_softirqs+0x138/0x3e0
do_softirq.part.0+0x89/0xc0
</IRQ>
<TASK>
__local_bh_enable_ip+0xa7/0xb0
virtnet_open+0xc8/0x310 [virtio_net]
__dev_open+0xfa/0x1b0
__dev_change_flags+0x1de/0x250
dev_change_flags+0x22/0x60
do_setlink.isra.0+0x2df/0x10b0
? rtnetlink_rcv_msg+0x34f/0x3f0
? netlink_rcv_skb+0x54/0x100
? netlink_unicast+0x23e/0x390
? netlink_sendmsg+0x21e/0x490
? ____sys_sendmsg+0x31b/0x350
? avc_has_perm_noaudit+0x67/0xf0
? cred_has_capability.isra.0+0x75/0x110
? __nla_validate_parse+0x5f/0xee0
? __pfx___probestub_irq_enable+0x3/0x10
? __create_object+0x5e/0x90
? security_capable+0x3b/0x7
\e[I0
rtnl_newlink+0x784/0xaf0
? avc_has_perm_noaudit+0x67/0xf0
? cred_has_capability.isra.0+0x75/0x110
? stack_depot_save_flags+0x24/0x6d0
? __pfx_rtnl_newlink+0x10/0x10
rtnetlink_rcv_msg+0x34f/0x3f0
? do_syscall_64+0x6c/0x180
? entry_SYSCALL_64_after_hwframe+0x76/0x7e
? __pfx_rtnetlink_rcv_msg+0x10/0x10
netlink_rcv_skb+0x54/0x100
netlink_unicast+0x23e/0x390
netlink_sendmsg+0x21e/0x490
____sys_sendmsg+0x31b/0x350
? copy_msghdr_from_user+0x6d/0xa0
___sys_sendmsg+0x86/0xd0
? __pte_offset_map+0x17/0x160
? preempt_count_add+0x69/0xa0
? __call_rcu_common.constprop.0+0x147/0x610
? preempt_count_add+0x69/0xa0
? preempt_count_add+0x69/0xa0
? _raw_spin_trylock+0x13/0x60
? trace_hardirqs_on+0x1d/0x80
__sys_sendmsg+0x66/0xc0
do_syscall_64+0x6c/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f41defe5b34
Code: 15 e1 12 0f 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00 00
f3 0f 1e fa 80 3d 35 95 0f 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55
RSP: 002b:
00007ffe5336ecc8 EFLAGS:
00000202 ORIG_RAX:
000000000000002e
RAX:
ffffffffffffffda RBX:
0000000000000003 RCX:
00007f41defe5b34
RDX:
0000000000000000 RSI:
00007ffe5336ed30 RDI:
0000000000000003
RBP:
00007ffe5336eda0 R08:
0000000000000010 R09:
0000000000000001
R10:
00007ffe5336f6f9 R11:
0000000000000202 R12:
0000000000000003
R13:
0000000067452259 R14:
0000556ccc28b040 R15:
0000000000000000
</TASK>
[...]
====================
Link: https://patch.msgid.link/20241206011047.923923-1-koichiro.den@canonical.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Koichiro Den [Fri, 6 Dec 2024 01:10:47 +0000 (10:10 +0900)]
virtio_net: ensure netdev_tx_reset_queue is called on bind xsk for tx
virtnet_sq_bind_xsk_pool() flushes tx skbs and then resets tx queue, so
DQL counters need to be reset when flushing has actually occurred, Add
virtnet_sq_free_unused_buf_done() as a callback for virtqueue_resize()
to handle this.
Fixes:
21a4e3ce6dc7 ("virtio_net: xsk: bind/unbind xsk for tx")
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Koichiro Den [Fri, 6 Dec 2024 01:10:46 +0000 (10:10 +0900)]
virtio_ring: add a func argument 'recycle_done' to virtqueue_reset()
When virtqueue_reset() has actually recycled all unused buffers,
additional work may be required in some cases. Relying solely on its
return status is fragile, so introduce a new function argument
'recycle_done', which is invoked when it really occurs.
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Koichiro Den [Fri, 6 Dec 2024 01:10:45 +0000 (10:10 +0900)]
virtio_net: ensure netdev_tx_reset_queue is called on tx ring resize
virtnet_tx_resize() flushes remaining tx skbs, requiring DQL counters to
be reset when flushing has actually occurred. Add
virtnet_sq_free_unused_buf_done() as a callback for virtqueue_reset() to
handle this.
Fixes:
c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits")
Cc: <stable@vger.kernel.org> # v6.11+
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Koichiro Den [Fri, 6 Dec 2024 01:10:44 +0000 (10:10 +0900)]
virtio_ring: add a func argument 'recycle_done' to virtqueue_resize()
When virtqueue_resize() has actually recycled all unused buffers,
additional work may be required in some cases. Relying solely on its
return status is fragile, so introduce a new function argument
'recycle_done', which is invoked when the recycle really occurs.
Cc: <stable@vger.kernel.org> # v6.11+
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Koichiro Den [Fri, 6 Dec 2024 01:10:43 +0000 (10:10 +0900)]
virtio_net: replace vq2rxq with vq2txq where appropriate
While not harmful, using vq2rxq where it's always sq appears odd.
Replace it with the more appropriate vq2txq for clarity and correctness.
Fixes:
89f86675cb03 ("virtio_net: xsk: tx: support xmit xsk buffer")
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Koichiro Den [Fri, 6 Dec 2024 01:10:42 +0000 (10:10 +0900)]
virtio_net: correct netdev_tx_reset_queue() invocation point
When virtnet_close is followed by virtnet_open, some TX completions can
possibly remain unconsumed, until they are finally processed during the
first NAPI poll after the netdev_tx_reset_queue(), resulting in a crash
[1]. Commit
b96ed2c97c79 ("virtio_net: move netdev_tx_reset_queue() call
before RX napi enable") was not sufficient to eliminate all BQL crash
cases for virtio-net.
This issue can be reproduced with the latest net-next master by running:
`while :; do ip l set DEV down; ip l set DEV up; done` under heavy network
TX load from inside the machine.
netdev_tx_reset_queue() can actually be dropped from virtnet_open path;
the device is not stopped in any case. For BQL core part, it's just like
traffic nearly ceases to exist for some period. For stall detector added
to BQL, even if virtnet_close could somehow lead to some TX completions
delayed for long, followed by virtnet_open, we can just take it as stall
as mentioned in commit
6025b9135f7a ("net: dqs: add NIC stall detector
based on BQL"). Note also that users can still reset stall_max via sysfs.
So, drop netdev_tx_reset_queue() from virtnet_enable_queue_pair(). This
eliminates the BQL crashes. As a result, netdev_tx_reset_queue() is now
explicitly required in freeze/restore path. This patch adds it to
immediately after free_unused_bufs(), following the rule of thumb:
netdev_tx_reset_queue() should follow any SKB freeing not followed by
netdev_tx_completed_queue(). This seems the most consistent and
streamlined approach, and now netdev_tx_reset_queue() runs whenever
free_unused_bufs() is done.
[1]:
------------[ cut here ]------------
kernel BUG at lib/dynamic_queue_limits.c:99!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 UID: 0 PID: 1598 Comm: ip Tainted: G N 6.12.0net-next_main+ #2
Tainted: [N]=TEST
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), \
BIOS
rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:dql_completed+0x26b/0x290
Code: b7 c2 49 89 e9 44 89 da 89 c6 4c 89 d7 e8 ed 17 47 00 58 65 ff 0d
4d 27 90 7e 0f 85 fd fe ff ff e8 ea 53 8d ff e9 f3 fe ff ff <0f> 0b 01
d2 44 89 d1 29 d1 ba 00 00 00 00 0f 48 ca e9 28 ff ff ff
RSP: 0018:
ffffc900002b0d08 EFLAGS:
00010297
RAX:
0000000000000000 RBX:
ffff888102398c80 RCX:
0000000080190009
RDX:
0000000000000000 RSI:
000000000000006a RDI:
0000000000000000
RBP:
ffff888102398c00 R08:
0000000000000000 R09:
0000000000000000
R10:
00000000000000ca R11:
0000000000015681 R12:
0000000000000001
R13:
ffffc900002b0d68 R14:
ffff88811115e000 R15:
ffff8881107aca40
FS:
00007f41ded69500(0000) GS:
ffff888667dc0000(0000)
knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000556ccc2dc1a0 CR3:
0000000104fd8003 CR4:
0000000000772ef0
PKRU:
55555554
Call Trace:
<IRQ>
? die+0x32/0x80
? do_trap+0xd9/0x100
? dql_completed+0x26b/0x290
? dql_completed+0x26b/0x290
? do_error_trap+0x6d/0xb0
? dql_completed+0x26b/0x290
? exc_invalid_op+0x4c/0x60
? dql_completed+0x26b/0x290
? asm_exc_invalid_op+0x16/0x20
? dql_completed+0x26b/0x290
__free_old_xmit+0xff/0x170 [virtio_net]
free_old_xmit+0x54/0xc0 [virtio_net]
virtnet_poll+0xf4/0xe30 [virtio_net]
? __update_load_avg_cfs_rq+0x264/0x2d0
? update_curr+0x35/0x260
? reweight_entity+0x1be/0x260
__napi_poll.constprop.0+0x28/0x1c0
net_rx_action+0x329/0x420
? enqueue_hrtimer+0x35/0x90
? trace_hardirqs_on+0x1d/0x80
? kvm_sched_clock_read+0xd/0x20
? sched_clock+0xc/0x30
? kvm_sched_clock_read+0xd/0x20
? sched_clock+0xc/0x30
? sched_clock_cpu+0xd/0x1a0
handle_softirqs+0x138/0x3e0
do_softirq.part.0+0x89/0xc0
</IRQ>
<TASK>
__local_bh_enable_ip+0xa7/0xb0
virtnet_open+0xc8/0x310 [virtio_net]
__dev_open+0xfa/0x1b0
__dev_change_flags+0x1de/0x250
dev_change_flags+0x22/0x60
do_setlink.isra.0+0x2df/0x10b0
? rtnetlink_rcv_msg+0x34f/0x3f0
? netlink_rcv_skb+0x54/0x100
? netlink_unicast+0x23e/0x390
? netlink_sendmsg+0x21e/0x490
? ____sys_sendmsg+0x31b/0x350
? avc_has_perm_noaudit+0x67/0xf0
? cred_has_capability.isra.0+0x75/0x110
? __nla_validate_parse+0x5f/0xee0
? __pfx___probestub_irq_enable+0x3/0x10
? __create_object+0x5e/0x90
? security_capable+0x3b/0x70
rtnl_newlink+0x784/0xaf0
? avc_has_perm_noaudit+0x67/0xf0
? cred_has_capability.isra.0+0x75/0x110
? stack_depot_save_flags+0x24/0x6d0
? __pfx_rtnl_newlink+0x10/0x10
rtnetlink_rcv_msg+0x34f/0x3f0
? do_syscall_64+0x6c/0x180
? entry_SYSCALL_64_after_hwframe+0x76/0x7e
? __pfx_rtnetlink_rcv_msg+0x10/0x10
netlink_rcv_skb+0x54/0x100
netlink_unicast+0x23e/0x390
netlink_sendmsg+0x21e/0x490
____sys_sendmsg+0x31b/0x350
? copy_msghdr_from_user+0x6d/0xa0
___sys_sendmsg+0x86/0xd0
? __pte_offset_map+0x17/0x160
? preempt_count_add+0x69/0xa0
? __call_rcu_common.constprop.0+0x147/0x610
? preempt_count_add+0x69/0xa0
? preempt_count_add+0x69/0xa0
? _raw_spin_trylock+0x13/0x60
? trace_hardirqs_on+0x1d/0x80
__sys_sendmsg+0x66/0xc0
do_syscall_64+0x6c/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f41defe5b34
Code: 15 e1 12 0f 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00 00
f3 0f 1e fa 80 3d 35 95 0f 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55
RSP: 002b:
00007ffe5336ecc8 EFLAGS:
00000202 ORIG_RAX:
000000000000002e
RAX:
ffffffffffffffda RBX:
0000000000000003 RCX:
00007f41defe5b34
RDX:
0000000000000000 RSI:
00007ffe5336ed30 RDI:
0000000000000003
RBP:
00007ffe5336eda0 R08:
0000000000000010 R09:
0000000000000001
R10:
00007ffe5336f6f9 R11:
0000000000000202 R12:
0000000000000003
R13:
0000000067452259 R14:
0000556ccc28b040 R15:
0000000000000000
</TASK>
[...]
Fixes:
c8bd1f7f3e61 ("virtio_net: add support for Byte Queue Limits")
Cc: <stable@vger.kernel.org> # v6.11+
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
[ pabeni: trimmed possibly troublesome separator ]
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Geetha sowjanya [Thu, 5 Dec 2024 11:34:35 +0000 (17:04 +0530)]
octeontx2-af: Fix installation of PF multicast rule
Due to target variable is being reassigned in npc_install_flow()
function, PF multicast rules are not getting installed.
This patch addresses the issue by fixing the "IF" condition
checks when rules are installed by AF.
Fixes:
6c40ca957fe5 ("octeontx2-pf: Adds TC offload support").
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241205113435.10601-1-gakula@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Tue, 10 Dec 2024 02:26:53 +0000 (18:26 -0800)]
Merge branch 'qca_spi-fix-spi-specific-issues'
Stefan Wahren says:
====================
qca_spi: Fix SPI specific issues
This small series address two annoying SPI specific issues of
the qca_spi driver.
====================
Link: https://patch.msgid.link/20241206184643.123399-1-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stefan Wahren [Fri, 6 Dec 2024 18:46:43 +0000 (19:46 +0100)]
qca_spi: Make driver probing reliable
The module parameter qcaspi_pluggable controls if QCA7000 signature
should be checked at driver probe (current default) or not. Unfortunately
this could fail in case the chip is temporary in reset, which isn't under
total control by the Linux host. So disable this check per default
in order to avoid unexpected probe failures.
Fixes:
291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000")
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Link: https://patch.msgid.link/20241206184643.123399-3-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stefan Wahren [Fri, 6 Dec 2024 18:46:42 +0000 (19:46 +0100)]
qca_spi: Fix clock speed for multiple QCA7000
Storing the maximum clock speed in module parameter qcaspi_clkspeed
has the unintended side effect that the first probed instance
defines the value for all other instances. Fix this issue by storing
it in max_speed_hz of the relevant SPI device.
This fix keeps the priority of the speed parameter (module parameter,
device tree property, driver default). Btw this uses the opportunity
to get the rid of the unused member clkspeed.
Fixes:
291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000")
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Link: https://patch.msgid.link/20241206184643.123399-2-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Anumula Murali Mohan Reddy [Fri, 6 Dec 2024 06:20:14 +0000 (11:50 +0530)]
cxgb4: use port number to set mac addr
t4_set_vf_mac_acl() uses pf to set mac addr, but t4vf_get_vf_mac_acl()
uses port number to get mac addr, this leads to error when an attempt
to set MAC address on VF's of PF2 and PF3.
This patch fixes the issue by using port number to set mac address.
Fixes:
e0cdac65ba26 ("cxgb4vf: configure ports accessible by the VF")
Signed-off-by: Anumula Murali Mohan Reddy <anumula@chelsio.com>
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241206062014.49414-1-anumula@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David S. Miller [Mon, 9 Dec 2024 13:30:17 +0000 (13:30 +0000)]
Merge branch 'net-sparx5-lan969x-fixes'
Daniel Machon says:
====================
net: sparx5: misc fixes for sparx5 and lan969x
This series fixes various issues in the Sparx5 and lan969x drivers. Most
of the fixes are for new issues introduced by the recent series adding
lan969x switch support in the Sparx5 driver.
Most notable is patch 1/5 that moves the lan969x dir into the sparx5
dir, in order to address a cyclic dependency issue reported by depmod,
when installing modules. Details are in the commit descriptions.
To: Andrew Lunn <andrew+netdev@lunn.ch>
To: David S. Miller <davem@davemloft.net>
To: Eric Dumazet <edumazet@google.com>
To: Jakub Kicinski <kuba@kernel.org>
To: Paolo Abeni <pabeni@redhat.com>
To: Lars Povlsen <lars.povlsen@microchip.com>
To: Steen Hegelund <Steen.Hegelund@microchip.com>
To: UNGLinuxDriver@microchip.com
To: Richard Cochran <richardcochran@gmail.com>
To: Bjarni Jonasson <bjarni.jonasson@microchip.com>
To: jensemil.schulzostergaard@microchip.com
To: horatiu.vultur@microchip.com
To: arnd@arndb.de
To: jacob.e.keller@intel.com
To: Parthiban.Veerasooran@microchip.com
Cc: Calvin Owens <calvin@wbinvd.org>
Cc: Muhammad Usama Anjum <Usama.Anjum@collabora.com>
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
====================
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Machon [Thu, 5 Dec 2024 13:54:28 +0000 (14:54 +0100)]
net: sparx5: fix the maximum frame length register
On port initialization, we configure the maximum frame length accepted
by the receive module associated with the port. This value is currently
written to the MAX_LEN field of the DEV10G_MAC_ENA_CFG register, when in
fact, it should be written to the DEV10G_MAC_MAXLEN_CFG register. Fix
this.
Fixes:
946e7fd5053a ("net: sparx5: add port module support")
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Machon [Thu, 5 Dec 2024 13:54:27 +0000 (14:54 +0100)]
net: sparx5: fix default value of monitor ports
When doing port mirroring, the physical port to send the frame to, is
written to the FRMC_PORT_VAL field of the QFWD_FRAME_COPY_CFG register.
This field is 7 bits wide on sparx5 and 6 bits wide on lan969x, and has
a default value of 65 and 30, respectively (the number of front ports).
On mirror deletion, we set the default value of the monitor port to
65 for this field, in case no more ports exists for the mirror. Needless
to say, this will not fit the 6 bits on lan969x.
Fix this by correctly using the n_ports constant instead.
Fixes:
3f9e46347a46 ("net: sparx5: use SPX5_CONST for constants which already have a symbol")
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Machon [Thu, 5 Dec 2024 13:54:26 +0000 (14:54 +0100)]
net: sparx5: fix FDMA performance issue
The FDMA handler is responsible for scheduling a NAPI poll, which will
eventually fetch RX packets from the FDMA queue. Currently, the FDMA
handler is run in a threaded context. For some reason, this kills
performance. Admittedly, I did not do a thorough investigation to see
exactly what causes the issue, however, I noticed that in the other
driver utilizing the same FDMA engine, we run the FDMA handler in hard
IRQ context.
Fix this performance issue, by running the FDMA handler in hard IRQ
context, not deferring any work to a thread.
Prior to this change, the RX UDP performance was:
Interval Transfer Bitrate Jitter
0.00-10.20 sec 44.6 MBytes 36.7 Mbits/sec 0.027 ms
After this change, the rx UDP performance is:
Interval Transfer Bitrate Jitter
0.00-9.12 sec 1.01 GBytes 953 Mbits/sec 0.020 ms
Fixes:
10615907e9b5 ("net: sparx5: switchdev: adding frame DMA functionality")
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Machon [Thu, 5 Dec 2024 13:54:25 +0000 (14:54 +0100)]
net: lan969x: fix the use of spin_lock in PTP handler
We are mixing the use of spin_lock() and spin_lock_irqsave() functions
in the PTP handler of lan969x. Fix this by correctly using the _irqsave
variants.
Fixes:
24fe83541755 ("net: lan969x: add PTP handler function")
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
[1]: https://lore.kernel.org/netdev/
20241024-sparx5-lan969x-switch-driver-2-v2-10-
a0b5fae88a0f@microchip.com/
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Machon [Thu, 5 Dec 2024 13:54:24 +0000 (14:54 +0100)]
net: lan969x: fix cyclic dependency reported by depmod
Depmod reports a cyclic dependency between modules sparx5-switch.ko and
lan969x-switch.ko:
depmod: ERROR: Cycle detected: lan969x_switch -> sparx5_switch -> lan969x_switch
depmod: ERROR: Found 2 modules in dependency cycles!
make[2]: *** [scripts/Makefile.modinst:132: depmod] Error 1
make: *** [Makefile:224: __sub-make] Error 2
This makes sense, as they both require symbols from each other.
Fix this by compiling lan969x support into the sparx5-switch.ko module.
In order to do this, in a sensible way, we move the lan969x/ dir into
the sparx5/ dir and do some code cleanup of code that is no longer
required.
After this patch, depmod will no longer complain, as lan969x support is
compiled into the sparx5-swicth.ko module, and can no longer be compiled
as a standalone module.
Fixes:
98a01119608d ("net: sparx5: add compatible string for lan969x")
Signed-off-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Fri, 6 Dec 2024 12:32:52 +0000 (15:32 +0300)]
rtnetlink: fix error code in rtnl_newlink()
If rtnl_get_peer_net() fails, then propagate the error code. Don't
return success.
Fixes:
48327566769a ("rtnetlink: fix double call of rtnl_link_get_net_ifla()")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/a2d20cd4-387a-4475-887c-bb7d0e88e25a@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sun, 8 Dec 2024 01:56:49 +0000 (17:56 -0800)]
Merge branch 'ocelot-ptp-fixes'
Vladimir Oltean says:
====================
Ocelot PTP fixes
This is another attempt at "net: mscc: ocelot: be resilient to loss of
PTP packets during transmission".
https://lore.kernel.org/netdev/
20241203164755.16115-1-vladimir.oltean@nxp.com/
The central change is in patch 4/5. It recovers a port from a very
reproducible condition where previously, it would become unable to take
any hardware TX timestamp.
Then we have patches 1/5 and 5/5 which are extra bug fixes.
Patches 2/5 and 3/5 are logical changes split out of the monolithic
patch previously submitted as v1, so that patch 4/5 is hopefully easier
to follow.
====================
Link: https://patch.msgid.link/20241205145519.1236778-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Thu, 5 Dec 2024 14:55:19 +0000 (16:55 +0200)]
net: mscc: ocelot: perform error cleanup in ocelot_hwstamp_set()
An unsupported RX filter will leave the port with TX timestamping still
applied as per the new request, rather than the old setting. When
parsing the tx_type, don't apply it just yet, but delay that until after
we've parsed the rx_filter as well (and potentially returned -ERANGE for
that).
Similarly, copy_to_user() may fail, which is a rare occurrence, but
should still be treated by unwinding what was done.
Fixes:
96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241205145519.1236778-6-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Thu, 5 Dec 2024 14:55:18 +0000 (16:55 +0200)]
net: mscc: ocelot: be resilient to loss of PTP packets during transmission
The Felix DSA driver presents unique challenges that make the simplistic
ocelot PTP TX timestamping procedure unreliable: any transmitted packet
may be lost in hardware before it ever leaves our local system.
This may happen because there is congestion on the DSA conduit, the
switch CPU port or even user port (Qdiscs like taprio may delay packets
indefinitely by design).
The technical problem is that the kernel, i.e. ocelot_port_add_txtstamp_skb(),
runs out of timestamp IDs eventually, because it never detects that
packets are lost, and keeps the IDs of the lost packets on hold
indefinitely. The manifestation of the issue once the entire timestamp
ID range becomes busy looks like this in dmesg:
mscc_felix 0000:00:00.5: port 0 delivering skb without TX timestamp
mscc_felix 0000:00:00.5: port 1 delivering skb without TX timestamp
At the surface level, we need a timeout timer so that the kernel knows a
timestamp ID is available again. But there is a deeper problem with the
implementation, which is the monotonically increasing ocelot_port->ts_id.
In the presence of packet loss, it will be impossible to detect that and
reuse one of the holes created in the range of free timestamp IDs.
What we actually need is a bitmap of 63 timestamp IDs tracking which one
is available. That is able to use up holes caused by packet loss, but
also gives us a unique opportunity to not implement an actual timer_list
for the timeout timer (very complicated in terms of locking).
We could only declare a timestamp ID stale on demand (lazily), aka when
there's no other timestamp ID available. There are pros and cons to this
approach: the implementation is much more simple than per-packet timers
would be, but most of the stale packets would be quasi-leaked - not
really leaked, but blocked in driver memory, since this algorithm sees
no reason to free them.
An improved technique would be to check for stale timestamp IDs every
time we allocate a new one. Assuming a constant flux of PTP packets,
this avoids stale packets being blocked in memory, but of course,
packets lost at the end of the flux are still blocked until the flux
resumes (nobody left to kick them out).
Since implementing per-packet timers is way too complicated, this should
be good enough.
Testing procedure:
Persistently block traffic class 5 and try to run PTP on it:
$ tc qdisc replace dev swp3 parent root taprio num_tc 8 \
map 0 1 2 3 4 5 6 7 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time 0 sched-entry S 0xdf 100000 flags 0x2
[ 126.948141] mscc_felix 0000:00:00.5: port 3 tc 5 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
$ ptp4l -i swp3 -2 -P -m --socket_priority 5 --fault_reset_interval ASAP --logSyncInterval -3
ptp4l[70.351]: port 1 (swp3): INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[70.354]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[70.358]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on INIT_COMPLETE
[ 70.394583] mscc_felix 0000:00:00.5: port 3 timestamp id 0
ptp4l[70.406]: timed out while polling for tx timestamp
ptp4l[70.406]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
ptp4l[70.406]: port 1 (swp3): send peer delay response failed
ptp4l[70.407]: port 1 (swp3): clearing fault immediately
ptp4l[70.952]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1
[ 71.394858] mscc_felix 0000:00:00.5: port 3 timestamp id 1
ptp4l[71.400]: timed out while polling for tx timestamp
ptp4l[71.400]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
ptp4l[71.401]: port 1 (swp3): send peer delay response failed
ptp4l[71.401]: port 1 (swp3): clearing fault immediately
[ 72.393616] mscc_felix 0000:00:00.5: port 3 timestamp id 2
ptp4l[72.401]: timed out while polling for tx timestamp
ptp4l[72.402]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
ptp4l[72.402]: port 1 (swp3): send peer delay response failed
ptp4l[72.402]: port 1 (swp3): clearing fault immediately
ptp4l[72.952]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1
[ 73.395291] mscc_felix 0000:00:00.5: port 3 timestamp id 3
ptp4l[73.400]: timed out while polling for tx timestamp
ptp4l[73.400]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
ptp4l[73.400]: port 1 (swp3): send peer delay response failed
ptp4l[73.400]: port 1 (swp3): clearing fault immediately
[ 74.394282] mscc_felix 0000:00:00.5: port 3 timestamp id 4
ptp4l[74.400]: timed out while polling for tx timestamp
ptp4l[74.401]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
ptp4l[74.401]: port 1 (swp3): send peer delay response failed
ptp4l[74.401]: port 1 (swp3): clearing fault immediately
ptp4l[74.953]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1
[ 75.396830] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 0 which seems lost
[ 75.405760] mscc_felix 0000:00:00.5: port 3 timestamp id 0
ptp4l[75.410]: timed out while polling for tx timestamp
ptp4l[75.411]: increasing tx_timestamp_timeout or increasing kworker priority may correct this issue, but a driver bug likely causes it
ptp4l[75.411]: port 1 (swp3): send peer delay response failed
ptp4l[75.411]: port 1 (swp3): clearing fault immediately
(...)
Remove the blocking condition and see that the port recovers:
$ same tc command as above, but use "sched-entry S 0xff" instead
$ same ptp4l command as above
ptp4l[99.489]: port 1 (swp3): INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[99.490]: port 0 (/var/run/ptp4l): INITIALIZING to LISTENING on INIT_COMPLETE
ptp4l[99.492]: port 0 (/var/run/ptp4lro): INITIALIZING to LISTENING on INIT_COMPLETE
[ 100.403768] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 0 which seems lost
[ 100.412545] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 1 which seems lost
[ 100.421283] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 2 which seems lost
[ 100.430015] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 3 which seems lost
[ 100.438744] mscc_felix 0000:00:00.5: port 3 invalidating stale timestamp ID 4 which seems lost
[ 100.447470] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 100.505919] mscc_felix 0000:00:00.5: port 3 timestamp id 0
ptp4l[100.963]: port 1 (swp3): new foreign master d858d7.fffe.00ca6d-1
[ 101.405077] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 101.507953] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 102.405405] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 102.509391] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 103.406003] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 103.510011] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 104.405601] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 104.510624] mscc_felix 0000:00:00.5: port 3 timestamp id 0
ptp4l[104.965]: selected best master clock d858d7.fffe.00ca6d
ptp4l[104.966]: port 1 (swp3): assuming the grand master role
ptp4l[104.967]: port 1 (swp3): LISTENING to GRAND_MASTER on RS_GRAND_MASTER
[ 105.106201] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 105.232420] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 105.359001] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 105.405500] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 105.485356] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 105.511220] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 105.610938] mscc_felix 0000:00:00.5: port 3 timestamp id 0
[ 105.737237] mscc_felix 0000:00:00.5: port 3 timestamp id 0
(...)
Notice that in this new usage pattern, a non-congested port should
basically use timestamp ID 0 all the time, progressing to higher numbers
only if there are unacknowledged timestamps in flight. Compare this to
the old usage, where the timestamp ID used to monotonically increase
modulo OCELOT_MAX_PTP_ID.
In terms of implementation, this simplifies the bookkeeping of the
ocelot_port :: ts_id and ptp_skbs_in_flight. Since we need to traverse
the list of two-step timestampable skbs for each new packet anyway, the
information can already be computed and does not need to be stored.
Also, ocelot_port->tx_skbs is always accessed under the switch-wide
ocelot->ts_id_lock IRQ-unsafe spinlock, so we don't need the skb queue's
lock and can use the unlocked primitives safely.
This problem was actually detected using the tc-taprio offload, and is
causing trouble in TSN scenarios, which Felix (NXP LS1028A / VSC9959)
supports but Ocelot (VSC7514) does not. Thus, I've selected the commit
to blame as the one adding initial timestamping support for the Felix
switch.
Fixes:
c0bcf537667c ("net: dsa: ocelot: add hardware timestamping support for Felix")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241205145519.1236778-5-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Thu, 5 Dec 2024 14:55:17 +0000 (16:55 +0200)]
net: mscc: ocelot: ocelot->ts_id_lock and ocelot_port->tx_skbs.lock are IRQ-safe
ocelot_get_txtstamp() is a threaded IRQ handler, requested explicitly as
such by both ocelot_ptp_rdy_irq_handler() and vsc9959_irq_handler().
As such, it runs with IRQs enabled, and not in hardirq context. Thus,
ocelot_port_add_txtstamp_skb() has no reason to turn off IRQs, it cannot
be preempted by ocelot_get_txtstamp(). For the same reason,
dev_kfree_skb_any_reason() will always evaluate as kfree_skb_reason() in
this calling context, so just simplify the dev_kfree_skb_any() call to
kfree_skb().
Also, ocelot_port_txtstamp_request() runs from NET_TX softirq context,
not with hardirqs enabled. Thus, ocelot_get_txtstamp() which shares the
ocelot_port->tx_skbs.lock lock with it, has no reason to disable hardirqs.
This is part of a larger rework of the TX timestamping procedure.
A logical subportion of the rework has been split into a separate
change.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241205145519.1236778-4-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Thu, 5 Dec 2024 14:55:16 +0000 (16:55 +0200)]
net: mscc: ocelot: improve handling of TX timestamp for unknown skb
This condition, theoretically impossible to trigger, is not really
handled well. By "continuing", we are skipping the write to SYS_PTP_NXT
which advances the timestamp FIFO to the next entry. So we are reading
the same FIFO entry all over again, printing stack traces and eventually
killing the kernel.
No real problem has been observed here. This is part of a larger rework
of the timestamp IRQ procedure, with this logical change split out into
a patch of its own. We will need to "goto next_ts" for other conditions
as well.
Fixes:
9fde506e0c53 ("net: mscc: ocelot: warn when a PTP IRQ is raised for an unknown skb")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241205145519.1236778-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Thu, 5 Dec 2024 14:55:15 +0000 (16:55 +0200)]
net: mscc: ocelot: fix memory leak on ocelot_port_add_txtstamp_skb()
If ocelot_port_add_txtstamp_skb() fails, for example due to a full PTP
timestamp FIFO, we must undo the skb_clone_sk() call with kfree_skb().
Otherwise, the reference to the skb clone is lost.
Fixes:
52849bcf0029 ("net: mscc: ocelot: avoid overflowing the PTP timestamp FIFO")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241205145519.1236778-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima [Fri, 6 Dec 2024 02:07:15 +0000 (11:07 +0900)]
ip: Return drop reason if in_dev is NULL in ip_route_input_rcu().
syzkaller reported a warning in __sk_skb_reason_drop().
Commit
61b95c70f344 ("net: ip: make ip_route_input_rcu() return
drop reasons") missed a path where -EINVAL is returned.
Then, the cited commit started to trigger the warning with the
invalid error.
Let's fix it by returning SKB_DROP_REASON_NOT_SPECIFIED.
[0]:
WARNING: CPU: 0 PID: 10 at net/core/skbuff.c:1216 __sk_skb_reason_drop net/core/skbuff.c:1216 [inline]
WARNING: CPU: 0 PID: 10 at net/core/skbuff.c:1216 sk_skb_reason_drop+0x97/0x1b0 net/core/skbuff.c:1241
Modules linked in:
CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Not tainted
6.12.0-10686-gbb18265c3aba #10
1c308307628619808b5a4a0495c4aab5637b0551
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Workqueue: wg-crypt-wg2 wg_packet_decrypt_worker
RIP: 0010:__sk_skb_reason_drop net/core/skbuff.c:1216 [inline]
RIP: 0010:sk_skb_reason_drop+0x97/0x1b0 net/core/skbuff.c:1241
Code: 5d 41 5c 41 5d 41 5e e9 e7 9e 95 fd e8 e2 9e 95 fd 31 ff 44 89 e6 e8 58 a1 95 fd 45 85 e4 0f 85 a2 00 00 00 e8 ca 9e 95 fd 90 <0f> 0b 90 e8 c1 9e 95 fd 44 89 e6 bf 01 00 00 00 e8 34 a1 95 fd 41
RSP: 0018:
ffa0000000007650 EFLAGS:
00010246
RAX:
0000000000000000 RBX:
000000000000ffff RCX:
ffffffff83bc3592
RDX:
ff110001002a0000 RSI:
ffffffff83bc34d6 RDI:
0000000000000007
RBP:
ff11000109ee85f0 R08:
0000000000000001 R09:
ffe21c00213dd0da
R10:
000000000000ffff R11:
0000000000000000 R12:
00000000ffffffea
R13:
0000000000000000 R14:
ff11000109ee86d4 R15:
ff11000109ee8648
FS:
0000000000000000(0000) GS:
ff1100011a000000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000000020177000 CR3:
0000000108a3d006 CR4:
0000000000771ef0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe07f0 DR7:
0000000000000600
PKRU:
55555554
Call Trace:
<IRQ>
kfree_skb_reason include/linux/skbuff.h:1263 [inline]
ip_rcv_finish_core.constprop.0+0x896/0x2320 net/ipv4/ip_input.c:424
ip_list_rcv_finish.constprop.0+0x1b2/0x710 net/ipv4/ip_input.c:610
ip_sublist_rcv net/ipv4/ip_input.c:636 [inline]
ip_list_rcv+0x34a/0x460 net/ipv4/ip_input.c:670
__netif_receive_skb_list_ptype net/core/dev.c:5715 [inline]
__netif_receive_skb_list_core+0x536/0x900 net/core/dev.c:5762
__netif_receive_skb_list net/core/dev.c:5814 [inline]
netif_receive_skb_list_internal+0x77c/0xdc0 net/core/dev.c:5905
gro_normal_list include/net/gro.h:515 [inline]
gro_normal_list include/net/gro.h:511 [inline]
napi_complete_done+0x219/0x8c0 net/core/dev.c:6256
wg_packet_rx_poll+0xbff/0x1e40 drivers/net/wireguard/receive.c:488
__napi_poll.constprop.0+0xb3/0x530 net/core/dev.c:6877
napi_poll net/core/dev.c:6946 [inline]
net_rx_action+0x9eb/0xe30 net/core/dev.c:7068
handle_softirqs+0x1ac/0x740 kernel/softirq.c:554
do_softirq kernel/softirq.c:455 [inline]
do_softirq+0x48/0x80 kernel/softirq.c:442
</IRQ>
<TASK>
__local_bh_enable_ip+0xed/0x110 kernel/softirq.c:382
spin_unlock_bh include/linux/spinlock.h:396 [inline]
ptr_ring_consume_bh include/linux/ptr_ring.h:367 [inline]
wg_packet_decrypt_worker+0x3ba/0x580 drivers/net/wireguard/receive.c:499
process_one_work+0x940/0x1a70 kernel/workqueue.c:3229
process_scheduled_works kernel/workqueue.c:3310 [inline]
worker_thread+0x639/0xe30 kernel/workqueue.c:3391
kthread+0x283/0x350 kernel/kthread.c:389
ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:244
</TASK>
Fixes:
82d9983ebeb8 ("net: ip: make ip_route_input_noref() return drop reasons")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20241206020715.80207-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Russell King (Oracle) [Fri, 6 Dec 2024 12:40:11 +0000 (12:40 +0000)]
net: stmmac: fix TSO DMA API usage causing oops
Commit
66600fac7a98 ("net: stmmac: TSO: Fix unbalanced DMA map/unmap
for non-paged SKB data") moved the assignment of tx_skbuff_dma[]'s
members to be later in stmmac_tso_xmit().
The buf (dma cookie) and len stored in this structure are passed to
dma_unmap_single() by stmmac_tx_clean(). The DMA API requires that
the dma cookie passed to dma_unmap_single() is the same as the value
returned from dma_map_single(). However, by moving the assignment
later, this is not the case when priv->dma_cap.addr64 > 32 as "des"
is offset by proto_hdr_len.
This causes problems such as:
dwc-eth-dwmac
2490000.ethernet eth0: Tx DMA map failed
and with DMA_API_DEBUG enabled:
DMA-API: dwc-eth-dwmac
2490000.ethernet: device driver tries to +free DMA memory it has not allocated [device address=0x000000ffffcf65c0] [size=66 bytes]
Fix this by maintaining "des" as the original DMA cookie, and use
tso_des to pass the offset DMA cookie to stmmac_tso_allocator().
Full details of the crashes can be found at:
https://lore.kernel.org/all/
d8112193-0386-4e14-b516-
37c2d838171a@nvidia.com/
https://lore.kernel.org/all/klkzp5yn5kq5efgtrow6wbvnc46bcqfxs65nz3qy77ujr5turc@bwwhelz2l4dw/
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Reported-by: Thierry Reding <thierry.reding@gmail.com>
Fixes:
66600fac7a98 ("net: stmmac: TSO: Fix unbalanced DMA map/unmap for non-paged SKB data")
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Furong Xu <0x1207@gmail.com>
Link: https://patch.msgid.link/E1tJXcx-006N4Z-PC@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 4 Dec 2024 12:54:55 +0000 (12:54 +0000)]
net: defer final 'struct net' free in netns dismantle
Ilya reported a slab-use-after-free in dst_destroy [1]
Issue is in xfrm6_net_init() and xfrm4_net_init() :
They copy xfrm[46]_dst_ops_template into net->xfrm.xfrm[46]_dst_ops.
But net structure might be freed before all the dst callbacks are
called. So when dst_destroy() calls later :
if (dst->ops->destroy)
dst->ops->destroy(dst);
dst->ops points to the old net->xfrm.xfrm[46]_dst_ops, which has been freed.
See a relevant issue fixed in :
ac888d58869b ("net: do not delay dst_entries_add() in dst_release()")
A fix is to queue the 'struct net' to be freed after one
another cleanup_net() round (and existing rcu_barrier())
[1]
BUG: KASAN: slab-use-after-free in dst_destroy (net/core/dst.c:112)
Read of size 8 at addr
ffff8882137ccab0 by task swapper/37/0
Dec 03 05:46:18 kernel:
CPU: 37 UID: 0 PID: 0 Comm: swapper/37 Kdump: loaded Not tainted 6.12.0 #67
Hardware name: Red Hat KVM/RHEL, BIOS 1.16.1-1.el9 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl (lib/dump_stack.c:124)
print_address_description.constprop.0 (mm/kasan/report.c:378)
? dst_destroy (net/core/dst.c:112)
print_report (mm/kasan/report.c:489)
? dst_destroy (net/core/dst.c:112)
? kasan_addr_to_slab (mm/kasan/common.c:37)
kasan_report (mm/kasan/report.c:603)
? dst_destroy (net/core/dst.c:112)
? rcu_do_batch (kernel/rcu/tree.c:2567)
dst_destroy (net/core/dst.c:112)
rcu_do_batch (kernel/rcu/tree.c:2567)
? __pfx_rcu_do_batch (kernel/rcu/tree.c:2491)
? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4339 kernel/locking/lockdep.c:4406)
rcu_core (kernel/rcu/tree.c:2825)
handle_softirqs (kernel/softirq.c:554)
__irq_exit_rcu (kernel/softirq.c:589 kernel/softirq.c:428 kernel/softirq.c:637)
irq_exit_rcu (kernel/softirq.c:651)
sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1049 arch/x86/kernel/apic/apic.c:1049)
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt (./arch/x86/include/asm/idtentry.h:702)
RIP: 0010:default_idle (./arch/x86/include/asm/irqflags.h:37 ./arch/x86/include/asm/irqflags.h:92 arch/x86/kernel/process.c:743)
Code: 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 6e ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 0f 00 2d c7 c9 27 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 90
RSP: 0018:
ffff888100d2fe00 EFLAGS:
00000246
RAX:
00000000001870ed RBX:
1ffff110201a5fc2 RCX:
ffffffffb61a3e46
RDX:
0000000000000000 RSI:
0000000000000000 RDI:
ffffffffb3d4d123
RBP:
0000000000000000 R08:
0000000000000001 R09:
ffffed11c7e1835d
R10:
ffff888e3f0c1aeb R11:
0000000000000000 R12:
0000000000000000
R13:
ffff888100d20000 R14:
dffffc0000000000 R15:
0000000000000000
? ct_kernel_exit.constprop.0 (kernel/context_tracking.c:148)
? cpuidle_idle_call (kernel/sched/idle.c:186)
default_idle_call (./include/linux/cpuidle.h:143 kernel/sched/idle.c:118)
cpuidle_idle_call (kernel/sched/idle.c:186)
? __pfx_cpuidle_idle_call (kernel/sched/idle.c:168)
? lock_release (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5848)
? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4347 kernel/locking/lockdep.c:4406)
? tsc_verify_tsc_adjust (arch/x86/kernel/tsc_sync.c:59)
do_idle (kernel/sched/idle.c:326)
cpu_startup_entry (kernel/sched/idle.c:423 (discriminator 1))
start_secondary (arch/x86/kernel/smpboot.c:202 arch/x86/kernel/smpboot.c:282)
? __pfx_start_secondary (arch/x86/kernel/smpboot.c:232)
? soft_restart_cpu (arch/x86/kernel/head_64.S:452)
common_startup_64 (arch/x86/kernel/head_64.S:414)
</TASK>
Dec 03 05:46:18 kernel:
Allocated by task 12184:
kasan_save_stack (mm/kasan/common.c:48)
kasan_save_track (./arch/x86/include/asm/current.h:49 mm/kasan/common.c:60 mm/kasan/common.c:69)
__kasan_slab_alloc (mm/kasan/common.c:319 mm/kasan/common.c:345)
kmem_cache_alloc_noprof (mm/slub.c:4085 mm/slub.c:4134 mm/slub.c:4141)
copy_net_ns (net/core/net_namespace.c:421 net/core/net_namespace.c:480)
create_new_namespaces (kernel/nsproxy.c:110)
unshare_nsproxy_namespaces (kernel/nsproxy.c:228 (discriminator 4))
ksys_unshare (kernel/fork.c:3313)
__x64_sys_unshare (kernel/fork.c:3382)
do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
Dec 03 05:46:18 kernel:
Freed by task 11:
kasan_save_stack (mm/kasan/common.c:48)
kasan_save_track (./arch/x86/include/asm/current.h:49 mm/kasan/common.c:60 mm/kasan/common.c:69)
kasan_save_free_info (mm/kasan/generic.c:582)
__kasan_slab_free (mm/kasan/common.c:271)
kmem_cache_free (mm/slub.c:4579 mm/slub.c:4681)
cleanup_net (net/core/net_namespace.c:456 net/core/net_namespace.c:446 net/core/net_namespace.c:647)
process_one_work (kernel/workqueue.c:3229)
worker_thread (kernel/workqueue.c:3304 kernel/workqueue.c:3391)
kthread (kernel/kthread.c:389)
ret_from_fork (arch/x86/kernel/process.c:147)
ret_from_fork_asm (arch/x86/entry/entry_64.S:257)
Dec 03 05:46:18 kernel:
Last potentially related work creation:
kasan_save_stack (mm/kasan/common.c:48)
__kasan_record_aux_stack (mm/kasan/generic.c:541)
insert_work (./include/linux/instrumented.h:68 ./include/asm-generic/bitops/instrumented-non-atomic.h:141 kernel/workqueue.c:788 kernel/workqueue.c:795 kernel/workqueue.c:2186)
__queue_work (kernel/workqueue.c:2340)
queue_work_on (kernel/workqueue.c:2391)
xfrm_policy_insert (net/xfrm/xfrm_policy.c:1610)
xfrm_add_policy (net/xfrm/xfrm_user.c:2116)
xfrm_user_rcv_msg (net/xfrm/xfrm_user.c:3321)
netlink_rcv_skb (net/netlink/af_netlink.c:2536)
xfrm_netlink_rcv (net/xfrm/xfrm_user.c:3344)
netlink_unicast (net/netlink/af_netlink.c:1316 net/netlink/af_netlink.c:1342)
netlink_sendmsg (net/netlink/af_netlink.c:1886)
sock_write_iter (net/socket.c:729 net/socket.c:744 net/socket.c:1165)
vfs_write (fs/read_write.c:590 fs/read_write.c:683)
ksys_write (fs/read_write.c:736)
do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
Dec 03 05:46:18 kernel:
Second to last potentially related work creation:
kasan_save_stack (mm/kasan/common.c:48)
__kasan_record_aux_stack (mm/kasan/generic.c:541)
insert_work (./include/linux/instrumented.h:68 ./include/asm-generic/bitops/instrumented-non-atomic.h:141 kernel/workqueue.c:788 kernel/workqueue.c:795 kernel/workqueue.c:2186)
__queue_work (kernel/workqueue.c:2340)
queue_work_on (kernel/workqueue.c:2391)
__xfrm_state_insert (./include/linux/workqueue.h:723 net/xfrm/xfrm_state.c:1150 net/xfrm/xfrm_state.c:1145 net/xfrm/xfrm_state.c:1513)
xfrm_state_update (./include/linux/spinlock.h:396 net/xfrm/xfrm_state.c:1940)
xfrm_add_sa (net/xfrm/xfrm_user.c:912)
xfrm_user_rcv_msg (net/xfrm/xfrm_user.c:3321)
netlink_rcv_skb (net/netlink/af_netlink.c:2536)
xfrm_netlink_rcv (net/xfrm/xfrm_user.c:3344)
netlink_unicast (net/netlink/af_netlink.c:1316 net/netlink/af_netlink.c:1342)
netlink_sendmsg (net/netlink/af_netlink.c:1886)
sock_write_iter (net/socket.c:729 net/socket.c:744 net/socket.c:1165)
vfs_write (fs/read_write.c:590 fs/read_write.c:683)
ksys_write (fs/read_write.c:736)
do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
Fixes:
a8a572a6b5f2 ("xfrm: dst_entries_init() per-net dst_ops")
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Closes: https://lore.kernel.org/netdev/CANn89iKKYDVpB=MtmfH7nyv2p=rJWSLedO5k7wSZgtY_tO8WQg@mail.gmail.com/T/#m02c98c3009fe66382b73cfb4db9cf1df6fab3fbf
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241204125455.3871859-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 4 Dec 2024 14:10:31 +0000 (14:10 +0000)]
net: lapb: increase LAPB_HEADER_LEN
It is unclear if net/lapb code is supposed to be ready for 8021q.
We can at least avoid crashes like the following :
skbuff: skb_under_panic: text:
ffffffff8aabe1f6 len:24 put:20 head:
ffff88802824a400 data:
ffff88802824a3fe tail:0x16 end:0x140 dev:nr0.2
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:206 !
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 1 UID: 0 PID: 5508 Comm: dhcpcd Not tainted
6.12.0-rc7-syzkaller-00144-g66418447d27b #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024
RIP: 0010:skb_panic net/core/skbuff.c:206 [inline]
RIP: 0010:skb_under_panic+0x14b/0x150 net/core/skbuff.c:216
Code: 0d 8d 48 c7 c6 2e 9e 29 8e 48 8b 54 24 08 8b 0c 24 44 8b 44 24 04 4d 89 e9 50 41 54 41 57 41 56 e8 1a 6f 37 02 48 83 c4 20 90 <0f> 0b 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3
RSP: 0018:
ffffc90002ddf638 EFLAGS:
00010282
RAX:
0000000000000086 RBX:
dffffc0000000000 RCX:
7a24750e538ff600
RDX:
0000000000000000 RSI:
0000000000000201 RDI:
0000000000000000
RBP:
ffff888034a86650 R08:
ffffffff8174b13c R09:
1ffff920005bbe60
R10:
dffffc0000000000 R11:
fffff520005bbe61 R12:
0000000000000140
R13:
ffff88802824a400 R14:
ffff88802824a3fe R15:
0000000000000016
FS:
00007f2a5990d740(0000) GS:
ffff8880b8700000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
000000110c2631fd CR3:
0000000029504000 CR4:
00000000003526f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
<TASK>
skb_push+0xe5/0x100 net/core/skbuff.c:2636
nr_header+0x36/0x320 net/netrom/nr_dev.c:69
dev_hard_header include/linux/netdevice.h:3148 [inline]
vlan_dev_hard_header+0x359/0x480 net/8021q/vlan_dev.c:83
dev_hard_header include/linux/netdevice.h:3148 [inline]
lapbeth_data_transmit+0x1f6/0x2a0 drivers/net/wan/lapbether.c:257
lapb_data_transmit+0x91/0xb0 net/lapb/lapb_iface.c:447
lapb_transmit_buffer+0x168/0x1f0 net/lapb/lapb_out.c:149
lapb_establish_data_link+0x84/0xd0
lapb_device_event+0x4e0/0x670
notifier_call_chain+0x19f/0x3e0 kernel/notifier.c:93
__dev_notify_flags+0x207/0x400
dev_change_flags+0xf0/0x1a0 net/core/dev.c:8922
devinet_ioctl+0xa4e/0x1aa0 net/ipv4/devinet.c:1188
inet_ioctl+0x3d7/0x4f0 net/ipv4/af_inet.c:1003
sock_do_ioctl+0x158/0x460 net/socket.c:1227
sock_ioctl+0x626/0x8e0 net/socket.c:1346
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:907 [inline]
__se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+fb99d1b0c0f81d94a5e2@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/
67506220.
050a0220.17bd51.006c.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241204141031.4030267-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 7 Dec 2024 01:39:18 +0000 (17:39 -0800)]
Merge branch 'bnxt_en-bug-fixes'
Michael Chan says:
====================
bnxt_en: Bug fixes
There are 2 bug fixes in this series. This first one fixes the issue
of setting the gso_type incorrectly for HW GRO packets on 5750X (Thor)
chips. This can cause HW GRO packets to be dropped by the stack if
they are re-segmented. The second one fixes a potential division by
zero crash when dumping FW log coredump.
====================
Link: https://patch.msgid.link/20241204215918.1692597-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hongguang Gao [Wed, 4 Dec 2024 21:59:18 +0000 (13:59 -0800)]
bnxt_en: Fix potential crash when dumping FW log coredump
If the FW log context memory is retained after FW reset, the existing
code is not handling the condition correctly and zeroes out the data
structures. This potentially will cause a division by zero crash
when the user runs ethtool -w. The last_type is also not set
correctly when the context memory is retained. This will cause errors
because the last_type signals to the FW that all context memory types
have been configured.
Oops: divide error: 0000 1 PREEMPT SMP NOPTI
CPU: 53 UID: 0 PID: 7019 Comm: ethtool Kdump: loaded Tainted: G OE 6.12.0-rc7+ #1
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Supermicro SYS-621C-TN12R/X13DDW-A, BIOS 1.4 08/10/2023
RIP: 0010:__bnxt_copy_ctx_mem.constprop.0.isra.0+0x86/0x160 [bnxt_en]
Code: 0a 31 d2 4c 89 6c 24 10 45 8b a5 fc df ff ff 4c 8b 74 24 20 31 db 66 89 44 24 06 48 63 c5 c1 e5 09 4c 0f af e0 48 8b 44 24 30 <49> f7 f4 4c 89 64 24 08 48 63 c5 4d 89 ec 31 ed 48 89 44 24 18 49
RSP: 0018:
ff480591603d78b8 EFLAGS:
00010206
RAX:
0000000000100000 RBX:
0000000000000000 RCX:
0000000000000000
RDX:
0000000000000000 RSI:
ff23959e46740000 RDI:
0000000000000000
RBP:
0000000000000000 R08:
0000000000100000 R09:
ff23959e46740000
R10:
ff480591603d7a18 R11:
0000000000000010 R12:
0000000000000000
R13:
ff23959e46742008 R14:
0000000000000000 R15:
0000000000000000
FS:
00007f04227c1740(0000) GS:
ff2395adbf680000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007f04225b33a5 CR3:
000000108b9a4001 CR4:
0000000000773ef0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe07f0 DR7:
0000000000000400
PKRU:
55555554
Call Trace:
<TASK>
? die+0x33/0x90
? do_trap+0xd9/0x100
? __bnxt_copy_ctx_mem.constprop.0.isra.0+0x86/0x160 [bnxt_en]
? do_error_trap+0x65/0x80
? __bnxt_copy_ctx_mem.constprop.0.isra.0+0x86/0x160 [bnxt_en]
? exc_divide_error+0x36/0x50
? __bnxt_copy_ctx_mem.constprop.0.isra.0+0x86/0x160 [bnxt_en]
? asm_exc_divide_error+0x16/0x20
? __bnxt_copy_ctx_mem.constprop.0.isra.0+0x86/0x160 [bnxt_en]
? __bnxt_copy_ctx_mem.constprop.0.isra.0+0xda/0x160 [bnxt_en]
bnxt_get_ctx_coredump.constprop.0+0x1ed/0x390 [bnxt_en]
? __memcg_slab_post_alloc_hook+0x21c/0x3c0
? __bnxt_get_coredump+0x473/0x4b0 [bnxt_en]
__bnxt_get_coredump+0x473/0x4b0 [bnxt_en]
? security_file_alloc+0x74/0xe0
? cred_has_capability.isra.0+0x78/0x120
bnxt_get_coredump_length+0x4b/0xf0 [bnxt_en]
bnxt_get_dump_flag+0x40/0x60 [bnxt_en]
__dev_ethtool+0x17e4/0x1fc0
? syscall_exit_to_user_mode+0xc/0x1d0
? do_syscall_64+0x85/0x150
? unmap_page_range+0x299/0x4b0
? vma_interval_tree_remove+0x215/0x2c0
? __kmalloc_cache_noprof+0x10a/0x300
dev_ethtool+0xa8/0x170
dev_ioctl+0x1b5/0x580
? sk_ioctl+0x4a/0x110
sock_do_ioctl+0xab/0xf0
sock_ioctl+0x1ca/0x2e0
__x64_sys_ioctl+0x87/0xc0
do_syscall_64+0x79/0x150
Fixes:
24d694aec139 ("bnxt_en: Allocate backing store memory for FW trace logs")
Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241204215918.1692597-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Wed, 4 Dec 2024 21:59:17 +0000 (13:59 -0800)]
bnxt_en: Fix GSO type for HW GRO packets on 5750X chips
The existing code is using RSS profile to determine IPV4/IPV6 GSO type
on all chips older than 5760X. This won't work on 5750X chips that may
be using modified RSS profiles. This commit from 2018 has updated the
driver to not use RSS profile for HW GRO packets on newer chips:
50f011b63d8c ("bnxt_en: Update RSS setup and GRO-HW logic according to the latest spec.")
However, a recent commit to add support for the newest 5760X chip broke
the logic. If the GRO packet needs to be re-segmented by the stack, the
wrong GSO type will cause the packet to be dropped.
Fix it to only use RSS profile to determine GSO type on the oldest
5730X/5740X chips which cannot use the new method and is safe to use the
RSS profiles.
Also fix the L3/L4 hash type for RX packets by not using the RSS
profile for the same reason. Use the ITYPE field in the RX completion
to determine L3/L4 hash types correctly.
Fixes:
a7445d69809f ("bnxt_en: Add support for new RX and TPA_START completion types for P7")
Reviewed-by: Colin Winegarden <colin.winegarden@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241204215918.1692597-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Thomas Weißschuh [Tue, 3 Dec 2024 17:09:55 +0000 (18:09 +0100)]
ptp: kvm: x86: Return EOPNOTSUPP instead of ENODEV from kvm_arch_ptp_init()
The caller, ptp_kvm_init(), emits a warning if kvm_arch_ptp_init() exits
with any error which is not EOPNOTSUPP:
"fail to initialize ptp_kvm"
Replace ENODEV with EOPNOTSUPP to avoid this spurious warning,
aligning with the ARM implementation.
Fixes:
a86ed2cfa13c ("ptp: Don't print an error if ptp_kvm is not supported")
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://patch.msgid.link/20241203-kvm_ptp-eopnotsuppp-v2-1-d1d060f27aa6@weissschuh.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 7 Dec 2024 01:37:46 +0000 (17:37 -0800)]
Merge branch 'selftests-mlxsw-add-few-fixes-for-sharedbuffer-test'
Petr Machata says:
====================
selftests: mlxsw: Add few fixes for sharedbuffer test
Danielle Ratson writes:
Currently, the sharedbuffer test fails sometimes because it is reading a
maximum occupancy that is larger than expected on some different cases.
This is happening because the test assumes that the packet it is sending
is the only packet being passed to the device.
In addition, some duplications on one hand, and redundant test cases on
the other hand, were found in the test.
Add egress filters on h1 and h2 that will guarantee that the packets in
the buffer are sent in the test, and remove the redundant test cases.
====================
Link: https://patch.msgid.link/cover.1733414773.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Danielle Ratson [Thu, 5 Dec 2024 16:36:01 +0000 (17:36 +0100)]
selftests: mlxsw: sharedbuffer: Ensure no extra packets are counted
The test assumes that the packet it is sending is the only packet being
passed to the device.
However, it is not the case and so other packets are filling the buffers
as well. Therefore, the test sometimes fails because it is reading a
maximum occupancy that is larger than expected.
Add egress filters on $h1 and $h2 that will guarantee the above.
Fixes:
a865ad999603 ("selftests: mlxsw: Add shared buffer traffic test")
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/64c28bc9b1cc1d78c4a73feda7cedbe9526ccf8b.1733414773.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Danielle Ratson [Thu, 5 Dec 2024 16:36:00 +0000 (17:36 +0100)]
selftests: mlxsw: sharedbuffer: Remove duplicate test cases
On both port_tc_ip_test() and port_tc_arp_test(), the max occupancy is
checked on $h2 twice, when only the error message is different and does not
match the check itself.
Remove the two duplicated test cases from the test.
Fixes:
a865ad999603 ("selftests: mlxsw: Add shared buffer traffic test")
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/d9eb26f6fc16a06a30b5c2c16ad80caf502bc561.1733414773.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Danielle Ratson [Thu, 5 Dec 2024 16:35:59 +0000 (17:35 +0100)]
selftests: mlxsw: sharedbuffer: Remove h1 ingress test case
The test is sending only one packet generated with mausezahn from $h1 to
$h2. However, for some reason, it is testing for non-zero maximum occupancy
in both the ingress pool of $h1 and $h2. The former only passes when $h2
happens to send a packet.
Avoid intermittent failures by removing unintentional test case
regarding the ingress pool of $h1.
Fixes:
a865ad999603 ("selftests: mlxsw: Add shared buffer traffic test")
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/5b7344608d5e06f38209e48d8af8c92fa11b6742.1733414773.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Haoyu Li [Tue, 3 Dec 2024 15:20:49 +0000 (23:20 +0800)]
wifi: cfg80211: sme: init n_channels before channels[] access
With the __counted_by annocation in cfg80211_scan_request struct,
the "n_channels" struct member must be set before accessing the
"channels" array. Failing to do so will trigger a runtime warning
when enabling CONFIG_UBSAN_BOUNDS and CONFIG_FORTIFY_SOURCE.
Fixes:
e3eac9f32ec0 ("wifi: cfg80211: Annotate struct cfg80211_scan_request with __counted_by")
Signed-off-by: Haoyu Li <lihaoyu499@gmail.com>
Link: https://patch.msgid.link/20241203152049.348806-1-lihaoyu499@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Dan Carpenter [Wed, 4 Dec 2024 12:06:41 +0000 (15:06 +0300)]
net/mlx5: DR, prevent potential error pointer dereference
The dr_domain_add_vport_cap() function generally returns NULL on error
but sometimes we want it to return ERR_PTR(-EBUSY) so the caller can
retry. The problem here is that "ret" can be either -EBUSY or -ENOMEM
and if it's and -ENOMEM then the error pointer is propogated back and
eventually dereferenced in dr_ste_v0_build_src_gvmi_qpn_tag().
Fixes:
11a45def2e19 ("net/mlx5: DR, Add support for SF vports")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/07477254-e179-43e2-b1b3-3b9db4674195@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 4 Dec 2024 17:05:48 +0000 (17:05 +0000)]
tipc: fix NULL deref in cleanup_bearer()
syzbot found [1] that after blamed commit, ub->ubsock->sk
was NULL when attempting the atomic_dec() :
atomic_dec(&tipc_net(sock_net(ub->ubsock->sk))->wq_count);
Fix this by caching the tipc_net pointer.
[1]
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
CPU: 0 UID: 0 PID: 5896 Comm: kworker/0:3 Not tainted 6.13.0-rc1-next-
20241203-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Workqueue: events cleanup_bearer
RIP: 0010:read_pnet include/net/net_namespace.h:387 [inline]
RIP: 0010:sock_net include/net/sock.h:655 [inline]
RIP: 0010:cleanup_bearer+0x1f7/0x280 net/tipc/udp_media.c:820
Code: 18 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 3c f7 99 f6 48 8b 1b 48 83 c3 30 e8 f0 e4 60 00 48 89 d8 48 c1 e8 03 <42> 80 3c 28 00 74 08 48 89 df e8 1a f7 99 f6 49 83 c7 e8 48 8b 1b
RSP: 0018:
ffffc9000410fb70 EFLAGS:
00010206
RAX:
0000000000000006 RBX:
0000000000000030 RCX:
ffff88802fe45a00
RDX:
0000000000000001 RSI:
0000000000000008 RDI:
ffffc9000410f900
RBP:
ffff88807e1f0908 R08:
ffffc9000410f907 R09:
1ffff92000821f20
R10:
dffffc0000000000 R11:
fffff52000821f21 R12:
ffff888031d19980
R13:
dffffc0000000000 R14:
dffffc0000000000 R15:
ffff88807e1f0918
FS:
0000000000000000(0000) GS:
ffff8880b8600000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000556ca050b000 CR3:
0000000031c0c000 CR4:
00000000003526f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Fixes:
6a2fa13312e5 ("tipc: Fix use-after-free of kernel socket in cleanup_bearer().")
Reported-by: syzbot+46aa5474f179dacd1a3b@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/
67508b5f.
050a0220.17bd51.0070.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20241204170548.4152658-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Remi Pommarel [Fri, 22 Nov 2024 15:52:50 +0000 (16:52 +0100)]
batman-adv: Do not let TT changes list grows indefinitely
When TT changes list is too big to fit in packet due to MTU size, an
empty OGM is sent expected other node to send TT request to get the
changes. The issue is that tt.last_changeset was not built thus the
originator was responding with previous changes to those TT requests
(see batadv_send_my_tt_response). Also the changes list was never
cleaned up effectively never ending growing from this point onwards,
repeatedly sending the same TT response changes over and over, and
creating a new empty OGM every OGM interval expecting for the local
changes to be purged.
When there is more TT changes that can fit in packet, drop all changes,
send empty OGM and wait for TT request so we can respond with a full
table instead.
Fixes:
e1bf0c14096f ("batman-adv: tvlv - convert tt data sent within OGMs")
Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Acked-by: Antonio Quartulli <Antonio@mandelbit.com>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Remi Pommarel [Fri, 22 Nov 2024 15:52:49 +0000 (16:52 +0100)]
batman-adv: Remove uninitialized data in full table TT response
The number of entries filled by batadv_tt_tvlv_generate() can be less
than initially expected in batadv_tt_prepare_tvlv_{global,local}_data()
(changes can be removed by batadv_tt_local_event() in ADD+DEL sequence
in the meantime as the lock held during the whole tvlv global/local data
generation).
Thus tvlv_len could be bigger than the actual TT entry size that need
to be sent so full table TT_RESPONSE could hold invalid TT entries such
as below.
* 00:00:00:00:00:00 -1 [....] ( 0) 88:12:4e:ad:7e:ba (179) (0x45845380)
* 00:00:00:00:78:79 4092 [.W..] ( 0) 88:12:4e:ad:7e:3c (145) (0x8ebadb8b)
Remove the extra allocated space to avoid sending uninitialized entries
for full table TT_RESPONSE in both batadv_send_other_tt_response() and
batadv_send_my_tt_response().
Fixes:
7ea7b4a14275 ("batman-adv: make the TT CRC logic VLAN specific")
Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Remi Pommarel [Fri, 22 Nov 2024 15:52:48 +0000 (16:52 +0100)]
batman-adv: Do not send uninitialized TT changes
The number of TT changes can be less than initially expected in
batadv_tt_tvlv_container_update() (changes can be removed by
batadv_tt_local_event() in ADD+DEL sequence between reading
tt_diff_entries_num and actually iterating the change list under lock).
Thus tt_diff_len could be bigger than the actual changes size that need
to be sent. Because batadv_send_my_tt_response sends the whole
packet, uninitialized data can be interpreted as TT changes on other
nodes leading to weird TT global entries on those nodes such as:
* 00:00:00:00:00:00 -1 [....] ( 0) 88:12:4e:ad:7e:ba (179) (0x45845380)
* 00:00:00:00:78:79 4092 [.W..] ( 0) 88:12:4e:ad:7e:3c (145) (0x8ebadb8b)
All of the above also applies to OGM tvlv container buffer's tvlv_len.
Remove the extra allocated space to avoid sending uninitialized TT
changes in batadv_send_my_tt_response() and batadv_v_ogm_send_softif().
Fixes:
e1bf0c14096f ("batman-adv: tvlv - convert tt data sent within OGMs")
Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Linus Torvalds [Thu, 5 Dec 2024 18:25:06 +0000 (10:25 -0800)]
Merge tag 'net-6.13-rc2' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from can and netfilter.
Current release - regressions:
- rtnetlink: fix double call of rtnl_link_get_net_ifla()
- tcp: populate XPS related fields of timewait sockets
- ethtool: fix access to uninitialized fields in set RXNFC command
- selinux: use sk_to_full_sk() in selinux_ip_output()
Current release - new code bugs:
- net: make napi_hash_lock irq safe
- eth:
- bnxt_en: support header page pool in queue API
- ice: fix NULL pointer dereference in switchdev
Previous releases - regressions:
- core: fix icmp host relookup triggering ip_rt_bug
- ipv6:
- avoid possible NULL deref in modify_prefix_route()
- release expired exception dst cached in socket
- smc: fix LGR and link use-after-free issue
- hsr: avoid potential out-of-bound access in fill_frame_info()
- can: hi311x: fix potential use-after-free
- eth: ice: fix VLAN pruning in switchdev mode
Previous releases - always broken:
- netfilter:
- ipset: hold module reference while requesting a module
- nft_inner: incorrect percpu area handling under softirq
- can: j1939: fix skb reference counting
- eth:
- mlxsw: use correct key block on Spectrum-4
- mlx5: fix memory leak in mlx5hws_definer_calc_layout"
* tag 'net-6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (76 commits)
net :mana :Request a V2 response version for MANA_QUERY_GF_STAT
net: avoid potential UAF in default_operstate()
vsock/test: verify socket options after setting them
vsock/test: fix parameter types in SO_VM_SOCKETS_* calls
vsock/test: fix failures due to wrong SO_RCVLOWAT parameter
net/mlx5e: Remove workaround to avoid syndrome for internal port
net/mlx5e: SD, Use correct mdev to build channel param
net/mlx5: E-Switch, Fix switching to switchdev mode in MPV
net/mlx5: E-Switch, Fix switching to switchdev mode with IB device disabled
net/mlx5: HWS: Properly set bwc queue locks lock classes
net/mlx5: HWS: Fix memory leak in mlx5hws_definer_calc_layout
bnxt_en: handle tpa_info in queue API implementation
bnxt_en: refactor bnxt_alloc_rx_rings() to call bnxt_alloc_rx_agg_bmap()
bnxt_en: refactor tpa_info alloc/free into helpers
geneve: do not assume mac header is set in geneve_xmit_skb()
mlxsw: spectrum_acl_flex_keys: Use correct key block on Spectrum-4
ethtool: Fix wrong mod state in case of verbose and no_mask bitset
ipmr: tune the ipmr_can_free_table() checks.
netfilter: nft_set_hash: skip duplicated elements pending gc run
netfilter: ipset: Hold module reference while requesting a module
...
Linus Torvalds [Thu, 5 Dec 2024 18:17:55 +0000 (10:17 -0800)]
Merge tag 'trace-v6.13-rc1' of git://git./linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix trace histogram sort function cmp_entries_dup()
The sort function cmp_entries_dup() returns either 1 or 0, and not -1
if parameter "a" is less than "b" by memcmp().
- Fix archs that call trace_hardirqs_off() without RCU watching
Both x86 and arm64 no longer call any tracepoints with RCU not
watching. It was assumed that it was safe to get rid of
trace_*_rcuidle() version of the tracepoint calls. This was needed to
get rid of the SRCU protection and be able to implement features like
faultable traceponits and add rust tracepoints.
Unfortunately, there were a few architectures that still relied on
that logic. There's only one file that has tracepoints that are
called without RCU watching. Add macro logic around the tracepoints
for architectures that do not have CONFIG_ARCH_WANTS_NO_INSTR defined
will check if the code is in the idle path (the only place RCU isn't
watching), and enable RCU around calling the tracepoint, but only do
it if the tracepoint is enabled.
* tag 'trace-v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Fix archs that still call tracepoints without RCU watching
tracing: Fix cmp_entries_dup() to respect sort() comparison rules
Linus Torvalds [Thu, 5 Dec 2024 18:06:47 +0000 (10:06 -0800)]
Merge tag 'hid-for-linus-
2024120501' of git://git./linux/kernel/git/hid/hid
Pull HID fixes from Benjamin Tissoires:
- regression fix in suspend/resume for i2c-hid (Kenny Levinsen)
- fix wacom driver assuming a name can not be null (WangYuli)
- a couple of constify changes/fixes (Thomas Weißschuh)
- a couple of selftests/hid fixes (Maximilian Heyne & Benjamin
Tissoires)
* tag 'hid-for-linus-
2024120501' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
selftests/hid: fix kfunc inclusions with newer bpftool
HID: bpf: drop unneeded casts discarding const
HID: bpf: constify hid_ops
selftests: hid: fix typo and exit code
HID: wacom: fix when get product name maybe null pointer
HID: i2c-hid: Revert to using power commands to wake on resume
Linus Torvalds [Thu, 5 Dec 2024 18:03:43 +0000 (10:03 -0800)]
Merge tag 'linux-watchdog-6.13-rc1' of git://linux-watchdog.org/linux-watchdog
Pull watchdog updates from Wim Van Sebroeck:
- Add support for exynosautov920 SoC
- Add support for Airoha EN7851 watchdog
- Add support for MT6735 TOPRGU/WDT
- Delete the cpu5wdt driver
- Always print when registering watchdog fails
- Several other small fixes and improvements
* tag 'linux-watchdog-6.13-rc1' of git://www.linux-watchdog.org/linux-watchdog: (36 commits)
watchdog: rti: of: honor timeout-sec property
watchdog: s3c2410_wdt: add support for exynosautov920 SoC
dt-bindings: watchdog: Document ExynosAutoV920 watchdog bindings
watchdog: mediatek: Add support for MT6735 TOPRGU/WDT
watchdog: mediatek: Make sure system reset gets asserted in mtk_wdt_restart()
dt-bindings: watchdog: fsl-imx-wdt: Add missing 'big-endian' property
dt-bindings: watchdog: Document Qualcomm QCS8300
docs: ABI: Fix spelling mistake in pretimeout_avaialable_governors
Revert "watchdog: s3c2410_wdt: use exynos_get_pmu_regmap_by_phandle() for PMU regs"
watchdog: rzg2l_wdt: Power on the watchdog domain in the restart handler
watchdog: Switch back to struct platform_driver::remove()
watchdog: it87_wdt: add PWRGD enable quirk for Qotom QCML04
watchdog: da9063: Remove __maybe_unused notations
watchdog: da9063: Do not use a global variable
watchdog: Delete the cpu5wdt driver
watchdog: Add support for Airoha EN7851 watchdog
dt-bindings: watchdog: airoha: document watchdog for Airoha EN7581
watchdog: sl28cpld_wdt: don't print out if registering watchdog fails
watchdog: rza_wdt: don't print out if registering watchdog fails
watchdog: rti_wdt: don't print out if registering watchdog fails
...
Steven Rostedt [Wed, 4 Dec 2024 15:04:14 +0000 (10:04 -0500)]
tracing: Fix archs that still call tracepoints without RCU watching
Tracepoints require having RCU "watching" as it uses RCU to do updates to
the tracepoints. There are some cases that would call a tracepoint when
RCU was not "watching". This was usually in the idle path where RCU has
"shutdown". For the few locations that had tracepoints without RCU
watching, there was an trace_*_rcuidle() variant that could be used. This
used SRCU for protection.
There are tracepoints that trace when interrupts and preemption are
enabled and disabled. In some architectures, these tracepoints are called
in a path where RCU is not watching. When x86 and arm64 removed these
locations, it was incorrectly assumed that it would be safe to remove the
trace_*_rcuidle() variant and also remove the SRCU logic, as it made the
code more complex and harder to implement new tracepoint features (like
faultable tracepoints and tracepoints in rust).
Instead of bringing back the trace_*_rcuidle(), as it will not be trivial
to do as new code has already been added depending on its removal, add a
workaround to the one file that still requires it (trace_preemptirq.c). If
the architecture does not define CONFIG_ARCH_WANTS_NO_INSTR, then check if
the code is in the idle path, and if so, call ct_irq_enter/exit() which
will enable RCU around the tracepoint.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/20241204100414.4d3e06d0@gandalf.local.home
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Fixes:
48bcda684823 ("tracing: Remove definition of trace_*_rcuidle()")
Closes: https://lore.kernel.org/all/
bddb02de-957a-4df5-8e77-
829f55728ea2@roeck-us.net/
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Shradha Gupta [Wed, 4 Dec 2024 05:48:20 +0000 (21:48 -0800)]
net :mana :Request a V2 response version for MANA_QUERY_GF_STAT
The current requested response version(V1) for MANA_QUERY_GF_STAT query
results in STATISTICS_FLAGS_TX_ERRORS_GDMA_ERROR value being set to
0 always.
In order to get the correct value for this counter we request the response
version to be V2.
Cc: stable@vger.kernel.org
Fixes:
e1df5202e879 ("net :mana :Add remaining GDMA stats for MANA to ethtool")
Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1733291300-12593-1-git-send-email-shradhagupta@linux.microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Eric Dumazet [Tue, 3 Dec 2024 17:09:33 +0000 (17:09 +0000)]
net: avoid potential UAF in default_operstate()
syzbot reported an UAF in default_operstate() [1]
Issue is a race between device and netns dismantles.
After calling __rtnl_unlock() from netdev_run_todo(),
we can not assume the netns of each device is still alive.
Make sure the device is not in NETREG_UNREGISTERED state,
and add an ASSERT_RTNL() before the call to
__dev_get_by_index().
We might move this ASSERT_RTNL() in __dev_get_by_index()
in the future.
[1]
BUG: KASAN: slab-use-after-free in __dev_get_by_index+0x5d/0x110 net/core/dev.c:852
Read of size 8 at addr
ffff888043eba1b0 by task syz.0.0/5339
CPU: 0 UID: 0 PID: 5339 Comm: syz.0.0 Not tainted
6.12.0-syzkaller-10296-gaaf20f870da0 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0x169/0x550 mm/kasan/report.c:489
kasan_report+0x143/0x180 mm/kasan/report.c:602
__dev_get_by_index+0x5d/0x110 net/core/dev.c:852
default_operstate net/core/link_watch.c:51 [inline]
rfc2863_policy+0x224/0x300 net/core/link_watch.c:67
linkwatch_do_dev+0x3e/0x170 net/core/link_watch.c:170
netdev_run_todo+0x461/0x1000 net/core/dev.c:10894
rtnl_unlock net/core/rtnetlink.c:152 [inline]
rtnl_net_unlock include/linux/rtnetlink.h:133 [inline]
rtnl_dellink+0x760/0x8d0 net/core/rtnetlink.c:3520
rtnetlink_rcv_msg+0x791/0xcf0 net/core/rtnetlink.c:6911
netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2541
netlink_unicast_kernel net/netlink/af_netlink.c:1321 [inline]
netlink_unicast+0x7f6/0x990 net/netlink/af_netlink.c:1347
netlink_sendmsg+0x8e4/0xcb0 net/netlink/af_netlink.c:1891
sock_sendmsg_nosec net/socket.c:711 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:726
____sys_sendmsg+0x52a/0x7e0 net/socket.c:2583
___sys_sendmsg net/socket.c:2637 [inline]
__sys_sendmsg+0x269/0x350 net/socket.c:2669
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f2a3cb80809
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:
00007f2a3d9cd058 EFLAGS:
00000246 ORIG_RAX:
000000000000002e
RAX:
ffffffffffffffda RBX:
00007f2a3cd45fa0 RCX:
00007f2a3cb80809
RDX:
0000000000000000 RSI:
0000000020000000 RDI:
0000000000000008
RBP:
00007f2a3cbf393e R08:
0000000000000000 R09:
0000000000000000
R10:
0000000000000000 R11:
0000000000000246 R12:
0000000000000000
R13:
0000000000000000 R14:
00007f2a3cd45fa0 R15:
00007ffd03bc65c8
</TASK>
Allocated by task 5339:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
__kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:394
kasan_kmalloc include/linux/kasan.h:260 [inline]
__kmalloc_cache_noprof+0x243/0x390 mm/slub.c:4314
kmalloc_noprof include/linux/slab.h:901 [inline]
kmalloc_array_noprof include/linux/slab.h:945 [inline]
netdev_create_hash net/core/dev.c:11870 [inline]
netdev_init+0x10c/0x250 net/core/dev.c:11890
ops_init+0x31e/0x590 net/core/net_namespace.c:138
setup_net+0x287/0x9e0 net/core/net_namespace.c:362
copy_net_ns+0x33f/0x570 net/core/net_namespace.c:500
create_new_namespaces+0x425/0x7b0 kernel/nsproxy.c:110
unshare_nsproxy_namespaces+0x124/0x180 kernel/nsproxy.c:228
ksys_unshare+0x57d/0xa70 kernel/fork.c:3314
__do_sys_unshare kernel/fork.c:3385 [inline]
__se_sys_unshare kernel/fork.c:3383 [inline]
__x64_sys_unshare+0x38/0x40 kernel/fork.c:3383
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Freed by task 12:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:582
poison_slab_object mm/kasan/common.c:247 [inline]
__kasan_slab_free+0x59/0x70 mm/kasan/common.c:264
kasan_slab_free include/linux/kasan.h:233 [inline]
slab_free_hook mm/slub.c:2338 [inline]
slab_free mm/slub.c:4598 [inline]
kfree+0x196/0x420 mm/slub.c:4746
netdev_exit+0x65/0xd0 net/core/dev.c:11992
ops_exit_list net/core/net_namespace.c:172 [inline]
cleanup_net+0x802/0xcc0 net/core/net_namespace.c:632
process_one_work kernel/workqueue.c:3229 [inline]
process_scheduled_works+0xa63/0x1850 kernel/workqueue.c:3310
worker_thread+0x870/0xd30 kernel/workqueue.c:3391
kthread+0x2f0/0x390 kernel/kthread.c:389
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
The buggy address belongs to the object at
ffff888043eba000
which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 432 bytes inside of
freed 2048-byte region [
ffff888043eba000,
ffff888043eba800)
The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:
0000000000000000 index:0x0 pfn:0x43eb8
head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x4fff00000000040(head|node=1|zone=1|lastcpupid=0x7ff)
page_type: f5(slab)
raw:
04fff00000000040 ffff88801ac42000 dead000000000122 0000000000000000
raw:
0000000000000000 0000000000080008 00000001f5000000 0000000000000000
head:
04fff00000000040 ffff88801ac42000 dead000000000122 0000000000000000
head:
0000000000000000 0000000000080008 00000001f5000000 0000000000000000
head:
04fff00000000003 ffffea00010fae01 ffffffffffffffff 0000000000000000
head:
0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 5339, tgid 5338 (syz.0.0), ts
69674195892, free_ts
69663220888
set_page_owner include/linux/page_owner.h:32 [inline]
post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1556
prep_new_page mm/page_alloc.c:1564 [inline]
get_page_from_freelist+0x3649/0x3790 mm/page_alloc.c:3474
__alloc_pages_noprof+0x292/0x710 mm/page_alloc.c:4751
alloc_pages_mpol_noprof+0x3e8/0x680 mm/mempolicy.c:2265
alloc_slab_page+0x6a/0x140 mm/slub.c:2408
allocate_slab+0x5a/0x2f0 mm/slub.c:2574
new_slab mm/slub.c:2627 [inline]
___slab_alloc+0xcd1/0x14b0 mm/slub.c:3815
__slab_alloc+0x58/0xa0 mm/slub.c:3905
__slab_alloc_node mm/slub.c:3980 [inline]
slab_alloc_node mm/slub.c:4141 [inline]
__do_kmalloc_node mm/slub.c:4282 [inline]
__kmalloc_noprof+0x2e6/0x4c0 mm/slub.c:4295
kmalloc_noprof include/linux/slab.h:905 [inline]
sk_prot_alloc+0xe0/0x210 net/core/sock.c:2165
sk_alloc+0x38/0x370 net/core/sock.c:2218
__netlink_create+0x65/0x260 net/netlink/af_netlink.c:629
__netlink_kernel_create+0x174/0x6f0 net/netlink/af_netlink.c:2015
netlink_kernel_create include/linux/netlink.h:62 [inline]
uevent_net_init+0xed/0x2d0 lib/kobject_uevent.c:783
ops_init+0x31e/0x590 net/core/net_namespace.c:138
setup_net+0x287/0x9e0 net/core/net_namespace.c:362
page last free pid 1032 tgid 1032 stack trace:
reset_page_owner include/linux/page_owner.h:25 [inline]
free_pages_prepare mm/page_alloc.c:1127 [inline]
free_unref_page+0xdf9/0x1140 mm/page_alloc.c:2657
__slab_free+0x31b/0x3d0 mm/slub.c:4509
qlink_free mm/kasan/quarantine.c:163 [inline]
qlist_free_all+0x9a/0x140 mm/kasan/quarantine.c:179
kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286
__kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:329
kasan_slab_alloc include/linux/kasan.h:250 [inline]
slab_post_alloc_hook mm/slub.c:4104 [inline]
slab_alloc_node mm/slub.c:4153 [inline]
kmem_cache_alloc_node_noprof+0x1d9/0x380 mm/slub.c:4205
__alloc_skb+0x1c3/0x440 net/core/skbuff.c:668
alloc_skb include/linux/skbuff.h:1323 [inline]
alloc_skb_with_frags+0xc3/0x820 net/core/skbuff.c:6612
sock_alloc_send_pskb+0x91a/0xa60 net/core/sock.c:2881
sock_alloc_send_skb include/net/sock.h:1797 [inline]
mld_newpack+0x1c3/0xaf0 net/ipv6/mcast.c:1747
add_grhead net/ipv6/mcast.c:1850 [inline]
add_grec+0x1492/0x19a0 net/ipv6/mcast.c:1988
mld_send_initial_cr+0x228/0x4b0 net/ipv6/mcast.c:2234
ipv6_mc_dad_complete+0x88/0x490 net/ipv6/mcast.c:2245
addrconf_dad_completed+0x712/0xcd0 net/ipv6/addrconf.c:4342
addrconf_dad_work+0xdc2/0x16f0
process_one_work kernel/workqueue.c:3229 [inline]
process_scheduled_works+0xa63/0x1850 kernel/workqueue.c:3310
Memory state around the buggy address:
ffff888043eba080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888043eba100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>
ffff888043eba180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888043eba200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888043eba280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Fixes:
8c55facecd7a ("net: linkwatch: only report IF_OPER_LOWERLAYERDOWN if iflink is actually down")
Reported-by: syzbot+1939f24bdb783e9e43d9@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/
674f3a18.
050a0220.48a03.0041.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241203170933.2449307-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 5 Dec 2024 10:49:14 +0000 (11:49 +0100)]
Merge tag 'nf-24-12-05' of git://git./linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Fix esoteric undefined behaviour due to uninitialized stack access
in ip_vs_protocol_init(), from Jinghao Jia.
2) Fix iptables xt_LED slab-out-of-bounds due to incorrect sanitization
of the led string identifier, reported by syzbot. Patch from
Dmitry Antipov.
3) Remove WARN_ON_ONCE reachable from userspace to check for the maximum
cgroup level, nft_socket cgroup matching is restricted to 255 levels,
but cgroups allow for INT_MAX levels by default. Reported by syzbot.
4) Fix nft_inner incorrect use of percpu area to store tunnel parser
context with softirqs, resulting in inconsistent inner header
offsets that could lead to bogus rule mismatches, reported by syzbot.
5) Grab module reference on ipset core while requesting set type modules,
otherwise kernel crash is possible by removing ipset core module,
patch from Phil Sutter.
6) Fix possible double-free in nft_hash garbage collector due to unstable
walk interator that can provide twice the same element. Use a sequence
number to skip expired/dead elements that have been already scheduled
for removal. Based on patch from Laurent Fasnach
netfilter pull request 24-12-05
* tag 'nf-24-12-05' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_set_hash: skip duplicated elements pending gc run
netfilter: ipset: Hold module reference while requesting a module
netfilter: nft_inner: incorrect percpu area handling under softirq
netfilter: nft_socket: remove WARN_ON_ONCE on maximum cgroup level
netfilter: x_tables: fix LED ID check in led_tg_check()
ipvs: fix UB due to uninitialized stack access in ip_vs_protocol_init()
====================
Link: https://patch.msgid.link/20241205002854.162490-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo Abeni [Thu, 5 Dec 2024 10:39:36 +0000 (11:39 +0100)]
Merge branch 'vsock-test-fix-wrong-setsockopt-parameters'
Konstantin Shkolnyy says:
====================
vsock/test: fix wrong setsockopt() parameters
Parameters were created using wrong C types, which caused them to be of
wrong size on some architectures, causing problems.
The problem with SO_RCVLOWAT was found on s390 (big endian), while x86-64
didn't show it. After the fix, all tests pass on s390.
Then Stefano Garzarella pointed out that SO_VM_SOCKETS_* calls might have
a similar problem, which turned out to be true, hence, the second patch.
Changes for v8:
- Fix whitespace warnings from "checkpatch.pl --strict"
- Add maintainers to Cc:
Changes for v7:
- Rebase on top of https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git
- Add the "net" tags to the subjects
Changes for v6:
- rework the patch #3 to avoid creating a new file for new functions,
and exclude vsock_perf from calling the new functions.
- add "Reviewed-by:" to the patch #2.
Changes for v5:
- in the patch #2 replace the introduced uint64_t with unsigned long long
to match documentation
- add a patch #3 that verifies every setsockopt() call.
Changes for v4:
- add "Reviewed-by:" to the first patch, and add a second patch fixing
SO_VM_SOCKETS_* calls, which depends on the first one (hence, it's now
a patch series.)
Changes for v3:
- fix the same problem in vsock_perf and update commit message
Changes for v2:
- add "Fixes:" lines to the commit message
====================
Link: https://patch.msgid.link/20241203150656.287028-1-kshk@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Konstantin Shkolnyy [Tue, 3 Dec 2024 15:06:56 +0000 (09:06 -0600)]
vsock/test: verify socket options after setting them
Replace setsockopt() calls with calls to functions that follow
setsockopt() with getsockopt() and check that the returned value and its
size are the same as have been set. (Except in vsock_perf.)
Signed-off-by: Konstantin Shkolnyy <kshk@linux.ibm.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Konstantin Shkolnyy [Tue, 3 Dec 2024 15:06:55 +0000 (09:06 -0600)]
vsock/test: fix parameter types in SO_VM_SOCKETS_* calls
Change parameters of SO_VM_SOCKETS_* to unsigned long long as documented
in the vm_sockets.h, because the corresponding kernel code requires them
to be at least 64-bit, no matter what architecture. Otherwise they are
too small on 32-bit machines.
Fixes:
5c338112e48a ("test/vsock: rework message bounds test")
Fixes:
685a21c314a8 ("test/vsock: add big message test")
Fixes:
542e893fbadc ("vsock/test: two tests to check credit update logic")
Fixes:
8abbffd27ced ("test/vsock: vsock_perf utility")
Signed-off-by: Konstantin Shkolnyy <kshk@linux.ibm.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Konstantin Shkolnyy [Tue, 3 Dec 2024 15:06:54 +0000 (09:06 -0600)]
vsock/test: fix failures due to wrong SO_RCVLOWAT parameter
This happens on 64-bit big-endian machines.
SO_RCVLOWAT requires an int parameter. However, instead of int, the test
uses unsigned long in one place and size_t in another. Both are 8 bytes
long on 64-bit machines. The kernel, having received the 8 bytes, doesn't
test for the exact size of the parameter, it only cares that it's >=
sizeof(int), and casts the 4 lower-addressed bytes to an int, which, on
a big-endian machine, contains 0. 0 doesn't trigger an error, SO_RCVLOWAT
returns with success and the socket stays with the default SO_RCVLOWAT = 1,
which results in vsock_test failures, while vsock_perf doesn't even notice
that it's failed to change it.
Fixes:
b1346338fbae ("vsock_test: POLLIN + SO_RCVLOWAT test")
Fixes:
542e893fbadc ("vsock/test: two tests to check credit update logic")
Fixes:
8abbffd27ced ("test/vsock: vsock_perf utility")
Signed-off-by: Konstantin Shkolnyy <kshk@linux.ibm.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Thu, 5 Dec 2024 03:46:49 +0000 (19:46 -0800)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-12-03 (ice, idpf, ixgbe, ixgbevf, igb)
This series contains updates to ice, idpf, ixgbe, ixgbevf, and igb
drivers.
For ice:
Arkadiusz corrects search for determining whether PHY clock recovery is
supported on the device.
Przemyslaw corrects mask used for PHY timestamps on ETH56G devices.
Wojciech adds missing virtchnl ops which caused NULL pointer
dereference.
Marcin fixes VLAN filter settings for uplink VSI in switchdev mode.
For idpf:
Josh restores setting of completion tag for empty buffers.
For ixgbevf:
Jake removes incorrect initialization/support of IPSEC for mailbox
version 1.5.
For ixgbe:
Jake rewords and downgrades misleading message when negotiation
of VF mailbox version is not supported.
Tore Amundsen corrects value for BASE-BX10 capability.
For igb:
Yuan Can adds proper teardown on failed pci_register_driver() call.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
igb: Fix potential invalid memory access in igb_init_module()
ixgbe: Correct BASE-BX10 compliance code
ixgbe: downgrade logging of unsupported VF API version to debug
ixgbevf: stop attempting IPSEC offload on Mailbox API 1.5
idpf: set completion tag for "empty" bufs associated with a packet
ice: Fix VLAN pruning in switchdev mode
ice: Fix NULL pointer dereference in switchdev
ice: fix PHY timestamp extraction for ETH56G
ice: fix PHY Clock Recovery availability check
====================
Link: https://patch.msgid.link/20241203215521.1646668-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 5 Dec 2024 03:43:48 +0000 (19:43 -0800)]
Merge branch 'mlx5-misc-fixes-2024-12-03'
Tariq Toukan says:
====================
mlx5 misc fixes 2024-12-03
This patchset provides misc bug fixes from the team to the mlx5 core and
Eth drivers.
====================
Link: https://patch.msgid.link/20241203204920.232744-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jianbo Liu [Tue, 3 Dec 2024 20:49:20 +0000 (22:49 +0200)]
net/mlx5e: Remove workaround to avoid syndrome for internal port
Previously a workaround was added to avoid syndrome 0xcdb051. It is
triggered when offload a rule with tunnel encapsulation, and
forwarding to another table, but not matching on the internal port in
firmware steering mode. The original workaround skips internal tunnel
port logic, which is not correct as not all cases are considered. As
an example, if vlan is configured on the uplink port, traffic can't
pass because vlan header is not added with this workaround. Besides,
there is no such issue for software steering. So, this patch removes
that, and returns error directly if trying to offload such rule for
firmware steering.
Fixes:
06b4eac9c4be ("net/mlx5e: Don't offload internal port if filter device is out device")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Tested-by: Frode Nordahl <frode.nordahl@canonical.com>
Reviewed-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241203204920.232744-7-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tariq Toukan [Tue, 3 Dec 2024 20:49:19 +0000 (22:49 +0200)]
net/mlx5e: SD, Use correct mdev to build channel param
In a multi-PF netdev, each traffic channel creates its own resources
against a specific PF.
In the cited commit, where this support was added, the channel_param
logic was mistakenly kept unchanged, so it always used the primary PF
which is found at priv->mdev.
In this patch we fix this by moving the logic to be per-channel, and
passing the correct mdev instance.
This bug happened to be usually harmless, as the resulting cparam
structures would be the same for all channels, due to identical FW logic
and decisions.
However, in some use cases, like fwreset, this gets broken.
This could lead to different symptoms. Example:
Error cqe on cqn 0x428, ci 0x0, qn 0x10a9, opcode 0xe, syndrome 0x4,
vendor syndrome 0x32
Fixes:
e4f9686bdee7 ("net/mlx5e: Let channels be SD-aware")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20241203204920.232744-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Patrisious Haddad [Tue, 3 Dec 2024 20:49:18 +0000 (22:49 +0200)]
net/mlx5: E-Switch, Fix switching to switchdev mode in MPV
Fix the mentioned commit change for MPV mode, since in MPV mode the IB
device is shared between different core devices, so under this change
when moving both devices simultaneously to switchdev mode the IB device
removal and re-addition can race with itself causing unexpected behavior.
In such case do rescan_drivers() only once in order to add the ethernet
representor auxiliary device, and skip adding and removing IB devices.
Fixes:
ab85ebf43723 ("net/mlx5: E-switch, refactor eswitch mode change")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241203204920.232744-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Patrisious Haddad [Tue, 3 Dec 2024 20:49:17 +0000 (22:49 +0200)]
net/mlx5: E-Switch, Fix switching to switchdev mode with IB device disabled
In case that IB device is already disabled when moving to switchdev mode,
which can happen when working with LAG, need to do rescan_drivers()
before leaving in order to add ethernet representor auxiliary device.
Fixes:
ab85ebf43723 ("net/mlx5: E-switch, refactor eswitch mode change")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241203204920.232744-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cosmin Ratiu [Tue, 3 Dec 2024 20:49:16 +0000 (22:49 +0200)]
net/mlx5: HWS: Properly set bwc queue locks lock classes
The mentioned "Fixes" patch forgot to do that.
Fixes:
9addffa34359 ("net/mlx5: HWS, use lock classes for bwc locks")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241203204920.232744-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cosmin Ratiu [Tue, 3 Dec 2024 20:49:15 +0000 (22:49 +0200)]
net/mlx5: HWS: Fix memory leak in mlx5hws_definer_calc_layout
It allocates a match template, which creates a compressed definer fc
struct, but that is not deallocated.
This commit fixes that.
Fixes:
74a778b4a63f ("net/mlx5: HWS, added definers handling")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241203204920.232744-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 5 Dec 2024 03:23:36 +0000 (19:23 -0800)]
Merge branch 'bnxt_en-support-header-page-pool-in-queue-api'
David Wei says:
====================
bnxt_en: support header page pool in queue API
Commit
7ed816be35ab ("eth: bnxt: use page pool for head frags") added a
separate page pool for header frags. Now, frags are allocated from this
header page pool e.g. rxr->tpa_info.data.
The queue API did not properly handle rxr->tpa_info and so using the
queue API to i.e. reset any queues will result in pages being returned
to the incorrect page pool, causing inflight != 0 warnings.
Fix this bug by properly allocating/freeing tpa_info and copying/freeing
head_pool in the queue API implementation.
The 1st patch is a prep patch that refactors helpers out to be used by
the implementation patch later.
The 2nd patch is a drive-by refactor. Happy to take it out and re-send
to net-next if there are any objections.
The 3rd patch is the implementation patch that will properly alloc/free
rxr->tpa_info.
====================
Link: https://patch.msgid.link/20241204041022.56512-1-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Wei [Wed, 4 Dec 2024 04:10:22 +0000 (20:10 -0800)]
bnxt_en: handle tpa_info in queue API implementation
Commit
7ed816be35ab ("eth: bnxt: use page pool for head frags") added a
page pool for header frags, which may be distinct from the existing pool
for the aggregation ring. Prior to this change, frags used in the TPA
ring rx_tpa were allocated from system memory e.g. napi_alloc_frag()
meaning their lifetimes were not associated with a page pool. They can
be returned at any time and so the queue API did not alloc or free
rx_tpa.
But now frags come from a separate head_pool which may be different to
page_pool. Without allocating and freeing rx_tpa, frags allocated from
the old head_pool may be returned to a different new head_pool which
causes a mismatch between the pp hold/release count.
Fix this problem by properly freeing and allocating rx_tpa in the queue
API implementation.
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241204041022.56512-4-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Wei [Wed, 4 Dec 2024 04:10:21 +0000 (20:10 -0800)]
bnxt_en: refactor bnxt_alloc_rx_rings() to call bnxt_alloc_rx_agg_bmap()
Refactor bnxt_alloc_rx_rings() to call bnxt_alloc_rx_agg_bmap() for
allocating rx_agg_bmap.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241204041022.56512-3-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David Wei [Wed, 4 Dec 2024 04:10:20 +0000 (20:10 -0800)]
bnxt_en: refactor tpa_info alloc/free into helpers
Refactor bnxt_rx_ring_info->tpa_info operations into helpers that work
on a single tpa_info in prep for queue API using them.
There are 2 pairs of operations:
* bnxt_alloc_one_tpa_info()
* bnxt_free_one_tpa_info()
These alloc/free the tpa_info array itself.
* bnxt_alloc_one_tpa_info_data()
* bnxt_free_one_tpa_info_data()
These alloc/free the frags stored in tpa_info array.
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241204041022.56512-2-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Tue, 3 Dec 2024 18:21:21 +0000 (18:21 +0000)]
geneve: do not assume mac header is set in geneve_xmit_skb()
We should not assume mac header is set in output path.
Use skb_eth_hdr() instead of eth_hdr() to fix the issue.
sysbot reported the following :
WARNING: CPU: 0 PID: 11635 at include/linux/skbuff.h:3052 skb_mac_header include/linux/skbuff.h:3052 [inline]
WARNING: CPU: 0 PID: 11635 at include/linux/skbuff.h:3052 eth_hdr include/linux/if_ether.h:24 [inline]
WARNING: CPU: 0 PID: 11635 at include/linux/skbuff.h:3052 geneve_xmit_skb drivers/net/geneve.c:898 [inline]
WARNING: CPU: 0 PID: 11635 at include/linux/skbuff.h:3052 geneve_xmit+0x4c38/0x5730 drivers/net/geneve.c:1039
Modules linked in:
CPU: 0 UID: 0 PID: 11635 Comm: syz.4.1423 Not tainted
6.12.0-syzkaller-10296-gaaf20f870da0 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
RIP: 0010:skb_mac_header include/linux/skbuff.h:3052 [inline]
RIP: 0010:eth_hdr include/linux/if_ether.h:24 [inline]
RIP: 0010:geneve_xmit_skb drivers/net/geneve.c:898 [inline]
RIP: 0010:geneve_xmit+0x4c38/0x5730 drivers/net/geneve.c:1039
Code: 21 c6 02 e9 35 d4 ff ff e8 a5 48 4c fb 90 0f 0b 90 e9 fd f5 ff ff e8 97 48 4c fb 90 0f 0b 90 e9 d8 f5 ff ff e8 89 48 4c fb 90 <0f> 0b 90 e9 41 e4 ff ff e8 7b 48 4c fb 90 0f 0b 90 e9 cd e7 ff ff
RSP: 0018:
ffffc90003b2f870 EFLAGS:
00010283
RAX:
000000000000037a RBX:
000000000000ffff RCX:
ffffc9000dc3d000
RDX:
0000000000080000 RSI:
ffffffff86428417 RDI:
0000000000000003
RBP:
ffffc90003b2f9f0 R08:
0000000000000003 R09:
000000000000ffff
R10:
000000000000ffff R11:
0000000000000002 R12:
ffff88806603c000
R13:
0000000000000000 R14:
ffff8880685b2780 R15:
0000000000000e23
FS:
00007fdc2deed6c0(0000) GS:
ffff8880b8600000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
0000001b30a1dff8 CR3:
0000000056b8c000 CR4:
00000000003526f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
<TASK>
__netdev_start_xmit include/linux/netdevice.h:5002 [inline]
netdev_start_xmit include/linux/netdevice.h:5011 [inline]
__dev_direct_xmit+0x58a/0x720 net/core/dev.c:4490
dev_direct_xmit include/linux/netdevice.h:3181 [inline]
packet_xmit+0x1e4/0x360 net/packet/af_packet.c:285
packet_snd net/packet/af_packet.c:3146 [inline]
packet_sendmsg+0x2700/0x5660 net/packet/af_packet.c:3178
sock_sendmsg_nosec net/socket.c:711 [inline]
__sock_sendmsg net/socket.c:726 [inline]
__sys_sendto+0x488/0x4f0 net/socket.c:2197
__do_sys_sendto net/socket.c:2204 [inline]
__se_sys_sendto net/socket.c:2200 [inline]
__x64_sys_sendto+0xe0/0x1c0 net/socket.c:2200
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes:
a025fb5f49ad ("geneve: Allow configuration of DF behaviour")
Reported-by: syzbot+3ec5271486d7cb2d242a@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/
674f4b72.
050a0220.17bd51.004a.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Link: https://patch.msgid.link/20241203182122.2725517-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ido Schimmel [Tue, 3 Dec 2024 15:16:05 +0000 (16:16 +0100)]
mlxsw: spectrum_acl_flex_keys: Use correct key block on Spectrum-4
The driver is currently using an ACL key block that is not supported by
Spectrum-4. This works because the driver is only using a single field
from this key block which is located in the same offset in the
equivalent Spectrum-4 key block.
The issue was discovered when the firmware started rejecting the use of
the unsupported key block. The change has been reverted to avoid
breaking users that only update their firmware.
Nonetheless, fix the issue by using the correct key block.
Fixes:
07ff135958dd ("mlxsw: Introduce flex key elements for Spectrum-4")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/35e72c97bdd3bc414fb8e4d747e5fb5d26c29658.1733237440.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kory Maincent [Mon, 2 Dec 2024 15:33:57 +0000 (16:33 +0100)]
ethtool: Fix wrong mod state in case of verbose and no_mask bitset
A bitset without mask in a _SET request means we want exactly the bits in
the bitset to be set. This works correctly for compact format but when
verbose format is parsed, ethnl_update_bitset32_verbose() only sets the
bits present in the request bitset but does not clear the rest. The commit
6699170376ab ("ethtool: fix application of verbose no_mask bitset") fixes
this issue by clearing the whole target bitmap before we start iterating.
The solution proposed brought an issue with the behavior of the mod
variable. As the bitset is always cleared the old value will always
differ to the new value.
Fix it by adding a new function to compare bitmaps and a temporary variable
which save the state of the old bitmap.
Fixes:
6699170376ab ("ethtool: fix application of verbose no_mask bitset")
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20241202153358.1142095-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>