linux-2.6-block.git
2 months agonet: page_pool: fix warning code
Johannes Berg [Fri, 5 Jul 2024 11:42:06 +0000 (13:42 +0200)]
net: page_pool: fix warning code

WARN_ON_ONCE("string") doesn't really do what appears to
be intended, so fix that.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fixes: 90de47f020db ("page_pool: fragment API support for 32-bit arch with 64-bit DMA")
Link: https://patch.msgid.link/20240705134221.2f4de205caa1.I28496dc0f2ced580282d1fb892048017c4491e21@changeid
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftests: net: ksft: interrupt cleanly on KeyboardInterrupt
Jakub Kicinski [Fri, 5 Jul 2024 01:52:22 +0000 (18:52 -0700)]
selftests: net: ksft: interrupt cleanly on KeyboardInterrupt

It's very useful to be able to interrupt the tests during development.
Detect KeyboardInterrupt, run the cleanups and exit.

Link: https://patch.msgid.link/20240705015222.675840-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: dsa: microchip: lan9371/2: update MAC capabilities for port 4
Oleksij Rempel [Fri, 5 Jul 2024 08:47:15 +0000 (10:47 +0200)]
net: dsa: microchip: lan9371/2: update MAC capabilities for port 4

Set proper MAC capabilities for port 4 on LAN9371 and LAN9372 switches with
integrated 100BaseTX PHY. And introduce the is_lan937x_tx_phy() function to
reuse it where applicable.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 months agoact_ct: prepare for stolen verdict coming from conntrack and nat engine
Florian Westphal [Thu, 4 Jul 2024 11:29:20 +0000 (13:29 +0200)]
act_ct: prepare for stolen verdict coming from conntrack and nat engine

At this time, conntrack either returns NF_ACCEPT or NF_DROP.
To improve debuging it would be nice to be able to replace NF_DROP verdict
with NF_DROP_REASON() helper,

This helper releases the skb instantly (so drop_monitor can pinpoint
exact location) and returns NF_STOLEN.

Prepare call sites to deal with this before introducing such changes
in conntrack and nat core.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 months agoMerge branch 'net-pse-pd-add-new-pse-c33-features'
Jakub Kicinski [Sat, 6 Jul 2024 01:30:02 +0000 (18:30 -0700)]
Merge branch 'net-pse-pd-add-new-pse-c33-features'

Kory Maincent says:

====================
net: pse-pd: Add new PSE c33 features

This patch series adds new c33 features to the PSE API.
- Expand the PSE PI informations status with power, class and failure
  reason
- Add the possibility to get and set the PSE PIs power limit

v5: https://lore.kernel.org/r/20240628-feature_poe_power_cap-v5-0-5e1375d3817a@bootlin.com
v4: https://lore.kernel.org/r/20240625-feature_poe_power_cap-v4-0-b0813aad57d5@bootlin.com
v3: https://lore.kernel.org/r/20240614-feature_poe_power_cap-v3-0-a26784e78311@bootlin.com
v2: https://lore.kernel.org/r/20240607-feature_poe_power_cap-v2-0-c03c2deb83ab@bootlin.com
v1: https://lore.kernel.org/r/20240529-feature_poe_power_cap-v1-0-0c4b1d5953b8@bootlin.com
====================

Link: https://patch.msgid.link/20240704-feature_poe_power_cap-v6-0-320003204264@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: pse-pd: pd692x0: Enhance with new current limit and voltage read callbacks
Kory Maincent (Dent Project) [Thu, 4 Jul 2024 08:12:02 +0000 (10:12 +0200)]
net: pse-pd: pd692x0: Enhance with new current limit and voltage read callbacks

This patch expands PSE callbacks with newly introduced
pi_get/set_current_limit() and pi_get_voltage() callback.
It also add the power limit ranges description in the status returned.
The only way to set ps692x0 port power limit is by configure the power
class plus a small power supplement which maximum depends on each class.

Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20240704-feature_poe_power_cap-v6-7-320003204264@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonetlink: specs: Expand the PSE netlink command with C33 pw-limit attributes
Kory Maincent (Dent Project) [Thu, 4 Jul 2024 08:12:01 +0000 (10:12 +0200)]
netlink: specs: Expand the PSE netlink command with C33 pw-limit attributes

Expand the c33 PSE attributes with power limit to be able to set and get
the PSE Power Interface power limit.

./ynl/cli.py --spec netlink/specs/ethtool.yaml --no-schema --do pse-get
             --json '{"header":{"dev-name":"eth1"}}'
{'c33-pse-actual-pw': 1700,
 'c33-pse-admin-state': 3,
 'c33-pse-avail-pw-limit': 97500,
 'c33-pse-pw-class': 4,
 'c33-pse-pw-d-status': 4,
 'c33-pse-pw-limit-ranges': [{'max': 18100, 'min': 15000},
                             {'max': 38000, 'min': 30000},
                             {'max': 65000, 'min': 60000},
                             {'max': 97500, 'min': 90000}],
 'header': {'dev-index': 5, 'dev-name': 'eth1'}}

./ynl/cli.py --spec netlink/specs/ethtool.yaml --no-schema --do pse-set
             --json '{"header":{"dev-name":"eth1"},
                      "c33-pse-avail-pw-limit":19000}'
None

Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20240704-feature_poe_power_cap-v6-6-320003204264@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: ethtool: Add new power limit get and set features
Kory Maincent (Dent Project) [Thu, 4 Jul 2024 08:12:00 +0000 (10:12 +0200)]
net: ethtool: Add new power limit get and set features

This patch expands the status information provided by ethtool for PSE c33
with available power limit and available power limit ranges. It also adds
a call to pse_ethtool_set_pw_limit() to configure the PSE control power
limit.

Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20240704-feature_poe_power_cap-v6-5-320003204264@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: pse-pd: Add new power limit get and set c33 features
Kory Maincent (Dent Project) [Thu, 4 Jul 2024 08:11:59 +0000 (10:11 +0200)]
net: pse-pd: Add new power limit get and set c33 features

This patch add a way to get and set the power limit of a PSE PI.
For that it uses regulator API callbacks wrapper like get_voltage() and
get/set_current_limit() as power is simply V * I.
We used mW unit as defined by the IEEE 802.3-2022 standards.

set_current_limit() uses the voltage return by get_voltage() and the
desired power limit to calculate the current limit. get_voltage() callback
is then mandatory to set the power limit.

get_current_limit() callback is by default looking at a driver callback
and fallback to extracting the current limit from _pse_ethtool_get_status()
if the driver does not set its callback. We prefer let the user the choice
because ethtool_get_status return much more information than the current
limit.

expand pse status with c33_pw_limit_ranges to return the ranges available
to configure the power limit.

Reviewed-by: Sai Krishna <saikrishnag@marvell.com>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20240704-feature_poe_power_cap-v6-4-320003204264@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: pse-pd: pd692x0: Expand ethtool status message
Kory Maincent (Dent Project) [Thu, 4 Jul 2024 08:11:58 +0000 (10:11 +0200)]
net: pse-pd: pd692x0: Expand ethtool status message

This update expands pd692x0_ethtool_get_status() callback with newly
introduced details such as the detected class, current power delivered,
and extended state information.

Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20240704-feature_poe_power_cap-v6-3-320003204264@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonetlink: specs: Expand the PSE netlink command with C33 new features
Kory Maincent (Dent Project) [Thu, 4 Jul 2024 08:11:57 +0000 (10:11 +0200)]
netlink: specs: Expand the PSE netlink command with C33 new features

Expand the c33 PSE attributes with PSE class, extended state information
and power consumption.

./ynl/cli.py --spec netlink/specs/ethtool.yaml --no-schema --do pse-get
     --json '{"header":{"dev-name":"eth0"}}'
{'c33-pse-actual-pw': 1700,
 'c33-pse-admin-state': 3,
 'c33-pse-pw-class': 4,
 'c33-pse-pw-d-status': 4,
 'header': {'dev-index': 4, 'dev-name': 'eth0'}}

./ynl/cli.py --spec netlink/specs/ethtool.yaml --no-schema --do pse-get
     --json '{"header":{"dev-name":"eth0"}}'
{'c33-pse-admin-state': 3,
 'c33-pse-ext-state': 'mr-mps-valid',
 'c33-pse-ext-substate': 2,
 'c33-pse-pw-d-status': 2,
 'header': {'dev-index': 4, 'dev-name': 'eth0'}}

Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20240704-feature_poe_power_cap-v6-2-320003204264@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: ethtool: pse-pd: Expand C33 PSE status with class, power and extended state
Kory Maincent (Dent Project) [Thu, 4 Jul 2024 08:11:56 +0000 (10:11 +0200)]
net: ethtool: pse-pd: Expand C33 PSE status with class, power and extended state

This update expands the status information provided by ethtool for PSE c33.
It includes details such as the detected class, current power delivered,
and extended state information.

Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20240704-feature_poe_power_cap-v6-1-320003204264@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch 'net-openvswitch-add-sample-multicasting'
Jakub Kicinski [Sat, 6 Jul 2024 00:45:49 +0000 (17:45 -0700)]
Merge branch 'net-openvswitch-add-sample-multicasting'

Adrian Moreno says:

====================
net: openvswitch: Add sample multicasting.

** Background **
Currently, OVS supports several packet sampling mechanisms (sFlow,
per-bridge IPFIX, per-flow IPFIX). These end up being translated into a
userspace action that needs to be handled by ovs-vswitchd's handler
threads only to be forwarded to some third party application that
will somehow process the sample and provide observability on the
datapath.

A particularly interesting use-case is controller-driven
per-flow IPFIX sampling where the OpenFlow controller can add metadata
to samples (via two 32bit integers) and this metadata is then available
to the sample-collecting system for correlation.

** Problem **
The fact that sampled traffic share netlink sockets and handler thread
time with upcalls, apart from being a performance bottleneck in the
sample extraction itself, can severely compromise the datapath,
yielding this solution unfit for highly loaded production systems.

Users are left with little options other than guessing what sampling
rate will be OK for their traffic pattern and system load and dealing
with the lost accuracy.

Looking at available infrastructure, an obvious candidated would be
to use psample. However, it's current state does not help with the
use-case at stake because sampled packets do not contain user-defined
metadata.

** Proposal **
This series is an attempt to fix this situation by extending the
existing psample infrastructure to carry a variable length
user-defined cookie.

The main existing user of psample is tc's act_sample. It is also
extended to forward the action's cookie to psample.

Finally, a new OVS action (OVS_SAMPLE_ATTR_PSAMPLE) is created.
It accepts a group and an optional cookie and uses psample to
multicast the packet and the metadata.
====================

Link: https://patch.msgid.link/20240704085710.353845-1-amorenoz@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftests: openvswitch: add psample test
Adrian Moreno [Thu, 4 Jul 2024 08:57:01 +0000 (10:57 +0200)]
selftests: openvswitch: add psample test

Add a test to verify sampling packets via psample works.

In order to do that, create a subcommand in ovs-dpctl.py to listen to
on the psample multicast group and print samples.

Reviewed-by: Aaron Conole <aconole@redhat.com>
Tested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Link: https://patch.msgid.link/20240704085710.353845-11-amorenoz@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftests: openvswitch: parse trunc action
Adrian Moreno [Thu, 4 Jul 2024 08:57:00 +0000 (10:57 +0200)]
selftests: openvswitch: parse trunc action

The trunc action was supported decode-able but not parse-able. Add
support for parsing the action string.

Reviewed-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Link: https://patch.msgid.link/20240704085710.353845-10-amorenoz@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftests: openvswitch: add userspace parsing
Adrian Moreno [Thu, 4 Jul 2024 08:56:59 +0000 (10:56 +0200)]
selftests: openvswitch: add userspace parsing

The userspace action lacks parsing support plus it contains a bug in the
name of one of its attributes.

This patch makes userspace action work.

Reviewed-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Link: https://patch.msgid.link/20240704085710.353845-9-amorenoz@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftests: openvswitch: add psample action
Adrian Moreno [Thu, 4 Jul 2024 08:56:58 +0000 (10:56 +0200)]
selftests: openvswitch: add psample action

Add sample and psample action support to ovs-dpctl.py.

Refactor common attribute parsing logic into an external function.

Reviewed-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Link: https://patch.msgid.link/20240704085710.353845-8-amorenoz@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: openvswitch: store sampling probability in cb.
Adrian Moreno [Thu, 4 Jul 2024 08:56:57 +0000 (10:56 +0200)]
net: openvswitch: store sampling probability in cb.

When a packet sample is observed, the sampling rate that was used is
important to estimate the real frequency of such event.

Store the probability of the parent sample action in the skb's cb area
and use it in psample action to pass it down to psample module.

Reviewed-by: Aaron Conole <aconole@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Link: https://patch.msgid.link/20240704085710.353845-7-amorenoz@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: openvswitch: add psample action
Adrian Moreno [Thu, 4 Jul 2024 08:56:56 +0000 (10:56 +0200)]
net: openvswitch: add psample action

Add support for a new action: psample.

This action accepts a u32 group id and a variable-length cookie and uses
the psample multicast group to make the packet available for
observability.

The maximum length of the user-defined cookie is set to 16, same as
tc_cookie, to discourage using cookies that will not be offloadable.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Link: https://patch.msgid.link/20240704085710.353845-6-amorenoz@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: psample: allow using rate as probability
Adrian Moreno [Thu, 4 Jul 2024 08:56:55 +0000 (10:56 +0200)]
net: psample: allow using rate as probability

Although not explicitly documented in the psample module itself, the
definition of PSAMPLE_ATTR_SAMPLE_RATE seems inherited from act_sample.

Quoting tc-sample(8):
"RATE of 100 will lead to an average of one sampled packet out of every
100 observed."

With this semantics, the rates that we can express with an unsigned
32-bits number are very unevenly distributed and concentrated towards
"sampling few packets".
For example, we can express a probability of 2.32E-8% but we
cannot express anything between 100% and 50%.

For sampling applications that are capable of sampling a decent
amount of packets, this sampling rate semantics is not very useful.

Add a new flag to the uAPI that indicates that the sampling rate is
expressed in scaled probability, this is:
- 0 is 0% probability, no packets get sampled.
- U32_MAX is 100% probability, all packets get sampled.

Reviewed-by: Aaron Conole <aconole@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Link: https://patch.msgid.link/20240704085710.353845-5-amorenoz@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: psample: skip packet copy if no listeners
Adrian Moreno [Thu, 4 Jul 2024 08:56:54 +0000 (10:56 +0200)]
net: psample: skip packet copy if no listeners

If nobody is listening on the multicast group, generating the sample,
which involves copying packet data, seems completely unnecessary.

Return fast in this case.

Reviewed-by: Aaron Conole <aconole@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Link: https://patch.msgid.link/20240704085710.353845-4-amorenoz@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: sched: act_sample: add action cookie to sample
Adrian Moreno [Thu, 4 Jul 2024 08:56:53 +0000 (10:56 +0200)]
net: sched: act_sample: add action cookie to sample

If the action has a user_cookie, pass it along to the sample so it can
be easily identified.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Reviewed-by: Aaron Conole <aconole@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Link: https://patch.msgid.link/20240704085710.353845-3-amorenoz@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: psample: add user cookie
Adrian Moreno [Thu, 4 Jul 2024 08:56:52 +0000 (10:56 +0200)]
net: psample: add user cookie

Add a user cookie to the sample metadata so that sample emitters can
provide more contextual information to samples.

If present, send the user cookie in a new attribute:
PSAMPLE_ATTR_USER_COOKIE.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Link: https://patch.msgid.link/20240704085710.353845-2-amorenoz@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: ethernet: mtk_eth_soc: implement .{get,set}_pauseparam ethtool ops
Daniel Golle [Thu, 4 Jul 2024 10:14:55 +0000 (11:14 +0100)]
net: ethernet: mtk_eth_soc: implement .{get,set}_pauseparam ethtool ops

Implement operations to get and set flow-control link parameters.
Both is done by simply calling phylink_ethtool_{get,set}_pauseparam().
Fix whitespace in mtk_ethtool_ops while at it.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Rui Salvaterra <rsalvaterra@gmail.com>
Link: https://patch.msgid.link/e3ece47323444631d6cb479f32af0dfd6d145be0.1720088047.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: ethernet: mtk_ppe: Change PPE entries number to 16K
Shengyu Qu [Thu, 4 Jul 2024 17:26:26 +0000 (01:26 +0800)]
net: ethernet: mtk_ppe: Change PPE entries number to 16K

MT7981,7986 and 7988 all supports 32768 PPE entries, and MT7621/MT7620
supports 16384 PPE entries, but only set to 8192 entries in driver. So
incrase max entries to 16384 instead.

Signed-off-by: Elad Yifee <eladwf@gmail.com>
Signed-off-by: Shengyu Qu <wiagn233@outlook.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/TY3P286MB261103F937DE4EEB0F88437D98DE2@TY3P286MB2611.JPNP286.PROD.OUTLOOK.COM
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch 'net-constify-struct-regmap_bus-regmap_config'
Jakub Kicinski [Sat, 6 Jul 2024 00:02:23 +0000 (17:02 -0700)]
Merge branch 'net-constify-struct-regmap_bus-regmap_config'

Javier Carrasco says:

====================
net: constify struct regmap_bus/regmap_config

This series adds the const modifier to the remaining regmap_bus and
regmap_config structs within the net subsystem that are effectively
used as const (i.e., only read after their declaration), but kept as
writtable data.
====================

Link: https://patch.msgid.link/20240703-net-const-regmap-v1-0-ff4aeceda02c@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: dsa: ar9331: constify struct regmap_bus
Javier Carrasco [Wed, 3 Jul 2024 21:46:36 +0000 (23:46 +0200)]
net: dsa: ar9331: constify struct regmap_bus

`ar9331_sw_bus` is not modified and can be declared as const to
move its data to a read-only section.

Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/20240703-net-const-regmap-v1-4-ff4aeceda02c@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: encx24j600: constify struct regmap_bus/regmap_config
Javier Carrasco [Wed, 3 Jul 2024 21:46:35 +0000 (23:46 +0200)]
net: encx24j600: constify struct regmap_bus/regmap_config

`regmap_encx24j600`, `phycfg` and `phymap_encx24j600` are not modified
and can be declared as const to move their data to a read-only section.

Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Link: https://patch.msgid.link/20240703-net-const-regmap-v1-3-ff4aeceda02c@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: ti: icss-iep: constify struct regmap_config
Javier Carrasco [Wed, 3 Jul 2024 21:46:34 +0000 (23:46 +0200)]
net: ti: icss-iep: constify struct regmap_config

`am654_icss_iep_regmap_config` is only assigned to a pointer that passes
the data as read-only.

Add the const modifier to the struct and pointer to move the data to a
read-only section.

Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>
Link: https://patch.msgid.link/20240703-net-const-regmap-v1-2-ff4aeceda02c@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: dsa: qca8k: constify struct regmap_config
Javier Carrasco [Wed, 3 Jul 2024 21:46:33 +0000 (23:46 +0200)]
net: dsa: qca8k: constify struct regmap_config

`qca8k_regmap_config` is not modified and can be declared as const to
move its data to a read-only section.

Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/20240703-net-const-regmap-v1-1-ff4aeceda02c@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agotun: Assign missing bpf_net_context.
Sebastian Andrzej Siewior [Thu, 4 Jul 2024 14:48:15 +0000 (16:48 +0200)]
tun: Assign missing bpf_net_context.

During the introduction of struct bpf_net_context handling for
XDP-redirect, the tun driver has been missed.
Jakub also pointed out that there is another call chain to
do_xdp_generic() originating from netif_receive_skb() and drivers may
use it outside from the NAPI context.

Set the bpf_net_context before invoking BPF XDP program within the TUN
driver. Set the bpf_net_context also in do_xdp_generic() if a xdp
program is available.

Reported-by: syzbot+0b5c75599f1d872bea6f@syzkaller.appspotmail.com
Reported-by: syzbot+5ae46b237278e2369cac@syzkaller.appspotmail.com
Reported-by: syzbot+c1e04a422bbc0f0f2921@syzkaller.appspotmail.com
Fixes: 401cb7dae8130 ("net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20240704144815.j8xQda5r@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: ethernet: mediatek: Allow gaps in MAC allocation
Daniel Golle [Mon, 1 Jul 2024 19:28:14 +0000 (20:28 +0100)]
net: ethernet: mediatek: Allow gaps in MAC allocation

Some devices with MediaTek SoCs don't use the first but only the second
MAC in the chip. Especially with MT7981 which got a built-in 1GE PHY
connected to the second MAC this is quite common.
Make sure to reset and enable PSE also in those cases by skipping gaps
using 'continue' instead of aborting the loop using 'break'.

Fixes: dee4dd10c79a ("net: ethernet: mtk_eth_soc: ppe: add support for multiple PPEs")
Suggested-by: Elad Yifee <eladwf@gmail.com>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/379ae584cea112db60f4ada79c7e5ba4f3364a64.1719862038.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoopenvswitch: prepare for stolen verdict coming from conntrack and nat engine
Florian Westphal [Wed, 3 Jul 2024 10:46:34 +0000 (12:46 +0200)]
openvswitch: prepare for stolen verdict coming from conntrack and nat engine

At this time, conntrack either returns NF_ACCEPT or NF_DROP.
To improve debuging it would be nice to be able to replace NF_DROP
verdict with NF_DROP_REASON() helper,

This helper releases the skb instantly (so drop_monitor can pinpoint
precise location) and returns NF_STOLEN.

Prepare call sites to deal with this before introducing such changes
in conntrack and nat core.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Aaron Conole <aconole@redhat.om>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 months agoMerge branch 'pcs-xpcs-mmap' into main
David S. Miller [Fri, 5 Jul 2024 08:35:52 +0000 (09:35 +0100)]
Merge branch 'pcs-xpcs-mmap' into main

Serge Semin <fancer says:

====================
net: pcs: xpcs: Add memory-mapped device support

The main goal of this series is to extend the DW XPCS device support in
the kernel. Particularly the patchset adds a support of the DW XPCS
device with the MCI/APB3 IO interface registered as a platform device. In
order to have them utilized by the DW XPCS core the fwnode-based DW XPCS
descriptor creation procedure has been introduced. Finally the STMMAC
driver has been altered to support the DW XPCS passed via the 'pcs-handle'
property.

Note the series has been significantly re-developed since v1. So I even
had to change the subject. Anyway I've done my best to take all the noted
into account.

The series traditionally starts with a set of the preparation patches.
First one just moves the generic DW XPCS IDs macros from the internal
header file to the external one where some other IDs also reside. Second
patch splits up the xpcs_create() method to a set of the coherent
sub-functions for the sake of the further easier updates and to have it
looking less complicated. The goal of the next three patches is to extend
the DW XPCS ID management code by defining a new dw_xpcs_info structure
with both PCS and PMA IDs.

The next two patches provide the DW XPCS device DT-bindings and the
respective platform-device driver for the memory-mapped DW XPCS devices.
Besides the later patch makes use of the introduced dw_xpcs_info structure
to pre-define the DW XPCS IDs based on the platform-device compatible
string. Thus if there is no way to auto-identify the XPCS device
capabilities it can be done based on the custom device IDs passed via the
MDIO-device platform data.

Final DW XPCS driver change is about adding a new method of the DW XPCS
descriptor creation. The xpcs_create_fwnode() function has been introduced
with the same semantics as a similar method recently added to the Lynx PCS
driver. It's supposed to be called with the fwnode pointing to the DW XPCS
device node, for which the XPCS descriptor will be created.

The series is terminated with two STMMAC driver patches. The former one
simplifies the DW XPCS descriptor registration procedure by dropping the
MDIO-bus scanning and creating the descriptor for the particular device
address. The later patch alters the STMMAC PCS setup method so one would
support the DW XPCS specified via the "pcs-handle" property.

That's it for now. Thanks for review in advance. Any tests are very
welcome. After this series is merged in, I'll submit another one which
adds the generic 10GBase-R and 10GBase-X interfaces support to the STMMAC
and DW XPCS driver with the proper CSRs re-initialization, PMA
initialization and reference clock selection as it's described in the
Synopsys DW XPCS HW manual.

Link: https://lore.kernel.org/netdev/20231205103559.9605-1-fancer.lancer@gmail.com
Changelog v2:
- Drop the patches:
  [PATCH net-next 01/16] net: pcs: xpcs: Drop sentinel entry from 2500basex ifaces list
  [PATCH net-next 02/16] net: pcs: xpcs: Drop redundant workqueue.h include directive
  [PATCH net-next 03/16] net: pcs: xpcs: Return EINVAL in the internal methods
  [PATCH net-next 04/16] net: pcs: xpcs: Explicitly return error on caps validation
  as ones have already been merged into the kernel repo:
Link: https://lore.kernel.org/netdev/20240222175843.26919-1-fancer.lancer@gmail.com/
- Drop the patches:
  [PATCH net-next 14/16] net: stmmac: Pass netdev to XPCS setup function
  [PATCH net-next 15/16] net: stmmac: Add dedicated XPCS cleanup method
  as ones have already been merged into the kernel repo:
Link: https://lore.kernel.org/netdev/20240513-rzn1-gmac1-v7-0-6acf58b5440d@bootlin.com/
- Drop the patch:
  [PATCH net-next 06/16] net: pcs: xpcs: Avoid creating dummy XPCS MDIO device
  [PATCH net-next 09/16] net: mdio: Add Synopsys DW XPCS management interface support
  [PATCH net-next 11/16] net: pcs: xpcs: Change xpcs_create_mdiodev() suffix to "byaddr"
  [PATCH net-next 13/16] net: stmmac: intel: Register generic MDIO device
  as no longer relevant.
- Add new patches:
  [PATCH net-next v2 03/10] net: pcs: xpcs: Convert xpcs_id to dw_xpcs_desc
  [PATCH net-next v2 04/10] net: pcs: xpcs: Convert xpcs_compat to dw_xpcs_compat
  [PATCH net-next v2 05/10] net: pcs: xpcs: Introduce DW XPCS info structure
  [PATCH net-next v2 09/10] net: stmmac: Create DW XPCS device with particular address
- Use the xpcs_create_fwnode() function name and semantics similar to the
  Lynx PCS driver.
- Add kdoc describing the DW XPCS registration functions.
- Convert the memory-mapped DW XPCS device driver to being the
  platform-device driver.
- Convert the DW XPCS DT-bindings to defining both memory-mapped and MDIO
  devices.
- Drop inline'es from the methods statically defined in *.c. (@Maxime)
- Preserve the strict refcount-ing pattern. (@Russell)

Link: https://lore.kernel.org/netdev/20240602143636.5839-1-fancer.lancer@gmail.com/
Changelov v3:
- Implement the ordered clocks constraint. (@Rob)
- Convert xpcs_plat_pm_ops to being defined as static. (@Simon)
- Add the "@interface" argument kdoc to the xpcs_create_mdiodev()
  function. (@Simon)
- Fix the "@fwnode" argument name in the xpcs_create_fwnode() method kdoc.
  (@Simon)
- Move the return value descriptions to the "Return:" section of the
  xpcs_create_mdiodev() and xpcs_create_fwnode() kdoc. (@Simon)
- Drop stmmac_mdio_bus_data::has_xpcs flag and define the PCS-address
  mask with particular XPCS address instead.

Link: https://lore.kernel.org/netdev/20240627004142.8106-1-fancer.lancer@gmail.com/
Changelog v4:
- Make sure the series is applicable to the net-next tree. (@Vladimir)
- Rename entry to desc in the xpcs_init_id() method. (@Andrew)
- Add a comment to the clock-names property constraint about the
  oneOf-subschemas applicability. (@Conor)
- Convert "pclk" clock name to "csr" to match the DW XPCS IP-core
  input signal name. (@Rob)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 months agonet: stmmac: Add DW XPCS specified via "pcs-handle" support
Serge Semin [Mon, 1 Jul 2024 18:28:41 +0000 (21:28 +0300)]
net: stmmac: Add DW XPCS specified via "pcs-handle" support

Recently the DW XPCS DT-bindings have been introduced and the DW XPCS
driver has been altered to support the DW XPCS registered as a platform
device. In order to have the DW XPCS DT-device accessed from the STMMAC
driver let's alter the STMMAC PCS-setup procedure to support the
"pcs-handle" property containing the phandle reference to the DW XPCS
device DT-node. The respective fwnode will be then passed to the
xpcs_create_fwnode() function which in its turn will create the DW XPCS
descriptor utilized in the main driver for the PCS-related setups.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 months agonet: stmmac: Create DW XPCS device with particular address
Serge Semin [Mon, 1 Jul 2024 18:28:40 +0000 (21:28 +0300)]
net: stmmac: Create DW XPCS device with particular address

Currently the only STMMAC platform driver using the DW XPCS code is the
Intel mGBE device driver. (It can be determined by finding all the drivers
having the stmmac_mdio_bus_data::has_xpcs flag set.) At the same time the
low-level platform driver masks out the DW XPCS MDIO-address from being
auto-detected as PHY by the MDIO subsystem core. Seeing the PCS MDIO ID is
known the procedure of the DW XPCS device creation can be simplified by
dropping the loop over all the MDIO IDs. From now the DW XPCS device
descriptor will be created for the MDIO-bus address pre-defined by the
platform drivers via the stmmac_mdio_bus_data::pcs_mask field.

Note besides this shall speed up a bit the Intel mGBE probing.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 months agonet: pcs: xpcs: Add fwnode-based descriptor creation method
Serge Semin [Mon, 1 Jul 2024 18:28:39 +0000 (21:28 +0300)]
net: pcs: xpcs: Add fwnode-based descriptor creation method

It's now possible to have the DW XPCS device defined as a standard
platform device for instance in the platform DT-file. Although that
functionality is useless unless there is a way to have the device found by
the client drivers (STMMAC/DW *MAC, NXP SJA1105 Eth Switch, etc). Provide
such ability by means of the xpcs_create_fwnode() method. It needs to be
called with the device DW XPCS fwnode instance passed. That node will be
then used to find the MDIO-device instance in order to create the DW XPCS
descriptor.

Note the method semantics and name is similar to what has been recently
introduced in the Lynx PCS driver.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 months agonet: pcs: xpcs: Add Synopsys DW xPCS platform device driver
Serge Semin [Mon, 1 Jul 2024 18:28:38 +0000 (21:28 +0300)]
net: pcs: xpcs: Add Synopsys DW xPCS platform device driver

Synopsys DesignWare XPCS IP-core can be synthesized with the device CSRs
being accessible over the MCI or APB3 interface instead of the MDIO bus
(see the CSR_INTERFACE HDL parameter). Thus all the PCS registers can be
just memory mapped and be a subject of the standard MMIO operations of
course taking into account the peculiarities of the Clause C45 CSRs
mapping. From that perspective the DW XPCS devices would look as just
normal platform devices for the kernel.

On the other hand in order to have the DW XPCS devices handled by the
pcs-xpcs.c driver they need to be registered in the framework of the
MDIO-subsystem. So the suggested change is about providing a DW XPCS
platform device driver registering a virtual MDIO-bus with a single
MDIO-device representing the DW XPCS device.

DW XPCS platform device is supposed to be described by the respective
compatible string "snps,dw-xpcs" (or with the PMA-specific compatible
string), CSRs memory space and optional peripheral bus and reference clock
sources. Depending on the INDIRECT_ACCESS IP-core synthesize parameter the
memory-mapped reg-space can be represented as either directly or
indirectly mapped Clause 45 space. In the former case the particular
address is determined based on the MMD device and the registers offset (5
+ 16 bits all together) within the device reg-space. In the later case
there is only 8 lower address bits are utilized for the registers mapping
(255 CSRs). The upper bits are supposed to be written into the respective
viewport CSR in order to select the respective MMD sub-page.

Note, only the peripheral bus clock source is requested in the platform
device probe procedure. The core and pad clocks handling has been
implemented in the framework of the xpcs_create() method intentionally
since the clocks-related setups are supposed to be performed later, during
the DW XPCS main configuration procedures. (For instance they will be
required for the DW Gen5 10G PMA configuration.)

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 months agodt-bindings: net: Add Synopsys DW xPCS bindings
Serge Semin [Mon, 1 Jul 2024 18:28:37 +0000 (21:28 +0300)]
dt-bindings: net: Add Synopsys DW xPCS bindings

Synopsys DesignWare XPCS IP-core is a Physical Coding Sublayer (PCS) layer
providing an interface between the Media Access Control (MAC) and Physical
Medium Attachment Sublayer (PMA) through a Media independent interface.
From software point of view it exposes IEEE std. Clause 45 CSR space and
can be accessible either by MDIO or MCI/APB3 bus interfaces. In the former
case the PCS device is supposed to be defined under the respective MDIO
bus DT-node. In the later case the DW xPCS will be just a normal IO
memory-mapped device.

Besides of that DW XPCS DT-nodes can have an interrupt signal and clock
source properties specified. The former one indicates the Clause 73/37
auto-negotiation events like: negotiation page received, AN is completed
or incompatible link partner. The clock DT-properties can describe up to
three clock sources: peripheral bus clock source, internal reference clock
and the externally connected reference clock.

Finally the DW XPCS IP-core can be optionally synthesized with a
vendor-specific interface connected to the Synopsys PMA (also called
DesignWare Consumer/Enterprise PHY). Alas that isn't auto-detectable in a
portable way. So if the DW XPCS device has the respective PMA attached
then it should be reflected in the DT-node compatible string so the driver
would be aware of the PMA-specific device capabilities (mainly connected
with CSRs available for the fine-tunings).

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 months agonet: pcs: xpcs: Introduce DW XPCS info structure
Serge Semin [Mon, 1 Jul 2024 18:28:36 +0000 (21:28 +0300)]
net: pcs: xpcs: Introduce DW XPCS info structure

The being introduced structure will preserve the PCS and PMA IDs retrieved
from the respective DW XPCS MMDs or potentially pre-defined by the client
drivers. (The later change will be introduced later in the framework of
the commit adding the memory-mapped DW XPCS devices support.)

The structure fields are filled in in the xpcs_get_id() function, which
used to be responsible for the PCS Device ID getting only. Besides of the
PCS ID the method now fetches the PMA/PMD IDs too from MMD 1, which used
to be done in xpcs_dev_flag(). The retrieved PMA ID will be from now
utilized for the PMA-specific tweaks like it was introduced for the
Wangxun TxGBE PCS in the commit f629acc6f210 ("net: pcs: xpcs: support to
switch mode for Wangxun NICs").

Note 1. The xpcs_get_id() error-handling semantics has been changed. From
now the error number will be returned from the function. There is no point
in the next IOs or saving 0xffs and then looping over the actual device
IDs if device couldn't be reached. -ENODEV will be returned if the very
first IO operation failed thus indicating that no device could be found.

Note 2. The PCS and PMA IDs macros have been converted to enum'es. The
enum'es will be populated later in another commit with the virtual IDs
identifying the DW XPCS devices which have some platform-specifics, but
have been synthesized with the default PCS/PMA ID.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 months agonet: pcs: xpcs: Convert xpcs_compat to dw_xpcs_compat
Serge Semin [Mon, 1 Jul 2024 18:28:35 +0000 (21:28 +0300)]
net: pcs: xpcs: Convert xpcs_compat to dw_xpcs_compat

The xpcs_compat structure has been left as the only dw-prefix-less
structure since the previous commit. Let's unify at least the structures
naming in the driver by adding the dw_-prefix to it.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 months agonet: pcs: xpcs: Convert xpcs_id to dw_xpcs_desc
Serge Semin [Mon, 1 Jul 2024 18:28:34 +0000 (21:28 +0300)]
net: pcs: xpcs: Convert xpcs_id to dw_xpcs_desc

A structure with the PCS/PMA MMD IDs data is being introduced in one of
the next commits. In order to prevent the names ambiguity let's convert
the xpcs_id structure name to dw_xpcs_desc. The later version is more
suitable since the structure content is indeed the device descriptor
containing the data and callbacks required for the driver to correctly set
the device up.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 months agonet: pcs: xpcs: Split up xpcs_create() body to sub-functions
Serge Semin [Mon, 1 Jul 2024 18:28:33 +0000 (21:28 +0300)]
net: pcs: xpcs: Split up xpcs_create() body to sub-functions

As an initial preparation before adding the fwnode-based DW XPCS device
support let's split the xpcs_create() function code up to a set of the
small sub-functions. Thus the xpcs_create() implementation will get to
look simpler and turn to be more coherent. Further updates will just touch
the new sub-functions a bit: add platform-specific device info, add the
reference clock getting and enabling.

The xpcs_create() method will now contain the next static methods calls:

xpcs_create_data() - create the DW XPCS device descriptor, pre-initialize
it' fields and increase the mdio device refcount-er;

xpcs_init_id() - find XPCS ID instance and save it in the device
descriptor;

xpcs_init_iface() - find MAC/PCS interface descriptor and perform
basic initialization specific to it: soft-reset, disable polling.

The update doesn't imply any semantic change but merely makes the code
looking simpler and more ready for adding new features support.

Note the xpcs_destroy() has been moved to being defined below the
xpcs_create_mdiodev() function as the driver now implies having the
protagonist-then-antagonist functions definition order.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 months agonet: pcs: xpcs: Move native device ID macro to linux/pcs/pcs-xpcs.h
Serge Semin [Mon, 1 Jul 2024 18:28:32 +0000 (21:28 +0300)]
net: pcs: xpcs: Move native device ID macro to linux/pcs/pcs-xpcs.h

One of the next commits will alter the DW XPCS driver to support setting a
custom device ID for the particular MDIO-device detected on the platform.
The generic DW XPCS ID can be used as a custom ID as well in case if the
DW XPCS-device was erroneously synthesized with no or some undefined ID.
In addition to that having all supported DW XPCS device IDs defined in a
single place will improve the code maintainability and readability.

Note while at it rename the macros to being shorter and looking alike to
the already defined NXP XPCS ID macro.

Signed-off-by: Serge Semin <fancer.lancer@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 months agodt-bindings: net: Define properties at top-level
Rob Herring (Arm) [Wed, 3 Jul 2024 19:58:27 +0000 (13:58 -0600)]
dt-bindings: net: Define properties at top-level

Convention is DT schemas should define all properties at the top-level
and not inside of if/then schemas. That minimizes the if/then schemas
and is more future proof.

Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Link: https://patch.msgid.link/20240703195827.1670594-2-robh@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoethtool: move firmware flashing flag to struct ethtool_netdev_state
Edward Cree [Wed, 3 Jul 2024 12:18:49 +0000 (13:18 +0100)]
ethtool: move firmware flashing flag to struct ethtool_netdev_state

Commit 31e0aa99dc02 ("ethtool: Veto some operations during firmware flashing process")
 added a flag module_fw_flash_in_progress to struct net_device.  As
 this is ethtool related state, move it to the recently created
 struct ethtool_netdev_state, accessed via the 'ethtool' member of
 struct net_device.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20240703121849.652893-1-edward.cree@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Thu, 4 Jul 2024 21:11:03 +0000 (14:11 -0700)]
Merge git://git./linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/phy/aquantia/aquantia.h
  219343755eae ("net: phy: aquantia: add missing include guards")
  61578f679378 ("net: phy: aquantia: add support for PHY LEDs")

drivers/net/ethernet/wangxun/libwx/wx_hw.c
  bd07a9817846 ("net: txgbe: remove separate irq request for MSI and INTx")
  b501d261a5b3 ("net: txgbe: add FDIR ATR support")
https://lore.kernel.org/all/20240703112936.483c1975@canb.auug.org.au/

include/linux/mlx5/mlx5_ifc.h
  048a403648fc ("net/mlx5: IFC updates for changing max EQs")
  99be56171fa9 ("net/mlx5e: SHAMPO, Re-enable HW-GRO")
https://lore.kernel.org/all/20240701133951.6926b2e3@canb.auug.org.au/

Adjacent changes:

drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
  4130c67cd123 ("wifi: iwlwifi: mvm: check vif for NULL/ERR_PTR before dereference")
  3f3126515fbe ("wifi: iwlwifi: mvm: add mvm-specific guard")

include/net/mac80211.h
  816c6bec09ed ("wifi: mac80211: fix BSS_CHANGED_UNSOL_BCAST_PROBE_RESP")
  5a009b42e041 ("wifi: mac80211: track changes in AP's TPE")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch 'crypto-caam-unembed-net_dev'
Jakub Kicinski [Thu, 4 Jul 2024 17:19:25 +0000 (10:19 -0700)]
Merge branch 'crypto-caam-unembed-net_dev'

Breno Leitao says:

====================
crypto: caam: Unembed net_dev

This will un-embed the net_device struct from inside other struct, so we
can add flexible array into net_device.

This also enable COMPILE test for FSL_CAAM, as any config option that
depends on ARCH_LAYERSCAPE.
====================

Link: https://patch.msgid.link/20240702185557.3699991-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agocrypto: caam: Unembed net_dev structure in dpaa2
Breno Leitao [Tue, 2 Jul 2024 18:55:54 +0000 (11:55 -0700)]
crypto: caam: Unembed net_dev structure in dpaa2

Embedding net_device into structures prohibits the usage of flexible
arrays in the net_device structure. For more details, see the discussion
at [1].

Un-embed the net_devices from struct dpaa2_caam_priv_per_cpu by
converting them into pointers, and allocating them dynamically. Use the
leverage alloc_netdev_dummy() to allocate the net_device object at
dpaa2_dpseci_setup().

The free of the device occurs at dpaa2_dpseci_disable().

Link: https://lore.kernel.org/all/20240229225910.79e224cf@kernel.org/
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20240702185557.3699991-5-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agocrypto: caam: Unembed net_dev structure from qi
Breno Leitao [Tue, 2 Jul 2024 18:55:53 +0000 (11:55 -0700)]
crypto: caam: Unembed net_dev structure from qi

Embedding net_device into structures prohibits the usage of flexible
arrays in the net_device structure. For more details, see the discussion
at [1].

Un-embed the net_devices from struct caam_qi_pcpu_priv by converting them
into pointers, and allocating them dynamically. Use the leverage
alloc_netdev_dummy() to allocate the net_device object at
caam_qi_init().

The free of the device occurs at caam_qi_shutdown().

Link: https://lore.kernel.org/all/20240229225910.79e224cf@kernel.org/
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20240702185557.3699991-4-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agocrypto: caam: Make CRYPTO_DEV_FSL_CAAM dependent of COMPILE_TEST
Breno Leitao [Tue, 2 Jul 2024 18:55:52 +0000 (11:55 -0700)]
crypto: caam: Make CRYPTO_DEV_FSL_CAAM dependent of COMPILE_TEST

As most of the drivers that depend on ARCH_LAYERSCAPE, make
CRYPTO_DEV_FSL_CAAM depend on COMPILE_TEST for compilation and testing.

    # grep -r depends.\*ARCH_LAYERSCAPE.\*COMPILE_TEST | wc -l
    29

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20240702185557.3699991-3-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agocrypto: caam: Avoid unused imx8m_machine_match variable
Breno Leitao [Tue, 2 Jul 2024 18:55:51 +0000 (11:55 -0700)]
crypto: caam: Avoid unused imx8m_machine_match variable

If caam module is built without OF support, the compiler returns the
following warning:

drivers/crypto/caam/ctrl.c:83:34: warning: 'imx8m_machine_match' defined but not used [-Wunused-const-variable=]

imx8m_machine_match is only referenced by of_match_node(), which is set
to NULL if CONFIG_OF is not set, as of commit 5762c20593b6b ("dt: Add
empty of_match_node() macro"):

#define of_match_node(_matches, _node)  NULL

Do not create imx8m_machine_match if CONFIG_OF is not set.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202407011309.cpTuOGdg-lkp@intel.com/
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20240702185557.3699991-2-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge tag 'net-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 4 Jul 2024 17:11:12 +0000 (10:11 -0700)]
Merge tag 'net-6.10-rc7' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bluetooth, wireless and netfilter.

  There's one fix for power management with Intel's e1000e here,
  Thorsten tells us there's another problem that started in v6.9. We're
  trying to wrap that up but I don't think it's blocking.

  Current release - new code bugs:

   - wifi: mac80211: disable softirqs for queued frame handling

   - af_unix: fix uninit-value in __unix_walk_scc(), with the new
     garbage collection algo

  Previous releases - regressions:

   - Bluetooth:
      - qca: fix BT enable failure for QCA6390 after warm reboot
      - add quirk to ignore reserved PHY bits in LE Extended Adv Report,
        abused by some Broadcom controllers found on Apple machines

   - wifi: wilc1000: fix ies_len type in connect path

  Previous releases - always broken:

   - tcp: fix DSACK undo in fast recovery to call tcp_try_to_open(),
     avoid premature timeouts

   - net: make sure skb_datagram_iter maps fragments page by page, in
     case we somehow get compound highmem mixed in

   - eth: bnx2x: fix multiple UBSAN array-index-out-of-bounds when more
     queues are used

  Misc:

   - MAINTAINERS: Remembering Larry Finger"

* tag 'net-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
  bnxt_en: Fix the resource check condition for RSS contexts
  mlxsw: core_linecards: Fix double memory deallocation in case of invalid INI file
  inet_diag: Initialize pad field in struct inet_diag_req_v2
  tcp: Don't flag tcp_sk(sk)->rx_opt.saw_unknown for TCP AO.
  selftests: make order checking verbose in msg_zerocopy selftest
  selftests: fix OOM in msg_zerocopy selftest
  ice: use proper macro for testing bit
  ice: Reject pin requests with unsupported flags
  ice: Don't process extts if PTP is disabled
  ice: Fix improper extts handling
  selftest: af_unix: Add test case for backtrack after finalising SCC.
  af_unix: Fix uninit-value in __unix_walk_scc()
  bonding: Fix out-of-bounds read in bond_option_arp_ip_targets_set()
  net: rswitch: Avoid use-after-free in rswitch_poll()
  netfilter: nf_tables: unconditionally flush pending work before notifier
  wifi: iwlwifi: mvm: check vif for NULL/ERR_PTR before dereference
  wifi: iwlwifi: mvm: avoid link lookup in statistics
  wifi: iwlwifi: mvm: don't wake up rx_sync_waitq upon RFKILL
  wifi: iwlwifi: properly set WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK
  wifi: wilc1000: fix ies_len type in connect path
  ...

2 months agoMerge tag 's390-6.10-8' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Thu, 4 Jul 2024 16:46:15 +0000 (09:46 -0700)]
Merge tag 's390-6.10-8' of git://git./linux/kernel/git/s390/linux

Pull s390 fixes from Heiko Carstens:

 - Fix and add physical to virtual address translations in dasd and
   virtio_ccw drivers. For virtio_ccw this is just a minimal fix.
   More code cleanup will follow.

 - Small defconfig updates

* tag 's390-6.10-8' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/dasd: Fix invalid dereferencing of indirect CCW data pointer
  s390/vfio_ccw: Fix target addresses of TIC CCWs
  s390: Update defconfigs

2 months agoMerge tag 'platform-drivers-x86-v6.10-5' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 4 Jul 2024 16:36:42 +0000 (09:36 -0700)]
Merge tag 'platform-drivers-x86-v6.10-5' of git://git./linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fix from Hans de Goede:

 - Fix regression in toshiba_acpi introduced in 6.10-rc1

* tag 'platform-drivers-x86-v6.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: toshiba_acpi: Fix quickstart quirk handling

2 months agoMerge tag 'kselftest-fix-2024-07-04' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 4 Jul 2024 16:29:42 +0000 (09:29 -0700)]
Merge tag 'kselftest-fix-2024-07-04' of git://git./linux/kernel/git/mic/linux

Pull Kselftest fix from Mickaël Salaün:
 "Fix Kselftests timeout.

  We can't use CLONE_VFORK, since that blocks the parent - and thus the
  timeout handling - until the child exits or execve's.

  Go back to using plain fork()"

* tag 'kselftest-fix-2024-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
  selftests/harness: Fix tests timeout and race condition

2 months agoMerge tag 'mm-hotfixes-stable-2024-07-03-22-23' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Thu, 4 Jul 2024 16:13:02 +0000 (09:13 -0700)]
Merge tag 'mm-hotfixes-stable-2024-07-03-22-23' of git://git./linux/kernel/git/akpm/mm

Pull misc fixes from, Andrew Morton:
 "6 hotfies, all cc:stable. Some fixes for longstanding nilfs2 issues
  and three unrelated MM fixes"

* tag 'mm-hotfixes-stable-2024-07-03-22-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  nilfs2: fix incorrect inode allocation from reserved inodes
  nilfs2: add missing check for inode numbers on directory entries
  nilfs2: fix inode number range checks
  mm: avoid overflows in dirty throttling logic
  Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again"
  mm: optimize the redundant loop of mm_update_owner_next()

2 months agobnxt_en: Fix the resource check condition for RSS contexts
Pavan Chebbi [Wed, 3 Jul 2024 18:01:12 +0000 (11:01 -0700)]
bnxt_en: Fix the resource check condition for RSS contexts

While creating a new RSS context, bnxt_rfs_capable() currently
makes a strict check to see if the required VNICs are already
available.  If the current VNICs are not what is required,
either too many or not enough, it will call the firmware to
reserve the exact number required.

There is a bug in the firmware when the driver tries to
relinquish some reserved VNICs and RSS contexts.  It will
cause the default VNIC to lose its RSS configuration and
cause receive packets to be placed incorrectly.

Workaround this problem by skipping the resource reduction.
The driver will not reduce the VNIC and RSS context reservations
when a context is deleted.  The resources will be available for
use when new contexts are created later.

Potentially, this workaround can cause us to run out of VNIC
and RSS contexts if there are a lot of VF functions creating
and deleting RSS contexts.  In the future, we will conditionally
disable this workaround when the firmware fix is available.

Fixes: 438ba39b25fe ("bnxt_en: Improve RSS context reservation infrastructure")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/netdev/20240625010210.2002310-1-kuba@kernel.org/
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240703180112.78590-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agomlxsw: core_linecards: Fix double memory deallocation in case of invalid INI file
Aleksandr Mishin [Wed, 3 Jul 2024 20:32:51 +0000 (23:32 +0300)]
mlxsw: core_linecards: Fix double memory deallocation in case of invalid INI file

In case of invalid INI file mlxsw_linecard_types_init() deallocates memory
but doesn't reset pointer to NULL and returns 0. In case of any error
occurred after mlxsw_linecard_types_init() call, mlxsw_linecards_init()
calls mlxsw_linecard_types_fini() which performs memory deallocation again.

Add pointer reset to NULL.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: b217127e5e4e ("mlxsw: core_linecards: Add line card objects and implement provisioning")
Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Link: https://patch.msgid.link/20240703203251.8871-1-amishin@t-argos.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge tag 'wireless-2024-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git...
Jakub Kicinski [Thu, 4 Jul 2024 14:31:54 +0000 (07:31 -0700)]
Merge tag 'wireless-2024-07-04' of git://git./linux/kernel/git/wireless/wireless

Kalle Valo says:

====================
wireless fixes for v6.10

Hopefully the last fixes for v6.10. Fix a regression in wilc1000
where bitrate Information Elements longer than 255 bytes were broken.
Few fixes also to mac80211 and iwlwifi.

* tag 'wireless-2024-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: iwlwifi: mvm: check vif for NULL/ERR_PTR before dereference
  wifi: iwlwifi: mvm: avoid link lookup in statistics
  wifi: iwlwifi: mvm: don't wake up rx_sync_waitq upon RFKILL
  wifi: iwlwifi: properly set WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK
  wifi: wilc1000: fix ies_len type in connect path
  wifi: mac80211: fix BSS_CHANGED_UNSOL_BCAST_PROBE_RESP
====================

Link: https://patch.msgid.link/20240704111431.11DEDC3277B@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge tag 'nf-24-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Paolo Abeni [Thu, 4 Jul 2024 13:31:26 +0000 (15:31 +0200)]
Merge tag 'nf-24-07-04' of git://git./linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following batch contains a oneliner patch to inconditionally flush
workqueue containing stale objects to be released, syzbot managed to
trigger UaF. Patch from Florian Westphal.

netfilter pull request 24-07-04

* tag 'nf-24-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: unconditionally flush pending work before notifier
====================

Link: https://patch.msgid.link/20240703223304.1455-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoinet_diag: Initialize pad field in struct inet_diag_req_v2
Shigeru Yoshida [Wed, 3 Jul 2024 09:16:49 +0000 (18:16 +0900)]
inet_diag: Initialize pad field in struct inet_diag_req_v2

KMSAN reported uninit-value access in raw_lookup() [1]. Diag for raw
sockets uses the pad field in struct inet_diag_req_v2 for the
underlying protocol. This field corresponds to the sdiag_raw_protocol
field in struct inet_diag_req_raw.

inet_diag_get_exact_compat() converts inet_diag_req to
inet_diag_req_v2, but leaves the pad field uninitialized. So the issue
occurs when raw_lookup() accesses the sdiag_raw_protocol field.

Fix this by initializing the pad field in
inet_diag_get_exact_compat(). Also, do the same fix in
inet_diag_dump_compat() to avoid the similar issue in the future.

[1]
BUG: KMSAN: uninit-value in raw_lookup net/ipv4/raw_diag.c:49 [inline]
BUG: KMSAN: uninit-value in raw_sock_get+0x657/0x800 net/ipv4/raw_diag.c:71
 raw_lookup net/ipv4/raw_diag.c:49 [inline]
 raw_sock_get+0x657/0x800 net/ipv4/raw_diag.c:71
 raw_diag_dump_one+0xa1/0x660 net/ipv4/raw_diag.c:99
 inet_diag_cmd_exact+0x7d9/0x980
 inet_diag_get_exact_compat net/ipv4/inet_diag.c:1404 [inline]
 inet_diag_rcv_msg_compat+0x469/0x530 net/ipv4/inet_diag.c:1426
 sock_diag_rcv_msg+0x23d/0x740 net/core/sock_diag.c:282
 netlink_rcv_skb+0x537/0x670 net/netlink/af_netlink.c:2564
 sock_diag_rcv+0x35/0x40 net/core/sock_diag.c:297
 netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
 netlink_unicast+0xe74/0x1240 net/netlink/af_netlink.c:1361
 netlink_sendmsg+0x10c6/0x1260 net/netlink/af_netlink.c:1905
 sock_sendmsg_nosec net/socket.c:730 [inline]
 __sock_sendmsg+0x332/0x3d0 net/socket.c:745
 ____sys_sendmsg+0x7f0/0xb70 net/socket.c:2585
 ___sys_sendmsg+0x271/0x3b0 net/socket.c:2639
 __sys_sendmsg net/socket.c:2668 [inline]
 __do_sys_sendmsg net/socket.c:2677 [inline]
 __se_sys_sendmsg net/socket.c:2675 [inline]
 __x64_sys_sendmsg+0x27e/0x4a0 net/socket.c:2675
 x64_sys_call+0x135e/0x3ce0 arch/x86/include/generated/asm/syscalls_64.h:47
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xd9/0x1e0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Uninit was stored to memory at:
 raw_sock_get+0x650/0x800 net/ipv4/raw_diag.c:71
 raw_diag_dump_one+0xa1/0x660 net/ipv4/raw_diag.c:99
 inet_diag_cmd_exact+0x7d9/0x980
 inet_diag_get_exact_compat net/ipv4/inet_diag.c:1404 [inline]
 inet_diag_rcv_msg_compat+0x469/0x530 net/ipv4/inet_diag.c:1426
 sock_diag_rcv_msg+0x23d/0x740 net/core/sock_diag.c:282
 netlink_rcv_skb+0x537/0x670 net/netlink/af_netlink.c:2564
 sock_diag_rcv+0x35/0x40 net/core/sock_diag.c:297
 netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
 netlink_unicast+0xe74/0x1240 net/netlink/af_netlink.c:1361
 netlink_sendmsg+0x10c6/0x1260 net/netlink/af_netlink.c:1905
 sock_sendmsg_nosec net/socket.c:730 [inline]
 __sock_sendmsg+0x332/0x3d0 net/socket.c:745
 ____sys_sendmsg+0x7f0/0xb70 net/socket.c:2585
 ___sys_sendmsg+0x271/0x3b0 net/socket.c:2639
 __sys_sendmsg net/socket.c:2668 [inline]
 __do_sys_sendmsg net/socket.c:2677 [inline]
 __se_sys_sendmsg net/socket.c:2675 [inline]
 __x64_sys_sendmsg+0x27e/0x4a0 net/socket.c:2675
 x64_sys_call+0x135e/0x3ce0 arch/x86/include/generated/asm/syscalls_64.h:47
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xd9/0x1e0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Local variable req.i created at:
 inet_diag_get_exact_compat net/ipv4/inet_diag.c:1396 [inline]
 inet_diag_rcv_msg_compat+0x2a6/0x530 net/ipv4/inet_diag.c:1426
 sock_diag_rcv_msg+0x23d/0x740 net/core/sock_diag.c:282

CPU: 1 PID: 8888 Comm: syz-executor.6 Not tainted 6.10.0-rc4-00217-g35bb670d65fc #32
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014

Fixes: 432490f9d455 ("net: ip, diag -- Add diag interface for raw sockets")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240703091649.111773-1-syoshida@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agonet: dsa: microchip: lan937x: Add error handling in lan937x_setup
Oleksij Rempel [Wed, 3 Jul 2024 08:38:20 +0000 (10:38 +0200)]
net: dsa: microchip: lan937x: Add error handling in lan937x_setup

Introduce error handling for lan937x_cfg function calls in lan937x_setup.
This change ensures that if any lan937x_cfg or ksz_rmw32 calls fails, the
function will return the appropriate error code.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Link: https://patch.msgid.link/20240703083820.3152100-1-o.rempel@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agol2tp: Remove duplicate included header file trace.h
Thorsten Blum [Wed, 3 Jul 2024 06:11:48 +0000 (08:11 +0200)]
l2tp: Remove duplicate included header file trace.h

Remove duplicate included header file trace.h and the following warning
reported by make includecheck:

  trace.h is included more than once

Compile-tested only.

Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Link: https://patch.msgid.link/20240703061147.691973-2-thorsten.blum@toblux.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agotcp: Don't flag tcp_sk(sk)->rx_opt.saw_unknown for TCP AO.
Kuniyuki Iwashima [Wed, 3 Jul 2024 03:35:08 +0000 (20:35 -0700)]
tcp: Don't flag tcp_sk(sk)->rx_opt.saw_unknown for TCP AO.

When we process segments with TCP AO, we don't check it in
tcp_parse_options().  Thus, opt_rx->saw_unknown is set to 1,
which unconditionally triggers the BPF TCP option parser.

Let's avoid the unnecessary BPF invocation.

Fixes: 0a3a809089eb ("net/tcp: Verify inbound TCP-AO signed segments")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20240703033508.6321-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoMerge branch 'fix-oom-and-order-check-in-msg_zerocopy-selftest'
Jakub Kicinski [Thu, 4 Jul 2024 02:42:33 +0000 (19:42 -0700)]
Merge branch 'fix-oom-and-order-check-in-msg_zerocopy-selftest'

Zijian Zhang says:

====================
fix OOM and order check in msg_zerocopy selftest

In selftests/net/msg_zerocopy.c, it has a while loop keeps calling sendmsg
on a socket with MSG_ZEROCOPY flag, and it will recv the notifications
until the socket is not writable. Typically, it will start the receiving
process after around 30+ sendmsgs. However, as the introduction of commit
dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale"), the sender is
always writable and does not get any chance to run recv notifications.
The selftest always exits with OUT_OF_MEMORY because the memory used by
opt_skb exceeds the net.core.optmem_max. Meanwhile, it could be set to a
different value to trigger OOM on older kernels too.

Thus, we introduce "cfg_notification_limit" to force sender to receive
notifications after some number of sendmsgs.

And, we find that when lock debugging is on, notifications may not come in
order. Thus, we have order checking outputs managed by cfg_verbose, to
avoid too many outputs in this case.
====================

Link: https://patch.msgid.link/20240701225349.3395580-1-zijianzhang@bytedance.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftests: make order checking verbose in msg_zerocopy selftest
Zijian Zhang [Mon, 1 Jul 2024 22:53:49 +0000 (22:53 +0000)]
selftests: make order checking verbose in msg_zerocopy selftest

We find that when lock debugging is on, notifications may not come in
order. Thus, we have order checking outputs managed by cfg_verbose, to
avoid too many outputs in this case.

Fixes: 07b65c5b31ce ("test: add msg_zerocopy test")
Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
Signed-off-by: Xiaochun Lu <xiaochun.lu@bytedance.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240701225349.3395580-3-zijianzhang@bytedance.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftests: fix OOM in msg_zerocopy selftest
Zijian Zhang [Mon, 1 Jul 2024 22:53:48 +0000 (22:53 +0000)]
selftests: fix OOM in msg_zerocopy selftest

In selftests/net/msg_zerocopy.c, it has a while loop keeps calling sendmsg
on a socket with MSG_ZEROCOPY flag, and it will recv the notifications
until the socket is not writable. Typically, it will start the receiving
process after around 30+ sendmsgs. However, as the introduction of commit
dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale"), the sender is
always writable and does not get any chance to run recv notifications.
The selftest always exits with OUT_OF_MEMORY because the memory used by
opt_skb exceeds the net.core.optmem_max. Meanwhile, it could be set to a
different value to trigger OOM on older kernels too.

Thus, we introduce "cfg_notification_limit" to force sender to receive
notifications after some number of sendmsgs.

Fixes: 07b65c5b31ce ("test: add msg_zerocopy test")
Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
Signed-off-by: Xiaochun Lu <xiaochun.lu@bytedance.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240701225349.3395580-2-zijianzhang@bytedance.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch 'intel-wired-lan-driver-updates-2024-06-25-ice'
Jakub Kicinski [Thu, 4 Jul 2024 02:36:53 +0000 (19:36 -0700)]
Merge branch 'intel-wired-lan-driver-updates-2024-06-25-ice'

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2024-06-25 (ice)

This series contains updates to ice driver only.

Milena adds disabling of extts events when PTP is disabled.

Jake prevents possible NULL pointer by checking that timestamps are
ready before processing extts events and adds checks for unsupported
PTP pin configuration.

Petr Oros replaces _test_bit() with the correct test_bit() macro.
v1: https://lore.kernel.org/netdev/20240625170248.199162-1-anthony.l.nguyen@intel.com/
====================

Link: https://patch.msgid.link/20240702171459.2606611-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoice: use proper macro for testing bit
Petr Oros [Tue, 2 Jul 2024 17:14:57 +0000 (10:14 -0700)]
ice: use proper macro for testing bit

Do not use _test_bit() macro for testing bit. The proper macro for this
is one without underline.

_test_bit() is what test_bit() was prior to const-optimization. It
directly calls arch_test_bit(), i.e. the arch-specific implementation
(or the generic one). It's strictly _internal_ and shouldn't be used
anywhere outside the actual test_bit() macro.

test_bit() is a wrapper which checks whether the bitmap and the bit
number are compile-time constants and if so, it calls the optimized
function which evaluates this call to a compile-time constant as well.
If either of them is not a compile-time constant, it just calls _test_bit().
test_bit() is the actual function to use anywhere in the kernel.

IOW, calling _test_bit() avoids potential compile-time optimizations.

The sensors is not a compile-time constant, thus most probably there
are no object code changes before and after the patch.
But anyway, we shouldn't call internal wrappers instead of
the actual API.

Fixes: 4da71a77fc3b ("ice: read internal temperature sensor")
Acked-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20240702171459.2606611-5-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoice: Reject pin requests with unsupported flags
Jacob Keller [Tue, 2 Jul 2024 17:14:56 +0000 (10:14 -0700)]
ice: Reject pin requests with unsupported flags

The driver receives requests for configuring pins via the .enable
callback of the PTP clock object. These requests come into the driver
with flags which modify the requested behavior from userspace. Current
implementation in ice does not reject flags that it doesn't support.
This causes the driver to incorrectly apply requests with such flags as
PTP_PEROUT_DUTY_CYCLE, or any future flags added by the kernel which it
is not yet aware of.

Fix this by properly validating flags in both ice_ptp_cfg_perout and
ice_ptp_cfg_extts. Ensure that we check by bit-wise negating supported
flags rather than just checking and rejecting known un-supported flags.
This is preferable, as it ensures better compatibility with future
kernels.

Fixes: 172db5f91d5f ("ice: add support for auxiliary input/output pins")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20240702171459.2606611-4-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoice: Don't process extts if PTP is disabled
Jacob Keller [Tue, 2 Jul 2024 17:14:55 +0000 (10:14 -0700)]
ice: Don't process extts if PTP is disabled

The ice_ptp_extts_event() function can race with ice_ptp_release() and
result in a NULL pointer dereference which leads to a kernel panic.

Panic occurs because the ice_ptp_extts_event() function calls
ptp_clock_event() with a NULL pointer. The ice driver has already
released the PTP clock by the time the interrupt for the next external
timestamp event occurs.

To fix this, modify the ice_ptp_extts_event() function to check the
PTP state and bail early if PTP is not ready.

Fixes: 172db5f91d5f ("ice: add support for auxiliary input/output pins")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20240702171459.2606611-3-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoice: Fix improper extts handling
Milena Olech [Tue, 2 Jul 2024 17:14:54 +0000 (10:14 -0700)]
ice: Fix improper extts handling

Extts events are disabled and enabled by the application ts2phc.
However, in case where the driver is removed when the application is
running, a specific extts event remains enabled and can cause a kernel
crash.
As a side effect, when the driver is reloaded and application is started
again, remaining extts event for the channel from a previous run will
keep firing and the message "extts on unexpected channel" might be
printed to the user.

To avoid that, extts events shall be disabled when PTP is released.

Fixes: 172db5f91d5f ("ice: add support for auxiliary input/output pins")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Co-developed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Milena Olech <milena.olech@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20240702171459.2606611-2-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftest: af_unix: Add test case for backtrack after finalising SCC.
Kuniyuki Iwashima [Tue, 2 Jul 2024 16:04:28 +0000 (01:04 +0900)]
selftest: af_unix: Add test case for backtrack after finalising SCC.

syzkaller reported a KMSAN splat in __unix_walk_scc() while backtracking
edge_stack after finalising SCC.

Let's add a test case exercising the path.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Link: https://patch.msgid.link/20240702160428.10153-2-syoshida@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoaf_unix: Fix uninit-value in __unix_walk_scc()
Shigeru Yoshida [Tue, 2 Jul 2024 16:04:27 +0000 (01:04 +0900)]
af_unix: Fix uninit-value in __unix_walk_scc()

KMSAN reported uninit-value access in __unix_walk_scc() [1].

In the list_for_each_entry_reverse() loop, when the vertex's index
equals it's scc_index, the loop uses the variable vertex as a
temporary variable that points to a vertex in scc. And when the loop
is finished, the variable vertex points to the list head, in this case
scc, which is a local variable on the stack (more precisely, it's not
even scc and might underflow the call stack of __unix_walk_scc():
container_of(&scc, struct unix_vertex, scc_entry)).

However, the variable vertex is used under the label prev_vertex. So
if the edge_stack is not empty and the function jumps to the
prev_vertex label, the function will access invalid data on the
stack. This causes the uninit-value access issue.

Fix this by introducing a new temporary variable for the loop.

[1]
BUG: KMSAN: uninit-value in __unix_walk_scc net/unix/garbage.c:478 [inline]
BUG: KMSAN: uninit-value in unix_walk_scc net/unix/garbage.c:526 [inline]
BUG: KMSAN: uninit-value in __unix_gc+0x2589/0x3c20 net/unix/garbage.c:584
 __unix_walk_scc net/unix/garbage.c:478 [inline]
 unix_walk_scc net/unix/garbage.c:526 [inline]
 __unix_gc+0x2589/0x3c20 net/unix/garbage.c:584
 process_one_work kernel/workqueue.c:3231 [inline]
 process_scheduled_works+0xade/0x1bf0 kernel/workqueue.c:3312
 worker_thread+0xeb6/0x15b0 kernel/workqueue.c:3393
 kthread+0x3c4/0x530 kernel/kthread.c:389
 ret_from_fork+0x6e/0x90 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

Uninit was stored to memory at:
 unix_walk_scc net/unix/garbage.c:526 [inline]
 __unix_gc+0x2adf/0x3c20 net/unix/garbage.c:584
 process_one_work kernel/workqueue.c:3231 [inline]
 process_scheduled_works+0xade/0x1bf0 kernel/workqueue.c:3312
 worker_thread+0xeb6/0x15b0 kernel/workqueue.c:3393
 kthread+0x3c4/0x530 kernel/kthread.c:389
 ret_from_fork+0x6e/0x90 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

Local variable entries created at:
 ref_tracker_free+0x48/0xf30 lib/ref_tracker.c:222
 netdev_tracker_free include/linux/netdevice.h:4058 [inline]
 netdev_put include/linux/netdevice.h:4075 [inline]
 dev_put include/linux/netdevice.h:4101 [inline]
 update_gid_event_work_handler+0xaa/0x1b0 drivers/infiniband/core/roce_gid_mgmt.c:813

CPU: 1 PID: 12763 Comm: kworker/u8:31 Not tainted 6.10.0-rc4-00217-g35bb670d65fc #32
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
Workqueue: events_unbound __unix_gc

Fixes: 3484f063172d ("af_unix: Detect Strongly Connected Components.")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20240702160428.10153-1-syoshida@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agobonding: Fix out-of-bounds read in bond_option_arp_ip_targets_set()
Sam Sun [Tue, 2 Jul 2024 13:55:55 +0000 (14:55 +0100)]
bonding: Fix out-of-bounds read in bond_option_arp_ip_targets_set()

In function bond_option_arp_ip_targets_set(), if newval->string is an
empty string, newval->string+1 will point to the byte after the
string, causing an out-of-bound read.

BUG: KASAN: slab-out-of-bounds in strlen+0x7d/0xa0 lib/string.c:418
Read of size 1 at addr ffff8881119c4781 by task syz-executor665/8107
CPU: 1 PID: 8107 Comm: syz-executor665 Not tainted 6.7.0-rc7 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
 print_address_description mm/kasan/report.c:364 [inline]
 print_report+0xc1/0x5e0 mm/kasan/report.c:475
 kasan_report+0xbe/0xf0 mm/kasan/report.c:588
 strlen+0x7d/0xa0 lib/string.c:418
 __fortify_strlen include/linux/fortify-string.h:210 [inline]
 in4_pton+0xa3/0x3f0 net/core/utils.c:130
 bond_option_arp_ip_targets_set+0xc2/0x910
drivers/net/bonding/bond_options.c:1201
 __bond_opt_set+0x2a4/0x1030 drivers/net/bonding/bond_options.c:767
 __bond_opt_set_notify+0x48/0x150 drivers/net/bonding/bond_options.c:792
 bond_opt_tryset_rtnl+0xda/0x160 drivers/net/bonding/bond_options.c:817
 bonding_sysfs_store_option+0xa1/0x120 drivers/net/bonding/bond_sysfs.c:156
 dev_attr_store+0x54/0x80 drivers/base/core.c:2366
 sysfs_kf_write+0x114/0x170 fs/sysfs/file.c:136
 kernfs_fop_write_iter+0x337/0x500 fs/kernfs/file.c:334
 call_write_iter include/linux/fs.h:2020 [inline]
 new_sync_write fs/read_write.c:491 [inline]
 vfs_write+0x96a/0xd80 fs/read_write.c:584
 ksys_write+0x122/0x250 fs/read_write.c:637
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x63/0x6b
---[ end trace ]---

Fix it by adding a check of string length before using it.

Fixes: f9de11a16594 ("bonding: add ip checks when store ip target")
Signed-off-by: Yue Sun <samsun1006219@gmail.com>
Signed-off-by: Simon Horman <horms@kernel.org>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20240702-bond-oob-v6-1-2dfdba195c19@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch 'selftests-openvswitch-address-some-flakes-in-the-ci-environment'
Jakub Kicinski [Thu, 4 Jul 2024 02:29:17 +0000 (19:29 -0700)]
Merge branch 'selftests-openvswitch-address-some-flakes-in-the-ci-environment'

Aaron Conole says:

====================
selftests: openvswitch: Address some flakes in the CI environment

These patches aim to make using the openvswitch testsuite more reliable.
These should address the major sources of flakiness in the openvswitch
test suite allowing the CI infrastructure to exercise the openvswitch
module for patch series.  There should be no change for users who simply
run the tests (except that patch 3/3 does make some of the debugging a bit
easier by making some output more verbose).
====================

Link: https://patch.msgid.link/20240702132830.213384-1-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftests: openvswitch: Be more verbose with selftest debugging.
Aaron Conole [Tue, 2 Jul 2024 13:28:30 +0000 (09:28 -0400)]
selftests: openvswitch: Be more verbose with selftest debugging.

The openvswitch selftest is difficult to debug for anyone that isn't
directly familiar with the openvswitch module and the specifics of the
test cases.  Many times when something fails, the debug log will be
sparsely populated and it takes some time to understand where a failure
occured.

Increase the amount of details logged to the debug log by trapping all
'info' logs, and all 'ovs_sbx' commands.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240702132830.213384-4-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftests: openvswitch: Attempt to autoload module.
Aaron Conole [Tue, 2 Jul 2024 13:28:29 +0000 (09:28 -0400)]
selftests: openvswitch: Attempt to autoload module.

Previously, the openvswitch.sh test suites would not attempt to autoload
the openvswitch module.  The idea was that a user who is manually running
tests might not even have the OVS module loaded or configured for their
own development.  However, if the kernel module is configured, and the
module can be autoloaded then we should just attempt to load it and run
the tests.  This is especially true in the CI environments, where the CI
tests should be able to rely on auto loading to get the test suite running.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240702132830.213384-3-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftests: openvswitch: Bump timeout to 15 minutes.
Aaron Conole [Tue, 2 Jul 2024 13:28:28 +0000 (09:28 -0400)]
selftests: openvswitch: Bump timeout to 15 minutes.

We found that since some tests rely on the TCP SYN timeouts to cause flow
misses, the default test suite timeout of 45 seconds is quick to be
exceeded.  Bump the timeout to 15 minutes.

Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240702132830.213384-2-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: ethtool: fix compat with old RSS context API
Jakub Kicinski [Tue, 2 Jul 2024 16:41:57 +0000 (09:41 -0700)]
net: ethtool: fix compat with old RSS context API

Device driver gets access to rxfh_dev, while rxfh is just a local
copy of user space params. We need to check what RSS context ID
driver assigned in rxfh_dev, not rxfh.

Using rxfh leads to trying to store all contexts at index 0xffffffff.
From the user perspective it leads to "driver chose duplicate ID"
warnings when second context is added and inability to access any
contexts even tho they were successfully created - xa_load() for
the actual context ID will return NULL, and syscall will return -ENOENT.

Looks like a rebasing mistake, since rxfh_dev was added relatively
recently by commit fb6e30a72539 ("net: ethtool: pass a pointer to
parameters to get/set_rxfh ethtool ops").

Fixes: eac9122f0c41 ("net: ethtool: record custom RSS contexts in the XArray")
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20240702164157.4018425-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: rswitch: Avoid use-after-free in rswitch_poll()
Radu Rendec [Tue, 2 Jul 2024 21:08:37 +0000 (17:08 -0400)]
net: rswitch: Avoid use-after-free in rswitch_poll()

The use-after-free is actually in rswitch_tx_free(), which is inlined in
rswitch_poll(). Since `skb` and `gq->skbs[gq->dirty]` are in fact the
same pointer, the skb is first freed using dev_kfree_skb_any(), then the
value in skb->len is used to update the interface statistics.

Let's move around the instructions to use skb->len before the skb is
freed.

This bug is trivial to reproduce using KFENCE. It will trigger a splat
every few packets. A simple ARP request or ICMP echo request is enough.

Fixes: 271e015b9153 ("net: rswitch: Add unmap_addrs instead of dma address in each desc")
Signed-off-by: Radu Rendec <rrendec@redhat.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Link: https://patch.msgid.link/20240702210838.2703228-1-rrendec@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftests: drv-net: rss_ctx: allow more noise on default context
Jakub Kicinski [Tue, 2 Jul 2024 23:37:28 +0000 (16:37 -0700)]
selftests: drv-net: rss_ctx: allow more noise on default context

As predicted by David running the test on a machine with a single
interface is a bit unreliable. We try to send 20k packets with
iperf and expect fewer than 10k packets on the default context.
The test isn't very quick, iperf will usually send 100k packets
by the time we stop it. So we're off by 5x on the number of iperf
packets but still expect default context to only get the hardcoded
10k. The intent is to make sure we get noticeably less traffic
on the default context. Use half of the resulting iperf traffic
instead of the hard coded 10k.

Link: https://patch.msgid.link/20240702233728.4183387-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agotools: ynl: use ident name for Family, too.
Paolo Abeni [Tue, 2 Jul 2024 14:29:31 +0000 (16:29 +0200)]
tools: ynl: use ident name for Family, too.

This allow consistent naming convention between Family and others
element's name.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/9bbcab3094970b371bd47aa18481ae6ca5a93687.1719930479.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonetfilter: nf_tables: unconditionally flush pending work before notifier
Florian Westphal [Tue, 2 Jul 2024 14:08:14 +0000 (16:08 +0200)]
netfilter: nf_tables: unconditionally flush pending work before notifier

syzbot reports:

KASAN: slab-uaf in nft_ctx_update include/net/netfilter/nf_tables.h:1831
KASAN: slab-uaf in nft_commit_release net/netfilter/nf_tables_api.c:9530
KASAN: slab-uaf int nf_tables_trans_destroy_work+0x152b/0x1750 net/netfilter/nf_tables_api.c:9597
Read of size 2 at addr ffff88802b0051c4 by task kworker/1:1/45
[..]
Workqueue: events nf_tables_trans_destroy_work
Call Trace:
 nft_ctx_update include/net/netfilter/nf_tables.h:1831 [inline]
 nft_commit_release net/netfilter/nf_tables_api.c:9530 [inline]
 nf_tables_trans_destroy_work+0x152b/0x1750 net/netfilter/nf_tables_api.c:9597

Problem is that the notifier does a conditional flush, but its possible
that the table-to-be-removed is still referenced by transactions being
processed by the worker, so we need to flush unconditionally.

We could make the flush_work depend on whether we found a table to delete
in nf-next to avoid the flush for most cases.

AFAICS this problem is only exposed in nf-next, with
commit e169285f8c56 ("netfilter: nf_tables: do not store nft_ctx in transaction objects"),
with this commit applied there is an unconditional fetch of
table->family which is whats triggering the above splat.

Fixes: 2c9f0293280e ("netfilter: nf_tables: flush pending destroy work before netlink notifier")
Reported-and-tested-by: syzbot+4fd66a69358fc15ae2ad@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=4fd66a69358fc15ae2ad
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2 months agoMerge tag 'trace-v6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace...
Linus Torvalds [Wed, 3 Jul 2024 21:54:35 +0000 (14:54 -0700)]
Merge tag 'trace-v6.10-rc6' of git://git./linux/kernel/git/trace/linux-trace

Pull tracing fix from Steven Rostedt:
 "Fix ioctl conflict with memmapped ring buffer ioctl

  It was reported that the ioctl() number used to update the ring buffer
  memory mapping conflicted with the TCGETS ioctl causing strace to
  report:

    $ strace -e ioctl stty
    ioctl(0, TCGETS or TRACE_MMAP_IOCTL_GET_READER, {c_iflag=ICRNL|IXON, c_oflag=NL0|CR0|TAB0|BS0|VT0|FF0|OPOST|ONLCR, c_cflag=B38400|CS8|CREAD, c_lflag=ISIG|ICANON|ECHO|ECHOE|ECHOK|IEXTEN|ECHOCTL|ECHOKE, ...}) = 0

  Since this ioctl hasn't been in a full release yet, change it from
  "T", 0x1 to "R" 0x20, and also reserve 0x20-0x2F for future ioctl
  commands, as some more are being worked on for the future"

* tag 'trace-v6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Have memmapped ring buffer use ioctl of "R" range 0x20-2F

2 months agotracing: Have memmapped ring buffer use ioctl of "R" range 0x20-2F
Steven Rostedt (Google) [Tue, 2 Jul 2024 19:33:54 +0000 (15:33 -0400)]
tracing: Have memmapped ring buffer use ioctl of "R" range 0x20-2F

To prevent conflicts with other ioctl numbers to allow strace to have an
idea of what is happening, add the range of ioctls for the trace buffer
mapping from _IO("T", 0x1) to the range of "R" 0x20 - 0x2F.

Link: https://lore.kernel.org/linux-trace-kernel/20240630105322.GA17573@altlinux.org/
Link: https://lore.kernel.org/linux-trace-kernel/20240630213626.GA23566@altlinux.org/
Cc: Jonathan Corbet <corbet@lwn.net>
Fixes: cf9f0f7c4c5bb ("tracing: Allow user-space mapping of the ring-buffer")
Link: https://lore.kernel.org/20240702153354.367861db@rorschach.local.home
Reported-by: "Dmitry V. Levin" <ldv@strace.io>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2 months agonilfs2: fix incorrect inode allocation from reserved inodes
Ryusuke Konishi [Sun, 23 Jun 2024 05:11:35 +0000 (14:11 +0900)]
nilfs2: fix incorrect inode allocation from reserved inodes

If the bitmap block that manages the inode allocation status is corrupted,
nilfs_ifile_create_inode() may allocate a new inode from the reserved
inode area where it should not be allocated.

Previous fix commit d325dc6eb763 ("nilfs2: fix use-after-free bug of
struct nilfs_root"), fixed the problem that reserved inodes with inode
numbers less than NILFS_USER_INO (=11) were incorrectly reallocated due to
bitmap corruption, but since the start number of non-reserved inodes is
read from the super block and may change, in which case inode allocation
may occur from the extended reserved inode area.

If that happens, access to that inode will cause an IO error, causing the
file system to degrade to an error state.

Fix this potential issue by adding a wraparound option to the common
metadata object allocation routine and by modifying
nilfs_ifile_create_inode() to disable the option so that it only allocates
inodes with inode numbers greater than or equal to the inode number read
in "nilfs->ns_first_ino", regardless of the bitmap status of reserved
inodes.

Link: https://lkml.kernel.org/r/20240623051135.4180-4-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agonilfs2: add missing check for inode numbers on directory entries
Ryusuke Konishi [Sun, 23 Jun 2024 05:11:34 +0000 (14:11 +0900)]
nilfs2: add missing check for inode numbers on directory entries

Syzbot reported that mounting and unmounting a specific pattern of
corrupted nilfs2 filesystem images causes a use-after-free of metadata
file inodes, which triggers a kernel bug in lru_add_fn().

As Jan Kara pointed out, this is because the link count of a metadata file
gets corrupted to 0, and nilfs_evict_inode(), which is called from iput(),
tries to delete that inode (ifile inode in this case).

The inconsistency occurs because directories containing the inode numbers
of these metadata files that should not be visible in the namespace are
read without checking.

Fix this issue by treating the inode numbers of these internal files as
errors in the sanity check helper when reading directory folios/pages.

Also thanks to Hillf Danton and Matthew Wilcox for their initial mm-layer
analysis.

Link: https://lkml.kernel.org/r/20240623051135.4180-3-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+d79afb004be235636ee8@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d79afb004be235636ee8
Reported-by: Jan Kara <jack@suse.cz>
Closes: https://lkml.kernel.org/r/20240617075758.wewhukbrjod5fp5o@quack3
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agonilfs2: fix inode number range checks
Ryusuke Konishi [Sun, 23 Jun 2024 05:11:33 +0000 (14:11 +0900)]
nilfs2: fix inode number range checks

Patch series "nilfs2: fix potential issues related to reserved inodes".

This series fixes one use-after-free issue reported by syzbot, caused by
nilfs2's internal inode being exposed in the namespace on a corrupted
filesystem, and a couple of flaws that cause problems if the starting
number of non-reserved inodes written in the on-disk super block is
intentionally (or corruptly) changed from its default value.

This patch (of 3):

In the current implementation of nilfs2, "nilfs->ns_first_ino", which
gives the first non-reserved inode number, is read from the superblock,
but its lower limit is not checked.

As a result, if a number that overlaps with the inode number range of
reserved inodes such as the root directory or metadata files is set in the
super block parameter, the inode number test macros (NILFS_MDT_INODE and
NILFS_VALID_INODE) will not function properly.

In addition, these test macros use left bit-shift calculations using with
the inode number as the shift count via the BIT macro, but the result of a
shift calculation that exceeds the bit width of an integer is undefined in
the C specification, so if "ns_first_ino" is set to a large value other
than the default value NILFS_USER_INO (=11), the macros may potentially
malfunction depending on the environment.

Fix these issues by checking the lower bound of "nilfs->ns_first_ino" and
by preventing bit shifts equal to or greater than the NILFS_USER_INO
constant in the inode number test macros.

Also, change the type of "ns_first_ino" from signed integer to unsigned
integer to avoid the need for type casting in comparisons such as the
lower bound check introduced this time.

Link: https://lkml.kernel.org/r/20240623051135.4180-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20240623051135.4180-2-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm: avoid overflows in dirty throttling logic
Jan Kara [Fri, 21 Jun 2024 14:42:38 +0000 (16:42 +0200)]
mm: avoid overflows in dirty throttling logic

The dirty throttling logic is interspersed with assumptions that dirty
limits in PAGE_SIZE units fit into 32-bit (so that various multiplications
fit into 64-bits).  If limits end up being larger, we will hit overflows,
possible divisions by 0 etc.  Fix these problems by never allowing so
large dirty limits as they have dubious practical value anyway.  For
dirty_bytes / dirty_background_bytes interfaces we can just refuse to set
so large limits.  For dirty_ratio / dirty_background_ratio it isn't so
simple as the dirty limit is computed from the amount of available memory
which can change due to memory hotplug etc.  So when converting dirty
limits from ratios to numbers of pages, we just don't allow the result to
exceed UINT_MAX.

This is root-only triggerable problem which occurs when the operator
sets dirty limits to >16 TB.

Link: https://lkml.kernel.org/r/20240621144246.11148-2-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Zach O'Keefe <zokeefe@google.com>
Reviewed-By: Zach O'Keefe <zokeefe@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agoRevert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again"
Jan Kara [Fri, 21 Jun 2024 14:42:37 +0000 (16:42 +0200)]
Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again"

Patch series "mm: Avoid possible overflows in dirty throttling".

Dirty throttling logic assumes dirty limits in page units fit into
32-bits.  This patch series makes sure this is true (see patch 2/2 for
more details).

This patch (of 2):

This reverts commit 9319b647902cbd5cc884ac08a8a6d54ce111fc78.

The commit is broken in several ways.  Firstly, the removed (u64) cast
from the multiplication will introduce a multiplication overflow on 32-bit
archs if wb_thresh * bg_thresh >= 1<<32 (which is actually common - the
default settings with 4GB of RAM will trigger this).  Secondly, the
div64_u64() is unnecessarily expensive on 32-bit archs.  We have
div64_ul() in case we want to be safe & cheap.  Thirdly, if dirty
thresholds are larger than 1<<32 pages, then dirty balancing is going to
blow up in many other spectacular ways anyway so trying to fix one
possible overflow is just moot.

Link: https://lkml.kernel.org/r/20240621144017.30993-1-jack@suse.cz
Link: https://lkml.kernel.org/r/20240621144246.11148-1-jack@suse.cz
Fixes: 9319b647902c ("mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-By: Zach O'Keefe <zokeefe@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agomm: optimize the redundant loop of mm_update_owner_next()
Jinliang Zheng [Thu, 20 Jun 2024 12:21:24 +0000 (20:21 +0800)]
mm: optimize the redundant loop of mm_update_owner_next()

When mm_update_owner_next() is racing with swapoff (try_to_unuse()) or
/proc or ptrace or page migration (get_task_mm()), it is impossible to
find an appropriate task_struct in the loop whose mm_struct is the same as
the target mm_struct.

If the above race condition is combined with the stress-ng-zombie and
stress-ng-dup tests, such a long loop can easily cause a Hard Lockup in
write_lock_irq() for tasklist_lock.

Recognize this situation in advance and exit early.

Link: https://lkml.kernel.org/r/20240620122123.3877432-1-alexjlzheng@tencent.com
Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tycho Andersen <tandersen@netflix.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2 months agoMerge tag 'io_uring-6.10-20240703' of git://git.kernel.dk/linux
Linus Torvalds [Wed, 3 Jul 2024 17:16:54 +0000 (10:16 -0700)]
Merge tag 'io_uring-6.10-20240703' of git://git.kernel.dk/linux

Pull io_uring fix from Jens Axboe:
 "A fix for a feature that went into the 6.10 merge window actually
  ended up causing a regression in building bundles for receives.

  Fix that up by ensuring we don't overwrite msg_inq before we use
  it in the loop"

* tag 'io_uring-6.10-20240703' of git://git.kernel.dk/linux:
  io_uring/net: don't clear msg_inq before io_recv_buf_select() needs it

2 months agoMerge tag 'media/v6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Linus Torvalds [Wed, 3 Jul 2024 17:10:45 +0000 (10:10 -0700)]
Merge tag 'media/v6.10-3' of git://git./linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:
 "Some fixes related to the IPU6 driver"

* tag 'media/v6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: ivsc: Depend on IPU_BRIDGE or not IPU_BRIDGE
  media: intel/ipu6: Fix a null pointer dereference in ipu6_isys_query_stream_by_source
  media: ipu6: Use the ISYS auxdev device as the V4L2 device's device

2 months agos390/dasd: Fix invalid dereferencing of indirect CCW data pointer
Stefan Haberland [Wed, 3 Jul 2024 09:23:12 +0000 (11:23 +0200)]
s390/dasd: Fix invalid dereferencing of indirect CCW data pointer

Fix invalid dereferencing of indirect CCW data pointer in
dasd_eckd_dump_sense() that leads to a kernel panic in error cases.

When using indirect addressing for DASD CCWs (IDAW) the CCW CDA pointer
does not contain the data address itself but a pointer to the IDAL.
This needs to be translated from physical to virtual as well before
using it.

This dereferencing is also used for dasd_page_cache and also fixed
although it is very unlikely that this code path ever gets used.

Fixes: c0bd39601c13 ("s390/dasd: use new address translation helpers")
Cc: stable@vger.kernel.org
Signed-off-by: Stefan Haberland <sth@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2 months agowifi: iwlwifi: mvm: check vif for NULL/ERR_PTR before dereference
Miri Korenblit [Wed, 3 Jul 2024 03:43:15 +0000 (06:43 +0300)]
wifi: iwlwifi: mvm: check vif for NULL/ERR_PTR before dereference

iwl_mvm_get_bss_vif might return a NULL or ERR_PTR. Some of the callers
check only the NULL case, and some doesn't check at all.

Some of the callers even have a pointer to the mvmvif of the bss vif,
so we don't even need to call this function, and can simply get the vif
from mvmvif. Do it for those cases, and for the others - properly check
if IS_ERR_OR_NULL

Fixes: ec0d43d26f2c ("wifi: iwlwifi: mvm: Activate EMLSR based on traffic volume")
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20240703064027.a661f8c65aac.I45cf09b01af8ee3d55828863958ead741ea43b7f@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2 months agowifi: iwlwifi: mvm: avoid link lookup in statistics
Johannes Berg [Wed, 3 Jul 2024 03:43:14 +0000 (06:43 +0300)]
wifi: iwlwifi: mvm: avoid link lookup in statistics

We already iterate the link bss_conf/link_info and have the
pointer, or know that deflink/bss_conf is used, so avoid an
extra lookup and just pass the pointer. This may also avoid
a crash when this is processed during restart, where the FW
to link conf array (link_id_to_link_conf) may be NULLed out.

Fixes: c1e458b987f2 ("wifi: iwlwifi: mvm: Move beacon filtering to be per link")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20240703064026.346a6ef67a86.Iba5d65d728ca9f58518c88d029496c1250670544@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2 months agowifi: iwlwifi: mvm: don't wake up rx_sync_waitq upon RFKILL
Emmanuel Grumbach [Wed, 3 Jul 2024 03:43:16 +0000 (06:43 +0300)]
wifi: iwlwifi: mvm: don't wake up rx_sync_waitq upon RFKILL

Since we now want to sync the queues even when we're in RFKILL, we
shouldn't wake up the wait queue since we still expect to get all the
notifications from the firmware.

Fixes: 4d08c0b3357c ("wifi: iwlwifi: mvm: handle BA session teardown in RF-kill")
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20240703064027.be7a9dbeacde.I5586cb3ca8d6e44f79d819a48a0c22351ff720c9@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2 months agowifi: iwlwifi: properly set WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK
Daniel Gabay [Wed, 3 Jul 2024 03:43:13 +0000 (06:43 +0300)]
wifi: iwlwifi: properly set WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK

The WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK should be set based on the
WOWLAN_KEK_KCK_MATERIAL command version. Currently, the command
version in the firmware has advanced to 4, which prevents the
flag from being set correctly, fix that.

Signed-off-by: Daniel Gabay <daniel.gabay@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20240703064026.a0f162108575.If1a9785727d2a1b0197a396680965df1b53d4096@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>