linux-2.6-block.git
4 months agoipv6: sr: fix incorrect unregister order
Hangbin Liu [Thu, 9 May 2024 13:18:11 +0000 (21:18 +0800)]
ipv6: sr: fix incorrect unregister order

Commit 5559cea2d5aa ("ipv6: sr: fix possible use-after-free and
null-ptr-deref") changed the register order in seg6_init(). But the
unregister order in seg6_exit() is not updated.

Fixes: 5559cea2d5aa ("ipv6: sr: fix possible use-after-free and null-ptr-deref")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240509131812.1662197-3-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoipv6: sr: add missing seg6_local_exit
Hangbin Liu [Thu, 9 May 2024 13:18:10 +0000 (21:18 +0800)]
ipv6: sr: add missing seg6_local_exit

Currently, we only call seg6_local_exit() in seg6_init() if
seg6_local_init() failed. But forgot to call it in seg6_exit().

Fixes: d1df6fd8a1d2 ("ipv6: sr: define core operations for seg6local lightweight tunnel")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240509131812.1662197-2-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: openvswitch: fix overwriting ct original tuple for ICMPv6
Ilya Maximets [Thu, 9 May 2024 09:38:05 +0000 (11:38 +0200)]
net: openvswitch: fix overwriting ct original tuple for ICMPv6

OVS_PACKET_CMD_EXECUTE has 3 main attributes:
 - OVS_PACKET_ATTR_KEY - Packet metadata in a netlink format.
 - OVS_PACKET_ATTR_PACKET - Binary packet content.
 - OVS_PACKET_ATTR_ACTIONS - Actions to execute on the packet.

OVS_PACKET_ATTR_KEY is parsed first to populate sw_flow_key structure
with the metadata like conntrack state, input port, recirculation id,
etc.  Then the packet itself gets parsed to populate the rest of the
keys from the packet headers.

Whenever the packet parsing code starts parsing the ICMPv6 header, it
first zeroes out fields in the key corresponding to Neighbor Discovery
information even if it is not an ND packet.

It is an 'ipv6.nd' field.  However, the 'ipv6' is a union that shares
the space between 'nd' and 'ct_orig' that holds the original tuple
conntrack metadata parsed from the OVS_PACKET_ATTR_KEY.

ND packets should not normally have conntrack state, so it's fine to
share the space, but normal ICMPv6 Echo packets or maybe other types of
ICMPv6 can have the state attached and it should not be overwritten.

The issue results in all but the last 4 bytes of the destination
address being wiped from the original conntrack tuple leading to
incorrect packet matching and potentially executing wrong actions
in case this packet recirculates within the datapath or goes back
to userspace.

ND fields should not be accessed in non-ND packets, so not clearing
them should be fine.  Executing memset() only for actual ND packets to
avoid the issue.

Initializing the whole thing before parsing is needed because ND packet
may not contain all the options.

The issue only affects the OVS_PACKET_CMD_EXECUTE path and doesn't
affect packets entering OVS datapath from network interfaces, because
in this case CT metadata is populated from skb after the packet is
already parsed.

Fixes: 9dd7f8907c37 ("openvswitch: Add original direction conntrack tuple to sw_flow_key.")
Reported-by: Antonin Bas <antonin.bas@broadcom.com>
Closes: https://github.com/openvswitch/ovs-issues/issues/327
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Link: https://lore.kernel.org/r/20240509094228.1035477-1-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoaf_unix: Fix data races in unix_release_sock/unix_stream_sendmsg
Breno Leitao [Thu, 9 May 2024 08:14:46 +0000 (01:14 -0700)]
af_unix: Fix data races in unix_release_sock/unix_stream_sendmsg

A data-race condition has been identified in af_unix. In one data path,
the write function unix_release_sock() atomically writes to
sk->sk_shutdown using WRITE_ONCE. However, on the reader side,
unix_stream_sendmsg() does not read it atomically. Consequently, this
issue is causing the following KCSAN splat to occur:

BUG: KCSAN: data-race in unix_release_sock / unix_stream_sendmsg

write (marked) to 0xffff88867256ddbb of 1 bytes by task 7270 on cpu 28:
unix_release_sock (net/unix/af_unix.c:640)
unix_release (net/unix/af_unix.c:1050)
sock_close (net/socket.c:659 net/socket.c:1421)
__fput (fs/file_table.c:422)
__fput_sync (fs/file_table.c:508)
__se_sys_close (fs/open.c:1559 fs/open.c:1541)
__x64_sys_close (fs/open.c:1541)
x64_sys_call (arch/x86/entry/syscall_64.c:33)
do_syscall_64 (arch/x86/entry/common.c:?)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

read to 0xffff88867256ddbb of 1 bytes by task 989 on cpu 14:
unix_stream_sendmsg (net/unix/af_unix.c:2273)
__sock_sendmsg (net/socket.c:730 net/socket.c:745)
____sys_sendmsg (net/socket.c:2584)
__sys_sendmmsg (net/socket.c:2638 net/socket.c:2724)
__x64_sys_sendmmsg (net/socket.c:2753 net/socket.c:2750 net/socket.c:2750)
x64_sys_call (arch/x86/entry/syscall_64.c:33)
do_syscall_64 (arch/x86/entry/common.c:?)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

value changed: 0x01 -> 0x03

The line numbers are related to commit dd5a440a31fa ("Linux 6.9-rc7").

Commit e1d09c2c2f57 ("af_unix: Fix data races around sk->sk_shutdown.")
addressed a comparable issue in the past regarding sk->sk_shutdown.
However, it overlooked resolving this particular data path.
This patch only offending unix_stream_sendmsg() function, since the
other reads seem to be protected by unix_state_lock() as discussed in
Link: https://lore.kernel.org/all/20240508173324.53565-1-kuniyu@amazon.com/
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240509081459.2807828-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: ethernet: cortina: Locking fixes
Linus Walleij [Thu, 9 May 2024 07:44:54 +0000 (09:44 +0200)]
net: ethernet: cortina: Locking fixes

This fixes a probably long standing problem in the Cortina
Gemini ethernet driver: there are some paths in the code
where the IRQ registers are written without taking the proper
locks.

Fixes: 4d5ae32f5e1e ("net: ethernet: Add a driver for Gemini gigabit ethernet")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240509-gemini-ethernet-locking-v1-1-afd00a528b95@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoice: Fix package download algorithm
Dan Nowlin [Wed, 8 May 2024 17:19:07 +0000 (10:19 -0700)]
ice: Fix package download algorithm

Previously, the driver assumed that all signature segments would contain
one or more buffers to download. In the future, there will be signature
segments that will contain no buffers to download.

Correct download flow to allow for signature segments that have zero
download buffers and skip the download in this case.

Fixes: 3cbdb0343022 ("ice: Add support for E830 DDP package segment")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20240508171908.2760776-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: ethernet: mediatek: use ADMAv1 instead of ADMAv2.0 on MT7981 and MT7986
Daniel Golle [Wed, 8 May 2024 10:43:56 +0000 (11:43 +0100)]
net: ethernet: mediatek: use ADMAv1 instead of ADMAv2.0 on MT7981 and MT7986

ADMAv2.0 is plagued by RX hangs which can't easily detected and happen upon
receival of a corrupted Ethernet frame.

Use ADMAv1 instead which is also still present and usable, and doesn't
suffer from that problem.

Fixes: 197c9e9b17b1 ("net: ethernet: mtk_eth_soc: introduce support for mt7986 chipset")
Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://lore.kernel.org/r/57cef74bbd0c243366ad1ff4221e3f72f437ec80.1715164770.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonet: ethernet: mediatek: split tx and rx fields in mtk_soc_data struct
Lorenzo Bianconi [Wed, 8 May 2024 10:43:34 +0000 (11:43 +0100)]
net: ethernet: mediatek: split tx and rx fields in mtk_soc_data struct

Split tx and rx fields in mtk_soc_data struct. This is a preliminary
patch to roll back to ADMAv1 for MT7986 and MT7981 SoC in order to fix a
hw hang if the device receives a corrupted packet when using ADMAv2.0.

Fixes: 197c9e9b17b1 ("net: ethernet: mtk_eth_soc: introduce support for mt7986 chipset")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/70a799b1f060ec2f57883e88ccb420ac0fb0abb5.1715164770.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoselftests: net: move amt to socat for better compatibility
Jakub Kicinski [Thu, 9 May 2024 16:19:52 +0000 (09:19 -0700)]
selftests: net: move amt to socat for better compatibility

The test seems to expect that nc will exit after the first
received message. This is not the case with Ncat 7.94.
There are multiple versions of nc out there, switch
to socat for better compatibility.

Tell socat to exit after 128 bytes and pad the message.

Since the test sets -e make sure we don't set exit code
(|| true) and print the pass / fail rather then silently
moving over the test and just setting non-zero exit code
with no output indicating what failed.

Fixes: c08e8baea78e ("selftests: add amt interface selftest script")
Acked-by: Paolo Abeni<pabeni@redhat.com>
Tested-by: Taehee Yoo <ap420073@gmail.com>
Link: https://lore.kernel.org/r/20240509161952.3940476-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoselftests: net: add missing config for amt.sh
Jakub Kicinski [Thu, 9 May 2024 16:19:19 +0000 (09:19 -0700)]
selftests: net: add missing config for amt.sh

Test needs IPv6 multicast. smcroute currently crashes when trying
to install a route in a kernel without IPv6 multicast.

Fixes: c08e8baea78e ("selftests: add amt interface selftest script")
Link: https://lore.kernel.org/r/20240509161919.3939966-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoeth: sungem: remove .ndo_poll_controller to avoid deadlocks
Jakub Kicinski [Wed, 8 May 2024 13:45:04 +0000 (06:45 -0700)]
eth: sungem: remove .ndo_poll_controller to avoid deadlocks

Erhard reports netpoll warnings from sungem:

  netpoll_send_skb_on_dev(): eth0 enabled interrupts in poll (gem_start_xmit+0x0/0x398)
  WARNING: CPU: 1 PID: 1 at net/core/netpoll.c:370 netpoll_send_skb+0x1fc/0x20c

gem_poll_controller() disables interrupts, which may sleep.
We can't sleep in netpoll, it has interrupts disabled completely.
Strangely, gem_poll_controller() doesn't even poll the completions,
and instead acts as if an interrupt has fired so it just schedules
NAPI and exits. None of this has been necessary for years, since
netpoll invokes NAPI directly.

Fixes: fe09bb619096 ("sungem: Spring cleaning and GRO support")
Reported-and-tested-by: Erhard Furtner <erhard_f@mailbox.org>
Link: https://lore.kernel.org/all/20240428125306.2c3080ef@legion
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240508134504.3560956-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoptp: ocp: fix DPLL functions
Vadim Fedorenko [Wed, 8 May 2024 13:21:11 +0000 (13:21 +0000)]
ptp: ocp: fix DPLL functions

In ptp_ocp driver pin actions assume sma_nr starts with 1, but for DPLL
subsystem callback 0-based index was used. Fix it providing proper index.

Fixes: 09eeb3aecc6c ("ptp_ocp: implement DPLL ops")
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://lore.kernel.org/r/20240508132111.11545-1-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge tag 'net-6.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 9 May 2024 15:48:57 +0000 (08:48 -0700)]
Merge tag 'net-6.9-rc8' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from bluetooth and IPsec.

  The bridge patch is actually a follow-up to a recent fix in the same
  area. We have a pending v6.8 AF_UNIX regression; it should be solved
  soon, but not in time for this PR.

  Current release - regressions:

   - eth: ks8851: Queue RX packets in IRQ handler instead of disabling
     BHs

   - net: bridge: fix corrupted ethernet header on multicast-to-unicast

  Current release - new code bugs:

   - xfrm: fix possible bad pointer derferencing in error path

  Previous releases - regressionis:

   - core: fix out-of-bounds access in ops_init

   - ipv6:
      - fix potential uninit-value access in __ip6_make_skb()
      - fib6_rules: avoid possible NULL dereference in fib6_rule_action()

   - tcp: use refcount_inc_not_zero() in tcp_twsk_unique().

   - rtnetlink: correct nested IFLA_VF_VLAN_LIST attribute validation

   - rxrpc: fix congestion control algorithm

   - bluetooth:
      - l2cap: fix slab-use-after-free in l2cap_connect()
      - msft: fix slab-use-after-free in msft_do_close()

   - eth: hns3: fix kernel crash when devlink reload during
     initialization

   - eth: dsa: mv88e6xxx: add phylink_get_caps for the mv88e6320/21
     family

  Previous releases - always broken:

   - xfrm: preserve vlan tags for transport mode software GRO

   - tcp: defer shutdown(SEND_SHUTDOWN) for TCP_SYN_RECV sockets

   - eth: hns3: keep using user config after hardware reset"

* tag 'net-6.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
  net: dsa: mv88e6xxx: read cmode on mv88e6320/21 serdes only ports
  net: dsa: mv88e6xxx: add phylink_get_caps for the mv88e6320/21 family
  net: hns3: fix kernel crash when devlink reload during initialization
  net: hns3: fix port vlan filter not disabled issue
  net: hns3: use appropriate barrier function after setting a bit value
  net: hns3: release PTP resources if pf initialization failed
  net: hns3: change type of numa_node_mask as nodemask_t
  net: hns3: direct return when receive a unknown mailbox message
  net: hns3: using user configure after hardware reset
  net/smc: fix neighbour and rtable leak in smc_ib_find_route()
  ipv6: prevent NULL dereference in ip6_output()
  hsr: Simplify code for announcing HSR nodes timer setup
  ipv6: fib6_rules: avoid possible NULL dereference in fib6_rule_action()
  dt-bindings: net: mediatek: remove wrongly added clocks and SerDes
  rxrpc: Only transmit one ACK per jumbo packet received
  rxrpc: Fix congestion control algorithm
  selftests: test_bridge_neigh_suppress.sh: Fix failures due to duplicate MAC
  ipv6: Fix potential uninit-value access in __ip6_make_skb()
  net: phy: marvell-88q2xxx: add support for Rev B1 and B2
  appletalk: Improve handling of broadcast packets
  ...

4 months agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux
Linus Torvalds [Thu, 9 May 2024 15:44:13 +0000 (08:44 -0700)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rmk/linux

Pull ARM fix from Russell King:

 - clear stale KASan stack poison when a CPU resumes

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
  ARM: 9381/1: kasan: clear stale stack poison

4 months agoMerge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Thu, 9 May 2024 15:39:10 +0000 (08:39 -0700)]
Merge tag 'pull-fixes' of git://git./linux/kernel/git/viro/vfs

Pull dentry leak fix from Al Viro:
 "Dentry leak fix in the qibfs driver that I forgot to send a pull
  request for ;-/

  My apologies - it actually sat in vfs.git#fixes for more than two
  months..."

* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  qibfs: fix dentry leak

4 months agonet: dsa: mv88e6xxx: read cmode on mv88e6320/21 serdes only ports
Steffen Bätz [Wed, 8 May 2024 07:29:44 +0000 (09:29 +0200)]
net: dsa: mv88e6xxx: read cmode on mv88e6320/21 serdes only ports

On the mv88e6320 and 6321 switch family, port 0/1 are serdes only ports.
Modified the mv88e6352_get_port4_serdes_cmode function to pass a port
number since the register set of the 6352 is equal on the 6320/21.

Signed-off-by: Steffen Bätz <steffen@innosonix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Link: https://lore.kernel.org/r/20240508072944.54880-3-steffen@innosonix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 months agonet: dsa: mv88e6xxx: add phylink_get_caps for the mv88e6320/21 family
Steffen Bätz [Wed, 8 May 2024 07:29:43 +0000 (09:29 +0200)]
net: dsa: mv88e6xxx: add phylink_get_caps for the mv88e6320/21 family

As of commit de5c9bf40c45 ("net: phylink: require supported_interfaces to
be filled")
Marvell 88e6320/21 switches fail to be probed:

...
mv88e6085 30be0000.ethernet-1:00: phylink: error: empty supported_interfaces
error creating PHYLINK: -22
...

The problem stems from the use of mv88e6185_phylink_get_caps() to get
the device capabilities.
Since there are serdes only ports 0/1 included, create a new dedicated
phylink_get_caps for the 6320 and 6321 to properly support their
set of capabilities.

Fixes: de5c9bf40c45 ("net: phylink: require supported_interfaces to be filled")
Signed-off-by: Steffen Bätz <steffen@innosonix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Fabio Estevam <festevam@gmail.com>
Link: https://lore.kernel.org/r/20240508072944.54880-2-steffen@innosonix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 months agoMerge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'
Paolo Abeni [Thu, 9 May 2024 08:47:34 +0000 (10:47 +0200)]
Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'

Jijie Shao says:

====================
There are some bugfix for the HNS3 ethernet driver
====================

Link: https://lore.kernel.org/r/20240507134224.2646246-1-shaojijie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 months agonet: hns3: fix kernel crash when devlink reload during initialization
Yonglong Liu [Tue, 7 May 2024 13:42:24 +0000 (21:42 +0800)]
net: hns3: fix kernel crash when devlink reload during initialization

The devlink reload process will access the hardware resources,
but the register operation is done before the hardware is initialized.
So, processing the devlink reload during initialization may lead to kernel
crash.

This patch fixes this by registering the devlink after
hardware initialization.

Fixes: cd6242991d2e ("net: hns3: add support for registering devlink for VF")
Fixes: 93305b77ffcb ("net: hns3: fix kernel crash when devlink reload during pf initialization")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 months agonet: hns3: fix port vlan filter not disabled issue
Yonglong Liu [Tue, 7 May 2024 13:42:23 +0000 (21:42 +0800)]
net: hns3: fix port vlan filter not disabled issue

According to hardware limitation, for device support modify
VLAN filter state but not support bypass port VLAN filter,
it should always disable the port VLAN filter. but the driver
enables port VLAN filter when initializing, if there is no
VLAN(except VLAN 0) id added, the driver will disable it
in service task. In most time, it works fine. But there is
a time window before the service task shceduled and net device
being registered. So if user adds VLAN at this time, the driver
will not update the VLAN filter state,  and the port VLAN filter
remains enabled.

To fix the problem, if support modify VLAN filter state but not
support bypass port VLAN filter, set the port vlan filter to "off".

Fixes: 184cd221a863 ("net: hns3: disable port VLAN filter when support function level VLAN filter control")
Fixes: 2ba306627f59 ("net: hns3: add support for modify VLAN filter state")
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 months agonet: hns3: use appropriate barrier function after setting a bit value
Peiyang Wang [Tue, 7 May 2024 13:42:22 +0000 (21:42 +0800)]
net: hns3: use appropriate barrier function after setting a bit value

There is a memory barrier in followed case. When set the port down,
hclgevf_set_timmer will set DOWN in state. Meanwhile, the service task has
different behaviour based on whether the state is DOWN. Thus, to make sure
service task see DOWN, use smp_mb__after_atomic after calling set_bit().

          CPU0                        CPU1
========================== ===================================
hclgevf_set_timer_task()    hclgevf_periodic_service_task()
  set_bit(DOWN,state)         test_bit(DOWN,state)

pf also has this issue.

Fixes: ff200099d271 ("net: hns3: remove unnecessary work in hclgevf_main")
Fixes: 1c6dfe6fc6f7 ("net: hns3: remove mailbox and reset work in hclge_main")
Signed-off-by: Peiyang Wang <wangpeiyang1@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 months agonet: hns3: release PTP resources if pf initialization failed
Peiyang Wang [Tue, 7 May 2024 13:42:21 +0000 (21:42 +0800)]
net: hns3: release PTP resources if pf initialization failed

During the PF initialization process, hclge_update_port_info may return an
error code for some reason. At this point,  the ptp initialization has been
completed. To void memory leaks, the resources that are applied by ptp
should be released. Therefore, when hclge_update_port_info returns an error
code, hclge_ptp_uninit is called to release the corresponding resources.

Fixes: eaf83ae59e18 ("net: hns3: add querying fec ability from firmware")
Signed-off-by: Peiyang Wang <wangpeiyang1@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Hariprasad Kelam <hkelam@marvell.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 months agonet: hns3: change type of numa_node_mask as nodemask_t
Peiyang Wang [Tue, 7 May 2024 13:42:20 +0000 (21:42 +0800)]
net: hns3: change type of numa_node_mask as nodemask_t

It provides nodemask_t to describe the numa node mask in kernel. To
improve transportability, change the type of numa_node_mask as nodemask_t.

Fixes: 38caee9d3ee8 ("net: hns3: Add support of the HNAE3 framework")
Signed-off-by: Peiyang Wang <wangpeiyang1@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 months agonet: hns3: direct return when receive a unknown mailbox message
Jian Shen [Tue, 7 May 2024 13:42:19 +0000 (21:42 +0800)]
net: hns3: direct return when receive a unknown mailbox message

Currently, the driver didn't return when receive a unknown
mailbox message, and continue checking whether need to
generate a response. It's unnecessary and may be incorrect.

Fixes: bb5790b71bad ("net: hns3: refactor mailbox response scheme between PF and VF")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 months agonet: hns3: using user configure after hardware reset
Peiyang Wang [Tue, 7 May 2024 13:42:18 +0000 (21:42 +0800)]
net: hns3: using user configure after hardware reset

When a reset occurring, it's supposed to recover user's configuration.
Currently, the port info(speed, duplex and autoneg) is stored in hclge_mac
and will be scheduled updated. Consider the case that reset was happened
consecutively. During the first reset, the port info is configured with
a temporary value cause the PHY is reset and looking for best link config.
Second reset start and use pervious configuration which is not the user's.
The specific process is as follows:

+------+               +----+                +----+
| USER |               | PF |                | HW |
+---+--+               +-+--+                +-+--+
    |  ethtool --reset   |                     |
    +------------------->|    reset command    |
    |  ethtool --reset   +-------------------->|
    +------------------->|                     +---+
    |                    +---+                 |   |
    |                    |   |reset currently  |   | HW RESET
    |                    |   |and wait to do   |   |
    |                    |<--+                 |   |
    |                    | send pervious cfg   |<--+
    |                    | (1000M FULL AN_ON)  |
    |                    +-------------------->|
    |                    | read cfg(time task) |
    |                    | (10M HALF AN_OFF)   +---+
    |                    |<--------------------+   | cfg take effect
    |                    |    reset command    |<--+
    |                    +-------------------->|
    |                    |                     +---+
    |                    | send pervious cfg   |   | HW RESET
    |                    | (10M HALF AN_OFF)   |<--+
    |                    +-------------------->|
    |                    | read cfg(time task) |
    |                    |  (10M HALF AN_OFF)  +---+
    |                    |<--------------------+   | cfg take effect
    |                    |                     |   |
    |                    | read cfg(time task) |<--+
    |                    |  (10M HALF AN_OFF)  |
    |                    |<--------------------+
    |                    |                     |
    v                    v                     v

To avoid aboved situation, this patch introduced req_speed, req_duplex,
req_autoneg to store user's configuration and it only be used after
hardware reset and to recover user's configuration

Fixes: f5f2b3e4dcc0 ("net: hns3: add support for imp-controlled PHYs")
Signed-off-by: Peiyang Wang <wangpeiyang1@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 months agonet/smc: fix neighbour and rtable leak in smc_ib_find_route()
Wen Gu [Tue, 7 May 2024 12:53:31 +0000 (20:53 +0800)]
net/smc: fix neighbour and rtable leak in smc_ib_find_route()

In smc_ib_find_route(), the neighbour found by neigh_lookup() and rtable
resolved by ip_route_output_flow() are not released or put before return.
It may cause the refcount leak, so fix it.

Link: https://lore.kernel.org/r/20240506015439.108739-1-guwen@linux.alibaba.com
Fixes: e5c4744cfb59 ("net/smc: add SMC-Rv2 connection establishment")
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240507125331.2808-1-guwen@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 months agoipv6: prevent NULL dereference in ip6_output()
Eric Dumazet [Tue, 7 May 2024 16:18:42 +0000 (16:18 +0000)]
ipv6: prevent NULL dereference in ip6_output()

According to syzbot, there is a chance that ip6_dst_idev()
returns NULL in ip6_output(). Most places in IPv6 stack
deal with a NULL idev just fine, but not here.

syzbot reported:

general protection fault, probably for non-canonical address 0xdffffc00000000bc: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x00000000000005e0-0x00000000000005e7]
CPU: 0 PID: 9775 Comm: syz-executor.4 Not tainted 6.9.0-rc5-syzkaller-00157-g6a30653b604a #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
 RIP: 0010:ip6_output+0x231/0x3f0 net/ipv6/ip6_output.c:237
Code: 3c 1e 00 49 89 df 74 08 4c 89 ef e8 19 58 db f7 48 8b 44 24 20 49 89 45 00 49 89 c5 48 8d 9d e0 05 00 00 48 89 d8 48 c1 e8 03 <42> 0f b6 04 38 84 c0 4c 8b 74 24 28 0f 85 61 01 00 00 8b 1b 31 ff
RSP: 0018:ffffc9000927f0d8 EFLAGS: 00010202
RAX: 00000000000000bc RBX: 00000000000005e0 RCX: 0000000000040000
RDX: ffffc900131f9000 RSI: 0000000000004f47 RDI: 0000000000004f48
RBP: 0000000000000000 R08: ffffffff8a1f0b9a R09: 1ffffffff1f51fad
R10: dffffc0000000000 R11: fffffbfff1f51fae R12: ffff8880293ec8c0
R13: ffff88805d7fc000 R14: 1ffff1100527d91a R15: dffffc0000000000
FS:  00007f135c6856c0(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000080 CR3: 0000000064096000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
  NF_HOOK include/linux/netfilter.h:314 [inline]
  ip6_xmit+0xefe/0x17f0 net/ipv6/ip6_output.c:358
  sctp_v6_xmit+0x9f2/0x13f0 net/sctp/ipv6.c:248
  sctp_packet_transmit+0x26ad/0x2ca0 net/sctp/output.c:653
  sctp_packet_singleton+0x22c/0x320 net/sctp/outqueue.c:783
  sctp_outq_flush_ctrl net/sctp/outqueue.c:914 [inline]
  sctp_outq_flush+0x6d5/0x3e20 net/sctp/outqueue.c:1212
  sctp_side_effects net/sctp/sm_sideeffect.c:1198 [inline]
  sctp_do_sm+0x59cc/0x60c0 net/sctp/sm_sideeffect.c:1169
  sctp_primitive_ASSOCIATE+0x95/0xc0 net/sctp/primitive.c:73
  __sctp_connect+0x9cd/0xe30 net/sctp/socket.c:1234
  sctp_connect net/sctp/socket.c:4819 [inline]
  sctp_inet_connect+0x149/0x1f0 net/sctp/socket.c:4834
  __sys_connect_file net/socket.c:2048 [inline]
  __sys_connect+0x2df/0x310 net/socket.c:2065
  __do_sys_connect net/socket.c:2075 [inline]
  __se_sys_connect net/socket.c:2072 [inline]
  __x64_sys_connect+0x7a/0x90 net/socket.c:2072
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: 778d80be5269 ("ipv6: Add disable_ipv6 sysctl to disable IPv6 operaion on specific interface.")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://lore.kernel.org/r/20240507161842.773961-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agohsr: Simplify code for announcing HSR nodes timer setup
Lukasz Majewski [Tue, 7 May 2024 11:12:14 +0000 (13:12 +0200)]
hsr: Simplify code for announcing HSR nodes timer setup

Up till now the code to start HSR announce timer, which triggers sending
supervisory frames, was assuming that hsr_netdev_notify() would be called
at least twice for hsrX interface. This was required to have different
values for old and current values of network device's operstate.

This is problematic for a case where hsrX interface is already in the
operational state when hsr_netdev_notify() is called, so timer is not
configured to trigger and as a result the hsrX is not sending supervisory
frames to HSR ring.

This error has been discovered when hsr_ping.sh script was run. To be
more specific - for the hsr1 and hsr2 the hsr_netdev_notify() was
called at least twice with different IF_OPER_{LOWERDOWN|DOWN|UP} states
assigned in hsr_check_carrier_and_operstate(hsr). As a result there was
no issue with sending supervisory frames.
However, with hsr3, the notify function was called only once with
operstate set to IF_OPER_UP and timer responsible for triggering
supervisory frames was not fired.

The solution is to use netif_oper_up() and netif_running() helper
functions to assess if network hsrX device is up.
Only then, when the timer is not already pending, it is started.
Otherwise it is deactivated.

Fixes: f421436a591d ("net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0)")
Signed-off-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240507111214.3519800-1-lukma@denx.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoipv6: fib6_rules: avoid possible NULL dereference in fib6_rule_action()
Eric Dumazet [Tue, 7 May 2024 16:31:45 +0000 (16:31 +0000)]
ipv6: fib6_rules: avoid possible NULL dereference in fib6_rule_action()

syzbot is able to trigger the following crash [1],
caused by unsafe ip6_dst_idev() use.

Indeed ip6_dst_idev() can return NULL, and must always be checked.

[1]

Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 0 PID: 31648 Comm: syz-executor.0 Not tainted 6.9.0-rc4-next-20240417-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
 RIP: 0010:__fib6_rule_action net/ipv6/fib6_rules.c:237 [inline]
 RIP: 0010:fib6_rule_action+0x241/0x7b0 net/ipv6/fib6_rules.c:267
Code: 02 00 00 49 8d 9f d8 00 00 00 48 89 d8 48 c1 e8 03 42 80 3c 20 00 74 08 48 89 df e8 f9 32 bf f7 48 8b 1b 48 89 d8 48 c1 e8 03 <42> 80 3c 20 00 74 08 48 89 df e8 e0 32 bf f7 4c 8b 03 48 89 ef 4c
RSP: 0018:ffffc9000fc1f2f0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 1a772f98c8186700
RDX: 0000000000000003 RSI: ffffffff8bcac4e0 RDI: ffffffff8c1f9760
RBP: ffff8880673fb980 R08: ffffffff8fac15ef R09: 1ffffffff1f582bd
R10: dffffc0000000000 R11: fffffbfff1f582be R12: dffffc0000000000
R13: 0000000000000080 R14: ffff888076509000 R15: ffff88807a029a00
FS:  00007f55e82ca6c0(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b31d23000 CR3: 0000000022b66000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
  fib_rules_lookup+0x62c/0xdb0 net/core/fib_rules.c:317
  fib6_rule_lookup+0x1fd/0x790 net/ipv6/fib6_rules.c:108
  ip6_route_output_flags_noref net/ipv6/route.c:2637 [inline]
  ip6_route_output_flags+0x38e/0x610 net/ipv6/route.c:2649
  ip6_route_output include/net/ip6_route.h:93 [inline]
  ip6_dst_lookup_tail+0x189/0x11a0 net/ipv6/ip6_output.c:1120
  ip6_dst_lookup_flow+0xb9/0x180 net/ipv6/ip6_output.c:1250
  sctp_v6_get_dst+0x792/0x1e20 net/sctp/ipv6.c:326
  sctp_transport_route+0x12c/0x2e0 net/sctp/transport.c:455
  sctp_assoc_add_peer+0x614/0x15c0 net/sctp/associola.c:662
  sctp_connect_new_asoc+0x31d/0x6c0 net/sctp/socket.c:1099
  __sctp_connect+0x66d/0xe30 net/sctp/socket.c:1197
  sctp_connect net/sctp/socket.c:4819 [inline]
  sctp_inet_connect+0x149/0x1f0 net/sctp/socket.c:4834
  __sys_connect_file net/socket.c:2048 [inline]
  __sys_connect+0x2df/0x310 net/socket.c:2065
  __do_sys_connect net/socket.c:2075 [inline]
  __se_sys_connect net/socket.c:2072 [inline]
  __x64_sys_connect+0x7a/0x90 net/socket.c:2072
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: 5e5f3f0f8013 ("[IPV6] ADDRCONF: Convert ipv6_get_saddr() to ipv6_dev_get_saddr().")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240507163145.835254-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agodt-bindings: net: mediatek: remove wrongly added clocks and SerDes
Daniel Golle [Tue, 7 May 2024 12:20:43 +0000 (13:20 +0100)]
dt-bindings: net: mediatek: remove wrongly added clocks and SerDes

Several clocks as well as both sgmiisys phandles were added by mistake
to the Ethernet bindings for MT7988. Also, the total number of clocks
didn't match with the actual number of items listed.

This happened because the vendor driver which served as a reference uses
a high number of syscon phandles to access various parts of the SoC
which wasn't acceptable upstream. Hence several parts which have never
previously been supported (such SerDes PHY and USXGMII PCS) are going to
be implemented by separate drivers. As a result the device tree will
look much more sane.

Quickly align the bindings with the upcoming reality of the drivers
actually adding support for the remaining Ethernet-related features of
the MT7988 SoC.

Fixes: c94a9aabec36 ("dt-bindings: net: mediatek,net: add mt7988-eth binding")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/1569290b21cc787a424469ed74456a7e976b102d.1715084326.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge tag '6.9-rc7-ksmbd-fixes' of git://git.samba.org/ksmbd
Linus Torvalds [Wed, 8 May 2024 17:39:53 +0000 (10:39 -0700)]
Merge tag '6.9-rc7-ksmbd-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:
 "Five ksmbd server fixes, all also for stable

   - Three fixes related to SMB3 leases (fixes two xfstests, and a
     locking issue)

   - Unitialized variable fix

   - Socket creation fix when bindv6only is set"

* tag '6.9-rc7-ksmbd-fixes' of git://git.samba.org/ksmbd:
  ksmbd: do not grant v2 lease if parent lease key and epoch are not set
  ksmbd: use rwsem instead of rwlock for lease break
  ksmbd: avoid to send duplicate lease break notifications
  ksmbd: off ipv6only for both ipv4/ipv6 binding
  ksmbd: fix uninitialized symbol 'share' in smb2_tree_connect()

4 months agoMerge tag 'fuse-fixes-6.9-final' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 8 May 2024 17:33:55 +0000 (10:33 -0700)]
Merge tag 'fuse-fixes-6.9-final' of git://git./linux/kernel/git/mszeredi/fuse

Pull fuse fixes from Miklos Szeredi:
 "Two one-liner fixes for issues introduced in -rc1"

* tag 'fuse-fixes-6.9-final' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  virtiofs: include a newline in sysfs tag
  fuse: verify zero padding in fuse_backing_map

4 months agoMerge tag 'exfat-for-6.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/linkin...
Linus Torvalds [Wed, 8 May 2024 17:30:13 +0000 (10:30 -0700)]
Merge tag 'exfat-for-6.9-rc8' of git://git./linux/kernel/git/linkinjeon/exfat

Pull exfat fixes from Namjae Jeon:

 - Fix xfstests generic/013 test failure with dirsync mount option

 - Initialize the reserved fields of deleted file and stream extension
   dentries to zero

* tag 'exfat-for-6.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
  exfat: zero the reserved fields of file and stream extension dentries
  exfat: fix timing of synchronizing bitmap and inode

4 months agoMerge tag 'bcachefs-2024-05-07.2' of https://evilpiepirate.org/git/bcachefs
Linus Torvalds [Wed, 8 May 2024 17:23:18 +0000 (10:23 -0700)]
Merge tag 'bcachefs-2024-05-07.2' of https://evilpiepirate.org/git/bcachefs

Pull bcachefs fixes from Kent Overstreet:

 - Various syzbot fixes; mainly small gaps in validation

 - Fix an integer overflow in fiemap() which was preventing filefrag
   from returning the full list of extents

 - Fix a refcounting bug on the device refcount, turned up by new
   assertions in the development branch

 - Fix a device removal/readd bug; write_super() was repeatedly dropping
   and retaking bch_dev->io_ref references

* tag 'bcachefs-2024-05-07.2' of https://evilpiepirate.org/git/bcachefs:
  bcachefs: Add missing sched_annotate_sleep() in bch2_journal_flush_seq_async()
  bcachefs: Fix race in bch2_write_super()
  bcachefs: BCH_SB_LAYOUT_SIZE_BITS_MAX
  bcachefs: Add missing skcipher_request_set_callback() call
  bcachefs: Fix snapshot_t() usage in bch2_fs_quota_read_inode()
  bcachefs: Fix shift-by-64 in bformat_needs_redo()
  bcachefs: Guard against unknown k.k->type in __bkey_invalid()
  bcachefs: Add missing validation for superblock section clean
  bcachefs: Fix assert in bch2_alloc_v4_invalid()
  bcachefs: fix overflow in fiemap
  bcachefs: Add a better limit for maximum number of buckets
  bcachefs: Fix lifetime issue in device iterator helpers
  bcachefs: Fix bch2_dev_lookup() refcounting
  bcachefs: Initialize bch_write_op->failed in inline data path
  bcachefs: Fix refcount put in sb_field_resize error path
  bcachefs: Inodes need extra padding for varint_decode_fast()
  bcachefs: Fix early error path in bch2_fs_btree_key_cache_exit()
  bcachefs: bucket_pos_to_bp_noerror()
  bcachefs: don't free error pointers
  bcachefs: Fix a scheduler splat in __bch2_next_write_buffer_flush_journal_buf()

4 months agoMerge tag 'soc-fixes-6.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Linus Torvalds [Wed, 8 May 2024 17:15:40 +0000 (10:15 -0700)]
Merge tag 'soc-fixes-6.9-3' of git://git./linux/kernel/git/soc/soc

Pull ARM SoC fixes from Arnd Bergmann:
 "These are a couple of last minute fixes that came in over the previous
  week, addressing:

   - A pin configuration bug on a qualcomm board that caused issues with
     ethernet and mmc

   - Two minor code fixes for misleading console output in the microchip
     firmware driver

   - A build warning in the sifive cache driver"

* tag 'soc-fixes-6.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  firmware: microchip: clarify that sizes and addresses are in hex
  firmware: microchip: don't unconditionally print validation success
  arm64: dts: qcom: sa8155p-adp: fix SDHC2 CD pin configuration
  cache: sifive_ccache: Silence unused variable warning

4 months agoMerge tag 'pci-v6.9-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Linus Torvalds [Wed, 8 May 2024 16:37:58 +0000 (09:37 -0700)]
Merge tag 'pci-v6.9-fixes-2' of git://git./linux/kernel/git/pci/pci

Pull pci fixes from Bjorn Helgaas:

 - Update kernel-parameters doc to describe "pcie_aspm=off" more
   accurately (Bjorn Helgaas)

 - Restore the parent's (not the child's) ASPM state to the parent
   during resume, which fixes a reboot during resume (Kai-Heng Feng)

* tag 'pci-v6.9-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  PCI/ASPM: Restore parent state to parent, child state to child
  PCI/ASPM: Clarify that pcie_aspm=off means leave ASPM untouched

4 months agoMerge branch 'rxrpc-miscellaneous-fixes'
Jakub Kicinski [Wed, 8 May 2024 15:05:12 +0000 (08:05 -0700)]
Merge branch 'rxrpc-miscellaneous-fixes'

David Howells says:

====================
rxrpc: Miscellaneous fixes (part)

Here some miscellaneous fixes for AF_RXRPC:

 (1) Fix the congestion control algorithm to start cwnd at 4 and to not cut
     ssthresh when the peer cuts its rwind size.

 (2) Only transmit a single ACK for all the DATA packets glued together
     into a jumbo packet to reduce the number of ACKs being generated.
====================

Link: https://lore.kernel.org/r/20240503150749.1001323-1-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agorxrpc: Only transmit one ACK per jumbo packet received
David Howells [Fri, 3 May 2024 15:07:40 +0000 (16:07 +0100)]
rxrpc: Only transmit one ACK per jumbo packet received

Only generate one ACK packet for all the subpackets in a jumbo packet.  If
we would like to generate more than one ACK, we prioritise them base on
their reason code, in the order, highest first:

   OutOfSeq > NoSpace > ExceedsWin > Duplicate > Requested > Delay > Idle

For the first four, we reference the lowest offending subpacket; for the
last three, the highest.

This reduces the number of ACKs we end up transmitting to one per UDP
packet transmitted to reduce network loading and packet parsing.

Fixes: 5d7edbc9231e ("rxrpc: Get rid of the Rx ring")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Reviewed-by: Jeffrey Altman <jaltman@auristor.com <mailto:jaltman@auristor.com>>
Link: https://lore.kernel.org/r/20240503150749.1001323-3-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agorxrpc: Fix congestion control algorithm
David Howells [Fri, 3 May 2024 15:07:39 +0000 (16:07 +0100)]
rxrpc: Fix congestion control algorithm

Make the following fixes to the congestion control algorithm:

 (1) Don't vary the cwnd starting value by the size of RXRPC_TX_SMSS since
     that's currently held constant - set to the size of a jumbo subpacket
     payload so that we can create jumbo packets on the fly.  The current
     code invariably picks 3 as the starting value.

     Further, the starting cwnd needs to be an even number because we ack
     every other packet, so set it to 4.

 (2) Don't cut ssthresh when we see an ACK come from the peer with a
     receive window (rwind) less than ssthresh.  ssthresh keeps track of
     characteristics of the connection whereas rwind may be reduced by the
     peer for any reason - and may be reduced to 0.

Fixes: 1fc4fa2ac93d ("rxrpc: Fix congestion management")
Fixes: 0851115090a3 ("rxrpc: Reduce ssthresh to peer's receive window")
Signed-off-by: David Howells <dhowells@redhat.com>
Suggested-by: Simon Wilkinson <sxw@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Reviewed-by: Jeffrey Altman <jaltman@auristor.com <mailto:jaltman@auristor.com>>
Link: https://lore.kernel.org/r/20240503150749.1001323-2-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoselftests: test_bridge_neigh_suppress.sh: Fix failures due to duplicate MAC
Ido Schimmel [Tue, 7 May 2024 11:30:33 +0000 (14:30 +0300)]
selftests: test_bridge_neigh_suppress.sh: Fix failures due to duplicate MAC

When creating the topology for the test, three veth pairs are created in
the initial network namespace before being moved to one of the network
namespaces created by the test.

On systems where systemd-udev uses MACAddressPolicy=persistent (default
since systemd version 242), this will result in some net devices having
the same MAC address since they were created with the same name in the
initial network namespace. In turn, this leads to arping / ndisc6
failing since packets are dropped by the bridge's loopback filter.

Fix by creating each net device in the correct network namespace instead
of moving it there from the initial network namespace.

Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20240426074015.251854d4@kernel.org/
Fixes: 7648ac72dcd7 ("selftests: net: Add bridge neighbor suppression test")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20240507113033.1732534-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoipv6: Fix potential uninit-value access in __ip6_make_skb()
Shigeru Yoshida [Mon, 6 May 2024 14:11:29 +0000 (23:11 +0900)]
ipv6: Fix potential uninit-value access in __ip6_make_skb()

As it was done in commit fc1092f51567 ("ipv4: Fix uninit-value access in
__ip_make_skb()") for IPv4, check FLOWI_FLAG_KNOWN_NH on fl6->flowi6_flags
instead of testing HDRINCL on the socket to avoid a race condition which
causes uninit-value access.

Fixes: ea30388baebc ("ipv6: Fix an uninit variable access bug in __ip6_make_skb()")
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 months agonet: phy: marvell-88q2xxx: add support for Rev B1 and B2
Gregor Herburger [Mon, 6 May 2024 06:24:33 +0000 (08:24 +0200)]
net: phy: marvell-88q2xxx: add support for Rev B1 and B2

Different revisions of the Marvell 88q2xxx phy needs different init
sequences.

Add init sequence for Rev B1 and Rev B2. Rev B2 init sequence skips one
register write.

Tested-by: Dimitri Fedrau <dima.fedrau@gmail.com>
Signed-off-by: Gregor Herburger <gregor.herburger@ew.tq-group.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 months agoappletalk: Improve handling of broadcast packets
Vincent Duvert [Sun, 5 May 2024 18:54:57 +0000 (11:54 -0700)]
appletalk: Improve handling of broadcast packets

When a broadcast AppleTalk packet is received, prefer queuing it on the
socket whose address matches the address of the interface that received
the packet (and is listening on the correct port). Userspace
applications that handle such packets will usually send a response on
the same socket that received the packet; this fix allows the response
to be sent on the correct interface.

If a socket matching the interface's address is not found, an arbitrary
socket listening on the correct port will be used, if any. This matches
the implementation's previous behavior.

Fixes atalkd's responses to network information requests when multiple
network interfaces are configured to use AppleTalk.

Link: https://lore.kernel.org/netdev/20200722113752.1218-2-vincent.ldev@duvert.net/
Link: https://gist.github.com/VinDuv/4db433b6dce39d51a5b7847ee749b2a4
Signed-off-by: Vincent Duvert <vincent.ldev@duvert.net>
Signed-off-by: Doug Brown <doug@schmorgal.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 months agonet: bridge: fix corrupted ethernet header on multicast-to-unicast
Felix Fietkau [Sun, 5 May 2024 18:42:38 +0000 (20:42 +0200)]
net: bridge: fix corrupted ethernet header on multicast-to-unicast

The change from skb_copy to pskb_copy unfortunately changed the data
copying to omit the ethernet header, since it was pulled before reaching
this point. Fix this by calling __skb_push/pull around pskb_copy.

Fixes: 59c878cbcdd8 ("net: bridge: fix multicast-to-unicast with fraglist GSO")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 months agovirtiofs: include a newline in sysfs tag
Brian Foster [Thu, 25 Apr 2024 10:44:00 +0000 (06:44 -0400)]
virtiofs: include a newline in sysfs tag

The internal tag string doesn't contain a newline. Append one when
emitting the tag via sysfs.

[Stefan] Orthogonal to the newline issue, sysfs_emit(buf, "%s", fs->tag) is
needed to prevent format string injection.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Fixes: a8f62f50b4e4 ("virtiofs: export filesystem tags through sysfs")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
4 months agomptcp: only allow set existing scheduler for net.mptcp.scheduler
Gregory Detal [Mon, 6 May 2024 15:35:28 +0000 (17:35 +0200)]
mptcp: only allow set existing scheduler for net.mptcp.scheduler

The current behavior is to accept any strings as inputs, this results in
an inconsistent result where an unexisting scheduler can be set:

  # sysctl -w net.mptcp.scheduler=notdefault
  net.mptcp.scheduler = notdefault

This patch changes this behavior by checking for existing scheduler
before accepting the input.

Fixes: e3b2870b6d22 ("mptcp: add a new sysctl scheduler")
Cc: stable@vger.kernel.org
Signed-off-by: Gregory Detal <gregory.detal@gmail.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Tested-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240506-upstream-net-20240506-mptcp-sched-exist-v1-1-2ed1529e521e@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agonfc: nci: Fix kcov check in nci_rx_work()
Tetsuo Handa [Sun, 5 May 2024 10:36:49 +0000 (19:36 +0900)]
nfc: nci: Fix kcov check in nci_rx_work()

Commit 7e8cdc97148c ("nfc: Add KCOV annotations") added
kcov_remote_start_common()/kcov_remote_stop() pair into nci_rx_work(),
with an assumption that kcov_remote_stop() is called upon continue of
the for loop. But commit d24b03535e5e ("nfc: nci: Fix uninit-value in
nci_dev_up and nci_ntf_packet") forgot to call kcov_remote_stop() before
break of the for loop.

Reported-by: syzbot <syzbot+0438378d6f157baae1a2@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=0438378d6f157baae1a2
Fixes: d24b03535e5e ("nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet")
Suggested-by: Andrey Konovalov <andreyknvl@gmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/6d10f829-5a0c-405a-b39a-d7266f3a1a0b@I-love.SAKURA.ne.jp
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agobcachefs: Add missing sched_annotate_sleep() in bch2_journal_flush_seq_async()
Kent Overstreet [Tue, 7 May 2024 03:11:43 +0000 (23:11 -0400)]
bcachefs: Add missing sched_annotate_sleep() in bch2_journal_flush_seq_async()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Fix race in bch2_write_super()
Kent Overstreet [Tue, 7 May 2024 00:49:24 +0000 (20:49 -0400)]
bcachefs: Fix race in bch2_write_super()

bch2_write_super() was looping over online devices multiple times -
dropping and retaking io_ref each time.

This meant it could race with device removal; it could increment the
sequence number on a device but fail to write it - and then if the
device was re-added, it would get confused the next time around thinking
a superblock write was silently dropped.

Fix this by taking io_ref once, and stashing pointers to online devices
in a darray.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agoMerge tag 'qcom-arm64-fixes-for-6.9-2' of https://git.kernel.org/pub/scm/linux/kernel...
Arnd Bergmann [Tue, 7 May 2024 06:07:06 +0000 (08:07 +0200)]
Merge tag 'qcom-arm64-fixes-for-6.9-2' of https://git./linux/kernel/git/qcom/linux into arm/fixes

One more Qualcomm Arm64 DeviceTree fix for v6.9

On ths SA8155P automotive platform, the wrong gpio controller is defined
for the SD-card detect pin, which depending on probe ordering of things
cause ethernet to be broken. The card detect pin reference is corrected
to solve this problem.

* tag 'qcom-arm64-fixes-for-6.9-2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
  arm64: dts: qcom: sa8155p-adp: fix SDHC2 CD pin configuration

Link: https://lore.kernel.org/r/20240427153817.1430382-1-andersson@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
4 months agonetlink: specs: Add missing bridge linkinfo attrs
Donald Hunter [Fri, 3 May 2024 16:43:04 +0000 (17:43 +0100)]
netlink: specs: Add missing bridge linkinfo attrs

Attributes for FDB learned entries were added to the if_link netlink api
for bridge linkinfo but are missing from the rt_link.yaml spec. Add the
missing attributes to the spec.

Fixes: ddd1ad68826d ("net: bridge: Add netlink knobs for number / max learned FDB entries")
Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240503164304.87427-1-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agophonet: fix rtm_phonet_notify() skb allocation
Eric Dumazet [Thu, 2 May 2024 16:17:00 +0000 (16:17 +0000)]
phonet: fix rtm_phonet_notify() skb allocation

fill_route() stores three components in the skb:

- struct rtmsg
- RTA_DST (u8)
- RTA_OIF (u32)

Therefore, rtm_phonet_notify() should use

NLMSG_ALIGN(sizeof(struct rtmsg)) +
nla_total_size(1) +
nla_total_size(4)

Fixes: f062f41d0657 ("Phonet: routing table Netlink interface")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Rémi Denis-Courmont <courmisch@gmail.com>
Link: https://lore.kernel.org/r/20240502161700.1804476-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 months agoMerge tag 'for-6.9-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Mon, 6 May 2024 20:43:13 +0000 (13:43 -0700)]
Merge tag 'for-6.9-rc7-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "Two more fixes, both have some visible effects on user space:

   - add check if quotas are enabled when passing qgroup inheritance
     info, this affects snapper that could fail to create a snapshot

   - do check for leaf/node flag WRITTEN earlier so that nodes are
     completely validated before access, this used to be done by
     integrity checker but it's been removed and left an unhandled case"

* tag 'for-6.9-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: make sure that WRITTEN is set on all metadata blocks
  btrfs: qgroup: do not check qgroup inherit if qgroup is disabled

4 months agoReapply "drm/qxl: simplify qxl_fence_wait"
Linus Torvalds [Mon, 6 May 2024 20:28:59 +0000 (13:28 -0700)]
Reapply "drm/qxl: simplify qxl_fence_wait"

This reverts commit 07ed11afb68d94eadd4ffc082b97c2331307c5ea.

Stephen Rostedt reports:
 "I went to run my tests on my VMs and the tests hung on boot up.
  Unfortunately, the most I ever got out was:

  [   93.607888] Testing event system initcall: OK
  [   93.667730] Running tests on all trace events:
  [   93.669757] Testing all events: OK
  [   95.631064] ------------[ cut here ]------------
  Timed out after 60 seconds"

and further debugging points to a possible circular locking dependency
between the console_owner locking and the worker pool locking.

Reverting the commit allows Steve's VM to boot to completion again.

[ This may obviously result in the "[TTM] Buffer eviction failed"
  messages again, which was the reason for that original revert. But at
  this point this seems preferable to a non-booting system... ]

Reported-and-bisected-by: Steven Rostedt <rostedt@goodmis.org>
Link: https://lore.kernel.org/all/20240502081641.457aa25f@gandalf.local.home/
Acked-by: Maxime Ripard <mripard@kernel.org>
Cc: Alex Constantino <dreaming.about.electric.sheep@gmail.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Timo Lindfors <timo.lindfors@iki.fi>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agoPCI/ASPM: Restore parent state to parent, child state to child
Kai-Heng Feng [Mon, 6 May 2024 05:16:02 +0000 (13:16 +0800)]
PCI/ASPM: Restore parent state to parent, child state to child

There's a typo that makes parent device uses child LNKCTL value and vice
versa. This causes Micron NVMe to trigger a reboot upon system resume.

Correct the typo to fix the issue.

Fixes: 64dbb2d70744 ("PCI/ASPM: Disable L1 before configuring L1 Substates")
Link: https://lore.kernel.org/r/20240506051602.1990743-1-kai.heng.feng@canonical.com
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
[bhelgaas: update subject]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
4 months agoMerge tag 'slab-for-6.9-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Mon, 6 May 2024 17:27:58 +0000 (10:27 -0700)]
Merge tag 'slab-for-6.9-rc7-fixes' of git://git./linux/kernel/git/vbabka/slab

Pull slab fixes from Vlastimil Babka:

 - Fix for cleanup infrastructure (Dan Carpenter)

   This makes the __free(kfree) cleanup hooks not crash on error
   pointers.

 - SLUB fix for freepointer checking (Nicolas Bouchinet)

   This fixes a recently introduced bug that manifests when
   init_on_free, CONFIG_SLAB_FREELIST_HARDENED and consistency checks
   (slub_debug=F) are all enabled, and results in false-positive
   freepointer corrupt reports for caches that store freepointer outside
   of the object area.

* tag 'slab-for-6.9-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  mm/slab: make __free(kfree) accept error pointers
  mm/slub: avoid zeroing outside-object freepointer for single free

4 months agoMerge tag 'auxdisplay-v6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/andy...
Linus Torvalds [Mon, 6 May 2024 16:48:46 +0000 (09:48 -0700)]
Merge tag 'auxdisplay-v6.10-1' of git://git./linux/kernel/git/andy/linux-auxdisplay

Pull auxdisplay fixes from Andy Shevchenko:

 - A couple of non-critical build fixes to Character LCD library

 - Miscellaneous fixes here and there

* tag 'auxdisplay-v6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/andy/linux-auxdisplay:
  auxdisplay: charlcd: Don't rebuild when CONFIG_PANEL_BOOT_MESSAGE=y
  auxdisplay: charlcd: Add missing MODULE_DESCRIPTION()
  auxdisplay: seg-led-gpio: Convert to platform remove callback returning void
  auxdisplay: linedisp: Group display drivers together

4 months agobcachefs: BCH_SB_LAYOUT_SIZE_BITS_MAX
Kent Overstreet [Mon, 6 May 2024 13:10:29 +0000 (09:10 -0400)]
bcachefs: BCH_SB_LAYOUT_SIZE_BITS_MAX

Define a constant for the max superblock size, to avoid a too-large
shift.

Reported-by: syzbot+a8b0fb419355c91dda7f@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Add missing skcipher_request_set_callback() call
Kent Overstreet [Mon, 6 May 2024 12:40:46 +0000 (08:40 -0400)]
bcachefs: Add missing skcipher_request_set_callback() call

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Fix snapshot_t() usage in bch2_fs_quota_read_inode()
Kent Overstreet [Mon, 6 May 2024 02:56:54 +0000 (22:56 -0400)]
bcachefs: Fix snapshot_t() usage in bch2_fs_quota_read_inode()

bch2_fs_quota_read_inode() wasn't entirely updated to the
bch2_snapshot_tree() helper, which takes rcu lock.

Reported-by: syzbot+a3a9a61224ed3b7f0010@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Fix shift-by-64 in bformat_needs_redo()
Kent Overstreet [Mon, 6 May 2024 02:44:27 +0000 (22:44 -0400)]
bcachefs: Fix shift-by-64 in bformat_needs_redo()

Ancient versions of bcachefs produced packed formats that could
represent keys that our in memory format cannot represent;
bformat_needs_redo() has some tricky shifts to check for this sort of
overflow.

Reported-by: syzbot+594427aebfefeebe91c6@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Guard against unknown k.k->type in __bkey_invalid()
Kent Overstreet [Mon, 6 May 2024 02:33:05 +0000 (22:33 -0400)]
bcachefs: Guard against unknown k.k->type in __bkey_invalid()

For forwards compatibility we have to allow unknown key types, and only
run the checks that make sense against them.

Fix a missing guard on k.k->type being known.

Reported-by: syzbot+ae4dc916da3ce51f284f@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Add missing validation for superblock section clean
Kent Overstreet [Mon, 6 May 2024 02:28:00 +0000 (22:28 -0400)]
bcachefs: Add missing validation for superblock section clean

We were forgetting to check for jset entries that overrun the end of the
section - both in validate and to_text(); to_text() needs to be safe for
types that fail to validate.

Reported-by: syzbot+c48865e11e7e893ec4ab@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Fix assert in bch2_alloc_v4_invalid()
Kent Overstreet [Mon, 6 May 2024 02:02:28 +0000 (22:02 -0400)]
bcachefs: Fix assert in bch2_alloc_v4_invalid()

Reported-by: syzbot+10827fa6b176e1acf1d0@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: fix overflow in fiemap
Reed Riley [Sat, 4 May 2024 22:12:23 +0000 (22:12 +0000)]
bcachefs: fix overflow in fiemap

filefrag (and potentially other utilities that call fiemap) sometimes
pass ULONG_MAX as the length.  fiemap_prep clamps excessively large
lengths - but the calculation of end can overflow if it occurs before
calling fiemap_prep.  When this happens, filefrag assumes it has read to
the end and exits.

Signed-off-by: Reed Riley <reed@riley.engineer>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Add a better limit for maximum number of buckets
Kent Overstreet [Sat, 4 May 2024 17:26:37 +0000 (13:26 -0400)]
bcachefs: Add a better limit for maximum number of buckets

The bucket_gens array is a single array allocation (one byte per
bucket), and kernel allocations are still limited to INT_MAX.

Check this limit to avoid failing the bucket_gens array allocation.

Reported-by: syzbot+b29f436493184ea42e2b@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Fix lifetime issue in device iterator helpers
Kent Overstreet [Sat, 4 May 2024 16:55:44 +0000 (12:55 -0400)]
bcachefs: Fix lifetime issue in device iterator helpers

bch2_get_next_dev() and bch2_get_next_online_dev() iterate over devices,
dropping and taking refs as they go; we can't access the previous device
(for ca->dev_idx) after we've dropped our ref to it, unless we take
rcu_read_lock() first.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Fix bch2_dev_lookup() refcounting
Kent Overstreet [Sat, 4 May 2024 16:51:49 +0000 (12:51 -0400)]
bcachefs: Fix bch2_dev_lookup() refcounting

bch2_dev_lookup() is supposed to take a ref on the device it returns, but
for_each_member_device() takes refs as it iterates,
for_each_member_device_rcu() does not.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Initialize bch_write_op->failed in inline data path
Kent Overstreet [Sat, 4 May 2024 16:29:46 +0000 (12:29 -0400)]
bcachefs: Initialize bch_write_op->failed in inline data path

Normally this is initialized in __bch2_write(), which is executed in a
loop, but the inline data path skips this.

Reported-by: syzbot+fd3ccb331eb21f05d13b@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Fix refcount put in sb_field_resize error path
Kent Overstreet [Fri, 3 May 2024 21:13:21 +0000 (17:13 -0400)]
bcachefs: Fix refcount put in sb_field_resize error path

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Inodes need extra padding for varint_decode_fast()
Kent Overstreet [Fri, 3 May 2024 15:31:22 +0000 (11:31 -0400)]
bcachefs: Inodes need extra padding for varint_decode_fast()

Reported-by: syzbot+66b9b74f6520068596a9@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Fix early error path in bch2_fs_btree_key_cache_exit()
Kent Overstreet [Fri, 3 May 2024 15:39:53 +0000 (11:39 -0400)]
bcachefs: Fix early error path in bch2_fs_btree_key_cache_exit()

Reported-by: syzbot+a35cdb62ec34d44fb062@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: bucket_pos_to_bp_noerror()
Kent Overstreet [Fri, 3 May 2024 15:06:54 +0000 (11:06 -0400)]
bcachefs: bucket_pos_to_bp_noerror()

We don't want the assert when we're checking if the backpointer is
valid.

Reported-by: syzbot+bf7215c0525098e7747a@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: don't free error pointers
Kent Overstreet [Fri, 3 May 2024 14:55:17 +0000 (10:55 -0400)]
bcachefs: don't free error pointers

Reported-by: syzbot+3333603f569fc2ef258c@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agobcachefs: Fix a scheduler splat in __bch2_next_write_buffer_flush_journal_buf()
Kent Overstreet [Mon, 6 May 2024 14:14:13 +0000 (10:14 -0400)]
bcachefs: Fix a scheduler splat in __bch2_next_write_buffer_flush_journal_buf()

We're using mutex_lock() inside a wait_event() conditional -
prepare_to_wait() has already flipped task state, so potentially
blocking ops need annotation.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 months agonet: fix out-of-bounds access in ops_init
Thadeu Lima de Souza Cascardo [Thu, 2 May 2024 13:20:06 +0000 (10:20 -0300)]
net: fix out-of-bounds access in ops_init

net_alloc_generic is called by net_alloc, which is called without any
locking. It reads max_gen_ptrs, which is changed under pernet_ops_rwsem. It
is read twice, first to allocate an array, then to set s.len, which is
later used to limit the bounds of the array access.

It is possible that the array is allocated and another thread is
registering a new pernet ops, increments max_gen_ptrs, which is then used
to set s.len with a larger than allocated length for the variable array.

Fix it by reading max_gen_ptrs only once in net_alloc_generic. If
max_gen_ptrs is later incremented, it will be caught in net_assign_generic.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Fixes: 073862ba5d24 ("netns: fix net_alloc_generic()")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240502132006.3430840-1-cascardo@igalia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 months agoLinux 6.9-rc7 v6.9-rc7
Linus Torvalds [Sun, 5 May 2024 21:06:01 +0000 (14:06 -0700)]
Linux 6.9-rc7

4 months agoepoll: be better about file lifetimes
Linus Torvalds [Fri, 3 May 2024 20:36:09 +0000 (13:36 -0700)]
epoll: be better about file lifetimes

epoll can call out to vfs_poll() with a file pointer that may race with
the last 'fput()'. That would make f_count go down to zero, and while
the ep->mtx locking means that the resulting file pointer tear-down will
be blocked until the poll returns, it means that f_count is already
dead, and any use of it won't actually get a reference to the file any
more: it's dead regardless.

Make sure we have a valid ref on the file pointer before we call down to
vfs_poll() from the epoll routines.

Link: https://lore.kernel.org/lkml/0000000000002d631f0615918f1e@google.com/
Reported-by: syzbot+045b454ab35fd82a35fb@syzkaller.appspotmail.com
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agoMerge tag 'edac_urgent_for_v6.9_rc7' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 5 May 2024 17:51:29 +0000 (10:51 -0700)]
Merge tag 'edac_urgent_for_v6.9_rc7' of git://git./linux/kernel/git/ras/ras

Pull EDAC fixes from Borislav Petkov:

 - Fix error logging and check user-supplied data when injecting an
   error in the versal EDAC driver

* tag 'edac_urgent_for_v6.9_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/versal: Do not log total error counts
  EDAC/versal: Check user-supplied data before injecting an error
  EDAC/versal: Do not register for NOC errors

4 months agoMerge tag 'powerpc-6.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sun, 5 May 2024 17:44:04 +0000 (10:44 -0700)]
Merge tag 'powerpc-6.9-4' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix incorrect delay handling in the plpks (keystore) code

 - Fix a panic when an LPAR boots with a frozen PE

Thanks to Andrew Donnellan, Gaurav Batra, Nageswara R Sastry, and Nayna
Jain.

* tag 'powerpc-6.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/pseries/iommu: LPAR panics during boot up with a frozen PE
  powerpc/pseries: make max polling consistent for longer H_CALLs

4 months agoMerge tag 'x86-urgent-2024-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 5 May 2024 17:17:05 +0000 (10:17 -0700)]
Merge tag 'x86-urgent-2024-05-05' of git://git./linux/kernel/git/tip/tip

Pull misc x86 fixes from Ingo Molnar:

 - Remove the broken vsyscall emulation code from
   the page fault code

 - Fix kexec crash triggered by certain SEV RMP
   table layouts

 - Fix unchecked MSR access error when disabling
   the x2APIC via iommu=off

* tag 'x86-urgent-2024-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Remove broken vsyscall emulation code from the page fault code
  x86/apic: Don't access the APIC when disabling x2APIC
  x86/sev: Add callback to apply RMP table fixups for kexec
  x86/e820: Add a new e820 table update helper

4 months agoMerge tag 'irq-urgent-2024-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 5 May 2024 17:12:32 +0000 (10:12 -0700)]
Merge tag 'irq-urgent-2024-05-05' of git://git./linux/kernel/git/tip/tip

Pull irq fix from Ingo Molnar:
 "Fix suspicious RCU usage in __do_softirq()"

* tag 'irq-urgent-2024-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  softirq: Fix suspicious RCU usage in __do_softirq()

4 months agoMerge tag 'char-misc-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sun, 5 May 2024 17:08:52 +0000 (10:08 -0700)]
Merge tag 'char-misc-6.9-rc7' of git://git./linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "Here are some small char/misc/other driver fixes and new device ids
  for 6.9-rc7 that resolve some reported problems.

  Included in here are:

   - iio driver fixes

   - mei driver fix and new device ids

   - dyndbg bugfix

   - pvpanic-pci driver bugfix

   - slimbus driver bugfix

   - fpga new device id

  All have been in linux-next with no reported problems"

* tag 'char-misc-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  slimbus: qcom-ngd-ctrl: Add timeout for wait operation
  dyndbg: fix old BUG_ON in >control parser
  misc/pvpanic-pci: register attributes via pci_driver
  fpga: dfl-pci: add PCI subdevice ID for Intel D5005 card
  mei: me: add lunar lake point M DID
  mei: pxp: match against PCI_CLASS_DISPLAY_OTHER
  iio:imu: adis16475: Fix sync mode setting
  iio: accel: mxc4005: Reset chip on probe() and resume()
  iio: accel: mxc4005: Interrupt handling fixes
  dt-bindings: iio: health: maxim,max30102: fix compatible check
  iio: pressure: Fixes SPI support for BMP3xx devices
  iio: pressure: Fixes BME280 SPI driver data

4 months agoMerge tag 'usb-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sun, 5 May 2024 17:04:44 +0000 (10:04 -0700)]
Merge tag 'usb-6.9-rc7' of git://git./linux/kernel/git/gregkh/usb

Pull USB driver fixes from Greg KH:
 "Here are some small USB driver fixes for reported problems for
  6.9-rc7. Included in here are:

   - usb core fixes for found issues

   - typec driver fixes for reported problems

   - usb gadget driver fixes for reported problems

   - xhci build fixes

   - dwc3 driver fixes for reported issues

  All of these have been in linux-next this past week with no reported
  problems"

* tag 'usb-6.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: typec: tcpm: Check for port partner validity before consuming it
  usb: typec: tcpm: enforce ready state when queueing alt mode vdm
  usb: typec: tcpm: unregister existing source caps before re-registration
  usb: typec: tcpm: clear pd_event queue in PORT_RESET
  usb: typec: tcpm: queue correct sop type in tcpm_queue_vdm_unlocked
  usb: Fix regression caused by invalid ep0 maxpacket in virtual SuperSpeed device
  usb: ohci: Prevent missed ohci interrupts
  usb: typec: qcom-pmic: fix pdphy start() error handling
  usb: typec: qcom-pmic: fix use-after-free on late probe errors
  usb: gadget: f_fs: Fix a race condition when processing setup packets.
  USB: core: Fix access violation during port device removal
  usb: dwc3: core: Prevent phy suspend during init
  usb: xhci-plat: Don't include xhci.h
  usb: gadget: uvc: use correct buffer size when parsing configfs lists
  usb: gadget: composite: fix OS descriptors w_value logic
  usb: gadget: f_fs: Fix race between aio_cancel() and AIO request complete

4 months agoMerge tag 'input-for-v6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor...
Linus Torvalds [Sun, 5 May 2024 17:00:47 +0000 (10:00 -0700)]
Merge tag 'input-for-v6.9-rc6' of git://git./linux/kernel/git/dtor/input

Pull input fixes from Dmitry Torokhov:

 - a new ID for ASUS ROG RAIKIRI controllers added to xpad driver

 - amimouse driver structure annotated with __refdata to prevent section
   mismatch warnings.

* tag 'input-for-v6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: amimouse - mark driver struct with __refdata to prevent section mismatch
  Input: xpad - add support for ASUS ROG RAIKIRI

4 months agoMerge tag 'probes-fixes-v6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 5 May 2024 16:56:50 +0000 (09:56 -0700)]
Merge tag 'probes-fixes-v6.9-rc6' of git://git./linux/kernel/git/trace/linux-trace

Pull probes fix from Masami Hiramatsu:

 - probe-events: Fix memory leak in parsing probe argument.

   There is a memory leak (forget to free an allocated buffer) in a
   memory allocation failure path. Fix it to jump to the correct error
   handling code.

* tag 'probes-fixes-v6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/probes: Fix memory leak in traceprobe_parse_probe_arg_body()

4 months agoMerge tag 'trace-v6.9-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace...
Linus Torvalds [Sun, 5 May 2024 16:53:09 +0000 (09:53 -0700)]
Merge tag 'trace-v6.9-rc6-2' of git://git./linux/kernel/git/trace/linux-trace

Pull tracing and tracefs fixes from Steven Rostedt:

 - Fix RCU callback of freeing an eventfs_inode.

   The freeing of the eventfs_inode from the kref going to zero freed
   the contents of the eventfs_inode and then used kfree_rcu() to free
   the inode itself. But the contents should also be protected by RCU.
   Switch to a call_rcu() that calls a function to free all of the
   eventfs_inode after the RCU synchronization.

 - The tracing subsystem maps its own descriptor to a file represented
   by eventfs. The freeing of this descriptor needs to know when the
   last reference of an eventfs_inode is released, but currently there
   is no interface for that.

   Add a "release" callback to the eventfs_inode entry array that allows
   for freeing of data that can be referenced by the eventfs_inode being
   opened. Then increment the ref counter for this descriptor when the
   eventfs_inode file is created, and decrement/free it when the last
   reference to the eventfs_inode is released and the file is removed.
   This prevents races between freeing the descriptor and the opening of
   the eventfs file.

 - Fix the permission processing of eventfs.

   The change to make the permissions of eventfs default to the mount
   point but keep track of when changes were made had a side effect that
   could cause security concerns. When the tracefs is remounted with a
   given gid or uid, all the files within it should inherit that gid or
   uid. But if the admin had changed the permission of some file within
   the tracefs file system, it would not get updated by the remount.

   This caused the kselftest of file permissions to fail the second time
   it is run. The first time, all changes would look fine, but the
   second time, because the changes were "saved", the remount did not
   reset them.

   Create a link list of all existing tracefs inodes, and clear the
   saved flags on them on a remount if the remount changes the
   corresponding gid or uid fields.

   This also simplifies the code by removing the distinction between the
   toplevel eventfs and an instance eventfs. They should both act the
   same. They were different because of a misconception due to the
   remount not resetting the flags. Now that remount resets all the
   files and directories to default to the root node if a uid/gid is
   specified, it makes the logic simpler to implement.

* tag 'trace-v6.9-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  eventfs: Have "events" directory get permissions from its parent
  eventfs: Do not treat events directory different than other directories
  eventfs: Do not differentiate the toplevel events directory
  tracefs: Still use mount point as default permissions for instances
  tracefs: Reset permissions on remount if permissions are options
  eventfs: Free all of the eventfs_inode after RCU
  eventfs/tracing: Add callback for release of an eventfs_inode

4 months agoMerge tag 'dma-mapping-6.9-2024-05-04' of git://git.infradead.org/users/hch/dma-mapping
Linus Torvalds [Sun, 5 May 2024 16:49:21 +0000 (09:49 -0700)]
Merge tag 'dma-mapping-6.9-2024-05-04' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fix from Christoph Hellwig:

 - fix the combination of restricted pools and dynamic swiotlb
   (Will Deacon)

* tag 'dma-mapping-6.9-2024-05-04' of git://git.infradead.org/users/hch/dma-mapping:
  swiotlb: initialise restricted pool list_head when SWIOTLB_DYNAMIC=y

4 months agoMerge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 5 May 2024 16:37:10 +0000 (09:37 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
 "A handful of clk driver fixes:

   - Avoid a deadlock in the Qualcomm clk driver by making the regulator
     which supplies the GDSC optional

   - Restore RPM clks on Qualcomm msm8976 by setting num_clks

   - Fix Allwinner H6 CPU rate changing logic to avoid system crashes by
     temporarily reparenting the CPU clk to something that isn't being
     changed

   - Set a MIPI PLL min/max rate on Allwinner A64 to fix blank screens
     on some devices

   - Revert back to of_match_device() in the Samsung clkout driver to
     get the match data based on the parent device's compatible string"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: samsung: Revert "clk: Use device_get_match_data()"
  clk: sunxi-ng: a64: Set minimum and maximum rate for PLL-MIPI
  clk: sunxi-ng: common: Support minimum and maximum rate
  clk: sunxi-ng: h6: Reparent CPUX during PLL CPUX rate change
  clk: qcom: smd-rpm: Restore msm8976 num_clk
  clk: qcom: gdsc: treat optional supplies as optional

4 months agoksmbd: do not grant v2 lease if parent lease key and epoch are not set
Namjae Jeon [Wed, 1 May 2024 12:58:15 +0000 (21:58 +0900)]
ksmbd: do not grant v2 lease if parent lease key and epoch are not set

This patch fix xfstests generic/070 test with smb2 leases = yes.

cifs.ko doesn't set parent lease key and epoch in create context v2 lease.
ksmbd suppose that parent lease and epoch are vaild if data length is
v2 lease context size and handle directory lease using this values.
ksmbd should hanle it as v1 lease not v2 lease if parent lease key and
epoch are not set in create context v2 lease.

Cc: stable@vger.kernel.org
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 months agoksmbd: use rwsem instead of rwlock for lease break
Namjae Jeon [Thu, 2 May 2024 01:07:50 +0000 (10:07 +0900)]
ksmbd: use rwsem instead of rwlock for lease break

lease break wait for lease break acknowledgment.
rwsem is more suitable than unlock while traversing the list for parent
lease break in ->m_op_list.

Cc: stable@vger.kernel.org
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 months agoksmbd: avoid to send duplicate lease break notifications
Namjae Jeon [Wed, 1 May 2024 12:44:02 +0000 (21:44 +0900)]
ksmbd: avoid to send duplicate lease break notifications

This patch fixes generic/011 when enable smb2 leases.

if ksmbd sends multiple notifications for a file, cifs increments
the reference count of the file but it does not decrement the count by
the failure of queue_work.
So even if the file is closed, cifs does not send a SMB2_CLOSE request.

Cc: stable@vger.kernel.org
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 months agoksmbd: off ipv6only for both ipv4/ipv6 binding
Namjae Jeon [Wed, 1 May 2024 12:41:50 +0000 (21:41 +0900)]
ksmbd: off ipv6only for both ipv4/ipv6 binding

ΕΛΕΝΗ reported that ksmbd binds to the IPV6 wildcard (::) by default for
ipv4 and ipv6 binding. So IPV4 connections are successful only when
the Linux system parameter bindv6only is set to 0 [default value].
If this parameter is set to 1, then the ipv6 wildcard only represents
any IPV6 address. Samba creates different sockets for ipv4 and ipv6
by default. This patch off sk_ipv6only to support IPV4/IPV6 connections
without creating two sockets.

Cc: stable@vger.kernel.org
Reported-by: ΕΛΕΝΗ ΤΖΑΒΕΛΛΑ <helentzavellas@yahoo.gr>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 months agoeventfs: Have "events" directory get permissions from its parent
Steven Rostedt (Google) [Thu, 2 May 2024 20:08:27 +0000 (16:08 -0400)]
eventfs: Have "events" directory get permissions from its parent

The events directory gets its permissions from the root inode. But this
can cause an inconsistency if the instances directory changes its
permissions, as the permissions of the created directories under it should
inherit the permissions of the instances directory when directories under
it are created.

Currently the behavior is:

 # cd /sys/kernel/tracing
 # chgrp 1002 instances
 # mkdir instances/foo
 # ls -l instances/foo
[..]
 -r--r-----  1 root lkp  0 May  1 18:55 buffer_total_size_kb
 -rw-r-----  1 root lkp  0 May  1 18:55 current_tracer
 -rw-r-----  1 root lkp  0 May  1 18:55 error_log
 drwxr-xr-x  1 root root 0 May  1 18:55 events
 --w-------  1 root lkp  0 May  1 18:55 free_buffer
 drwxr-x---  2 root lkp  0 May  1 18:55 options
 drwxr-x--- 10 root lkp  0 May  1 18:55 per_cpu
 -rw-r-----  1 root lkp  0 May  1 18:55 set_event

All the files and directories under "foo" has the "lkp" group except the
"events" directory. That's because its getting its default value from the
mount point instead of its parent.

Have the "events" directory make its default value based on its parent's
permissions. That now gives:

 # ls -l instances/foo
[..]
 -rw-r-----  1 root lkp 0 May  1 21:16 buffer_subbuf_size_kb
 -r--r-----  1 root lkp 0 May  1 21:16 buffer_total_size_kb
 -rw-r-----  1 root lkp 0 May  1 21:16 current_tracer
 -rw-r-----  1 root lkp 0 May  1 21:16 error_log
 drwxr-xr-x  1 root lkp 0 May  1 21:16 events
 --w-------  1 root lkp 0 May  1 21:16 free_buffer
 drwxr-x---  2 root lkp 0 May  1 21:16 options
 drwxr-x--- 10 root lkp 0 May  1 21:16 per_cpu
 -rw-r-----  1 root lkp 0 May  1 21:16 set_event

Link: https://lore.kernel.org/linux-trace-kernel/20240502200906.161887248@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 8186fff7ab649 ("tracefs/eventfs: Use root and instance inodes as default ownership")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
4 months agoeventfs: Do not treat events directory different than other directories
Steven Rostedt (Google) [Thu, 2 May 2024 20:08:26 +0000 (16:08 -0400)]
eventfs: Do not treat events directory different than other directories

Treat the events directory the same as other directories when it comes to
permissions. The events directory was considered different because it's
dentry is persistent, whereas the other directory dentries are created
when accessed. But the way tracefs now does its ownership by using the
root dentry's permissions as the default permissions, the events directory
can get out of sync when a remount is performed setting the group and user
permissions.

Remove the special case for the events directory on setting the
attributes. This allows the updates caused by remount to work properly as
well as simplifies the code.

Link: https://lore.kernel.org/linux-trace-kernel/20240502200906.002923579@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 8186fff7ab649 ("tracefs/eventfs: Use root and instance inodes as default ownership")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
4 months agoeventfs: Do not differentiate the toplevel events directory
Steven Rostedt (Google) [Thu, 2 May 2024 20:08:25 +0000 (16:08 -0400)]
eventfs: Do not differentiate the toplevel events directory

The toplevel events directory is really no different than the events
directory of instances. Having the two be different caused
inconsistencies and made it harder to fix the permissions bugs.

Make all events directories act the same.

Link: https://lore.kernel.org/linux-trace-kernel/20240502200905.846448710@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 8186fff7ab649 ("tracefs/eventfs: Use root and instance inodes as default ownership")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
4 months agotracefs: Still use mount point as default permissions for instances
Steven Rostedt (Google) [Thu, 2 May 2024 20:08:24 +0000 (16:08 -0400)]
tracefs: Still use mount point as default permissions for instances

If the instances directory's permissions were never change, then have it
and its children use the mount point permissions as the default.

Currently, the permissions of instance directories are determined by the
instance directory's permissions itself. But if the tracefs file system is
remounted and changes the permissions, the instance directory and its
children should use the new permission.

But because both the instance directory and its children use the instance
directory's inode for permissions, it misses the update.

To demonstrate this:

  # cd /sys/kernel/tracing/
  # mkdir instances/foo
  # ls -ld instances/foo
 drwxr-x--- 5 root root 0 May  1 19:07 instances/foo
  # ls -ld instances
 drwxr-x--- 3 root root 0 May  1 18:57 instances
  # ls -ld current_tracer
 -rw-r----- 1 root root 0 May  1 18:57 current_tracer

  # mount -o remount,gid=1002 .
  # ls -ld instances
 drwxr-x--- 3 root root 0 May  1 18:57 instances
  # ls -ld instances/foo/
 drwxr-x--- 5 root root 0 May  1 19:07 instances/foo/
  # ls -ld current_tracer
 -rw-r----- 1 root lkp 0 May  1 18:57 current_tracer

Notice that changing the group id to that of "lkp" did not affect the
instances directory nor its children. It should have been:

  # ls -ld current_tracer
 -rw-r----- 1 root root 0 May  1 19:19 current_tracer
  # ls -ld instances/foo/
 drwxr-x--- 5 root root 0 May  1 19:25 instances/foo/
  # ls -ld instances
 drwxr-x--- 3 root root 0 May  1 19:19 instances

  # mount -o remount,gid=1002 .
  # ls -ld current_tracer
 -rw-r----- 1 root lkp 0 May  1 19:19 current_tracer
  # ls -ld instances
 drwxr-x--- 3 root lkp 0 May  1 19:19 instances
  # ls -ld instances/foo/
 drwxr-x--- 5 root lkp 0 May  1 19:25 instances/foo/

Where all files were updated by the remount gid update.

Link: https://lore.kernel.org/linux-trace-kernel/20240502200905.686838327@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 8186fff7ab649 ("tracefs/eventfs: Use root and instance inodes as default ownership")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
4 months agotracefs: Reset permissions on remount if permissions are options
Steven Rostedt (Google) [Thu, 2 May 2024 20:08:23 +0000 (16:08 -0400)]
tracefs: Reset permissions on remount if permissions are options

There's an inconsistency with the way permissions are handled in tracefs.
Because the permissions are generated when accessed, they default to the
root inode's permission if they were never set by the user. If the user
sets the permissions, then a flag is set and the permissions are saved via
the inode (for tracefs files) or an internal attribute field (for
eventfs).

But if a remount happens that specify the permissions, all the files that
were not changed by the user gets updated, but the ones that were are not.
If the user were to remount the file system with a given permission, then
all files and directories within that file system should be updated.

This can cause security issues if a file's permission was updated but the
admin forgot about it. They could incorrectly think that remounting with
permissions set would update all files, but miss some.

For example:

 # cd /sys/kernel/tracing
 # chgrp 1002 current_tracer
 # ls -l
[..]
 -rw-r-----  1 root root 0 May  1 21:25 buffer_size_kb
 -rw-r-----  1 root root 0 May  1 21:25 buffer_subbuf_size_kb
 -r--r-----  1 root root 0 May  1 21:25 buffer_total_size_kb
 -rw-r-----  1 root lkp  0 May  1 21:25 current_tracer
 -rw-r-----  1 root root 0 May  1 21:25 dynamic_events
 -r--r-----  1 root root 0 May  1 21:25 dyn_ftrace_total_info
 -r--r-----  1 root root 0 May  1 21:25 enabled_functions

Where current_tracer now has group "lkp".

 # mount -o remount,gid=1001 .
 # ls -l
 -rw-r-----  1 root tracing 0 May  1 21:25 buffer_size_kb
 -rw-r-----  1 root tracing 0 May  1 21:25 buffer_subbuf_size_kb
 -r--r-----  1 root tracing 0 May  1 21:25 buffer_total_size_kb
 -rw-r-----  1 root lkp     0 May  1 21:25 current_tracer
 -rw-r-----  1 root tracing 0 May  1 21:25 dynamic_events
 -r--r-----  1 root tracing 0 May  1 21:25 dyn_ftrace_total_info
 -r--r-----  1 root tracing 0 May  1 21:25 enabled_functions

Everything changed but the "current_tracer".

Add a new link list that keeps track of all the tracefs_inodes which has
the permission flags that tell if the file/dir should use the root inode's
permission or not. Then on remount, clear all the flags so that the
default behavior of using the root inode's permission is done for all
files and directories.

Link: https://lore.kernel.org/linux-trace-kernel/20240502200905.529542160@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 8186fff7ab649 ("tracefs/eventfs: Use root and instance inodes as default ownership")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
4 months agoeventfs: Free all of the eventfs_inode after RCU
Steven Rostedt (Google) [Thu, 2 May 2024 20:08:22 +0000 (16:08 -0400)]
eventfs: Free all of the eventfs_inode after RCU

The freeing of eventfs_inode via a kfree_rcu() callback. But the content
of the eventfs_inode was being freed after the last kref. This is
dangerous, as changes are being made that can access the content of an
eventfs_inode from an RCU loop.

Instead of using kfree_rcu() use call_rcu() that calls a function to do
all the freeing of the eventfs_inode after a RCU grace period has expired.

Link: https://lore.kernel.org/linux-trace-kernel/20240502200905.370261163@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 43aa6f97c2d03 ("eventfs: Get rid of dentry pointers without refcounts")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
4 months agoeventfs/tracing: Add callback for release of an eventfs_inode
Steven Rostedt (Google) [Thu, 2 May 2024 13:03:15 +0000 (09:03 -0400)]
eventfs/tracing: Add callback for release of an eventfs_inode

Synthetic events create and destroy tracefs files when they are created
and removed. The tracing subsystem has its own file descriptor
representing the state of the events attached to the tracefs files.
There's a race between the eventfs files and this file descriptor of the
tracing system where the following can cause an issue:

With two scripts 'A' and 'B' doing:

  Script 'A':
    echo "hello int aaa" > /sys/kernel/tracing/synthetic_events
    while :
    do
      echo 0 > /sys/kernel/tracing/events/synthetic/hello/enable
    done

  Script 'B':
    echo > /sys/kernel/tracing/synthetic_events

Script 'A' creates a synthetic event "hello" and then just writes zero
into its enable file.

Script 'B' removes all synthetic events (including the newly created
"hello" event).

What happens is that the opening of the "enable" file has:

 {
struct trace_event_file *file = inode->i_private;
int ret;

ret = tracing_check_open_get_tr(file->tr);
 [..]

But deleting the events frees the "file" descriptor, and a "use after
free" happens with the dereference at "file->tr".

The file descriptor does have a reference counter, but there needs to be a
way to decrement it from the eventfs when the eventfs_inode is removed
that represents this file descriptor.

Add an optional "release" callback to the eventfs_entry array structure,
that gets called when the eventfs file is about to be removed. This allows
for the creating on the eventfs file to increment the tracing file
descriptor ref counter. When the eventfs file is deleted, it can call the
release function that will call the put function for the tracing file
descriptor.

This will protect the tracing file from being freed while a eventfs file
that references it is being opened.

Link: https://lore.kernel.org/linux-trace-kernel/20240426073410.17154-1-Tze-nan.Wu@mediatek.com/
Link: https://lore.kernel.org/linux-trace-kernel/20240502090315.448cba46@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode")
Reported-by: Tze-nan wu <Tze-nan.Wu@mediatek.com>
Tested-by: Tze-nan Wu (吳澤南) <Tze-nan.Wu@mediatek.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>