git.kernel.dk Git - linux-2.6-block.git/log

Merge branch 'mctp-serial-tx-escapes'

Matt Johnston says:

====================
net: mctp-serial: Fix for missing tx escapes

The mctp-serial code to add escape characters was incorrect due to an
off-by-one error. This series adds a test for the chunking which splits
by escape characters, and fixes the bug.

v2: Fix kunit param const pointer
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: mctp-serial: Fix missing escapes on transmit

0x7d and 0x7e bytes are meant to be escaped in the data portion of
frames, but this didn't occur since next_chunk_len() had an off-by-one
error. That also resulted in the final byte of a payload being written
as a separate tty write op.

The chunk prior to an escaped byte would be one byte short, and the
next call would never test the txpos+1 case, which is where the escaped
byte was located. That meant it never hit the escaping case in
mctp_serial_tx_work().

Example Input: 01 00 08 c8 7e 80 02

Previous incorrect chunks from next_chunk_len():

01 00 08
c8 7e 80
02

With this fix:

01 00 08 c8
7e
80 02

Cc: stable@vger.kernel.org
Fixes: a0c2ccd9b5ad ("mctp: Add MCTP-over-serial transport binding")
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mctp-serial: Add kunit test for next_chunk_len()

Test various edge cases of inputs that contain characters
that need escaping.

This adds a new kunit suite for mctp-serial.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp_bpf: fix return value of tcp_bpf_sendmsg()

When we cork messages in psock->cork, the last message triggers the
flushing will result in sending a sk_msg larger than the current
message size. In this case, in tcp_bpf_send_verdict(), 'copied' becomes
negative at least in the following case:

468         case __SK_DROP:
469         default:
470                 sk_msg_free_partial(sk, msg, tosend);
471                 sk_msg_apply_bytes(psock, tosend);
472                 *copied -= (tosend + delta); // <==== HERE
473                 return -EACCES;

Therefore, it could lead to the following BUG with a proper value of
'copied' (thanks to syzbot). We should not use negative 'copied' as a
return value here.

  ------------[ cut here ]------------
  kernel BUG at net/socket.c:733!
  Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
  Modules linked in:
  CPU: 0 UID: 0 PID: 3265 Comm: syz-executor510 Not tainted 6.11.0-rc3-syzkaller-00060-gd07b43284ab3 #0
  Hardware name: linux,dummy-virt (DT)
  pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
  pc : sock_sendmsg_nosec net/socket.c:733 [inline]
  pc : sock_sendmsg_nosec net/socket.c:728 [inline]
  pc : __sock_sendmsg+0x5c/0x60 net/socket.c:745
  lr : sock_sendmsg_nosec net/socket.c:730 [inline]
  lr : __sock_sendmsg+0x54/0x60 net/socket.c:745
  sp : ffff800088ea3b30
  x29: ffff800088ea3b30 x28: fbf00000062bc900 x27: 0000000000000000
  x26: ffff800088ea3bc0 x25: ffff800088ea3bc0 x24: 0000000000000000
  x23: f9f00000048dc000 x22: 0000000000000000 x21: ffff800088ea3d90
  x20: f9f00000048dc000 x19: ffff800088ea3d90 x18: 0000000000000001
  x17: 0000000000000000 x16: 0000000000000000 x15: 000000002002ffaf
  x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
  x11: 0000000000000000 x10: ffff8000815849c0 x9 : ffff8000815b49c0
  x8 : 0000000000000000 x7 : 000000000000003f x6 : 0000000000000000
  x5 : 00000000000007e0 x4 : fff07ffffd239000 x3 : fbf00000062bc900
  x2 : 0000000000000000 x1 : 0000000000000000 x0 : 00000000fffffdef
  Call trace:
   sock_sendmsg_nosec net/socket.c:733 [inline]
   __sock_sendmsg+0x5c/0x60 net/socket.c:745
   ____sys_sendmsg+0x274/0x2ac net/socket.c:2597
   ___sys_sendmsg+0xac/0x100 net/socket.c:2651
   __sys_sendmsg+0x84/0xe0 net/socket.c:2680
   __do_sys_sendmsg net/socket.c:2689 [inline]
   __se_sys_sendmsg net/socket.c:2687 [inline]
   __arm64_sys_sendmsg+0x24/0x30 net/socket.c:2687
   __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
   invoke_syscall+0x48/0x110 arch/arm64/kernel/syscall.c:49
   el0_svc_common.constprop.0+0x40/0xe0 arch/arm64/kernel/syscall.c:132
   do_el0_svc+0x1c/0x28 arch/arm64/kernel/syscall.c:151
   el0_svc+0x34/0xec arch/arm64/kernel/entry-common.c:712
   el0t_64_sync_handler+0x100/0x12c arch/arm64/kernel/entry-common.c:730
   el0t_64_sync+0x19c/0x1a0 arch/arm64/kernel/entry.S:598
  Code: f9404463 d63f0060 3108441f 54fffe81 (d4210000)
  ---[ end trace 0000000000000000 ]---

Fixes: 4f738adba30a ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data")
Reported-by: syzbot+58c03971700330ce14d8@syzkaller.appspotmail.com
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20240821030744.320934-1-xiyou.wangcong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/smc: prevent NULL pointer dereference in txopt_get

Since smc_inet6_prot does not initialize ipv6_pinfo_offset, inet6_create()
copies an incorrect address value, sk + 0 (offset), to inet_sk(sk)->pinet6.

In addition, since inet_sk(sk)->pinet6 and smc_sk(sk)->clcsock practically
point to the same address, when smc_create_clcsk() stores the newly
created clcsock in smc_sk(sk)->clcsock, inet_sk(sk)->pinet6 is corrupted
into clcsock. This causes NULL pointer dereference and various other
memory corruptions.

To solve this problem, you need to initialize ipv6_pinfo_offset, add a
smc6_sock structure, and then add ipv6_pinfo as the second member of
the smc_sock structure.

Reported-by: syzkaller <syzkaller@googlegroups.com>
Fixes: d25a92ccae6b ("net/smc: Introduce IPPROTO_SMC")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '1GbE' of git://git./linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2024-08-28 (igb, ice)

This series contains updates to igb and ice drivers.

Daiwei Li restores writing the TSICR (TimeSync Interrupt Cause)
register on 82850 devices to workaround a hardware issue for igb.

Dawid detaches netdev device for reset to avoid ethtool accesses during
reset causing NULL pointer dereferences on ice.

* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: Add netif_device_attach/detach into PF reset flow
igb: Fix not clearing TimeSync interrupts for 82580
====================

Link: https://patch.msgid.link/20240828225444.645154-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

MAINTAINERS: exclude bluetooth and wireless DT bindings from netdev ML

We exclude wireless drivers from the netdev@ traffic, to delegate
it to linux-wireless@, and avoid overwhelming netdev@.
Bluetooth drivers are implicitly excluded because they live under
drivers/bluetooth, not drivers/net.

In both cases DT bindings sit under Documentation/devicetree/bindings/net/
and aren't excluded. So if a patch series touches DT bindings
netdev@ ends up getting CCed, and these are usually fairly boring
series.

Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240828175821.2960423-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'net-6.11-rc6' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth, wireless and netfilter.

  No known outstanding regressions.

  Current release - regressions:

   - wifi: iwlwifi: fix hibernation

   - eth: ionic: prevent tx_timeout due to frequent doorbell ringing

  Previous releases - regressions:

   - sched: fix sch_fq incorrect behavior for small weights

   - wifi:
      - iwlwifi: take the mutex before running link selection
      - wfx: repair open network AP mode

   - netfilter: restore IP sanity checks for netdev/egress

   - tcp: fix forever orphan socket caused by tcp_abort

   - mptcp: close subflow when receiving TCP+FIN

   - bluetooth: fix random crash seen while removing btnxpuart driver

  Previous releases - always broken:

   - mptcp: more fixes for the in-kernel PM

   - eth: bonding: change ipsec_lock from spin lock to mutex

   - eth: mana: fix race of mana_hwc_post_rx_wqe and new hwc response

  Misc:

   - documentation: drop special comment style for net code"

* tag 'net-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits)
  nfc: pn533: Add poll mod list filling check
  mailmap: update entry for Sriram Yagnaraman
  selftests: mptcp: join: check re-re-adding ID 0 signal
  mptcp: pm: ADD_ADDR 0 is not a new address
  selftests: mptcp: join: validate event numbers
  mptcp: avoid duplicated SUB_CLOSED events
  selftests: mptcp: join: check re-re-adding ID 0 endp
  mptcp: pm: fix ID 0 endp usage after multiple re-creations
  mptcp: pm: do not remove already closed subflows
  selftests: mptcp: join: no extra msg if no counter
  selftests: mptcp: join: check re-adding init endp with != id
  mptcp: pm: reset MPC endp ID when re-added
  mptcp: pm: skip connecting to already established sf
  mptcp: pm: send ACK on an active subflow
  selftests: mptcp: join: check removing ID 0 endpoint
  mptcp: pm: fix RM_ADDR ID for the initial subflow
  mptcp: pm: reuse ID 0 after delete and re-add
  net: busy-poll: use ktime_get_ns() instead of local_clock()
  sctp: fix association labeling in the duplicate COOKIE-ECHO case
  mptcp: pr_debug: add missing \n at the end
  ...

nfc: pn533: Add poll mod list filling check

In case of im_protocols value is 1 and tm_protocols value is 0 this
combination successfully passes the check
'if (!im_protocols && !tm_protocols)' in the nfc_start_poll().
But then after pn533_poll_create_mod_list() call in pn533_start_poll()
poll mod list will remain empty and dev->poll_mod_count will remain 0
which lead to division by zero.

Normally no im protocol has value 1 in the mask, so this combination is
not expected by driver. But these protocol values actually come from
userspace via Netlink interface (NFC_CMD_START_POLL operation). So a
broken or malicious program may pass a message containing a "bad"
combination of protocol parameter values so that dev->poll_mod_count
is not incremented inside pn533_poll_create_mod_list(), thus leading
to division by zero.
Call trace looks like:
nfc_genl_start_poll()
  nfc_start_poll()
    ->start_poll()
    pn533_start_poll()

Add poll mod list filling check.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: dfccd0f58044 ("NFC: pn533: Add some polling entropy")
Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://patch.msgid.link/20240827084822.18785-1-amishin@t-argos.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'nf-24-08-28' of git://git./linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

Patch #1 sets on NFT_PKTINFO_L4PROTO for UDP packets less than 4 bytes
payload from netdev/egress by subtracting skb_network_offset() when
validating IPv4 packet length, otherwise 'meta l4proto udp' never
matches.

Patch #2 subtracts skb_network_offset() when validating IPv6 packet
length for netdev/egress.

netfilter pull request 24-08-28

* tag 'nf-24-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables_ipv6: consider network offset in netdev/egress validation
netfilter: nf_tables: restore IP sanity checks for netdev/egress
====================

Link: https://patch.msgid.link/20240828214708.619261-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mailmap: update entry for Sriram Yagnaraman

Link my old est.tech address to my active mail address

Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@ericsson.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Link: https://patch.msgid.link/20240828072417.4111996-1-sriram.yagnaraman@ericsson.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'mptcp-more-fixes-for-the-in-kernel-pm'

Matthieu Baerts says:

====================
mptcp: more fixes for the in-kernel PM

Here is a new batch of fixes for the MPTCP in-kernel path-manager:

Patch 1 ensures the address ID is set to 0 when the path-manager sends
an ADD_ADDR for the address of the initial subflow. The same fix is
applied when a new subflow is created re-using this special address. A
fix for v6.0.

Patch 2 is similar, but for the case where an endpoint is removed: if
this endpoint was used for the initial address, it is important to send
a RM_ADDR with this ID set to 0, and look for existing subflows with the
ID set to 0. A fix for v6.0 as well.

Patch 3 validates the two previous patches.

Patch 4 makes the PM selecting an "active" path to send an address
notification in an ACK, instead of taking the first path in the list. A
fix for v5.11.

Patch 5 fixes skipping the establishment of a new subflow if a previous
subflow using the same pair of addresses is being closed. A fix for
v5.13.

Patch 6 resets the ID linked to the initial subflow when the linked
endpoint is re-added, possibly with a different ID. A fix for v6.0.

Patch 7 validates the three previous patches.

Patch 8 is a small fix for the MPTCP Join selftest, when being used with
older subflows not supporting all MIB counters. A fix for a commit
introduced in v6.4, but backported up to v5.10.

Patch 9 avoids the PM to try to close the initial subflow multiple
times, and increment counters while nothing happened. A fix for v5.10.

Patch 10 stops incrementing local_addr_used and add_addr_accepted
counters when dealing with the address ID 0, because these counters are
not taking into account the initial subflow, and are then not
decremented when the linked addresses are removed. A fix for v6.0.

Patch 11 validates the previous patch.

Patch 12 avoids the PM to send multiple SUB_CLOSED events for the
initial subflow. A fix for v5.12.

Patch 13 validates the previous patch.

Patch 14 stops treating the ADD_ADDR 0 as a new address, and accepts it
in order to re-create the initial subflow if it has been closed, even if
the limit for *new* addresses -- not taking into account the address of
the initial subflow -- has been reached. A fix for v5.10.

Patch 15 validates the previous patch.

Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
Matthieu Baerts (NGI0) (15):
      mptcp: pm: reuse ID 0 after delete and re-add
      mptcp: pm: fix RM_ADDR ID for the initial subflow
      selftests: mptcp: join: check removing ID 0 endpoint
      mptcp: pm: send ACK on an active subflow
      mptcp: pm: skip connecting to already established sf
      mptcp: pm: reset MPC endp ID when re-added
      selftests: mptcp: join: check re-adding init endp with != id
      selftests: mptcp: join: no extra msg if no counter
      mptcp: pm: do not remove already closed subflows
      mptcp: pm: fix ID 0 endp usage after multiple re-creations
      selftests: mptcp: join: check re-re-adding ID 0 endp
      mptcp: avoid duplicated SUB_CLOSED events
      selftests: mptcp: join: validate event numbers
      mptcp: pm: ADD_ADDR 0 is not a new address
      selftests: mptcp: join: check re-re-adding ID 0 signal

net/mptcp/pm.c                                  |   4 +-
net/mptcp/pm_netlink.c                          |  87 ++++++++++----
net/mptcp/protocol.c                            |   6 +
net/mptcp/protocol.h                            |   5 +-
tools/testing/selftests/net/mptcp/mptcp_join.sh | 153 ++++++++++++++++++++----
tools/testing/selftests/net/mptcp/mptcp_lib.sh  |   4 +
6 files changed, 209 insertions(+), 50 deletions(-)
---
base-commit: 3a0504d54b3b57f0d7bf3d9184a00c9f8887f6d7
change-id: 20240826-net-mptcp-more-pm-fix-ffa61a36f817

Best regards,
====================

Link: https://patch.msgid.link/20240828-net-mptcp-more-pm-fix-v2-0-7f11b283fff7@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: mptcp: join: check re-re-adding ID 0 signal

This test extends "delete re-add signal" to validate the previous
commit: when the 'signal' endpoint linked to the initial subflow (ID 0)
is re-added multiple times, it will re-send the ADD_ADDR with id 0. The
client should still be able to re-create this subflow, even if the
add_addr_accepted limit has been reached as this special address is not
considered as a new address.

The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.

Fixes: d0876b2284cf ("mptcp: add the incoming RM_ADDR support")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: pm: ADD_ADDR 0 is not a new address

The ADD_ADDR 0 with the address from the initial subflow should not be
considered as a new address: this is not something new. If the host
receives it, it simply means that the address is available again.

When receiving an ADD_ADDR for the ID 0, the PM already doesn't consider
it as new by not incrementing the 'add_addr_accepted' counter. But the
'accept_addr' might not be set if the limit has already been reached:
this can be bypassed in this case. But before, it is important to check
that this ADD_ADDR for the ID 0 is for the same address as the initial
subflow. If not, it is not something that should happen, and the
ADD_ADDR can be ignored.

Note that if an ADD_ADDR is received while there is already a subflow
opened using the same address, this ADD_ADDR is ignored as well. It
means that if multiple ADD_ADDR for ID 0 are received, there will not be
any duplicated subflows created by the client.

Fixes: d0876b2284cf ("mptcp: add the incoming RM_ADDR support")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: mptcp: join: validate event numbers

This test extends "delete and re-add" and "delete re-add signal" to
validate the previous commit: the number of MPTCP events are checked to
make sure there are no duplicated or unexpected ones.

A new helper has been introduced to easily check these events. The
missing events have been added to the lib.

The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.

Fixes: b911c97c7dc7 ("mptcp: add netlink event support")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: avoid duplicated SUB_CLOSED events

The initial subflow might have already been closed, but still in the
connection list. When the worker is instructed to close the subflows
that have been marked as closed, it might then try to close the initial
subflow again.

A consequence of that is that the SUB_CLOSED event can be seen twice:

  # ip mptcp endpoint
  1.1.1.1 id 1 subflow dev eth0
  2.2.2.2 id 2 subflow dev eth1

  # ip mptcp monitor &
  [         CREATED] remid=0 locid=0 saddr4=1.1.1.1 daddr4=9.9.9.9
  [     ESTABLISHED] remid=0 locid=0 saddr4=1.1.1.1 daddr4=9.9.9.9
  [  SF_ESTABLISHED] remid=0 locid=2 saddr4=2.2.2.2 daddr4=9.9.9.9

  # ip mptcp endpoint delete id 1
  [       SF_CLOSED] remid=0 locid=0 saddr4=1.1.1.1 daddr4=9.9.9.9
  [       SF_CLOSED] remid=0 locid=0 saddr4=1.1.1.1 daddr4=9.9.9.9

The first one is coming from mptcp_pm_nl_rm_subflow_received(), and the
second one from __mptcp_close_subflow().

To avoid doing the post-closed processing twice, the subflow is now
marked as closed the first time.

Note that it is not enough to check if we are dealing with the first
subflow and check its sk_state: the subflow might have been reset or
closed before calling mptcp_close_ssk().

Fixes: b911c97c7dc7 ("mptcp: add netlink event support")
Cc: stable@vger.kernel.org
Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: mptcp: join: check re-re-adding ID 0 endp

This test extends "delete and re-add" to validate the previous commit:
when the endpoint linked to the initial subflow (ID 0) is re-added
multiple times, it was no longer being used, because the internal linked
counters are not decremented for this special endpoint: it is not an
additional endpoint.

Here, the "del/add id 0" steps are done 3 times to unsure this case is
validated.

The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.

Fixes: 3ad14f54bd74 ("mptcp: more accurate MPC endpoint tracking")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: pm: fix ID 0 endp usage after multiple re-creations

'local_addr_used' and 'add_addr_accepted' are decremented for addresses
not related to the initial subflow (ID0), because the source and
destination addresses of the initial subflows are known from the
beginning: they don't count as "additional local address being used" or
"ADD_ADDR being accepted".

It is then required not to increment them when the entrypoint used by
the initial subflow is removed and re-added during a connection. Without
this modification, this entrypoint cannot be removed and re-added more
than once.

Reported-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/512
Fixes: 3ad14f54bd74 ("mptcp: more accurate MPC endpoint tracking")
Reported-by: syzbot+455d38ecd5f655fc45cf@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/00000000000049861306209237f4@google.com
Cc: stable@vger.kernel.org
Tested-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: pm: do not remove already closed subflows

It is possible to have in the list already closed subflows, e.g. the
initial subflow has been already closed, but still in the list. No need
to try to close it again, and increments the related counters again.

Fixes: 0ee4261a3681 ("mptcp: implement mptcp_pm_remove_subflow")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: mptcp: join: no extra msg if no counter

The checksum and fail counters might not be available. Then no need to
display an extra message with missing info.

While at it, fix the indentation around, which is wrong since the same
commit.

Fixes: 47867f0a7e83 ("selftests: mptcp: join: skip check if MIB counter not supported")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: mptcp: join: check re-adding init endp with != id

The initial subflow has a special local ID: 0. It is specific per
connection.

When a global endpoint is deleted and re-added later, it can have a
different ID, but the kernel should still use the ID 0 if it corresponds
to the initial address.

This test validates this behaviour: the endpoint linked to the initial
subflow is removed, and re-added with a different ID.

Note that removing the initial subflow will not decrement the 'subflows'
counters, which corresponds to the *additional* subflows. On the other
hand, when the same endpoint is re-added, it will increment this
counter, as it will be seen as an additional subflow this time.

The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.

Fixes: 3ad14f54bd74 ("mptcp: more accurate MPC endpoint tracking")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: pm: reset MPC endp ID when re-added

The initial subflow has a special local ID: 0. It is specific per
connection.

When a global endpoint is deleted and re-added later, it can have a
different ID -- most services managing the endpoints automatically don't
force the ID to be the same as before. It is then important to track
these modifications to be consistent with the ID being used for the
address used by the initial subflow, not to confuse the other peer or to
send the ID 0 for the wrong address.

Now when removing an endpoint, msk->mpc_endpoint_id is reset if it
corresponds to this endpoint. When adding a new endpoint, the same
variable is updated if the address match the one of the initial subflow.

Fixes: 3ad14f54bd74 ("mptcp: more accurate MPC endpoint tracking")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: pm: skip connecting to already established sf

The lookup_subflow_by_daddr() helper checks if there is already a
subflow connected to this address. But there could be a subflow that is
closing, but taking time due to some reasons: latency, losses, data to
process, etc.

If an ADD_ADDR is received while the endpoint is being closed, it is
better to try connecting to it, instead of rejecting it: the peer which
has sent the ADD_ADDR will not be notified that the ADD_ADDR has been
rejected for this reason, and the expected subflow will not be created
at the end.

This helper should then only look for subflows that are established, or
going to be, but not the ones being closed.

Fixes: d84ad04941c3 ("mptcp: skip connecting the connected address")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: pm: send ACK on an active subflow

Taking the first one on the list doesn't work in some cases, e.g. if the
initial subflow is being removed. Pick another one instead of not
sending anything.

Fixes: 84dfe3677a6f ("mptcp: send out dedicated ADD_ADDR packet")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: mptcp: join: check removing ID 0 endpoint

Removing the endpoint linked to the initial subflow should trigger a
RM_ADDR for the right ID, and the removal of the subflow. That's what is
now being verified in the "delete and re-add" test.

Note that removing the initial subflow will not decrement the 'subflows'
counters, which corresponds to the *additional* subflows. On the other
hand, when the same endpoint is re-added, it will increment this
counter, as it will be seen as an additional subflow this time.

The 'Fixes' tag here below is the same as the one from the previous
commit: this patch here is not fixing anything wrong in the selftests,
but it validates the previous fix for an issue introduced by this commit
ID.

Fixes: 3ad14f54bd74 ("mptcp: more accurate MPC endpoint tracking")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: pm: fix RM_ADDR ID for the initial subflow

The initial subflow has a special local ID: 0. When an endpoint is being
deleted, it is then important to check if its address is not linked to
the initial subflow to send the right ID.

If there was an endpoint linked to the initial subflow, msk's
mpc_endpoint_id field will be set. We can then use this info when an
endpoint is being removed to see if it is linked to the initial subflow.

So now, the correct IDs are passed to mptcp_pm_nl_rm_addr_or_subflow(),
it is no longer needed to use mptcp_local_id_match().

Fixes: 3ad14f54bd74 ("mptcp: more accurate MPC endpoint tracking")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: pm: reuse ID 0 after delete and re-add

When the endpoint used by the initial subflow is removed and re-added
later, the PM has to force the ID 0, it is a special case imposed by the
MPTCP specs.

Note that the endpoint should then need to be re-added reusing the same
ID.

Fixes: 3ad14f54bd74 ("mptcp: more accurate MPC endpoint tracking")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'random-6.11-rc6-for-linus' of git://git./linux/kernel/git/crng/random

Pull random number generator fix from Jason Donenfeld:
"Reject invalid flags passed to vgetrandom() in the same way that
  getrandom() does, so that the behavior is the same, from Yann.

  The flags argument to getrandom() only has a behavioral effect on the
  function if the RNG isn't initialized yet, so vgetrandom() falls back
  to the syscall in that case. But if the RNG is initialized, all of the
  flags behave the same way, so vgetrandom() didn't bother checking
  them, and just ignored them entirely.

  But that doesn't account for invalid flags passed in, which need to be
  rejected so we can use them later"

* tag 'random-6.11-rc6-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
  random: vDSO: reject unknown getrandom() flags

net: busy-poll: use ktime_get_ns() instead of local_clock()

Typically, busy-polling durations are below 100 usec.

When/if the busy-poller thread migrates to another cpu,
local_clock() can be off by +/-2msec or more for small
values of HZ, depending on the platform.

Use ktimer_get_ns() to ensure deterministic behavior,
which is the whole point of busy-polling.

Fixes: 060212928670 ("net: add low latency socket poll")
Fixes: 9a3c71aa8024 ("net: convert low latency sockets to sched_clock()")
Fixes: 37089834528b ("sched, net: Fixup busy_loop_us_clock()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Reviewed-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20240827114916.223377-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'wireless-2024-08-28' of git://git./linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
Regressions:
* wfx: fix for open network connection
* iwlwifi: fix for hibernate (due to fast resume feature)
* iwlwifi: fix for a few warnings that were recently added
   (had previously been messages not warnings)

Previously broken:
* mwifiex: fix static structures used for per-device data
* iwlwifi: some harmless FW related messages were tagged
   too high priority
* iwlwifi: scan buffers weren't checked correctly
* mac80211: SKB leak on beacon error path
* iwlwifi: fix ACPI table interop with certain BIOSes
* iwlwifi: fix locking for link selection
* mac80211: fix SSID comparison in beacon validation

* tag 'wireless-2024-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: iwlwifi: clear trans->state earlier upon error
  wifi: wfx: repair open network AP mode
  wifi: mac80211: free skb on error path in ieee80211_beacon_get_ap()
  wifi: iwlwifi: mvm: don't wait for tx queues if firmware is dead
  wifi: iwlwifi: mvm: allow 6 GHz channels in MLO scan
  wifi: iwlwifi: mvm: pause TCM when the firmware is stopped
  wifi: iwlwifi: fw: fix wgds rev 3 exact size
  wifi: iwlwifi: mvm: take the mutex before running link selection
  wifi: iwlwifi: mvm: fix iwl_mvm_max_scan_ie_fw_cmd_room()
  wifi: iwlwifi: mvm: fix iwl_mvm_scan_fits() calculation
  wifi: iwlwifi: lower message level for FW buffer destination
  wifi: iwlwifi: mvm: fix hibernation
  wifi: mac80211: fix beacon SSID mismatch handling
  wifi: mwifiex: duplicate static structs used in driver instances
====================

Link: https://patch.msgid.link/20240828100151.23662-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ice: Add netif_device_attach/detach into PF reset flow

Ethtool callbacks can be executed while reset is in progress and try to
access deleted resources, e.g. getting coalesce settings can result in a
NULL pointer dereference seen below.

Reproduction steps:
Once the driver is fully initialized, trigger reset:
# echo 1 > /sys/class/net/<interface>/device/reset
when reset is in progress try to get coalesce settings using ethtool:
# ethtool -c <interface>

BUG: kernel NULL pointer dereference, address: 0000000000000020
PGD 0 P4D 0
Oops: Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 11 PID: 19713 Comm: ethtool Tainted: G S                 6.10.0-rc7+ #7
RIP: 0010:ice_get_q_coalesce+0x2e/0xa0 [ice]
RSP: 0018:ffffbab1e9bcf6a8 EFLAGS: 00010206
RAX: 000000000000000c RBX: ffff94512305b028 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9451c3f2e588 RDI: ffff9451c3f2e588
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff9451c3f2e580 R11: 000000000000001f R12: ffff945121fa9000
R13: ffffbab1e9bcf760 R14: 0000000000000013 R15: ffffffff9e65dd40
FS:  00007faee5fbe740(0000) GS:ffff94546fd80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000020 CR3: 0000000106c2e005 CR4: 00000000001706f0
Call Trace:
<TASK>
ice_get_coalesce+0x17/0x30 [ice]
coalesce_prepare_data+0x61/0x80
ethnl_default_doit+0xde/0x340
genl_family_rcv_msg_doit+0xf2/0x150
genl_rcv_msg+0x1b3/0x2c0
netlink_rcv_skb+0x5b/0x110
genl_rcv+0x28/0x40
netlink_unicast+0x19c/0x290
netlink_sendmsg+0x222/0x490
__sys_sendto+0x1df/0x1f0
__x64_sys_sendto+0x24/0x30
do_syscall_64+0x82/0x160
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7faee60d8e27

Calling netif_device_detach() before reset makes the net core not call
the driver when ethtool command is issued, the attempt to execute an
ethtool command during reset will result in the following message:

    netlink error: No such device

instead of NULL pointer dereference. Once reset is done and
ice_rebuild() is executing, the netif_device_attach() is called to allow
for ethtool operations to occur again in a safe manner.

Fixes: fcea6f3da546 ("ice: Add stats and ethtool support")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com>
Signed-off-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

igb: Fix not clearing TimeSync interrupts for 82580

82580 NICs have a hardware bug that makes it
necessary to write into the TSICR (TimeSync Interrupt Cause) register
to clear it:
https://lore.kernel.org/all/CDCB8BE0.1EC2C%25matthew.vick@intel.com/

Add a conditional so only for 82580 we write into the TSICR register,
so we don't risk losing events for other models.

Without this change, when running ptp4l with an Intel 82580 card,
I get the following output:

> timed out while polling for tx timestamp increasing tx_timestamp_timeout or
> increasing kworker priority may correct this issue, but a driver bug likely
> causes it

This goes away with this change.

This (partially) reverts commit ee14cc9ea19b ("igb: Fix missing time sync events").

Fixes: ee14cc9ea19b ("igb: Fix missing time sync events")
Closes: https://lore.kernel.org/intel-wired-lan/CAN0jFd1kO0MMtOh8N2Ztxn6f7vvDKp2h507sMryobkBKe=xk=w@mail.gmail.com/
Tested-by: Daiwei Li <daiweili@google.com>
Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Daiwei Li <daiweili@google.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

Merge tag 'loongarch-fixes-6.11-2' of git://git./linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch fixes from Huacai Chen:
"Remove the unused dma-direct.h, and some bug & warning fixes"

* tag 'loongarch-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: KVM: Invalidate guest steal time address on vCPU reset
  LoongArch: Add ifdefs to fix LSX and LASX related warnings
  LoongArch: Define ARCH_IRQ_INIT_FLAGS as IRQ_NOPROBE
  LoongArch: Remove the unused dma-direct.h

Merge tag 'platform-drivers-x86-v6.11-5' of git://git./linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform drivers fixes from Ilpo Järvinen:

- platform/x86/amd/pmc: AMD 1Ah model 60h series support (2nd attempt)

- asus-wmi: Prevent spurious rfkill on Asus Zenbook Duo

- x86-android-tablets: Relax DMI match to cover another model

* tag 'platform-drivers-x86-v6.11-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: x86-android-tablets: Make Lenovo Yoga Tab 3 X90F DMI match less strict
  platform/x86: asus-wmi: Fix spurious rfkill on UX8406MA
  platform/x86/amd/pmc: Extend support for PMC features on new AMD platform
  platform/x86/amd/pmc: Fix SMU command submission path on new AMD platform

Merge tag 'nfsd-6.11-2' of git://git./linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

- Fix a number of crashers

- Update email address for an NFSD reviewer

* tag 'nfsd-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  fs/nfsd: fix update of inode attrs in CB_GETATTR
  nfsd: fix potential UAF in nfsd4_cb_getattr_release
  nfsd: hold reference to delegation when updating it for cb_getattr
  MAINTAINERS: Update Olga Kornievskaia's email address
  nfsd: prevent panic for nfsv4.0 closed files in nfs4_show_open
  nfsd: ensure that nfsd4_fattr_args.context is zeroed out

Merge tag 'for-6.11-rc5-tag' of git://git./linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

- fix use-after-free when submitting bios for read, after an error and
   partially submitted bio the original one is freed while it can be
   still be accessed again

- fix fstests case btrfs/301, with enabled quotas wait for delayed
   iputs when flushing delalloc

- fix periodic block group reclaim, an unitialized value can be
   returned if there are no block groups to reclaim

- fix build warning (-Wmaybe-uninitialized)

* tag 'for-6.11-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix uninitialized return value from btrfs_reclaim_sweep()
  btrfs: fix a use-after-free when hitting errors inside btrfs_submit_chunk()
  btrfs: initialize last_extent_end to fix -Wmaybe-uninitialized warning in extent_fiemap()
  btrfs: run delayed iputs when flushing delalloc

Merge tag 'v6.11-rc5-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

- two RDMA/smbdirect fixes and a minor cleanup

- punch hole fix

* tag 'v6.11-rc5-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Fix FALLOC_FL_PUNCH_HOLE support
  smb/client: fix rdma usage in smb2_async_writev()
  smb/client: remove unused rq_iter_size from struct smb_rqst
  smb/client: avoid dereferencing rdata=NULL in smb2_new_read_req()

Merge tag 'tpmdd-next-6.11-rc6' of git://git./linux/kernel/git/jarkko/linux-tpmdd

Pull TPM fix from Jarkko Sakkinen:
"A bug fix for tpm_ibmvtpm driver so that it will take the bus
encryption into use"

* tag 'tpmdd-next-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
tpm: ibmvtpm: Call tpm2_sessions_init() to initialize session support

sctp: fix association labeling in the duplicate COOKIE-ECHO case

sctp_sf_do_5_2_4_dupcook() currently calls security_sctp_assoc_request()
on new_asoc, but as it turns out, this association is always discarded
and the LSM labels never get into the final association (asoc).

This can be reproduced by having two SCTP endpoints try to initiate an
association with each other at approximately the same time and then peel
off the association into a new socket, which exposes the unitialized
labels and triggers SELinux denials.

Fix it by calling security_sctp_assoc_request() on asoc instead of
new_asoc. Xin Long also suggested limit calling the hook only to cases
A, B, and D, since in cases C and E the COOKIE ECHO chunk is discarded
and the association doesn't enter the ESTABLISHED state, so rectify that
as well.

One related caveat with SELinux and peer labeling: When an SCTP
connection is set up simultaneously in this way, we will end up with an
association that is initialized with security_sctp_assoc_request() on
both sides, so the MLS component of the security context of the
association will get swapped between the peers, instead of just one side
setting it to the other's MLS component. However, at that point
security_sctp_assoc_request() had already been called on both sides in
sctp_sf_do_unexpected_init() (on a temporary association) and thus if
the exchange didn't fail before due to MLS, it won't fail now either
(most likely both endpoints have the same MLS range).

Tested by:
- reproducer from https://src.fedoraproject.org/tests/selinux/pull-request/530
- selinux-testsuite (https://github.com/SELinuxProject/selinux-testsuite/)
- sctp-tests (https://github.com/sctp/sctp-tests) - no tests failed
that wouldn't fail also without the patch applied

Fixes: c081d53f97a1 ("security: pass asoc to sctp_assoc_request and sctp_sk_clone")
Suggested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Paul Moore <paul@paul-moore.com> (LSM/SELinux)
Link: https://patch.msgid.link/20240826130711.141271-1-omosnace@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'mptcp-close-subflow-when-receiving-tcp-fin-and-misc'

Matthieu Baerts says:

====================
mptcp: close subflow when receiving TCP+FIN and misc.

Here are different fixes:

Patch 1 closes the subflow after having received a FIN, instead
of leaving it half-closed until the end of the MPTCP connection.
A fix for v5.12.

Patch 2 validates the previous patch.

Patch 3 is a fix for a recent fix to check both directions for the
backup flag. It can follow the 'Fixes' commit and be backported up
to v5.7.

Patch 4 adds a missing \n at the end of pr_debug(), causing debug
messages to be displayed with a delay, which confuses the debugger.
A fix for v5.6.
====================

Link: https://patch.msgid.link/20240826-net-mptcp-close-extra-sf-fin-v1-0-905199fe1172@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: pr_debug: add missing \n at the end

pr_debug() have been added in various places in MPTCP code to help
developers to debug some situations. With the dynamic debug feature, it
is easy to enable all or some of them, and asks users to reproduce
issues with extra debug.

Many of these pr_debug() don't end with a new line, while no 'pr_cont()'
are used in MPTCP code. So the goal was not to display multiple debug
messages on one line: they were then not missing the '\n' on purpose.
Not having the new line at the end causes these messages to be printed
with a delay, when something else needs to be printed. This issue is not
visible when many messages need to be printed, but it is annoying and
confusing when only specific messages are expected, e.g.

  # echo "func mptcp_pm_add_addr_echoed +fmp" \
        > /sys/kernel/debug/dynamic_debug/control
  # ./mptcp_join.sh "signal address"; \
        echo "$(awk '{print $1}' /proc/uptime) - end"; \
        sleep 5s; \
        echo "$(awk '{print $1}' /proc/uptime) - restart"; \
        ./mptcp_join.sh "signal address"
  013 signal address
      (...)
  10.75 - end
  15.76 - restart
  013 signal address
  [  10.367935] mptcp:mptcp_pm_add_addr_echoed: MPTCP: msk=(...)
      (...)

  => a delay of 5 seconds: printed with a 10.36 ts, but after 'restart'
     which was printed at the 15.76 ts.

The 'Fixes' tag here below points to the first pr_debug() used without
'\n' in net/mptcp. This patch could be split in many small ones, with
different Fixes tag, but it doesn't seem worth it, because it is easy to
re-generate this patch with this simple 'sed' command:

  git grep -l pr_debug -- net/mptcp |
    xargs sed -i "s/$pr_debug(\".*[^n]$$\"[,)]$/\1\\\n\2/g"

So in case of conflicts, simply drop the modifications, and launch this
command.

Fixes: f870fa0b5768 ("mptcp: Add MPTCP socket stubs")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240826-net-mptcp-close-extra-sf-fin-v1-4-905199fe1172@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: sched: check both backup in retrans

The 'mptcp_subflow_context' structure has two items related to the
backup flags:

- 'backup': the subflow has been marked as backup by the other peer

- 'request_bkup': the backup flag has been set by the host

Looking only at the 'backup' flag can make sense in some cases, but it
is not the behaviour of the default packet scheduler when selecting
paths.

As explained in the commit b6a66e521a20 ("mptcp: sched: check both
directions for backup"), the packet scheduler should look at both flags,
because that was the behaviour from the beginning: the 'backup' flag was
set by accident instead of the 'request_bkup' one. Now that the latter
has been fixed, get_retrans() needs to be adapted as well.

Fixes: b6a66e521a20 ("mptcp: sched: check both directions for backup")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240826-net-mptcp-close-extra-sf-fin-v1-3-905199fe1172@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: mptcp: join: cannot rm sf if closed

Thanks to the previous commit, the MPTCP subflows are now closed on both
directions even when only the MPTCP path-manager of one peer asks for
their closure.

In the two tests modified here -- "userspace pm add & remove address"
and "userspace pm create destroy subflow" -- one peer is controlled by
the userspace PM, and the other one by the in-kernel PM. When the
userspace PM sends a RM_ADDR notification, the in-kernel PM will
automatically react by closing all subflows using this address. Now,
thanks to the previous commit, the subflows are properly closed on both
directions, the userspace PM can then no longer closes the same
subflows if they are already closed. Before, it was OK to do that,
because the subflows were still half-opened, still OK to send a RM_ADDR.

In other words, thanks to the previous commit closing the subflows, an
error will be returned to the userspace if it tries to close a subflow
that has already been closed. So no need to run this command, which mean
that the linked counters will then not be incremented.

These tests are then no longer sending both a RM_ADDR, then closing the
linked subflow just after. The test with the userspace PM on the server
side is now removing one subflow linked to one address, then sending
a RM_ADDR for another address. The test with the userspace PM on the
client side is now only removing the subflow that was previously
created.

Fixes: 4369c198e599 ("selftests: mptcp: test userspace pm out of transfer")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240826-net-mptcp-close-extra-sf-fin-v1-2-905199fe1172@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mptcp: close subflow when receiving TCP+FIN

When a peer decides to close one subflow in the middle of a connection
having multiple subflows, the receiver of the first FIN should accept
that, and close the subflow on its side as well. If not, the subflow
will stay half closed, and would even continue to be used until the end
of the MPTCP connection or a reset from the network.

The issue has not been seen before, probably because the in-kernel
path-manager always sends a RM_ADDR before closing the subflow. Upon the
reception of this RM_ADDR, the other peer will initiate the closure on
its side as well. On the other hand, if the RM_ADDR is lost, or if the
path-manager of the other peer only closes the subflow without sending a
RM_ADDR, the subflow would switch to TCP_CLOSE_WAIT, but that's it,
leaving the subflow half-closed.

So now, when the subflow switches to the TCP_CLOSE_WAIT state, and if
the MPTCP connection has not been closed before with a DATA_FIN, the
kernel owning the subflow schedules its worker to initiate the closure
on its side as well.

This issue can be easily reproduced with packetdrill, as visible in [1],
by creating an additional subflow, injecting a FIN+ACK before sending
the DATA_FIN, and expecting a FIN+ACK in return.

Fixes: 40947e13997a ("mptcp: schedule worker when subflow is closed")
Cc: stable@vger.kernel.org
Link: https://github.com/multipath-tcp/packetdrill/pull/154
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240826-net-mptcp-close-extra-sf-fin-v1-1-905199fe1172@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tcp: fix forever orphan socket caused by tcp_abort

We have some problem closing zero-window fin-wait-1 tcp sockets in our
environment. This patch come from the investigation.

Previously tcp_abort only sends out reset and calls tcp_done when the
socket is not SOCK_DEAD, aka orphan. For orphan socket, it will only
purging the write queue, but not close the socket and left it to the
timer.

While purging the write queue, tp->packets_out and sk->sk_write_queue
is cleared along the way. However tcp_retransmit_timer have early
return based on !tp->packets_out and tcp_probe_timer have early
return based on !sk->sk_write_queue.

This caused ICSK_TIME_RETRANS and ICSK_TIME_PROBE0 not being resched
and socket not being killed by the timers, converting a zero-windowed
orphan into a forever orphan.

This patch removes the SOCK_DEAD check in tcp_abort, making it send
reset to peer and close the socket accordingly. Preventing the
timer-less orphan from happening.

According to Lorenzo's email in the v1 thread, the check was there to
prevent force-closing the same socket twice. That situation is handled
by testing for TCP_CLOSE inside lock, and returning -ENOENT if it is
already closed.

The -ENOENT code comes from the associate patch Lorenzo made for
iproute2-ss; link attached below, which also conform to RFC 9293.

At the end of the patch, tcp_write_queue_purge(sk) is removed because it
was already called in tcp_done_with_error().

p.s. This is the same patch with v2. Resent due to mis-labeled "changes
requested" on patchwork.kernel.org.

Link: https://patchwork.ozlabs.org/project/netdev/patch/1450773094-7978-3-git-send-email-lorenzo@google.com/
Fixes: c1e64e298b8c ("net: diag: Support destroying TCP sockets.")
Signed-off-by: Xueming Feng <kuro@kuroa.me>
Tested-by: Lorenzo Colitti <lorenzo@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240826102327.1461482-1-kuro@kuroa.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

gtp: fix a potential NULL pointer dereference

When sockfd_lookup() fails, gtp_encap_enable_socket() returns a
NULL pointer, but its callers only check for error pointers thus miss
the NULL pointer case.

Fix it by returning an error pointer with the error code carried from
sockfd_lookup().

(I found this bug during code inspection.)

Fixes: 1e3a3abd8b28 ("gtp: make GTP sockets in gtp_newlink optional")
Cc: Andreas Schultz <aschultz@tpip.net>
Cc: Harald Welte <laforge@gnumonks.org>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Link: https://patch.msgid.link/20240825191638.146748-1-xiyou.wangcong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'fixes-for-ipsec-over-bonding'

Jianbo Liu says:

====================
Fixes for IPsec over bonding

This patchset provides bug fixes for IPsec over bonding driver.

It adds the missing xdo_dev_state_free API, and fixes "scheduling while
atomic" by using mutex lock instead.

Series generated against:
commit c07ff8592d57 ("netem: fix return value if duplicate enqueue fails")
====================

Link: https://patch.msgid.link/20240823031056.110999-1-jianbol@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bonding: change ipsec_lock from spin lock to mutex

In the cited commit, bond->ipsec_lock is added to protect ipsec_list,
hence xdo_dev_state_add and xdo_dev_state_delete are called inside
this lock. As ipsec_lock is a spin lock and such xfrmdev ops may sleep,
"scheduling while atomic" will be triggered when changing bond's
active slave.

[  101.055189] BUG: scheduling while atomic: bash/902/0x00000200
[  101.055726] Modules linked in:
[  101.058211] CPU: 3 PID: 902 Comm: bash Not tainted 6.9.0-rc4+ #1
[  101.058760] Hardware name:
[  101.059434] Call Trace:
[  101.059436]  <TASK>
[  101.060873]  dump_stack_lvl+0x51/0x60
[  101.061275]  __schedule_bug+0x4e/0x60
[  101.061682]  __schedule+0x612/0x7c0
[  101.062078]  ? __mod_timer+0x25c/0x370
[  101.062486]  schedule+0x25/0xd0
[  101.062845]  schedule_timeout+0x77/0xf0
[  101.063265]  ? asm_common_interrupt+0x22/0x40
[  101.063724]  ? __bpf_trace_itimer_state+0x10/0x10
[  101.064215]  __wait_for_common+0x87/0x190
[  101.064648]  ? usleep_range_state+0x90/0x90
[  101.065091]  cmd_exec+0x437/0xb20 [mlx5_core]
[  101.065569]  mlx5_cmd_do+0x1e/0x40 [mlx5_core]
[  101.066051]  mlx5_cmd_exec+0x18/0x30 [mlx5_core]
[  101.066552]  mlx5_crypto_create_dek_key+0xea/0x120 [mlx5_core]
[  101.067163]  ? bonding_sysfs_store_option+0x4d/0x80 [bonding]
[  101.067738]  ? kmalloc_trace+0x4d/0x350
[  101.068156]  mlx5_ipsec_create_sa_ctx+0x33/0x100 [mlx5_core]
[  101.068747]  mlx5e_xfrm_add_state+0x47b/0xaa0 [mlx5_core]
[  101.069312]  bond_change_active_slave+0x392/0x900 [bonding]
[  101.069868]  bond_option_active_slave_set+0x1c2/0x240 [bonding]
[  101.070454]  __bond_opt_set+0xa6/0x430 [bonding]
[  101.070935]  __bond_opt_set_notify+0x2f/0x90 [bonding]
[  101.071453]  bond_opt_tryset_rtnl+0x72/0xb0 [bonding]
[  101.071965]  bonding_sysfs_store_option+0x4d/0x80 [bonding]
[  101.072567]  kernfs_fop_write_iter+0x10c/0x1a0
[  101.073033]  vfs_write+0x2d8/0x400
[  101.073416]  ? alloc_fd+0x48/0x180
[  101.073798]  ksys_write+0x5f/0xe0
[  101.074175]  do_syscall_64+0x52/0x110
[  101.074576]  entry_SYSCALL_64_after_hwframe+0x4b/0x53

As bond_ipsec_add_sa_all and bond_ipsec_del_sa_all are only called
from bond_change_active_slave, which requires holding the RTNL lock.
And bond_ipsec_add_sa and bond_ipsec_del_sa are xfrm state
xdo_dev_state_add and xdo_dev_state_delete APIs, which are in user
context. So ipsec_lock doesn't have to be spin lock, change it to
mutex, and thus the above issue can be resolved.

Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
Link: https://patch.msgid.link/20240823031056.110999-4-jianbol@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bonding: extract the use of real_device into local variable

Add a local variable for slave->dev, to prepare for the lock change in
the next patch. There is no functionality change.

Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
Link: https://patch.msgid.link/20240823031056.110999-3-jianbol@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bonding: implement xdo_dev_state_free and call it after deletion

Add this implementation for bonding, so hardware resources can be
freed from the active slave after xfrm state is deleted. The netdev
used to invoke xdo_dev_state_free callback, is saved in the xfrm state
(xs->xso.real_dev), which is also the bond's active slave. To prevent
it from being freed, acquire netdev reference before leaving RCU
read-side critical section, and release it after callback is done.

And call it when deleting all SAs from old active real interface while
switching current active slave.

Fixes: 9a5605505d9c ("bonding: Add struct bond_ipesc to manage SA")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
Link: https://patch.msgid.link/20240823031056.110999-2-jianbol@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: forwarding: local_termination: Down ports on cleanup

This test neglects to put ports down on cleanup. Fix it.

Fixes: 90b9566aa5cd ("selftests: forwarding: add a test for local_termination.sh")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/bf9b79f45de378f88344d44550f0a5052b386199.1724692132.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: forwarding: no_forwarding: Down ports on cleanup

This test neglects to put ports down on cleanup. Fix it.

Fixes: 476a4f05d9b8 ("selftests: forwarding: add a no_forwarding.sh test")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/0baf91dc24b95ae0cadfdf5db05b74888e6a228a.1724430120.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'livepatching-for-6.11-rc6' of git://git./linux/kernel/git/livepatching/livepatching

Pull livepatching fix from Petr Mladek:
"Selftest regression fix"

* tag 'livepatching-for-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
selftests/livepatch: wait for atomic replace to occur

Merge tag 'pinctrl-v6.11-2' of git://git./linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:

- Fix the hwirq map and pin offsets in the Qualcomm X1E80100 driver

- Fix the pin range handling in the AT91 driver so it works again

- Fix a NULL-dereference risk in pinctrl single

- Fix a serious biasing bug in the Mediatek driver

- Fix the level trigged IRQ in the StarFive JH7110

- Fix the iomux width in the Rockchip GPIO2-B pin handling

* tag 'pinctrl-v6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: rockchip: correct RK3328 iomux width flag for GPIO2-B pins
  pinctrl: starfive: jh7110: Correct the level trigger configuration of iev register
  pinctrl: qcom: x1e80100: Fix special pin offsets
  pinctrl: mediatek: common-v2: Fix broken bias-disable for PULL_PU_PD_RSEL_TYPE
  pinctrl: single: fix potential NULL dereference in pcs_get_function()
  pinctrl: at91: make it work with current gpiolib
  pinctrl: qcom: x1e80100: Update PDC hwirq map

Merge tag 'sound-6.11-rc6' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"It became a bit larger collection of fixes than wished at this time,
  but all changes are small and mostly device-specific fixes that should
  be fairly safe to apply.

  Majority of fixes are about ASoC for AMD SOF, Cirrus codecs, lpass,
  etc, in addition to the usual HD-audio quirks / fixes"

* tag 'sound-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (22 commits)
  ALSA: hda: hda_component: Fix mutex crash if nothing ever binds
  ALSA: hda/realtek: support HP Pavilion Aero 13-bg0xxx Mute LED
  ALSA: hda/realtek: Fix the speaker output on Samsung Galaxy Book3 Ultra
  ASoC: cs-amp-lib: Ignore empty UEFI calibration entries
  ASoC: cs-amp-lib-test: Force test calibration blob entries to be valid
  ALSA: hda/realtek - FIxed ALC285 headphone no sound
  ALSA: hda/realtek - Fixed ALC256 headphone no sound
  ASoC: allow module autoloading for table board_ids
  ASoC: allow module autoloading for table db1200_pids
  ALSA: hda: cs35l56: Don't use the device index as a calibration index
  ALSA: seq: Skip event type filtering for UMP events
  ALSA: hda/realtek: Enable mute/micmute LEDs on HP Laptop 14-ey0xxx
  ASoC: SOF: amd: Fix for acp init sequence
  ASoC: amd: acp: fix module autoloading
  ASoC: mediatek: mt8188: Mark AFE_DAC_CON0 register as volatile
  ASoC: codecs: wcd937x: Fix missing de-assert of reset GPIO
  ASoC: SOF: mediatek: Add missing board compatible
  ASoC: MAINTAINERS: Drop Banajit Goswami from Qualcomm sound drivers
  ASoC: SOF: amd: Fix for incorrect acp error register offsets
  ASoC: SOF: amd: move iram-dram fence register programming sequence
  ...

tpm: ibmvtpm: Call tpm2_sessions_init() to initialize session support

Commit d2add27cf2b8 ("tpm: Add NULL primary creation") introduced
CONFIG_TCG_TPM2_HMAC. When this option is enabled on ppc64 then the
following message appears in the kernel log due to a missing call to
tpm2_sessions_init().

[ 2.654549] tpm tpm0: auth session is not active

Add the missing call to tpm2_session_init() to the ibmvtpm driver to
resolve this issue.

Cc: stable@vger.kernel.org # v6.10+
Fixes: d2add27cf2b8 ("tpm: Add NULL primary creation")
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

netfilter: nf_tables_ipv6: consider network offset in netdev/egress validation

From netdev/egress, skb->len can include the ethernet header, therefore,
subtract network offset from skb->len when validating IPv6 packet length.

Fixes: 42df6e1d221d ("netfilter: Introduce egress hook")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

net_sched: sch_fq: fix incorrect behavior for small weights

fq_dequeue() has a complex logic to find packets in one of the 3 bands.

As Neal found out, it is possible that one band has a deficit smaller
than its weight. fq_dequeue() can return NULL while some packets are
elligible for immediate transmit.

In this case, more than one iteration is needed to refill pband->credit.

With default parameters (weights 589824 196608 65536) bug can trigger
if large BIG TCP packets are sent to the lowest priority band.

Bisected-by: John Sperbeck <jsperbeck@google.com>
Diagnosed-by: Neal Cardwell <ncardwell@google.com>
Fixes: 29f834aa326e ("net_sched: sch_fq: add 3 bands and WRR scheduling")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20240824181901.953776-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

btrfs: fix uninitialized return value from btrfs_reclaim_sweep()

The return variable 'ret' at btrfs_reclaim_sweep() is never assigned if
none of the space infos is reclaimable (for example if periodic reclaim
is disabled, which is the default), so we return an undefined value.

This can be fixed my making btrfs_reclaim_sweep() not return any value
as well as do_reclaim_sweep() because:

1) do_reclaim_sweep() always returns 0, so we can make it return void;

2) The only caller of btrfs_reclaim_sweep() (btrfs_reclaim_bgs()) doesn't
care about its return value, and in its context there's nothing to do
about any errors anyway.

Therefore remove the return value from btrfs_reclaim_sweep() and
do_reclaim_sweep().

Fixes: e4ca3932ae90 ("btrfs: periodic block_group reclaim")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

ionic: Prevent tx_timeout due to frequent doorbell ringing

With recent work to the doorbell workaround code a small hole was
introduced that could cause a tx_timeout. This happens if the rx
dbell_deadline goes beyond the netdev watchdog timeout set by the driver
(i.e. 2 seconds). Fix this by changing the netdev watchdog timeout to 5
seconds and reduce the max rx dbell_deadline to 4 seconds.

The test that can reproduce the issue being fixed is a multi-queue send
test via pktgen with the "burst" setting to 1. This causes the queue's
doorbell to be rung on every packet sent to the driver, which may result
in the device missing doorbells due to the high doorbell rate.

Cc: stable@vger.kernel.org
Fixes: 4ded136c78f8 ("ionic: add work item for missed-doorbell check")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://patch.msgid.link/20240822192557.9089-1-brett.creeley@amd.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

wifi: iwlwifi: clear trans->state earlier upon error

When the firmware crashes, we first told the op_mode and only then,
changed the transport's state. This is a problem if the op_mode's
nic_error() handler needs to send a host command: it'll see that the
transport's state still reflects that the firmware is alive.

Today, this has no consequences since we set the STATUS_FW_ERROR bit and
that will prevent sending host commands. iwl_fw_dbg_stop_restart_recording
looks at this bit to know not to send a host command for example.

To fix the hibernation, we needed to reset the firmware without having
an error and checking STATUS_FW_ERROR to see whether the firmware is
alive will no longer hold, so this change is necessary as well.

Change the flow a bit.
Change trans->state before calling the op_mode's nic_error() method and
check trans->state instead of STATUS_FW_ERROR. This will keep the
current behavior of iwl_fw_dbg_stop_restart_recording upon firmware
error, and it'll allow us to call iwl_fw_dbg_stop_restart_recording
safely even if STATUS_FW_ERROR is clear, but yet, the firmware is not
alive.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20240825191257.9d7427fbdfd7.Ia056ca57029a382c921d6f7b6a6b28fc480f2f22@changeid
[I missed this was a dependency for the hibernation fix, changed
the commit message a bit accordingly]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: wfx: repair open network AP mode

RSN IE missing in beacon is normal in open networks.
Avoid returning -EINVAL in this case.

Steps to reproduce:

$ cat /etc/wpa_supplicant.conf
network={
ssid="testNet"
mode=2
key_mgmt=NONE
}

$ wpa_supplicant -iwlan0 -c /etc/wpa_supplicant.conf
nl80211: Beacon set failed: -22 (Invalid argument)
Failed to set beacon parameters
Interface initialization failed
wlan0: interface state UNINITIALIZED->DISABLED
wlan0: AP-DISABLED
wlan0: Unable to setup interface.
Failed to initialize AP interface

After the change:

$ wpa_supplicant -iwlan0 -c /etc/wpa_supplicant.conf
Successfully initialized wpa_supplicant
wlan0: interface state UNINITIALIZED->ENABLED
wlan0: AP-ENABLED

Cc: stable@vger.kernel.org
Fixes: fe0a7776d4d1 ("wifi: wfx: fix possible NULL pointer dereference in wfx_set_mfp_ap()")
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Reviewed-by: Jérôme Pouiller <jerome.pouiller@silabs.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://patch.msgid.link/20240823131521.3309073-1-alexander.sverdlin@siemens.com

Merge tag 'vfs-6.11-rc6.fixes' of gitolite.pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:
"VFS:

   - Ensure that backing files uses file->f_ops->splice_write() for
     splice

  netfs:

   - Revert the removal of PG_private_2 from netfs_release_folio() as
     cephfs still relies on this

   - When AS_RELEASE_ALWAYS is set on a mapping the folio needs to
     always be invalidated during truncation

   - Fix losing untruncated data in a folio by making letting
     netfs_release_folio() return false if the folio is dirty

   - Fix trimming of streaming-write folios in netfs_inval_folio()

   - Reset iterator before retrying a short read

   - Fix interaction of streaming writes with zero-point tracker

  afs:

   - During truncation afs currently calls truncate_setsize() which sets
     i_size, expands the pagecache and truncates it. The first two
     operations aren't needed because they will have already been done.
     So call truncate_pagecache() instead and skip the redundant parts

  overlayfs:

   - Fix checking of the number of allowed lower layers so 500 layers
     can actually be used instead of just 499

   - Add missing '\n' to pr_err() output

   - Pass string to ovl_parse_layer() and thus allow it to be used for
     Opt_lowerdir as well

  pidfd:

   - Revert blocking the creation of pidfds for kthread as apparently
     userspace relies on this. Specifically, it breaks systemd during
     shutdown

  romfs:

   - Fix romfs_read_folio() to use the correct offset with
     folio_zero_tail()"

* tag 'vfs-6.11-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
  netfs: Fix interaction of streaming writes with zero-point tracker
  netfs: Fix missing iterator reset on retry of short read
  netfs: Fix trimming of streaming-write folios in netfs_inval_folio()
  netfs: Fix netfs_release_folio() to say no if folio dirty
  afs: Fix post-setattr file edit to do truncation correctly
  mm: Fix missing folio invalidation calls during truncation
  ovl: ovl_parse_param_lowerdir: Add missed '\n' for pr_err
  ovl: fix wrong lowerdir number check for parameter Opt_lowerdir
  ovl: pass string to ovl_parse_layer()
  backing-file: convert to using fops->splice_write
  Revert "pidfd: prevent creation of pidfds for kthreads"
  romfs: fix romfs_read_folio()
  netfs, ceph: Partially revert "netfs: Replace PG_fscache by setting folio->private and marking dirty"

btrfs: fix a use-after-free when hitting errors inside btrfs_submit_chunk()

[BUG]
There is an internal report that KASAN is reporting use-after-free, with
the following backtrace:

  BUG: KASAN: slab-use-after-free in btrfs_check_read_bio+0xa68/0xb70 [btrfs]
  Read of size 4 at addr ffff8881117cec28 by task kworker/u16:2/45
  CPU: 1 UID: 0 PID: 45 Comm: kworker/u16:2 Not tainted 6.11.0-rc2-next-20240805-default+ #76
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
  Workqueue: btrfs-endio btrfs_end_bio_work [btrfs]
  Call Trace:
   dump_stack_lvl+0x61/0x80
   print_address_description.constprop.0+0x5e/0x2f0
   print_report+0x118/0x216
   kasan_report+0x11d/0x1f0
   btrfs_check_read_bio+0xa68/0xb70 [btrfs]
   process_one_work+0xce0/0x12a0
   worker_thread+0x717/0x1250
   kthread+0x2e3/0x3c0
   ret_from_fork+0x2d/0x70
   ret_from_fork_asm+0x11/0x20

  Allocated by task 20917:
   kasan_save_stack+0x37/0x60
   kasan_save_track+0x10/0x30
   __kasan_slab_alloc+0x7d/0x80
   kmem_cache_alloc_noprof+0x16e/0x3e0
   mempool_alloc_noprof+0x12e/0x310
   bio_alloc_bioset+0x3f0/0x7a0
   btrfs_bio_alloc+0x2e/0x50 [btrfs]
   submit_extent_page+0x4d1/0xdb0 [btrfs]
   btrfs_do_readpage+0x8b4/0x12a0 [btrfs]
   btrfs_readahead+0x29a/0x430 [btrfs]
   read_pages+0x1a7/0xc60
   page_cache_ra_unbounded+0x2ad/0x560
   filemap_get_pages+0x629/0xa20
   filemap_read+0x335/0xbf0
   vfs_read+0x790/0xcb0
   ksys_read+0xfd/0x1d0
   do_syscall_64+0x6d/0x140
   entry_SYSCALL_64_after_hwframe+0x4b/0x53

  Freed by task 20917:
   kasan_save_stack+0x37/0x60
   kasan_save_track+0x10/0x30
   kasan_save_free_info+0x37/0x50
   __kasan_slab_free+0x4b/0x60
   kmem_cache_free+0x214/0x5d0
   bio_free+0xed/0x180
   end_bbio_data_read+0x1cc/0x580 [btrfs]
   btrfs_submit_chunk+0x98d/0x1880 [btrfs]
   btrfs_submit_bio+0x33/0x70 [btrfs]
   submit_one_bio+0xd4/0x130 [btrfs]
   submit_extent_page+0x3ea/0xdb0 [btrfs]
   btrfs_do_readpage+0x8b4/0x12a0 [btrfs]
   btrfs_readahead+0x29a/0x430 [btrfs]
   read_pages+0x1a7/0xc60
   page_cache_ra_unbounded+0x2ad/0x560
   filemap_get_pages+0x629/0xa20
   filemap_read+0x335/0xbf0
   vfs_read+0x790/0xcb0
   ksys_read+0xfd/0x1d0
   do_syscall_64+0x6d/0x140
   entry_SYSCALL_64_after_hwframe+0x4b/0x53

[CAUSE]
Although I cannot reproduce the error, the report itself is good enough
to pin down the cause.

The call trace is the regular endio workqueue context, but the
free-by-task trace is showing that during btrfs_submit_chunk() we
already hit a critical error, and is calling btrfs_bio_end_io() to error
out.  And the original endio function called bio_put() to free the whole
bio.

This means a double freeing thus causing use-after-free, e.g.:

1. Enter btrfs_submit_bio() with a read bio
   The read bio length is 128K, crossing two 64K stripes.

2. The first run of btrfs_submit_chunk()

2.1 Call btrfs_map_block(), which returns 64K
2.2 Call btrfs_split_bio()
    Now there are two bios, one referring to the first 64K, the other
    referring to the second 64K.
2.3 The first half is submitted.

3. The second run of btrfs_submit_chunk()

3.1 Call btrfs_map_block(), which by somehow failed
    Now we call btrfs_bio_end_io() to handle the error

3.2 btrfs_bio_end_io() calls the original endio function
    Which is end_bbio_data_read(), and it calls bio_put() for the
    original bio.

    Now the original bio is freed.

4. The submitted first 64K bio finished
   Now we call into btrfs_check_read_bio() and tries to advance the bio
   iter.
   But since the original bio (thus its iter) is already freed, we
   trigger the above use-after free.

   And even if the memory is not poisoned/corrupted, we will later call
   the original endio function, causing a double freeing.

[FIX]
Instead of calling btrfs_bio_end_io(), call btrfs_orig_bbio_end_io(),
which has the extra check on split bios and do the proper refcounting
for cloned bios.

Furthermore there is already one extra btrfs_cleanup_bio() call, but
that is duplicated to btrfs_orig_bbio_end_io() call, so remove that
label completely.

Reported-by: David Sterba <dsterba@suse.com>
Fixes: 852eee62d31a ("btrfs: allow btrfs_submit_bio to split bios")
CC: stable@vger.kernel.org # 6.6+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

fs/nfsd: fix update of inode attrs in CB_GETATTR

Currently, we copy the mtime and ctime to the in-core inode and then
mark the inode dirty. This is fine for certain types of filesystems, but
not all. Some require a real setattr to properly change these values
(e.g. ceph or reexported NFS).

Fix this code to call notify_change() instead, which is the proper way
to effect a setattr. There is one problem though:

In this case, the client is holding a write delegation and has sent us
attributes to update our cache. We don't want to break the delegation
for this since that would defeat the purpose. Add a new ATTR_DELEG flag
that makes notify_change bypass the try_break_deleg call.

Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation")
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

net: ti: icssg-prueth: Fix 10M Link issue on AM64x

Crash is seen on AM64x 10M link when connecting / disconnecting multiple
times.

The fix for this is to enable quirk_10m_link_issue for AM64x.

Fixes: b256e13378a9 ("net: ti: icssg-prueth: Add AM64x icssg support")
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Link: https://patch.msgid.link/20240823120412.1262536-1-danishanwar@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ethtool: check device is present when getting link settings

A sysfs reader can race with a device reset or removal, attempting to
read device state when the device is not actually present. eg:

     [exception RIP: qed_get_current_link+17]
  #8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede]
  #9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3
#10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4
#11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300
#12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c
#13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b
#14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3
#15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1
#16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f
#17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb

crash> struct net_device.state ffff9a9d21336000
    state = 5,

state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100).
The device is not present, note lack of __LINK_STATE_PRESENT (0b10).

This is the same sort of panic as observed in commit 4224cfd7fb65
("net-sysfs: add check for netdevice being present to speed_show").

There are many other callers of __ethtool_get_link_ksettings() which
don't have a device presence check.

Move this check into ethtool to protect all callers.

Fixes: d519e17e2d01 ("net: export device speed and duplex via sysfs")
Fixes: 4224cfd7fb65 ("net-sysfs: add check for netdevice being present to speed_show")
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Link: https://patch.msgid.link/8bae218864beaa44ed01628140475b9bf641c5b0.1724393671.git.jamie.bainbridge@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'for-net-2024-08-23' of git://git./linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

- btintel: Allow configuring drive strength of BRI
- hci_core: Fix not handling hibernation actions
- btnxpuart: Fix random crash seen while removing driver

* tag 'for-net-2024-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: hci_core: Fix not handling hibernation actions
  Bluetooth: btnxpuart: Fix random crash seen while removing driver
  Bluetooth: btintel: Allow configuring drive strength of BRI
====================

Link: https://patch.msgid.link/20240823200008.65241-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nfsd: fix potential UAF in nfsd4_cb_getattr_release

Once we drop the delegation reference, the fields embedded in it are no
longer safe to access. Do that last.

Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

nfsd: hold reference to delegation when updating it for cb_getattr

Once we've dropped the flc_lock, there is nothing that ensures that the
delegation that was found will still be around later. Take a reference
to it while holding the lock and then drop it when we've finished with
the delegation.

Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

MAINTAINERS: Update Olga Kornievskaia's email address

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

wifi: mac80211: free skb on error path in ieee80211_beacon_get_ap()

In 'ieee80211_beacon_get_ap()', free allocated skb in case of error
returned by 'ieee80211_beacon_protect()'. Compile tested only.

Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Link: https://patch.msgid.link/20240805142035.227847-1-dmantipov@yandex.ru
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: mvm: don't wait for tx queues if firmware is dead

There is a WARNING in iwl_trans_wait_tx_queues_empty() (that was
recently converted from just a message), that can be hit if we
wait for TX queues to become empty after firmware died. Clearly,
we can't expect anything from the firmware after it's declared dead.

Don't call iwl_trans_wait_tx_queues_empty() in this case. While it could
be a good idea to stop the flow earlier, the flush functions do some
maintenance work that is not related to the firmware, so keep that part
of the code running even when the firmware is not running.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20240825191257.a7cbd794cee9.I44a739fbd4ffcc46b83844dd1c7b2eb0c7b270f6@changeid
[edit commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: mvm: allow 6 GHz channels in MLO scan

MLO internal scan may include 6 GHz channels. Since the 6 GHz scan
indication is not set, the channel flags are set incorrectly, which
leads to a firmware assert.
Since the MLO scan may include 6 GHz and non 6 GHz channels in one
request, add support for non-PSC 6 GHz channels (PSC channels are
already supported) when the 6 GHz indication is not set.

Fixes: 38b3998dfba3 ("wifi: iwlwifi: mvm: Introduce internal MLO passive scan")
Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20240825191257.04807f8213b2.Idd09d4366df92a74853649c1a520b7f0f752d1ac@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: mvm: pause TCM when the firmware is stopped

Not doing so will make us send a host command to the transport while the
firmware is not alive, which will trigger a WARNING.

bad state = 0
WARNING: CPU: 2 PID: 17434 at drivers/net/wireless/intel/iwlwifi/iwl-trans.c:115 iwl_trans_send_cmd+0x1cb/0x1e0 [iwlwifi]
RIP: 0010:iwl_trans_send_cmd+0x1cb/0x1e0 [iwlwifi]
Call Trace:
<TASK>
iwl_mvm_send_cmd+0x40/0xc0 [iwlmvm]
iwl_mvm_config_scan+0x198/0x260 [iwlmvm]
iwl_mvm_recalc_tcm+0x730/0x11d0 [iwlmvm]
iwl_mvm_tcm_work+0x1d/0x30 [iwlmvm]
process_one_work+0x29e/0x640
worker_thread+0x2df/0x690
? rescuer_thread+0x540/0x540
kthread+0x192/0x1e0
? set_kthread_struct+0x90/0x90
ret_from_fork+0x22/0x30

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20240825191257.5abe71ca1b6b.I97a968cb8be1f24f94652d9b110ecbf6af73f89e@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: fw: fix wgds rev 3 exact size

Check size of WGDS revision 3 is equal to 8 entries size with some header,
but doesn't depend on the number of used entries. Check that used entries
are between min and max but allow more to be present than are used to fix
operation with some BIOSes that have such data.

Fixes: 97f8a3d1610b ("iwlwifi: ACPI: support revision 3 WGDS tables")
Signed-off-by: Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20240825191257.cc71dfc67ec3.Ic27ee15ac6128b275c210b6de88f2145bd83ca7b@changeid
[edit commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: mvm: take the mutex before running link selection

iwl_mvm_select_links is called by the link selection worker and it
requires the mutex.
Take it in the link selection worker.
This logic used to run from iwl_mvm_rx_umac_scan_complete_notif which
had the mvm->mutex held. This was changed to run in a worker holding the
wiphy mutex, but we also need the mvm->mutex.

Fixes: 2e194efa3809 ("wifi: iwlwifi: mvm: Fix race in scan completion")
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20240825191257.0cacecd5db1e.Iaca38a078592b69bdd06549daf63408ccf1810e4@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: mvm: fix iwl_mvm_max_scan_ie_fw_cmd_room()

Driver creates also the WFA TPC element, consider that in the
calculation.

Signed-off-by: Daniel Gabay <daniel.gabay@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20240825191257.e710ce446b7f.I2715c6742e9c3d160e2ba41bc4b35de370d2ce34@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: mvm: fix iwl_mvm_scan_fits() calculation

The calculation should consider also the 6GHz IE's len, fix that.
In addition, in iwl_mvm_sched_scan_start() the scan_fits helper is
called only in case non_psc_incldued is true, but it should be called
regardless, fix that as well.

Signed-off-by: Daniel Gabay <daniel.gabay@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20240825191257.7db825442fd2.I99f4d6587709de02072fd57957ec7472331c6b1d@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: lower message level for FW buffer destination

An invalid buffer destination is not a problem for the driver and it
does not make sense to report it with the KERN_ERR message level. As
such, change the message to use IWL_DEBUG_FW.

Reported-by: Len Brown <lenb@kernel.org>
Closes: https://lore.kernel.org/r/CAJvTdKkcxJss=DM2sxgv_MR5BeZ4_OC-3ad6tA40TYH2yqHCWw@mail.gmail.com
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20240825191257.20abf78f05bc.Ifbcecc2ae9fb40b9698302507dcba8b922c8d856@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: mvm: fix hibernation

Fast resume is a feature that was recently introduced to speed up the
resume time. It basically keeps the firmware alive while the system
is suspended and that avoids starting again the whole device.

This flow can't work for hibernation, since when the system boots,
before the frozen image is loaded, the kernel may touch the device. As a
result, we can't assume the device is in the exact same state as before
the hibernation.

Detect that we are resuming from hibernation through the PCI device and
forbid the fast resume flow. We also need to shut down the device
cleanly when that happens.

In addition, in case the device is power gated during S3, we won't be
able to keep the device alive. Detect this situation with BE200 at least
with the help of the CSR_FUNC_SCRATCH register and reset the device upon
resume if it was power gated during S3.

Fixes: e8bb19c1d590 ("wifi: iwlwifi: support fast resume")
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20240825191257.24eb3b19e74f.I3837810318dbef0a0a773cf4c4fcf89cdc6fdbd3@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: fix beacon SSID mismatch handling

Return false when memcmp with zero_ssid returns 0 to correctly
handle hidden SSIDs case.

Fixes: 9cc88678db5b ("wifi: mac80211: check SSID in beacon")
Reviewed-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Signed-off-by: Daniel Gabay <daniel.gabay@intel.com>
Link: https://patch.msgid.link/20240823105546.7ab29ae287a6.I7f98e57e1ab6597614703fdd138cc88ad253d986@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

LoongArch: KVM: Invalidate guest steal time address on vCPU reset

If ParaVirt steal time feature is enabled, there is a percpu gpa address
passed from guest vCPU and host modifies guest memory space with this gpa
address. When vCPU is reset normally, it will notify host and invalidate
gpa address.

However if VM is crashed and VMM reboots VM forcely, the vCPU reboot
notification callback will not be called in VM. Host needs invalidate
the gpa address, else host will modify guest memory during VM reboots.
Here it is invalidated from the vCPU KVM_REG_LOONGARCH_VCPU_RESET ioctl
interface.

Also funciton kvm_reset_timer() is removed at vCPU reset stage, since SW
emulated timer is only used in vCPU block state. When a vCPU is removed
from the block waiting queue, kvm_restore_timer() is called and SW timer
is cancelled. And the timer register is also cleared at VMM when a vCPU
is reset.

Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Add ifdefs to fix LSX and LASX related warnings

There exist some warnings when building kernel if CONFIG_CPU_HAS_LBT is
set but CONFIG_CPU_HAS_LSX and CONFIG_CPU_HAS_LASX are not set. In this
case, there are no definitions of _restore_lsx & _restore_lasx and there
are also no definitions of kvm_restore_lsx & kvm_restore_lasx in fpu.S
and switch.S respectively, just add some ifdefs to fix these warnings.

  AS      arch/loongarch/kernel/fpu.o
arch/loongarch/kernel/fpu.o: warning: objtool: unexpected relocation symbol type in .rela.discard.func_stack_frame_non_standard: 0
arch/loongarch/kernel/fpu.o: warning: objtool: unexpected relocation symbol type in .rela.discard.func_stack_frame_non_standard: 0

  AS [M]  arch/loongarch/kvm/switch.o
arch/loongarch/kvm/switch.o: warning: objtool: unexpected relocation symbol type in .rela.discard.func_stack_frame_non_standard: 0
arch/loongarch/kvm/switch.o: warning: objtool: unexpected relocation symbol type in .rela.discard.func_stack_frame_non_standard: 0

  MODPOST Module.symvers
ERROR: modpost: "kvm_restore_lsx" [arch/loongarch/kvm/kvm.ko] undefined!
ERROR: modpost: "kvm_restore_lasx" [arch/loongarch/kvm/kvm.ko] undefined!

Cc: stable@vger.kernel.org # 6.9+
Fixes: cb8a2ef0848c ("LoongArch: Add ORC stack unwinder support")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202408120955.qls5oNQY-lkp@intel.com/
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

LoongArch: Define ARCH_IRQ_INIT_FLAGS as IRQ_NOPROBE

Currently we call irq_set_noprobe() in a loop for all IRQs, but indeed
it only works for IRQs below NR_IRQS_LEGACY because at init_IRQ() only
legacy interrupts have been allocated.

Instead, we can define ARCH_IRQ_INIT_FLAGS as IRQ_NOPROBE in asm/hwirq.h
and the core will automatically set the flag for all interrupts.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn>

btrfs: initialize last_extent_end to fix -Wmaybe-uninitialized warning in extent_fiemap()

There's a warning (probably on some older compiler version):

fs/btrfs/fiemap.c: warning: 'last_extent_end' may be used uninitialized in this function [-Wmaybe-uninitialized]: => 822:19

Initialize the variable to 0 although it's not necessary as it's either
properly set or not used after an error. The called function is in the
same file so this is a false alert but we want to fix all
-Wmaybe-uninitialized reports.

Link: https://lore.kernel.org/all/20240819070639.2558629-1-geert@linux-m68k.org/
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David Sterba <dsterba@suse.com>

net: ftgmac100: Ensure tx descriptor updates are visible

The driver must ensure TX descriptor updates are visible
before updating TX pointer and TX clear pointer.

This resolves TX hangs observed on AST2600 when running
iperf3.

Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests/livepatch: wait for atomic replace to occur

On some machines with a large number of CPUs there is a sizable delay
between an atomic replace occurring and when sysfs updates accordingly.
This fix uses 'loop_until' to wait for the atomic replace to unload all
previous livepatches.

Reported-by: CKI Project <cki-project@redhat.com>
Closes: https://datawarehouse.cki-project.org/kcidb/tests/redhat:1413102084-x86_64-kernel_upt_28
Signed-off-by: Ryan Sullivan <rysulliv@redhat.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Link: https://lore.kernel.org/r/20240822173122.14760-1-rysulliv@redhat.com
Signed-off-by: Petr Mladek <pmladek@suse.com>

platform/x86: x86-android-tablets: Make Lenovo Yoga Tab 3 X90F DMI match less strict

There are 2G and 4G RAM versions of the Lenovo Yoga Tab 3 X90F and it
turns out that the 2G version has a DMI product name of
"CHERRYVIEW D1 PLATFORM" where as the 4G version has
"CHERRYVIEW C0 PLATFORM". The sys-vendor + product-version check are
unique enough that the product-name check is not necessary.

Drop the product-name check so that the existing DMI match for the 4G
RAM version also matches the 2G RAM version.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20240825132415.8307-1-hdegoede@redhat.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

platform/x86: asus-wmi: Fix spurious rfkill on UX8406MA

The Asus Zenbook Duo (UX8406MA) has a keyboard which can be
placed on the laptop to connect it via USB, or can be removed from the
laptop to reveal a hidden secondary display in which case the keyboard
operates via Bluetooth.

When it is placed on the secondary display to connect via USB, it emits
a keypress for a wireless disable. This causes the rfkill system to be
activated disconnecting the current wifi connection, which doesn't
reflect the user's true intention.

Detect this hardware and suppress any wireless switches from the
keyboard; this keyboard does not have a wireless toggle capability so
these presses are always spurious.

Signed-off-by: Mathieu Fenniak <mathieu@fenniak.net>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20240823135630.128447-1-mathieu@fenniak.net
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

netfilter: nf_tables: restore IP sanity checks for netdev/egress

Subtract network offset to skb->len before performing IPv4 header sanity
checks, then adjust transport offset from offset from mac header.

Jorge Ortiz says:

When small UDP packets (< 4 bytes payload) are sent from eth0,
`meta l4proto udp` condition is not met because `NFT_PKTINFO_L4PROTO` is
not set. This happens because there is a comparison that checks if the
transport header offset exceeds the total length. This comparison does
not take into account the fact that the skb network offset might be
non-zero in egress mode (e.g., 14 bytes for Ethernet header).

Fixes: 0ae8e4cca787 ("netfilter: nf_tables: set transport offset from mac header for netdev/egress")
Reported-by: Jorge Ortiz <jorge.ortiz.escribano@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ALSA: hda: hda_component: Fix mutex crash if nothing ever binds

Move the initialization of parent->mutex into
hda_component_manager_init() so that it is always valid.

In hda_component_manager_bind() do not clear the parent information.
Only zero-fill the per-component data ready for it to be filled in
by the components as they bind.

Previously parent->mutex was being initialized only in
hda_component_manager_bind(). This meant that it was only
initialized if all components appeared and there was a bind callback.
If there wasn't a bind the mutex object was not valid when the
Realtek driver called any of the other functions.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 047b9cbbaa8e ("ALSA: hda: hda_component: Protect shared data with a mutex")
Link: https://patch.msgid.link/20240826094940.45563-1-rf@opensource.cirrus.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: hda/realtek: support HP Pavilion Aero 13-bg0xxx Mute LED

This patch adds the HP Pavilion Aero 13 (13-bg0xxx) (year 2024) to list of
quirks for keyboard LED mute indication.

The laptop has two LEDs (one for speaker and one for mic mute). The
pre-existing quirk ALC245_FIXUP_HP_X360_MUTE_LEDS chains both the quirk for
mic and speaker mute.

Tested on 6.11.0-rc4 with the aforementioned laptop.

Signed-off-by: Hendrik Borghorst <hendrikborghorst@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://patch.msgid.link/20240825174351.5687-1-hendrikborghorst@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

random: vDSO: reject unknown getrandom() flags

Like the getrandom() syscall, vDSO getrandom() must also reject unknown
flags. [1]

It would be possible to return -EINVAL from vDSO itself, but in the
possible case that a new flag is added to getrandom() syscall in the
future, it would be easier to get the behavior from the syscall, instead
of erroring until the vDSO is extended to support the new flag or
explicitly falling back.

[1] Designing the API: Planning for Extension
https://docs.kernel.org/process/adding-syscalls.html#designing-the-api-planning-for-extension

Signed-off-by: Yann Droneaud <yann@droneaud.fr>
[Jason: reworded commit message]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

btrfs: run delayed iputs when flushing delalloc

We have transient failures with btrfs/301, specifically in the part
where we do

  for i in $(seq 0 10); do
  write 50m to file
  rm -f file
  done

Sometimes this will result in a transient quota error, and it's because
sometimes we start writeback on the file which results in a delayed
iput, and thus the rm doesn't actually clean the file up.  When we're
flushing the quota space we need to run the delayed iputs to make sure
all the unlinks that we think have completed have actually completed.
This removes the small window where we could fail to find enough space
in our quota.

CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>

ALSA: hda/realtek: Fix the speaker output on Samsung Galaxy Book3 Ultra

144d:c1cc requires the same workaround to enable the speaker amp
as other Samsung models with the ALC298 codec.

Signed-off-by: YOUNGJIN JOO <neoelec@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://patch.msgid.link/20240825092515.28728-1-neoelec@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

LoongArch: Remove the unused dma-direct.h

dma-direct.h is introduced in commit d4b6f1562a3c3284 ("LoongArch: Add
Non-Uniform Memory Access (NUMA) support"). In commit c78c43fe7d42524c
("LoongArch: Use acpi_arch_dma_setup() and remove ARCH_HAS_PHYS_TO_DMA"),
ARCH_HAS_PHYS_TO_DMA was deselected and the coresponding phys_to_dma()/
dma_to_phys() functions were removed. However, the unused dma-direct.h
was left behind, which is removed by this patch.

Cc: <stable@vger.kernel.org>
Fixes: c78c43fe7d42 ("LoongArch: Use acpi_arch_dma_setup() and remove ARCH_HAS_PHYS_TO_DMA")
Signed-off-by: Miao Wang <shankerwangmiao@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>

cifs: Fix FALLOC_FL_PUNCH_HOLE support

The cifs filesystem doesn't quite emulate FALLOC_FL_PUNCH_HOLE correctly
(note that due to lack of protocol support, it can't actually implement it
directly).  Whilst it will (partially) invalidate dirty folios in the
pagecache, it doesn't write them back first, and so the EOF marker on the
server may be lower than inode->i_size.

This presents a problem, however, as if the punched hole invalidates the
tail of the locally cached dirty data, writeback won't know it needs to
move the EOF over to account for the hole punch (which isn't supposed to
move the EOF).  We could just write zeroes over the punched out region of
the pagecache and write that back - but this is supposed to be a
deallocatory operation.

Fix this by manually moving the EOF over on the server after the operation
if the hole punched would corrupt it.

Note that the FSCTL_SET_ZERO_DATA RPC and the setting of the EOF should
probably be compounded to stop a third party interfering (or, at least,
massively reduce the chance).

This was reproducible occasionally by using fsx with the following script:

truncate 0x0 0x375e2 0x0
punch_hole 0x2f6d3 0x6ab5 0x375e2
truncate 0x0 0x3a71f 0x375e2
mapread 0xee05 0xcf12 0x3a71f
write 0x2078e 0x5604 0x3a71f
write 0x3ebdf 0x1421 0x3a71f *
punch_hole 0x379d0 0x8630 0x40000 *
mapread 0x2aaa2 0x85b 0x40000
fallocate 0x1b401 0x9ada 0x40000
read 0x15f2 0x7d32 0x40000
read 0x32f37 0x7a3b 0x40000 *

The second "write" should extend the EOF to 0x40000, and the "punch_hole"
should operate inside of that - but that depends on whether the VM gets in
and writes back the data first.  If it doesn't, the file ends up 0x3a71f in
size, not 0x40000.

Fixes: 31742c5a3317 ("enable fallocate punch hole ("fallocate -p") for SMB3")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
Signed-off-by: Steve French <stfrench@microsoft.com>

smb/client: fix rdma usage in smb2_async_writev()

rqst.rq_iter needs to be truncated otherwise we'll
also send the bytes into the stream socket...

This is the logic behind rqst.rq_npages = 0, which was removed in
"cifs: Change the I/O paths to use an iterator rather than a page list"
(d08089f649a0cfb2099c8551ac47eef0cc23fdf2).

Cc: stable@vger.kernel.org
Fixes: d08089f649a0 ("cifs: Change the I/O paths to use an iterator rather than a page list")
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb/client: remove unused rq_iter_size from struct smb_rqst

Reviewed-by: David Howells <dhowells@redhat.com>
Fixes: d08089f649a0 ("cifs: Change the I/O paths to use an iterator rather than a page list")
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>