summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
4 daysMerge tag 'net-6.1-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from rxrpc, netfilter and xfrm. Current release - regressions: - dccp/tcp: fix bhash2 issues related to WARN_ON() in inet_csk_get_port() - l2tp: don't sleep and disable BH under writer-side sk_callback_lock - eth: ice: fix handling of burst tx timestamps Current release - new code bugs: - xfrm: squelch kernel warning in case XFRM encap type is not available - eth: mlx5e: fix possible race condition in macsec extended packet number update routine Previous releases - regressions: - neigh: decrement the family specific qlen - netfilter: fix ipset regression - rxrpc: fix race between conn bundle lookup and bundle removal [ZDI-CAN-15975] - eth: iavf: do not restart tx queues after reset task failure - eth: nfp: add port from netdev validation for EEPROM access - eth: mtk_eth_soc: fix potential memory leak in mtk_rx_alloc() Previous releases - always broken: - tipc: set con sock in tipc_conn_alloc - nfc: - fix potential memory leaks - fix incorrect sizing calculations in EVT_TRANSACTION - eth: octeontx2-af: fix pci device refcount leak - eth: bonding: fix ICMPv6 header handling when receiving IPv6 messages - eth: prestera: add missing unregister_netdev() in prestera_port_create() - eth: tsnep: fix rotten packets Misc: - usb: qmi_wwan: add support for LARA-L6" * tag 'net-6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits) net: thunderx: Fix the ACPI memory leak octeontx2-af: Fix reference count issue in rvu_sdp_init() net: altera_tse: release phylink resources in tse_shutdown() virtio_net: Fix probe failed when modprobe virtio_net net: wwan: t7xx: Fix the ACPI memory leak octeontx2-pf: Add check for devm_kcalloc net: enetc: preserve TX ring priority across reconfiguration net: marvell: prestera: add missing unregister_netdev() in prestera_port_create() nfc: st-nci: fix incorrect sizing calculations in EVT_TRANSACTION nfc: st-nci: fix memory leaks in EVT_TRANSACTION nfc: st-nci: fix incorrect validating logic in EVT_TRANSACTION Documentation: networking: Update generic_netlink_howto URL net/cdc_ncm: Fix multicast RX support for CDC NCM devices with ZLP net: usb: qmi_wwan: add u-blox 0x1342 composition l2tp: Don't sleep and disable BH under writer-side sk_callback_lock net: dm9051: Fix missing dev_kfree_skb() in dm9051_loop_rx() arcnet: fix potential memory leak in com20020_probe() ipv4: Fix error return code in fib_table_insert() net: ethernet: mtk_eth_soc: fix memory leak in error path net: ethernet: mtk_eth_soc: fix resource leak in error path ...
4 daysMerge branch 'master' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== ipsec 2022-11-23 1) Fix "disable_policy" on ipv4 early demuxP Packets after the initial packet in a flow might be incorectly dropped on early demux if there are no matching policies. From Eyal Birger. 2) Fix a kernel warning in case XFRM encap type is not available. From Eyal Birger. 3) Fix ESN wrap around for GSO to avoid a double usage of a sequence number. From Christian Langrock. 4) Fix a send_acquire race with pfkey_register. From Herbert Xu. 5) Fix a list corruption panic in __xfrm_state_delete(). Thomas Jarosch. 6) Fix an unchecked return value in xfrm6_init(). Chen Zhongjin. * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: Fix ignored return value in xfrm6_init() xfrm: Fix oops in __xfrm_state_delete() af_key: Fix send_acquire race with pfkey_register xfrm: replay: Fix ESN wrap around for GSO xfrm: lwtunnel: squelch kernel warning in case XFRM encap type is not available xfrm: fix "disable_policy" on ipv4 early demux ==================== Link: https://lore.kernel.org/r/20221123093117.434274-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysMerge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Fix regression in ipset hash:ip with IPv4 range, from Vishwanath Pai. This is fixing up a bug introduced in the 6.0 release. 2) The "netfilter: ipset: enforce documented limit to prevent allocating huge memory" patch contained a wrong condition which makes impossible to add up to 64 clashing elements to a hash:net,iface type of set while it is the documented feature of the set type. The patch fixes the condition and thus makes possible to add the elements while keeps preventing allocating huge memory, from Jozsef Kadlecsik. This has been broken for several releases. 3) Missing locking when updating the flow block list which might lead a reader to crash. This has been broken since the introduction of the flowtable hardware offload support. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: flowtable_offload: add missing locking netfilter: ipset: restore allowing 64 clashing elements in hash:net,iface netfilter: ipset: regression in ip_set_hash_ip.c ==================== Link: https://lore.kernel.org/r/20221122212814.63177-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 daysMerge tag '9p-for-6.1-rc7' of https://github.com/martinetd/linuxLinus Torvalds
Pull 9p fixes from Dominique Martinet: - 9p now uses a variable size for its recv buffer, but every place hadn't been updated properly to use it and some buffer overflows have been found and needed fixing. There's still one place where msize is incorrectly used in a safety check (p9_check_errors), but all paths leading to it should already be avoiding overflows and that patch took a bit more time to get right for zero-copy requests so I'll send it for 6.2 - yet another race condition in p9_conn_cancel introduced by a fix for a syzbot report in the same place. Maybe at some point we'll get it right without burning it all down... * tag '9p-for-6.1-rc7' of https://github.com/martinetd/linux: 9p/xen: check logical size for buffer size 9p/fd: Use P9_HDRSZ for header size 9p/fd: Fix write overflow in p9_read_work 9p/fd: fix issue of list_del corruption in p9_fd_cancel()
5 daysl2tp: Don't sleep and disable BH under writer-side sk_callback_lockJakub Sitnicki
When holding a reader-writer spin lock we cannot sleep. Calling setup_udp_tunnel_sock() with write lock held violates this rule, because we end up calling percpu_down_read(), which might sleep, as syzbot reports [1]: __might_resched.cold+0x222/0x26b kernel/sched/core.c:9890 percpu_down_read include/linux/percpu-rwsem.h:49 [inline] cpus_read_lock+0x1b/0x140 kernel/cpu.c:310 static_key_slow_inc+0x12/0x20 kernel/jump_label.c:158 udp_tunnel_encap_enable include/net/udp_tunnel.h:187 [inline] setup_udp_tunnel_sock+0x43d/0x550 net/ipv4/udp_tunnel_core.c:81 l2tp_tunnel_register+0xc51/0x1210 net/l2tp/l2tp_core.c:1509 pppol2tp_connect+0xcdc/0x1a10 net/l2tp/l2tp_ppp.c:723 Trim the writer-side critical section for sk_callback_lock down to the minimum, so that it covers only operations on sk_user_data. Also, when grabbing the sk_callback_lock, we always need to disable BH, as Eric points out. Failing to do so leads to deadlocks because we acquire sk_callback_lock in softirq context, which can get stuck waiting on us if: 1) it runs on the same CPU, or CPU0 ---- lock(clock-AF_INET6); <Interrupt> lock(clock-AF_INET6); 2) lock ordering leads to priority inversion CPU0 CPU1 ---- ---- lock(clock-AF_INET6); local_irq_disable(); lock(&tcp_hashinfo.bhash[i].lock); lock(clock-AF_INET6); <Interrupt> lock(&tcp_hashinfo.bhash[i].lock); ... as syzbot reports [2,3]. Use the _bh variants for write_(un)lock. [1] https://lore.kernel.org/netdev/0000000000004e78ec05eda79749@google.com/ [2] https://lore.kernel.org/netdev/000000000000e38b6605eda76f98@google.com/ [3] https://lore.kernel.org/netdev/000000000000dfa31e05eda76f75@google.com/ v2: - Check and set sk_user_data while holding sk_callback_lock for both L2TP encapsulation types (IP and UDP) (Tetsuo) Cc: Tom Parkin <tparkin@katalix.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Fixes: b68777d54fac ("l2tp: Serialize access to sk_user_data with sk_callback_lock") Reported-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot+703d9e154b3b58277261@syzkaller.appspotmail.com Reported-by: syzbot+50680ced9e98a61f7698@syzkaller.appspotmail.com Reported-by: syzbot+de987172bb74a381879b@syzkaller.appspotmail.com Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
5 days9p/xen: check logical size for buffer sizeDominique Martinet
trans_xen did not check the data fits into the buffer before copying from the xen ring, but we probably should. Add a check that just skips the request and return an error to userspace if it did not fit Tested-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com> Link: https://lkml.kernel.org/r/20221118135542.63400-1-asmadeus@codewreck.org Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
5 daysipv4: Fix error return code in fib_table_insert()Ziyang Xuan
In fib_table_insert(), if the alias was already inserted, but node not exist, the error code should be set before return from error handling path. Fixes: a6c76c17df02 ("ipv4: Notify route after insertion to the routing table") Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Link: https://lore.kernel.org/r/20221120072838.2167047-1-william.xuanziyang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 daysdccp/tcp: Fixup bhash2 bucket when connect() fails.Kuniyuki Iwashima
If a socket bound to a wildcard address fails to connect(), we only reset saddr and keep the port. Then, we have to fix up the bhash2 bucket; otherwise, the bucket has an inconsistent address in the list. Also, listen() for such a socket will fire the WARN_ON() in inet_csk_get_port(). [0] Note that when a system runs out of memory, we give up fixing the bucket and unlink sk from bhash and bhash2 by inet_put_port(). [0]: WARNING: CPU: 0 PID: 207 at net/ipv4/inet_connection_sock.c:548 inet_csk_get_port (net/ipv4/inet_connection_sock.c:548 (discriminator 1)) Modules linked in: CPU: 0 PID: 207 Comm: bhash2_prev_rep Not tainted 6.1.0-rc3-00799-gc8421681c845 #63 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.amzn2022.0.1 04/01/2014 RIP: 0010:inet_csk_get_port (net/ipv4/inet_connection_sock.c:548 (discriminator 1)) Code: 74 a7 eb 93 48 8b 54 24 18 0f b7 cb 4c 89 e6 4c 89 ff e8 48 b2 ff ff 49 8b 87 18 04 00 00 e9 32 ff ff ff 0f 0b e9 34 ff ff ff <0f> 0b e9 42 ff ff ff 41 8b 7f 50 41 8b 4f 54 89 fe 81 f6 00 00 ff RSP: 0018:ffffc900003d7e50 EFLAGS: 00010202 RAX: ffff8881047fb500 RBX: 0000000000004e20 RCX: 0000000000000000 RDX: 000000000000000a RSI: 00000000fffffe00 RDI: 00000000ffffffff RBP: ffffffff8324dc00 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000004e20 R15: ffff8881054e1280 FS: 00007f8ac04dc740(0000) GS:ffff88842fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020001540 CR3: 00000001055fa003 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> inet_csk_listen_start (net/ipv4/inet_connection_sock.c:1205) inet_listen (net/ipv4/af_inet.c:228) __sys_listen (net/socket.c:1810) __x64_sys_listen (net/socket.c:1819 net/socket.c:1817 net/socket.c:1817) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) RIP: 0033:0x7f8ac051de5d Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 93 af 1b 00 f7 d8 64 89 01 48 RSP: 002b:00007ffc1c177248 EFLAGS: 00000206 ORIG_RAX: 0000000000000032 RAX: ffffffffffffffda RBX: 0000000020001550 RCX: 00007f8ac051de5d RDX: ffffffffffffff80 RSI: 0000000000000000 RDI: 0000000000000004 RBP: 00007ffc1c177270 R08: 0000000000000018 R09: 0000000000000007 R10: 0000000020001540 R11: 0000000000000206 R12: 00007ffc1c177388 R13: 0000000000401169 R14: 0000000000403e18 R15: 00007f8ac0723000 </TASK> Fixes: 28044fc1d495 ("net: Add a bhash2 table hashed by port and address") Reported-by: syzbot <syzkaller@googlegroups.com> Reported-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 daysdccp/tcp: Update saddr under bhash's lock.Kuniyuki Iwashima
When we call connect() for a socket bound to a wildcard address, we update saddr locklessly. However, it could result in a data race; another thread iterating over bhash might see a corrupted address. Let's update saddr under the bhash bucket's lock. Fixes: 3df80d9320bc ("[DCCP]: Introduce DCCPv6") Fixes: 7c657876b63c ("[DCCP]: Initial implementation") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 daysdccp/tcp: Remove NULL check for prev_saddr in inet_bhash2_update_saddr().Kuniyuki Iwashima
When we call inet_bhash2_update_saddr(), prev_saddr is always non-NULL. Let's remove the unnecessary test. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 daysdccp/tcp: Reset saddr on failure after inet6?_hash_connect().Kuniyuki Iwashima
When connect() is called on a socket bound to the wildcard address, we change the socket's saddr to a local address. If the socket fails to connect() to the destination, we have to reset the saddr. However, when an error occurs after inet_hash6?_connect() in (dccp|tcp)_v[46]_conect(), we forget to reset saddr and leave the socket bound to the address. From the user's point of view, whether saddr is reset or not varies with errno. Let's fix this inconsistent behaviour. Note that after this patch, the repro [0] will trigger the WARN_ON() in inet_csk_get_port() again, but this patch is not buggy and rather fixes a bug papering over the bhash2's bug for which we need another fix. For the record, the repro causes -EADDRNOTAVAIL in inet_hash6_connect() by this sequence: s1 = socket() s1.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1) s1.bind(('127.0.0.1', 10000)) s1.sendto(b'hello', MSG_FASTOPEN, (('127.0.0.1', 10000))) # or s1.connect(('127.0.0.1', 10000)) s2 = socket() s2.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1) s2.bind(('0.0.0.0', 10000)) s2.connect(('127.0.0.1', 10000)) # -EADDRNOTAVAIL s2.listen(32) # WARN_ON(inet_csk(sk)->icsk_bind2_hash != tb2); [0]: https://syzkaller.appspot.com/bug?extid=015d756bbd1f8b5c8f09 Fixes: 3df80d9320bc ("[DCCP]: Introduce DCCPv6") Fixes: 7c657876b63c ("[DCCP]: Initial implementation") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 daysnetfilter: flowtable_offload: add missing lockingFelix Fietkau
nf_flow_table_block_setup and the driver TC_SETUP_FT call can modify the flow block cb list while they are being traversed elsewhere, causing a crash. Add a write lock around the calls to protect readers Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Reported-by: Chad Monroe <chad.monroe@smartrg.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 daysnetfilter: ipset: restore allowing 64 clashing elements in hash:net,ifaceJozsef Kadlecsik
The commit 510841da1fcc ("netfilter: ipset: enforce documented limit to prevent allocating huge memory") was too strict and prevented to add up to 64 clashing elements to a hash:net,iface type of set. This patch fixes the issue and now the type behaves as documented. Fixes: 510841da1fcc ("netfilter: ipset: enforce documented limit to prevent allocating huge memory") Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 daysNFC: nci: fix memory leak in nci_rx_data_packet()Liu Shixin
Syzbot reported a memory leak about skb: unreferenced object 0xffff88810e144e00 (size 240): comm "syz-executor284", pid 3701, jiffies 4294952403 (age 12.620s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff83ab79a9>] __alloc_skb+0x1f9/0x270 net/core/skbuff.c:497 [<ffffffff82a5cf64>] alloc_skb include/linux/skbuff.h:1267 [inline] [<ffffffff82a5cf64>] virtual_ncidev_write+0x24/0xe0 drivers/nfc/virtual_ncidev.c:116 [<ffffffff815f6503>] do_loop_readv_writev fs/read_write.c:759 [inline] [<ffffffff815f6503>] do_loop_readv_writev fs/read_write.c:743 [inline] [<ffffffff815f6503>] do_iter_write+0x253/0x300 fs/read_write.c:863 [<ffffffff815f66ed>] vfs_writev+0xdd/0x240 fs/read_write.c:934 [<ffffffff815f68f6>] do_writev+0xa6/0x1c0 fs/read_write.c:977 [<ffffffff848802d5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff848802d5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 [<ffffffff84a00087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd In nci_rx_data_packet(), if we don't get a valid conn_info, we will return directly but forget to release the skb. Reported-by: syzbot+cdb9a427d1bc08815104@syzkaller.appspotmail.com Fixes: 4aeee6871e8c ("NFC: nci: Add dynamic logical connections support") Signed-off-by: Liu Shixin <liushixin2@huawei.com> Link: https://lore.kernel.org/r/20221118082419.239475-1-liushixin2@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 daysnet: sched: allow act_ct to be built without NF_NATXin Long
In commit f11fe1dae1c4 ("net/sched: Make NET_ACT_CT depends on NF_NAT"), it fixed the build failure when NF_NAT is m and NET_ACT_CT is y by adding depends on NF_NAT for NET_ACT_CT. However, it would also cause NET_ACT_CT cannot be built without NF_NAT, which is not expected. This patch fixes it by changing to use "(!NF_NAT || NF_NAT)" as the depend. Fixes: f11fe1dae1c4 ("net/sched: Make NET_ACT_CT depends on NF_NAT") Signed-off-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/b6386f28d1ba34721795fb776a91cbdabb203447.1668807183.git.lucien.xin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 daysxfrm: Fix ignored return value in xfrm6_init()Chen Zhongjin
When IPv6 module initializing in xfrm6_init(), register_pernet_subsys() is possible to fail but its return value is ignored. If IPv6 initialization fails later and xfrm6_fini() is called, removing uninitialized list in xfrm6_net_ops will cause null-ptr-deref: KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] CPU: 1 PID: 330 Comm: insmod RIP: 0010:unregister_pernet_operations+0xc9/0x450 Call Trace: <TASK> unregister_pernet_subsys+0x31/0x3e xfrm6_fini+0x16/0x30 [ipv6] ip6_route_init+0xcd/0x128 [ipv6] inet6_init+0x29c/0x602 [ipv6] ... Fix it by catching the error return value of register_pernet_subsys(). Fixes: 8d068875caca ("xfrm: make gc_thresh configurable in all namespaces") Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
6 daysxfrm: Fix oops in __xfrm_state_delete()Thomas Jarosch
Kernel 5.14 added a new "byseq" index to speed up xfrm_state lookups by sequence number in commit fe9f1d8779cb ("xfrm: add state hashtable keyed by seq") While the patch was thorough, the function pfkey_send_new_mapping() in net/af_key.c also modifies x->km.seq and never added the current xfrm_state to the "byseq" index. This leads to the following kernel Ooops: BUG: kernel NULL pointer dereference, address: 0000000000000000 .. RIP: 0010:__xfrm_state_delete+0xc9/0x1c0 .. Call Trace: <TASK> xfrm_state_delete+0x1e/0x40 xfrm_del_sa+0xb0/0x110 [xfrm_user] xfrm_user_rcv_msg+0x12d/0x270 [xfrm_user] ? remove_entity_load_avg+0x8a/0xa0 ? copy_to_user_state_extra+0x580/0x580 [xfrm_user] netlink_rcv_skb+0x51/0x100 xfrm_netlink_rcv+0x30/0x50 [xfrm_user] netlink_unicast+0x1a6/0x270 netlink_sendmsg+0x22a/0x480 __sys_sendto+0x1a6/0x1c0 ? __audit_syscall_entry+0xd8/0x130 ? __audit_syscall_exit+0x249/0x2b0 __x64_sys_sendto+0x23/0x30 do_syscall_64+0x3a/0x90 entry_SYSCALL_64_after_hwframe+0x61/0xcb Exact location of the crash in __xfrm_state_delete(): if (x->km.seq) hlist_del_rcu(&x->byseq); The hlist_node "byseq" was never populated. The bug only triggers if a new NAT traversal mapping (changed IP or port) is detected in esp_input_done2() / esp6_input_done2(), which in turn indirectly calls pfkey_send_new_mapping() *if* the kernel is compiled with CONFIG_NET_KEY and "af_key" is active. The PF_KEYv2 message SADB_X_NAT_T_NEW_MAPPING is not part of RFC 2367. Various implementations have been examined how they handle the "sadb_msg_seq" header field: - racoon (Android): does not process SADB_X_NAT_T_NEW_MAPPING - strongswan: does not care about sadb_msg_seq - openswan: does not care about sadb_msg_seq There is no standard how PF_KEYv2 sadb_msg_seq should be populated for SADB_X_NAT_T_NEW_MAPPING and it's not used in popular implementations either. Herbert Xu suggested we should just use the current km.seq value as is. This fixes the root cause of the oops since we no longer modify km.seq itself. The update of "km.seq" looks like a copy'n'paste error from pfkey_send_acquire(). SADB_ACQUIRE must indeed assign a unique km.seq number according to RFC 2367. It has been verified that code paths involving pfkey_send_acquire() don't cause the same Oops. PF_KEYv2 SADB_X_NAT_T_NEW_MAPPING support was originally added here: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git commit cbc3488685b20e7b2a98ad387a1a816aada569d8 Author: Derek Atkins <derek@ihtfp.com> AuthorDate: Wed Apr 2 13:21:02 2003 -0800 [IPSEC]: Implement UDP Encapsulation framework. In particular, implement ESPinUDP encapsulation for IPsec Nat Traversal. A note on triggering the bug: I was not able to trigger it using VMs. There is one VPN using a high latency link on our production VPN server that triggered it like once a day though. Link: https://github.com/strongswan/strongswan/issues/992 Link: https://lore.kernel.org/netdev/00959f33ee52c4b3b0084d42c430418e502db554.1652340703.git.antony.antony@secunet.com/T/ Link: https://lore.kernel.org/netdev/20221027142455.3975224-1-chenzhihao@meizu.com/T/ Fixes: fe9f1d8779cb ("xfrm: add state hashtable keyed by seq") Reported-by: Roth Mark <rothm@mail.com> Reported-by: Zhihao Chen <chenzhihao@meizu.com> Tested-by: Roth Mark <rothm@mail.com> Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com> Acked-by: Antony Antony <antony.antony@secunet.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
6 daystipc: check skb_linearize() return value in tipc_disc_rcv()YueHaibing
If skb_linearize() fails in tipc_disc_rcv(), we need to free the skb instead of handle it. Fixes: 25b0b9c4e835 ("tipc: handle collisions of 32-bit node address hash values") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Link: https://lore.kernel.org/r/20221119072832.7896-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 daystipc: add an extra conn_get in tipc_conn_allocXin Long
One extra conn_get() is needed in tipc_conn_alloc(), as after tipc_conn_alloc() is called, tipc_conn_close() may free this con before deferencing it in tipc_topsrv_accept(): tipc_conn_alloc(); newsk = newsock->sk; <---- tipc_conn_close(); write_lock_bh(&sk->sk_callback_lock); newsk->sk_data_ready = tipc_conn_data_ready; Then an uaf issue can be triggered: BUG: KASAN: use-after-free in tipc_topsrv_accept+0x1e7/0x370 [tipc] Call Trace: <TASK> dump_stack_lvl+0x33/0x46 print_report+0x178/0x4b0 kasan_report+0x8c/0x100 kasan_check_range+0x179/0x1e0 tipc_topsrv_accept+0x1e7/0x370 [tipc] process_one_work+0x6a3/0x1030 worker_thread+0x8a/0xdf0 This patch fixes it by holding it in tipc_conn_alloc(), then after all accessing in tipc_topsrv_accept() releasing it. Note when does this in tipc_topsrv_kern_subscr(), as tipc_conn_rcv_sub() returns 0 or -1 only, we don't need to check for "> 0". Fixes: c5fa7b3cf3cb ("tipc: introduce new TIPC server infrastructure") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 daystipc: set con sock in tipc_conn_allocXin Long
A crash was reported by Wei Chen: BUG: kernel NULL pointer dereference, address: 0000000000000018 RIP: 0010:tipc_conn_close+0x12/0x100 Call Trace: tipc_topsrv_exit_net+0x139/0x320 ops_exit_list.isra.9+0x49/0x80 cleanup_net+0x31a/0x540 process_one_work+0x3fa/0x9f0 worker_thread+0x42/0x5c0 It was caused by !con->sock in tipc_conn_close(). In tipc_topsrv_accept(), con is allocated in conn_idr then its sock is set: con = tipc_conn_alloc(); ... <----[1] con->sock = newsock; If tipc_conn_close() is called in anytime of [1], the null-pointer-def is triggered by con->sock->sk due to con->sock is not yet set. This patch fixes it by moving the con->sock setting to tipc_conn_alloc() under s->idr_lock. So that con->sock can never be NULL when getting the con from s->conn_idr. It will be also safer to move con->server and flag CF_CONNECTED setting under s->idr_lock, as they should all be set before tipc_conn_alloc() is called. Fixes: c5fa7b3cf3cb ("tipc: introduce new TIPC server infrastructure") Reported-by: Wei Chen <harperchen1110@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysnetfilter: ipset: regression in ip_set_hash_ip.cVishwanath Pai
This patch introduced a regression: commit 48596a8ddc46 ("netfilter: ipset: Fix adding an IPv4 range containing more than 2^31 addresses") The variable e.ip is passed to adtfn() function which finally adds the ip address to the set. The patch above refactored the for loop and moved e.ip = htonl(ip) to the end of the for loop. What this means is that if the value of "ip" changes between the first assignement of e.ip and the forloop, then e.ip is pointing to a different ip address than "ip". Test case: $ ipset create jdtest_tmp hash:ip family inet hashsize 2048 maxelem 100000 $ ipset add jdtest_tmp 10.0.1.1/31 ipset v6.21.1: Element cannot be added to the set: it's already added The value of ip gets updated inside the "else if (tb[IPSET_ATTR_CIDR])" block but e.ip is still pointing to the old value. Fixes: 48596a8ddc46 ("netfilter: ipset: Fix adding an IPv4 range containing more than 2^31 addresses") Reviewed-by: Joshua Hunt <johunt@akamai.com> Signed-off-by: Vishwanath Pai <vpai@akamai.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
7 daysipv4/fib: Replace zero-length array with DECLARE_FLEX_ARRAY() helperKees Cook
Zero-length arrays are deprecated[1] and are being replaced with flexible array members in support of the ongoing efforts to tighten the FORTIFY_SOURCE routines on memcpy(), correctly instrument array indexing with UBSAN_BOUNDS, and to globally enable -fstrict-flex-arrays=3. Replace zero-length array with flexible-array member in struct key_vector. This results in no differences in binary output. [1] https://github.com/KSPP/linux/issues/78 Cc: Jakub Kicinski <kuba@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: David Ahern <dsahern@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
7 daysMerge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== The following patchset contains late Netfilter fixes for net: 1) Use READ_ONCE()/WRITE_ONCE() to update ct->mark, from Daniel Xu. Not reported by syzbot, but I presume KASAN would trigger post a splat on this. This is a rather old issue, predating git history. 2) Do not set up extensions for set element with end interval flag set on. This leads to bogusly skipping this elements as expired when listing the set/map to userspace as well as increasing memory consumpton when stateful expressions are used. This issue has been present since 4.18, when timeout support for rbtree set was added. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
10 daysnetfilter: nf_tables: do not set up extensions for end intervalPablo Neira Ayuso
Elements with an end interval flag set on do not store extensions. The global set definition is currently setting on the timeout and stateful expression for end interval elements. This leads to skipping end interval elements from the set->ops->walk() path as the expired check bogusly reports true. Moreover, do not set up stateful expressions for elements with end interval flag set on since this is never used. Fixes: 65038428b2c6 ("netfilter: nf_tables: allow to specify stateful expression in set definition") Fixes: 8d8540c4f5e0 ("netfilter: nft_set_rbtree: add timeout support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
10 daysnetfilter: conntrack: Fix data-races around ct markDaniel Xu
nf_conn:mark can be read from and written to in parallel. Use READ_ONCE()/WRITE_ONCE() for reads and writes to prevent unwanted compiler optimizations. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
10 days9p/fd: Use P9_HDRSZ for header sizeGUO Zihua
Cleanup hardcoded header sizes to use P9_HDRSZ instead of '7' Link: https://lkml.kernel.org/r/20221117091159.31533-4-guozihua@huawei.com Signed-off-by: GUO Zihua <guozihua@huawei.com> Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com> [Dominique: commit message adjusted to make sense after offset size adjustment got removed] Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
10 days9p/fd: Fix write overflow in p9_read_workGUO Zihua
This error was reported while fuzzing: BUG: KASAN: slab-out-of-bounds in _copy_to_iter+0xd35/0x1190 Write of size 4043 at addr ffff888008724eb1 by task kworker/1:1/24 CPU: 1 PID: 24 Comm: kworker/1:1 Not tainted 6.1.0-rc5-00002-g1adf73218daa-dirty #223 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 Workqueue: events p9_read_work Call Trace: <TASK> dump_stack_lvl+0x4c/0x64 print_report+0x178/0x4b0 kasan_report+0xae/0x130 kasan_check_range+0x179/0x1e0 memcpy+0x38/0x60 _copy_to_iter+0xd35/0x1190 copy_page_to_iter+0x1d5/0xb00 pipe_read+0x3a1/0xd90 __kernel_read+0x2a5/0x760 kernel_read+0x47/0x60 p9_read_work+0x463/0x780 process_one_work+0x91d/0x1300 worker_thread+0x8c/0x1210 kthread+0x280/0x330 ret_from_fork+0x22/0x30 </TASK> Allocated by task 457: kasan_save_stack+0x1c/0x40 kasan_set_track+0x21/0x30 __kasan_kmalloc+0x7e/0x90 __kmalloc+0x59/0x140 p9_fcall_init.isra.11+0x5d/0x1c0 p9_tag_alloc+0x251/0x550 p9_client_prepare_req+0x162/0x350 p9_client_rpc+0x18d/0xa90 p9_client_create+0x670/0x14e0 v9fs_session_init+0x1fd/0x14f0 v9fs_mount+0xd7/0xaf0 legacy_get_tree+0xf3/0x1f0 vfs_get_tree+0x86/0x2c0 path_mount+0x885/0x1940 do_mount+0xec/0x100 __x64_sys_mount+0x1a0/0x1e0 do_syscall_64+0x3a/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd This BUG pops up when trying to reproduce https://syzkaller.appspot.com/bug?id=6c7cd46c7bdd0e86f95d26ec3153208ad186f9fa The callstack is different but the issue is valid and re-producable with the same re-producer in the link. The root cause of this issue is that we check the size of the message received against the msize of the client in p9_read_work. However, it turns out that capacity is no longer consistent with msize. Thus, the message size should be checked against sdata capacity. As the msize is non-consistant with the capacity of the tag and as we are now checking message size against capacity directly, there is no point checking message size against msize. So remove it. Link: https://lkml.kernel.org/r/20221117091159.31533-2-guozihua@huawei.com Link: https://lkml.kernel.org/r/20221117091159.31533-3-guozihua@huawei.com Reported-by: syzbot+0f89bd13eaceccc0e126@syzkaller.appspotmail.com Fixes: 60ece0833b6c ("net/9p: allocate appropriate reduced message buffers") Signed-off-by: GUO Zihua <guozihua@huawei.com> Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com> [Dominique: squash patches 1 & 2 and fix size including header part] Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
10 days9p/fd: fix issue of list_del corruption in p9_fd_cancel()Zhengchao Shao
Syz reported the following issue: kernel BUG at lib/list_debug.c:53! invalid opcode: 0000 [#1] PREEMPT SMP KASAN RIP: 0010:__list_del_entry_valid.cold+0x5c/0x72 Call Trace: <TASK> p9_fd_cancel+0xb1/0x270 p9_client_rpc+0x8ea/0xba0 p9_client_create+0x9c0/0xed0 v9fs_session_init+0x1e0/0x1620 v9fs_mount+0xba/0xb80 legacy_get_tree+0x103/0x200 vfs_get_tree+0x89/0x2d0 path_mount+0x4c0/0x1ac0 __x64_sys_mount+0x33b/0x430 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 </TASK> The process is as follows: Thread A: Thread B: p9_poll_workfn() p9_client_create() ... ... p9_conn_cancel() p9_fd_cancel() list_del() ... ... list_del() //list_del corruption There is no lock protection when deleting list in p9_conn_cancel(). After deleting list in Thread A, thread B will delete the same list again. It will cause issue of list_del corruption. Setting req->status to REQ_STATUS_ERROR under lock prevents other cleanup paths from trying to manipulate req_list. The other thread can safely check req->status because it still holds a reference to req at this point. Link: https://lkml.kernel.org/r/20221110122606.383352-1-shaozhengchao@huawei.com Fixes: 52f1c45dde91 ("9p: trans_fd/p9_conn_cancel: drop client lock earlier") Reported-by: syzbot+9b69b8d10ab4a7d88056@syzkaller.appspotmail.com Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> [Dominique: add description of the fix in commit message] Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
10 daysnfc/nci: fix race with opening and closingLin Ma
Previously we leverage NCI_UNREG and the lock inside nci_close_device to prevent the race condition between opening a device and closing a device. However, it still has problem because a failed opening command will erase the NCI_UNREG flag and allow another opening command to bypass the status checking. This fix corrects that by making sure the NCI_UNREG is held. Reported-by: syzbot+43475bf3cfbd6e41f5b7@syzkaller.appspotmail.com Fixes: 48b71a9e66c2 ("NFC: add NCI_UNREG flag to eliminate the race") Signed-off-by: Lin Ma <linma@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
10 daysrxrpc: Fix race between conn bundle lookup and bundle removal [ZDI-CAN-15975]David Howells
After rxrpc_unbundle_conn() has removed a connection from a bundle, it checks to see if there are any conns with available channels and, if not, removes and attempts to destroy the bundle. Whilst it does check after grabbing client_bundles_lock that there are no connections attached, this races with rxrpc_look_up_bundle() retrieving the bundle, but not attaching a connection for the connection to be attached later. There is therefore a window in which the bundle can get destroyed before we manage to attach a new connection to it. Fix this by adding an "active" counter to struct rxrpc_bundle: (1) rxrpc_connect_call() obtains an active count by prepping/looking up a bundle and ditches it before returning. (2) If, during rxrpc_connect_call(), a connection is added to the bundle, this obtains an active count, which is held until the connection is discarded. (3) rxrpc_deactivate_bundle() is created to drop an active count on a bundle and destroy it when the active count reaches 0. The active count is checked inside client_bundles_lock() to prevent a race with rxrpc_look_up_bundle(). (4) rxrpc_unbundle_conn() then calls rxrpc_deactivate_bundle(). Fixes: 245500d853e9 ("rxrpc: Rewrite the client connection manager") Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-15975 Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: zdi-disclosures@trendmicro.com cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>
10 daysnet: neigh: decrement the family specific qlenThomas Zeitlhofer
Commit 0ff4eb3d5ebb ("neighbour: make proxy_queue.qlen limit per-device") introduced the length counter qlen in struct neigh_parms. There are separate neigh_parms instances for IPv4/ARP and IPv6/ND, and while the family specific qlen is incremented in pneigh_enqueue(), the mentioned commit decrements always the IPv4/ARP specific qlen, regardless of the currently processed family, in pneigh_queue_purge() and neigh_proxy_process(). As a result, with IPv6/ND, the family specific qlen is only incremented (and never decremented) until it exceeds PROXY_QLEN, and then, according to the check in pneigh_enqueue(), neighbor solicitations are not answered anymore. As an example, this is noted when using the subnet-router anycast address to access a Linux router. After a certain amount of time (in the observed case, qlen exceeded PROXY_QLEN after two days), the Linux router stops answering neighbor solicitations for its subnet-router anycast address and effectively becomes unreachable. Another result with IPv6/ND is that the IPv4/ARP specific qlen is decremented more often than incremented. This leads to negative qlen values, as a signed integer has been used for the length counter qlen, and potentially to an integer overflow. Fix this by introducing the helper function neigh_parms_qlen_dec(), which decrements the family specific qlen. Thereby, make use of the existing helper function neigh_get_dev_parms_rcu(), whose definition therefore needs to be placed earlier in neighbour.c. Take the family member from struct neigh_table to determine the currently processed family and appropriately call neigh_parms_qlen_dec() from pneigh_queue_purge() and neigh_proxy_process(). Additionally, use an unsigned integer for the length counter qlen. Fixes: 0ff4eb3d5ebb ("neighbour: make proxy_queue.qlen limit per-device") Signed-off-by: Thomas Zeitlhofer <thomas.zeitlhofer+lkml@ze-it.at> Signed-off-by: David S. Miller <davem@davemloft.net>
12 daystcp: configurable source port perturb table sizeGleb Mazovetskiy
On embedded systems with little memory and no relevant security concerns, it is beneficial to reduce the size of the table. Reducing the size from 2^16 to 2^8 saves 255 KiB of kernel RAM. Makes the table size configurable as an expert option. The size was previously increased from 2^8 to 2^16 in commit 4c2c8f03a5ab ("tcp: increase source port perturb table to 2^16"). Signed-off-by: Gleb Mazovetskiy <glex.spb@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
12 daysl2tp: Serialize access to sk_user_data with sk_callback_lockJakub Sitnicki
sk->sk_user_data has multiple users, which are not compatible with each other. Writers must synchronize by grabbing the sk->sk_callback_lock. l2tp currently fails to grab the lock when modifying the underlying tunnel socket fields. Fix it by adding appropriate locking. We err on the side of safety and grab the sk_callback_lock also inside the sk_destruct callback overridden by l2tp, even though there should be no refs allowing access to the sock at the time when sk_destruct gets called. v4: - serialize write to sk_user_data in l2tp sk_destruct v3: - switch from sock lock to sk_callback_lock - document write-protection for sk_user_data v2: - update Fixes to point to origin of the bug - use real names in Reported/Tested-by tags Cc: Tom Parkin <tparkin@katalix.com> Fixes: 3557baabf280 ("[L2TP]: PPP over L2TP driver core") Reported-by: Haowei Yan <g1042620637@gmail.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
12 daysnet: dsa: don't leak tagger-owned storage on switch driver unbindVladimir Oltean
In the initial commit dc452a471dba ("net: dsa: introduce tagger-owned storage for private and shared data"), we had a call to tag_ops->disconnect(dst) issued from dsa_tree_free(), which is called at tree teardown time. There were problems with connecting to a switch tree as a whole, so this got reworked to connecting to individual switches within the tree. In this process, tag_ops->disconnect(ds) was made to be called only from switch.c (cross-chip notifiers emitted as a result of dynamic tag proto changes), but the normal driver teardown code path wasn't replaced with anything. Solve this problem by adding a function that does the opposite of dsa_switch_setup_tag_protocol(), which is called from the equivalent spot in dsa_switch_teardown(). The positioning here also ensures that we won't have any use-after-free in tagging protocol (*rcv) ops, since the teardown sequence is as follows: dsa_tree_teardown -> dsa_tree_teardown_master -> dsa_master_teardown -> unsets master->dsa_ptr, making no further packets match the ETH_P_XDSA packet type handler -> dsa_tree_teardown_ports -> dsa_port_teardown -> dsa_slave_destroy -> unregisters DSA net devices, there is even a synchronize_net() in unregister_netdevice_many() -> dsa_tree_teardown_switches -> dsa_switch_teardown -> dsa_switch_teardown_tag_protocol -> finally frees the tagger-owned storage Fixes: 7f2973149c22 ("net: dsa: make tagging protocols connect to individual switches from a tree") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Saeed Mahameed <saeed@kernel.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20221114143551.1906361-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 daysnet/x25: Fix skb leak in x25_lapb_receive_frame()Wei Yongjun
x25_lapb_receive_frame() using skb_copy() to get a private copy of skb, the new skb should be freed in the undersized/fragmented skb error handling path. Otherwise there is a memory leak. Fixes: cb101ed2c3c7 ("x25: Handle undersized/fragmented skbs") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Martin Schiller <ms@dev.tdt.de> Link: https://lore.kernel.org/r/20221114110519.514538-1-weiyongjun@huaweicloud.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 daysbridge: switchdev: Fix memory leaks when changing VLAN protocolIdo Schimmel
The bridge driver can offload VLANs to the underlying hardware either via switchdev or the 8021q driver. When the former is used, the VLAN is marked in the bridge driver with the 'BR_VLFLAG_ADDED_BY_SWITCHDEV' private flag. To avoid the memory leaks mentioned in the cited commit, the bridge driver will try to delete a VLAN via the 8021q driver if the VLAN is not marked with the previously mentioned flag. When the VLAN protocol of the bridge changes, switchdev drivers are notified via the 'SWITCHDEV_ATTR_ID_BRIDGE_VLAN_PROTOCOL' attribute, but the 8021q driver is also called to add the existing VLANs with the new protocol and delete them with the old protocol. In case the VLANs were offloaded via switchdev, the above behavior is both redundant and buggy. Redundant because the VLANs are already programmed in hardware and drivers that support VLAN protocol change (currently only mlx5) change the protocol upon the switchdev attribute notification. Buggy because the 8021q driver is called despite these VLANs being marked with 'BR_VLFLAG_ADDED_BY_SWITCHDEV'. This leads to memory leaks [1] when the VLANs are deleted. Fix by not calling the 8021q driver for VLANs that were already programmed via switchdev. [1] unreferenced object 0xffff8881f6771200 (size 256): comm "ip", pid 446855, jiffies 4298238841 (age 55.240s) hex dump (first 32 bytes): 00 00 7f 0e 83 88 ff ff 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000012819ac>] vlan_vid_add+0x437/0x750 [<00000000f2281fad>] __br_vlan_set_proto+0x289/0x920 [<000000000632b56f>] br_changelink+0x3d6/0x13f0 [<0000000089d25f04>] __rtnl_newlink+0x8ae/0x14c0 [<00000000f6276baf>] rtnl_newlink+0x5f/0x90 [<00000000746dc902>] rtnetlink_rcv_msg+0x336/0xa00 [<000000001c2241c0>] netlink_rcv_skb+0x11d/0x340 [<0000000010588814>] netlink_unicast+0x438/0x710 [<00000000e1a4cd5c>] netlink_sendmsg+0x788/0xc40 [<00000000e8992d4e>] sock_sendmsg+0xb0/0xe0 [<00000000621b8f91>] ____sys_sendmsg+0x4ff/0x6d0 [<000000000ea26996>] ___sys_sendmsg+0x12e/0x1b0 [<00000000684f7e25>] __sys_sendmsg+0xab/0x130 [<000000004538b104>] do_syscall_64+0x3d/0x90 [<0000000091ed9678>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 Fixes: 279737939a81 ("net: bridge: Fix VLANs memory leak") Reported-by: Vlad Buslov <vladbu@nvidia.com> Tested-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20221114084509.860831-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 dayskcm: close race conditions on sk_receive_queueCong Wang
sk->sk_receive_queue is protected by skb queue lock, but for KCM sockets its RX path takes mux->rx_lock to protect more than just skb queue. However, kcm_recvmsg() still only grabs the skb queue lock, so race conditions still exist. We can teach kcm_recvmsg() to grab mux->rx_lock too but this would introduce a potential performance regression as struct kcm_mux can be shared by multiple KCM sockets. So we have to enforce skb queue lock in requeue_rx_msgs() and handle skb peek case carefully in kcm_wait_data(). Fortunately, skb_recv_datagram() already handles it nicely and is widely used by other sockets, we can just switch to skb_recv_datagram() after getting rid of the unnecessary sock lock in kcm_recvmsg() and kcm_splice_read(). Side note: SOCK_DONE is not used by KCM sockets, so it is safe to get rid of this check too. I ran the original syzbot reproducer for 30 min without seeing any issue. Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Reported-by: syzbot+278279efdd2730dd14bf@syzkaller.appspotmail.com Reported-by: shaozhengchao <shaozhengchao@huawei.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Link: https://lore.kernel.org/r/20221114005119.597905-1-xiyou.wangcong@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
14 daysnet: dsa: make dsa_master_ioctl() see through port_hwtstamp_get() shimsVladimir Oltean
There are multi-generational drivers like mv88e6xxx which have code like this: int mv88e6xxx_port_hwtstamp_get(struct dsa_switch *ds, int port, struct ifreq *ifr) { if (!chip->info->ptp_support) return -EOPNOTSUPP; ... } DSA wants to deny PTP timestamping on the master if the switch supports timestamping too. However it currently relies on the presence of the port_hwtstamp_get() callback to determine PTP capability, and this clearly does not work in that case (method is present but returns -EOPNOTSUPP). We should not deny PTP on the DSA master for those switches which truly do not support hardware timestamping. Create a dsa_port_supports_hwtstamp() method which actually probes for support by calling port_hwtstamp_get() and seeing whether that returned -EOPNOTSUPP or not. Fixes: f685e609a301 ("net: dsa: Deny PTP on master if switch supports it") Link: https://patchwork.kernel.org/project/netdevbpf/patch/20221110124345.3901389-1-festevam@gmail.com/ Reported-by: Fabio Estevam <festevam@gmail.com> Reported-by: Steffen Bätz <steffen@innosonix.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Fabio Estevam <festevam@denx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
14 daysnet: caif: fix double disconnect client in chnl_net_open()Zhengchao Shao
When connecting to client timeout, disconnect client for twice in chnl_net_open(). Remove one. Compile tested only. Fixes: 2aa40aef9deb ("caif: Use link layer MTU instead of fixed MTU") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-11net/tls: Fix memory leak in tls_enc_skb() and tls_sw_fallback_init()Yu Liao
'aead_req' and 'aead_send' is allocated but not freed in default switch case. This commit fixes the potential memory leak by freeing them under the situation. Note that the default cases here should never be reached as they'd mean we allowed offloading an unsupported algorithm. Fixes: ea7a9d88ba21 ("net/tls: Use cipher sizes structs") Signed-off-by: Yu Liao <liaoyu15@huawei.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Link: https://lore.kernel.org/r/20221110090329.2036382-1-liaoyu15@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-11Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski
Andrii Nakryiko says: ==================== bpf 2022-11-11 We've added 11 non-merge commits during the last 8 day(s) which contain a total of 11 files changed, 83 insertions(+), 74 deletions(-). The main changes are: 1) Fix strncpy_from_kernel_nofault() to prevent out-of-bounds writes, from Alban Crequy. 2) Fix for bpf_prog_test_run_skb() to prevent wrong alignment, from Baisong Zhong. 3) Switch BPF_DISPATCHER to static_call() instead of ftrace infra, with a small build fix on top, from Peter Zijlstra and Nathan Chancellor. 4) Fix memory leak in BPF verifier in some error cases, from Wang Yufen. 5) 32-bit compilation error fixes for BPF selftests, from Pu Lehui and Yang Jihong. 6) Ensure even distribution of per-CPU free list elements, from Xu Kuohai. 7) Fix copy_map_value() to track special zeroed out areas properly, from Xu Kuohai. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Fix offset calculation error in __copy_map_value and zero_map_value bpf: Initialize same number of free nodes for each pcpu_freelist selftests: bpf: Add a test when bpf_probe_read_kernel_str() returns EFAULT maccess: Fix writing offset in case of fault in strncpy_from_kernel_nofault() selftests/bpf: Fix test_progs compilation failure in 32-bit arch selftests/bpf: Fix casting error when cross-compiling test_verifier for 32-bit platforms bpf: Fix memory leaks in __check_func_call bpf: Add explicit cast to 'void *' for __BPF_DISPATCHER_UPDATE() bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace) bpf: Revert ("Fix dispatcher patchable function entry to 5 bytes nop") bpf, test_run: Fix alignment problem in bpf_prog_test_run_skb() ==================== Link: https://lore.kernel.org/r/20221111231624.938829-1-andrii@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-09mctp: Fix an error handling path in mctp_init()Wei Yongjun
If mctp_neigh_init() return error, the routes resources should be released in the error handling path. Otherwise some resources leak. Fixes: 4d8b9319282a ("mctp: Add neighbour implementation") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Matt Johnston <matt@codeconstruct.com.au> Link: https://lore.kernel.org/r/20221108095517.620115-1-weiyongjun@huaweicloud.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== The following patchset contains Netfilter fixes for net: 1) Fix deadlock in nfnetlink due to missing mutex release in error path, from Ziyang Xuan. 2) Clean up pending autoload module list from nf_tables_exit_net() path, from Shigeru Yoshida. 3) Fixes for the netfilter's reverse path selftest, from Phil Sutter. All of these bugs have been around for several releases. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-08Merge tag 'linux-can-fixes-for-6.1-20221107' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== can 2022-11-07 The first patch is by Chen Zhongjin and adds a missing dev_remove_pack() to the AF_CAN protocol. Zhengchao Shao's patch fixes a potential NULL pointer deref in AF_CAN's can_rx_register(). The next patch is by Oliver Hartkopp and targets the CAN ISO-TP protocol, and fixes the state handling for echo TX processing. Oliver Hartkopp's patch for the j1939 protocol adds a missing initialization of the CAN headers inside outgoing skbs. Another patch by Oliver Hartkopp fixes an out of bounds read in the check for invalid CAN frames in the xmit callback of virtual CAN devices. This touches all non virtual device drivers as we decided to rename the function requiring that netdev_priv points to a struct can_priv. (Note: This patch will create a merge conflict with net-next where the pch_can driver has removed.) The last patch is by Geert Uytterhoeven and adds the missing ECC error checks for the channels 2-7 in the rcar_canfd driver. * tag 'linux-can-fixes-for-6.1-20221107' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: rcar_canfd: Add missing ECC error checks for channels 2-7 can: dev: fix skb drop check can: j1939: j1939_send_one(): fix missing CAN header initialization can: isotp: fix tx state handling for echo tx processing can: af_can: fix NULL pointer dereference in can_rx_register() can: af_can: can_exit(): add missing dev_remove_pack() of canxl_packet ==================== Link: https://lore.kernel.org/r/20221107133217.59861-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-08netfilter: Cleanup nft_net->module_list from nf_tables_exit_net()Shigeru Yoshida
syzbot reported a warning like below [1]: WARNING: CPU: 3 PID: 9 at net/netfilter/nf_tables_api.c:10096 nf_tables_exit_net+0x71c/0x840 Modules linked in: CPU: 2 PID: 9 Comm: kworker/u8:0 Tainted: G W 6.1.0-rc3-00072-g8e5423e991e8 #47 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014 Workqueue: netns cleanup_net RIP: 0010:nf_tables_exit_net+0x71c/0x840 ... Call Trace: <TASK> ? __nft_release_table+0xfc0/0xfc0 ops_exit_list+0xb5/0x180 cleanup_net+0x506/0xb10 ? unregister_pernet_device+0x80/0x80 process_one_work+0xa38/0x1730 ? pwq_dec_nr_in_flight+0x2b0/0x2b0 ? rwlock_bug.part.0+0x90/0x90 ? _raw_spin_lock_irq+0x46/0x50 worker_thread+0x67e/0x10e0 ? process_one_work+0x1730/0x1730 kthread+0x2e5/0x3a0 ? kthread_complete_and_exit+0x40/0x40 ret_from_fork+0x1f/0x30 </TASK> In nf_tables_exit_net(), there is a case where nft_net->commit_list is empty but nft_net->module_list is not empty. Such a case occurs with the following scenario: 1. nfnetlink_rcv_batch() is called 2. nf_tables_newset() returns -EAGAIN and NFNL_BATCH_FAILURE bit is set to status 3. nf_tables_abort() is called with NFNL_ABORT_AUTOLOAD (nft_net->commit_list is released, but nft_net->module_list is not because of NFNL_ABORT_AUTOLOAD flag) 4. Jump to replay label 5. netlink_skb_clone() fails and returns from the function (this is caused by fault injection in the reproducer of syzbot) This patch fixes this issue by calling __nf_tables_abort() when nft_net->module_list is not empty in nf_tables_exit_net(). Fixes: eb014de4fd41 ("netfilter: nf_tables: autoload modules from the abort path") Link: https://syzkaller.appspot.com/bug?id=802aba2422de4218ad0c01b46c9525cc9d4e4aa3 [1] Reported-by: syzbot+178efee9e2d7f87f5103@syzkaller.appspotmail.com Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2022-11-08netfilter: nfnetlink: fix potential dead lock in nfnetlink_rcv_msg()Ziyang Xuan
When type is NFNL_CB_MUTEX and -EAGAIN error occur in nfnetlink_rcv_msg(), it does not execute nfnl_unlock(). That would trigger potential dead lock. Fixes: 50f2db9e368f ("netfilter: nfnetlink: consolidate callback types") Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2022-11-07sctp: clear out_curr if all frag chunks of current msg are prunedXin Long
A crash was reported by Zhen Chen: list_del corruption, ffffa035ddf01c18->next is NULL WARNING: CPU: 1 PID: 250682 at lib/list_debug.c:49 __list_del_entry_valid+0x59/0xe0 RIP: 0010:__list_del_entry_valid+0x59/0xe0 Call Trace: sctp_sched_dequeue_common+0x17/0x70 [sctp] sctp_sched_fcfs_dequeue+0x37/0x50 [sctp] sctp_outq_flush_data+0x85/0x360 [sctp] sctp_outq_uncork+0x77/0xa0 [sctp] sctp_cmd_interpreter.constprop.0+0x164/0x1450 [sctp] sctp_side_effects+0x37/0xe0 [sctp] sctp_do_sm+0xd0/0x230 [sctp] sctp_primitive_SEND+0x2f/0x40 [sctp] sctp_sendmsg_to_asoc+0x3fa/0x5c0 [sctp] sctp_sendmsg+0x3d5/0x440 [sctp] sock_sendmsg+0x5b/0x70 and in sctp_sched_fcfs_dequeue() it dequeued a chunk from stream out_curr outq while this outq was empty. Normally stream->out_curr must be set to NULL once all frag chunks of current msg are dequeued, as we can see in sctp_sched_dequeue_done(). However, in sctp_prsctp_prune_unsent() as it is not a proper dequeue, sctp_sched_dequeue_done() is not called to do this. This patch is to fix it by simply setting out_curr to NULL when the last frag chunk of current msg is dequeued from out_curr stream in sctp_prsctp_prune_unsent(). Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations") Reported-by: Zhen Chen <chenzhen126@huawei.com> Tested-by: Caowangbao <caowangbao@huawei.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-07sctp: remove the unnecessary sinfo_stream check in sctp_prsctp_prune_unsentXin Long
Since commit 5bbbbe32a431 ("sctp: introduce stream scheduler foundations"), sctp_stream_outq_migrate() has been called in sctp_stream_init/update to removes those chunks to streams higher than the new max. There is no longer need to do such check in sctp_prsctp_prune_unsent(). Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-07tipc: fix the msg->req tlv len check in tipc_nl_compat_name_table_dump_headerXin Long
This is a follow-up for commit 974cb0e3e7c9 ("tipc: fix uninit-value in tipc_nl_compat_name_table_dump") where it should have type casted sizeof(..) to int to work when TLV_GET_DATA_LEN() returns a negative value. syzbot reported a call trace because of it: BUG: KMSAN: uninit-value in ... tipc_nl_compat_name_table_dump+0x841/0xea0 net/tipc/netlink_compat.c:934 __tipc_nl_compat_dumpit+0xab2/0x1320 net/tipc/netlink_compat.c:238 tipc_nl_compat_dumpit+0x991/0xb50 net/tipc/netlink_compat.c:321 tipc_nl_compat_recv+0xb6e/0x1640 net/tipc/netlink_compat.c:1324 genl_family_rcv_msg_doit net/netlink/genetlink.c:731 [inline] genl_family_rcv_msg net/netlink/genetlink.c:775 [inline] genl_rcv_msg+0x103f/0x1260 net/netlink/genetlink.c:792 netlink_rcv_skb+0x3a5/0x6c0 net/netlink/af_netlink.c:2501 genl_rcv+0x3c/0x50 net/netlink/genetlink.c:803 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0xf3b/0x1270 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x1288/0x1440 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg net/socket.c:734 [inline] Reported-by: syzbot+e5dbaaa238680ce206ea@syzkaller.appspotmail.com Fixes: 974cb0e3e7c9 ("tipc: fix uninit-value in tipc_nl_compat_name_table_dump") Signed-off-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/ccd6a7ea801b15aec092c3b532a883b4c5708695.1667594933.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-07can: j1939: j1939_send_one(): fix missing CAN header initializationOliver Hartkopp
The read access to struct canxl_frame::len inside of a j1939 created skbuff revealed a missing initialization of reserved and later filled elements in struct can_frame. This patch initializes the 8 byte CAN header with zero. Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Cc: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/linux-can/20221104052235.GA6474@pengutronix.de Reported-by: syzbot+d168ec0caca4697e03b1@syzkaller.appspotmail.com Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20221104075000.105414-1-socketcan@hartkopp.net Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>