linux-2.6-block.git
2 years agoMerge branch 'dt-bindings-dp83867-add-binding-for-io_impedance_ctrl-nvmem-cell'
Jakub Kicinski [Fri, 17 Jun 2022 03:29:07 +0000 (20:29 -0700)]
Merge branch 'dt-bindings-dp83867-add-binding-for-io_impedance_ctrl-nvmem-cell'

Rasmus Villemoes says:

====================
dt-bindings: dp83867: add binding for io_impedance_ctrl nvmem cell

We have a board where measurements indicate that the current three
options - leaving IO_IMPEDANCE_CTRL at the reset value (which is
factory calibrated to a value corresponding to approximately 50 ohms)
or using one of the two boolean properties to set it to the min/max
value - are too coarse.

This series adds a device tree binding for an nvmem cell which can be
populated during production with a suitable value calibrated for each
board, and corresponding support in the driver. The second patch adds
a trivial phy wrapper for dev_err_probe(), used in the third.
====================

Link: https://lore.kernel.org/r/20220614084612.325229-1-linux@rasmusvillemoes.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: phy: dp83867: implement support for io_impedance_ctrl nvmem cell
Rasmus Villemoes [Tue, 14 Jun 2022 08:46:12 +0000 (10:46 +0200)]
net: phy: dp83867: implement support for io_impedance_ctrl nvmem cell

We have a board where measurements indicate that the current three
options - leaving IO_IMPEDANCE_CTRL at the (factory calibrated) reset
value or using one of the two boolean properties to set it to the
min/max value - are too coarse.

Implement support for the newly added binding allowing device tree to
specify an nvmem cell containing an appropriate value for this
specific board.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agolinux/phy.h: add phydev_err_probe() wrapper for dev_err_probe()
Rasmus Villemoes [Tue, 14 Jun 2022 08:46:11 +0000 (10:46 +0200)]
linux/phy.h: add phydev_err_probe() wrapper for dev_err_probe()

The dev_err_probe() function is quite useful to avoid boilerplate
related to -EPROBE_DEFER handling. Add a phydev_err_probe() helper to
simplify making use of that from phy drivers which otherwise use the
phydev_* helpers.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agodt-bindings: dp83867: add binding for io_impedance_ctrl nvmem cell
Rasmus Villemoes [Tue, 14 Jun 2022 08:46:10 +0000 (10:46 +0200)]
dt-bindings: dp83867: add binding for io_impedance_ctrl nvmem cell

We have a board where measurements indicate that the current three
options - leaving IO_IMPEDANCE_CTRL at the reset value (which is
factory calibrated to a value corresponding to approximately 50 ohms)
or using one of the two boolean properties to set it to the min/max
value - are too coarse.

There is no fixed mapping from register values to values in the range
35-70 ohms; it varies from chip to chip, and even that target range is
approximate. So add a DT binding for an nvmem cell which can be
populated during production with a value suitable for each specific
board.

Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Jakub Kicinski [Fri, 17 Jun 2022 03:13:52 +0000 (20:13 -0700)]
Merge git://git./linux/kernel/git/netdev/net

No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'net-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Thu, 16 Jun 2022 18:51:32 +0000 (11:51 -0700)]
Merge tag 'net-5.19-rc3' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Mostly driver fixes.

  Current release - regressions:

   - Revert "net: Add a second bind table hashed by port and address",
     needs more work

   - amd-xgbe: use platform_irq_count(), static setup of IRQ resources
     had been removed from DT core

   - dts: at91: ksz9477_evb: add phy-mode to fix port/phy validation

  Current release - new code bugs:

   - hns3: modify the ring param print info

  Previous releases - always broken:

   - axienet: make the 64b addressable DMA depends on 64b architectures

   - iavf: fix issue with MAC address of VF shown as zero

   - ice: fix PTP TX timestamp offset calculation

   - usb: ax88179_178a needs FLAG_SEND_ZLP

  Misc:

   - document some net.sctp.* sysctls"

* tag 'net-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (31 commits)
  net: axienet: add missing error return code in axienet_probe()
  Revert "net: Add a second bind table hashed by port and address"
  net: ax25: Fix deadlock caused by skb_recv_datagram in ax25_recvmsg
  net: usb: ax88179_178a needs FLAG_SEND_ZLP
  MAINTAINERS: add include/dt-bindings/net to NETWORKING DRIVERS
  ARM: dts: at91: ksz9477_evb: fix port/phy validation
  net: bgmac: Fix an erroneous kfree() in bgmac_remove()
  ice: Fix memory corruption in VF driver
  ice: Fix queue config fail handling
  ice: Sync VLAN filtering features for DVM
  ice: Fix PTP TX timestamp offset calculation
  mlxsw: spectrum_cnt: Reorder counter pools
  docs: networking: phy: Fix a typo
  amd-xgbe: Use platform_irq_count()
  octeontx2-vf: Add support for adaptive interrupt coalescing
  xilinx:  Fix build on x86.
  net: axienet: Use iowrite64 to write all 64b descriptor pointers
  net: axienet: make the 64b addresable DMA depends on 64b archectures
  net: hns3: fix tm port shapping of fibre port is incorrect after driver initialization
  net: hns3: fix PF rss size initialization bug
  ...

2 years agonet: axienet: add missing error return code in axienet_probe()
Yang Yingliang [Thu, 16 Jun 2022 06:29:17 +0000 (14:29 +0800)]
net: axienet: add missing error return code in axienet_probe()

It should return error code in error path in axienet_probe().

Fixes: 00be43a74ca2 ("net: axienet: make the 64b addresable DMA depends on 64b archectures")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20220616062917.3601-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoRevert "net: Add a second bind table hashed by port and address"
Joanne Koong [Wed, 15 Jun 2022 19:32:13 +0000 (12:32 -0700)]
Revert "net: Add a second bind table hashed by port and address"

This reverts:

commit d5a42de8bdbe ("net: Add a second bind table hashed by port and address")
commit 538aaf9b2383 ("selftests: Add test for timing a bind request to a port with a populated bhash entry")
Link: https://lore.kernel.org/netdev/20220520001834.2247810-1-kuba@kernel.org/
There are a few things that need to be fixed here:
* Updating bhash2 in cases where the socket's rcv saddr changes
* Adding bhash2 hashbucket locks

Links to syzbot reports:
https://lore.kernel.org/netdev/00000000000022208805e0df247a@google.com/
https://lore.kernel.org/netdev/0000000000003f33bc05dfaf44fe@google.com/

Fixes: d5a42de8bdbe ("net: Add a second bind table hashed by port and address")
Reported-by: syzbot+015d756bbd1f8b5c8f09@syzkaller.appspotmail.com
Reported-by: syzbot+98fd2d1422063b0f8c44@syzkaller.appspotmail.com
Reported-by: syzbot+0a847a982613c6438fba@syzkaller.appspotmail.com
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20220615193213.2419568-1-joannelkoong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'net-mana-add-pf-and-xdp_redirect-support'
Paolo Abeni [Thu, 16 Jun 2022 08:40:29 +0000 (10:40 +0200)]
Merge branch 'net-mana-add-pf-and-xdp_redirect-support'

Haiyang Zhang says:

====================
net: mana: Add PF and XDP_REDIRECT support

The patch set adds PF and XDP_REDIRECT support.
====================

Link: https://lore.kernel.org/r/1655238535-19257-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: mana: Add support of XDP_REDIRECT action
Haiyang Zhang [Tue, 14 Jun 2022 20:28:55 +0000 (13:28 -0700)]
net: mana: Add support of XDP_REDIRECT action

Add a handler of the XDP_REDIRECT return code from a XDP program. The
packets will be flushed at the end of each RX/CQ NAPI poll cycle.
ndo_xdp_xmit() is implemented by sharing the code in mana_xdp_tx().
Ethtool per queue counters are added for XDP redirect and xmit operations.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: mana: Add the Linux MANA PF driver
Dexuan Cui [Tue, 14 Jun 2022 20:28:54 +0000 (13:28 -0700)]
net: mana: Add the Linux MANA PF driver

This minimal PF driver runs on bare metal.
Currently Ethernet TX/RX works. SR-IOV management is not supported yet.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Co-developed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: ethernet: stmmac: reset force speed bit for ipq806x
Christian 'Ansuel' Marangi [Tue, 14 Jun 2022 11:22:28 +0000 (13:22 +0200)]
net: ethernet: stmmac: reset force speed bit for ipq806x

Some bootloader may set the force speed regs even if the actual
interface should use autonegotiation between PCS and PHY.
This cause the complete malfuction of the interface.

To fix this correctly reset the force speed regs if a fixed-link is not
defined in the DTS. With a fixed-link node correctly configure the
forced speed regs to handle any misconfiguration by the bootloader.

Reported-by: Mark Mentovai <mark@moxienet.com>
Co-developed-by: Mark Mentovai <mark@moxienet.com>
Signed-off-by: Mark Mentovai <mark@moxienet.com>
Signed-off-by: Christian 'Ansuel' Marangi <ansuelsmth@gmail.com>
Link: https://lore.kernel.org/r/20220614112228.1998-2-ansuelsmth@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agonet: ethernet: stmmac: add missing sgmii configure for ipq806x
Christian 'Ansuel' Marangi [Tue, 14 Jun 2022 11:22:27 +0000 (13:22 +0200)]
net: ethernet: stmmac: add missing sgmii configure for ipq806x

The different gmacid require different configuration based on the soc
and on the gmac id. Add these missing configuration taken from the
original driver.

Signed-off-by: Christian 'Ansuel' Marangi <ansuelsmth@gmail.com>
Link: https://lore.kernel.org/r/20220614112228.1998-1-ansuelsmth@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agomlxbf_gige: remove own module name define and use KBUILD_MODNAME instead
David Thompson [Tue, 14 Jun 2022 21:26:02 +0000 (17:26 -0400)]
mlxbf_gige: remove own module name define and use KBUILD_MODNAME instead

This patch adds use of KBUILD_MODNAME as defined by the build system,
replacing the definition and use of a custom-defined name.

Signed-off-by: David Thompson <davthompson@nvidia.com>
Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com>
Link: https://lore.kernel.org/r/20220614212602.28061-1-davthompson@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge tag 'hardening-v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 15 Jun 2022 21:20:26 +0000 (14:20 -0700)]
Merge tag 'hardening-v5.19-rc3' of git://git./linux/kernel/git/kees/linux

Pull hardening fixes from Kees Cook:

 - Correctly handle vm_map areas in hardened usercopy (Matthew Wilcox)

 - Adjust CFI RCU usage to avoid boot splats with cpuidle (Sami Tolvanen)

* tag 'hardening-v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  usercopy: Make usercopy resilient against ridiculously large copies
  usercopy: Cast pointer to an integer once
  usercopy: Handle vm_map_ram() areas
  cfi: Fix __cfi_slowpath_diag RCU usage with cpuidle

2 years agoMerge tag 'tpmdd-next-v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 15 Jun 2022 19:34:19 +0000 (12:34 -0700)]
Merge tag 'tpmdd-next-v5.19-rc3' of git://git./linux/kernel/git/jarkko/linux-tpmdd

Pull tpm fixes from Jarkko Sakkinen:
 "Two fixes for this merge window"

* tag 'tpmdd-next-v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
  certs: fix and refactor CONFIG_SYSTEM_BLACKLIST_HASH_LIST build
  certs/blacklist_hashes.c: fix const confusion in certs blacklist

2 years agocerts: fix and refactor CONFIG_SYSTEM_BLACKLIST_HASH_LIST build
Masahiro Yamada [Sat, 11 Jun 2022 17:22:31 +0000 (02:22 +0900)]
certs: fix and refactor CONFIG_SYSTEM_BLACKLIST_HASH_LIST build

Commit addf466389d9 ("certs: Check that builtin blacklist hashes are
valid") was applied 8 months after the submission.

In the meantime, the base code had been removed by commit b8c96a6b466c
("certs: simplify $(srctree)/ handling and remove config_filename
macro").

Fix the Makefile.

Create a local copy of $(CONFIG_SYSTEM_BLACKLIST_HASH_LIST). It is
included from certs/blacklist_hashes.c and also works as a timestamp.

Send error messages from check-blacklist-hashes.awk to stderr instead
of stdout.

Fixes: addf466389d9 ("certs: Check that builtin blacklist hashes are valid")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Mickaël Salaün <mic@linux.microsoft.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2 years agocerts/blacklist_hashes.c: fix const confusion in certs blacklist
Masahiro Yamada [Sat, 11 Jun 2022 17:22:30 +0000 (02:22 +0900)]
certs/blacklist_hashes.c: fix const confusion in certs blacklist

This file fails to compile as follows:

  CC      certs/blacklist_hashes.o
certs/blacklist_hashes.c:4:1: error: ignoring attribute ‘section (".init.data")’ because it conflicts with previous ‘section (".init.rodata")’ [-Werror=attributes]
    4 | const char __initdata *const blacklist_hashes[] = {
      | ^~~~~
In file included from certs/blacklist_hashes.c:2:
certs/blacklist.h:5:38: note: previous declaration here
    5 | extern const char __initconst *const blacklist_hashes[];
      |                                      ^~~~~~~~~~~~~~~~

Apply the same fix as commit 2be04df5668d ("certs/blacklist_nohashes.c:
fix const confusion in certs blacklist").

Fixes: 734114f8782f ("KEYS: Add a system blacklist keyring")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Reviewed-by: Mickaël Salaün <mic@linux.microsoft.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2 years agoMerge tag 'fs.fixes.v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/braun...
Linus Torvalds [Wed, 15 Jun 2022 16:04:55 +0000 (09:04 -0700)]
Merge tag 'fs.fixes.v5.19-rc3' of git://git./linux/kernel/git/brauner/linux

Pull vfs idmapping fix from Christian Brauner:
 "This fixes an issue where we fail to change the group of a file when
  the caller owns the file and is a member of the group to change to.

  This is only relevant on idmapped mounts.

  There's a detailed description in the commit message and regression
  tests have been added to xfstests"

* tag 'fs.fixes.v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  fs: account for group membership

2 years agonet: sparx5: Allow mdb entries to both CPU and ports
Casper Andersson [Tue, 14 Jun 2022 09:25:32 +0000 (11:25 +0200)]
net: sparx5: Allow mdb entries to both CPU and ports

Allow mdb entries to be forwarded to CPU and be switched at the same
time. Only remove entry when no port and the CPU isn't part of the group
anymore.

Signed-off-by: Casper Andersson <casper.casan@gmail.com>
Acked-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ax25: Fix deadlock caused by skb_recv_datagram in ax25_recvmsg
Duoming Zhou [Tue, 14 Jun 2022 09:25:57 +0000 (17:25 +0800)]
net: ax25: Fix deadlock caused by skb_recv_datagram in ax25_recvmsg

The skb_recv_datagram() in ax25_recvmsg() will hold lock_sock
and block until it receives a packet from the remote. If the client
doesn`t connect to server and calls read() directly, it will not
receive any packets forever. As a result, the deadlock will happen.

The fail log caused by deadlock is shown below:

[  369.606973] INFO: task ax25_deadlock:157 blocked for more than 245 seconds.
[  369.608919] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  369.613058] Call Trace:
[  369.613315]  <TASK>
[  369.614072]  __schedule+0x2f9/0xb20
[  369.615029]  schedule+0x49/0xb0
[  369.615734]  __lock_sock+0x92/0x100
[  369.616763]  ? destroy_sched_domains_rcu+0x20/0x20
[  369.617941]  lock_sock_nested+0x6e/0x70
[  369.618809]  ax25_bind+0xaa/0x210
[  369.619736]  __sys_bind+0xca/0xf0
[  369.620039]  ? do_futex+0xae/0x1b0
[  369.620387]  ? __x64_sys_futex+0x7c/0x1c0
[  369.620601]  ? fpregs_assert_state_consistent+0x19/0x40
[  369.620613]  __x64_sys_bind+0x11/0x20
[  369.621791]  do_syscall_64+0x3b/0x90
[  369.622423]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  369.623319] RIP: 0033:0x7f43c8aa8af7
[  369.624301] RSP: 002b:00007f43c8197ef8 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
[  369.625756] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f43c8aa8af7
[  369.626724] RDX: 0000000000000010 RSI: 000055768e2021d0 RDI: 0000000000000005
[  369.628569] RBP: 00007f43c8197f00 R08: 0000000000000011 R09: 00007f43c8198700
[  369.630208] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff845e6afe
[  369.632240] R13: 00007fff845e6aff R14: 00007f43c8197fc0 R15: 00007f43c8198700

This patch replaces skb_recv_datagram() with an open-coded variant of it
releasing the socket lock before the __skb_wait_for_more_packets() call
and re-acquiring it after such call in order that other functions that
need socket lock could be executed.

what's more, the socket lock will be released only when recvmsg() will
block and that should produce nicer overall behavior.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Suggested-by: Thomas Osterried <thomas@osterried.de>
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reported-by: Thomas Habets <thomas@@habets.se>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agobcm63xx_enet: switch to napi_build_skb() to reuse skbuff_heads
Sieng Piaw Liew [Wed, 15 Jun 2022 06:09:22 +0000 (14:09 +0800)]
bcm63xx_enet: switch to napi_build_skb() to reuse skbuff_heads

napi_build_skb() reuses NAPI skbuff_head cache in order to save some
cycles on freeing/allocating skbuff_heads on every new Rx or completed
Tx.
Use napi_consume_skb() to feed the cache with skbuff_heads of completed
Tx so it's never empty.

Signed-off-by: Sieng Piaw Liew <liew.s.piaw@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: don't check skb_count twice
Sieng Piaw Liew [Wed, 15 Jun 2022 03:24:26 +0000 (11:24 +0800)]
net: don't check skb_count twice

NAPI cache skb_count is being checked twice without condition. Change to
checking the second time only if the first check is run.

Signed-off-by: Sieng Piaw Liew <liew.s.piaw@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: bridge: allow add/remove permanent mdb entries on disabled ports
Casper Andersson [Tue, 14 Jun 2022 06:32:23 +0000 (08:32 +0200)]
net: bridge: allow add/remove permanent mdb entries on disabled ports

Adding mdb entries on disabled ports allows you to do setup before
accepting any traffic, avoiding any time where the port is not in the
multicast group.

Signed-off-by: Casper Andersson <casper.casan@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: usb: ax88179_178a needs FLAG_SEND_ZLP
Jose Alonso [Mon, 13 Jun 2022 18:32:44 +0000 (15:32 -0300)]
net: usb: ax88179_178a needs FLAG_SEND_ZLP

The extra byte inserted by usbnet.c when
 (length % dev->maxpacket == 0) is causing problems to device.

This patch sets FLAG_SEND_ZLP to avoid this.

Tested with: 0b95:1790 ASIX Electronics Corp. AX88179 Gigabit Ethernet

Problems observed:
======================================================================
1) Using ssh/sshfs. The remote sshd daemon can abort with the message:
   "message authentication code incorrect"
   This happens because the tcp message sent is corrupted during the
   USB "Bulk out". The device calculate the tcp checksum and send a
   valid tcp message to the remote sshd. Then the encryption detects
   the error and aborts.
2) NETDEV WATCHDOG: ... (ax88179_178a): transmit queue 0 timed out
3) Stop normal work without any log message.
   The "Bulk in" continue receiving packets normally.
   The host sends "Bulk out" and the device responds with -ECONNRESET.
   (The netusb.c code tx_complete ignore -ECONNRESET)
Under normal conditions these errors take days to happen and in
intense usage take hours.

A test with ping gives packet loss, showing that something is wrong:
ping -4 -s 462 {destination} # 462 = 512 - 42 - 8
Not all packets fail.
My guess is that the device tries to find another packet starting
at the extra byte and will fail or not depending on the next
bytes (old buffer content).
======================================================================

Signed-off-by: Jose Alonso <joalonsof@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoi40e: add xdp frags support to ndo_xdp_xmit
Lorenzo Bianconi [Mon, 13 Jun 2022 16:51:50 +0000 (09:51 -0700)]
i40e: add xdp frags support to ndo_xdp_xmit

Add the capability to map non-linear xdp frames in XDP_TX and ndo_xdp_xmit
callback.

Tested-by: Sarkar Tirthendu <tirthendu.sarkar@intel.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: phy: marvell-88x2222: set proper phydev->port
Ivan Bornyakov [Sun, 12 Jun 2022 18:19:34 +0000 (21:19 +0300)]
net: phy: marvell-88x2222: set proper phydev->port

phydev->port was not set and always reported as PORT_TP.
Set phydev->port according to inserted SFP module.

Signed-off-by: Ivan Bornyakov <i.bornyakov@metrotek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agodt-bindings: net: xilinx: document xilinx emaclite driver binding
Radhey Shyam Pandey [Thu, 9 Jun 2022 16:53:35 +0000 (22:23 +0530)]
dt-bindings: net: xilinx: document xilinx emaclite driver binding

Add basic description for the xilinx emaclite driver DT bindings.

Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net...
David S. Miller [Wed, 15 Jun 2022 08:15:33 +0000 (09:15 +0100)]
Merge branch '100GbE' of git://git./linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-06-14

This series contains updates to ice driver only.

Michal fixes incorrect Tx timestamp offset calculation for E822 devices.

Roman enforces required VLAN filtering settings for double VLAN mode.

Przemyslaw fixes memory corruption issues with VFs by ensuring
queues are disabled in the error path of VF queue configuration and to
disabled VFs during reset.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'ipa-simplify-completion-stats'
David S. Miller [Wed, 15 Jun 2022 08:07:58 +0000 (09:07 +0100)]
Merge branch 'ipa-simplify-completion-stats'

Alex Elder says:

====================
net: ipa: simplify completion statistics

The first patch in this series makes the name used for variables
representing a TRE ring be consistent everywhere.  The second
renames two structure fields to better represent their purpose.

The last four rework a little code that manages some tranaction and
byte transfer statistics maintained mainly for TX endpoints.  For
the most part this series is refactoring.  The last one also
includes the first step toward no longer assuming an event ring is
dedicated to a single channel.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipa: rework gsi_channel_tx_update()
Alex Elder [Mon, 13 Jun 2022 17:17:59 +0000 (12:17 -0500)]
net: ipa: rework gsi_channel_tx_update()

Rename gsi_channel_tx_update() to be gsi_trans_tx_completed(), and
pass it just the transaction pointer, deriving the channel from the
transaction.  Update the comments above the function to provide a
more concise description of how statistics for TX endpoints are
maintained and used.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipa: stop counting total RX bytes and transactions
Alex Elder [Mon, 13 Jun 2022 17:17:58 +0000 (12:17 -0500)]
net: ipa: stop counting total RX bytes and transactions

In gsi_evt_ring_rx_update(), we update each transaction so its len
field reflects the actual number of bytes received.  In the process,
the total number of transactions and bytes processed on the channel
are summed, and added to a running total for the channel.

But we don't actually use those running totals for RX endpoints.
They're maintained for TX channels to support CoDel when they are
associated with a "real" network device.

So stop maintaining these totals for RX endpoints, and update the
comment where the fields are defined to make it clear they're only
valid for TX channels.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipa: simplify TX completion statistics
Alex Elder [Mon, 13 Jun 2022 17:17:57 +0000 (12:17 -0500)]
net: ipa: simplify TX completion statistics

When a TX request is issued, its channel's accumulated byte and
transaction counts are recorded.  This currently does *not* take
into account the transaction being committed.

Later, when the transaction completes, the number of bytes and
transactions that have completed since the transaction was committed
are reported to the network stack.  The transaction and its byte
count are accounted for at that time.

Instead, record the transaction and its bytes in the counts recorded
at commit time.  This avoids the need to do so when the transaction
completes, and provides a (small) simplification of that code.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipa: introduce gsi_trans_tx_committed()
Alex Elder [Mon, 13 Jun 2022 17:17:56 +0000 (12:17 -0500)]
net: ipa: introduce gsi_trans_tx_committed()

Create a new function that encapsulates recording information needed
for TX channel statistics when a transaction is committed.

Record the accumulated length in the transaction before the call
(for both RX and TX), so it can be used when updating TX statistics.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipa: rename two transaction fields
Alex Elder [Mon, 13 Jun 2022 17:17:55 +0000 (12:17 -0500)]
net: ipa: rename two transaction fields

There are two fields in a GSI transaction that keep track of TRE
counts.  The first represents the number of TREs reserved for the
transaction in the TRE ring; that's currently named "tre_count".
The second is the number of TREs that are actually *used* by the
transaction at the time it is committed.

Rename the "tre_count" field to be "rsvd_count", to make its meaning
a little more specific.  The "_count" is present in the name mainly
to avoid interpreting it as a reserved (not-to-be-used) field.  This
name also distinguishes it from the "tre_count" field associated
with a channel.

Rename the "used" field to be "used_count", to match the convention
used for reserved TREs.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipa: use "tre_ring" for all TRE ring local variables
Alex Elder [Mon, 13 Jun 2022 17:17:54 +0000 (12:17 -0500)]
net: ipa: use "tre_ring" for all TRE ring local variables

All local variables that represent event rings are named "ring".

All but two functions that represent a channel's TRE ring with a
local variable use the name "tre_ring".  For consistency, use that
name in the two functions that don't fit the pattern.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'support-mt7531-on-bpi-r2-pro'
Jakub Kicinski [Wed, 15 Jun 2022 05:35:18 +0000 (22:35 -0700)]
Merge branch 'support-mt7531-on-bpi-r2-pro'

Frank Wunderlich says:

====================
Support mt7531 on BPI-R2 Pro

This Series add Support for the mt7531 switch on Bananapi R2 Pro board.

This board uses port5 of the switch to conect to the gmac0 of the
rk3568 SoC.

Currently CPU-Port is hardcoded in the mt7530 driver to port 6.

Compared to v1 the reset-Patch was dropped as it was not needed and
CPU-Port-changes are completely rewriten based on suggestions/code from
Vladimir Oltean (many thanks to this).
In DTS Patch i only dropped the status-property that was not
needed/ignored by driver.

Due to the Changes i also made a regression test on mt7623 bpi-r2
(mt7623 soc + mt7530) and bpi-r64 (mt7622 soc + mt7531) with cpu-
port 6. Tests were done directly (ipv4 config on dsa user port)
and with vlan-aware bridge including vlan that was tagged outgoing
on dsa user port.
====================

Link: https://lore.kernel.org/r/20220610170541.8643-1-linux@fw-web.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoarm64: dts: rockchip: Add mt7531 dsa node to BPI-R2-Pro board
Frank Wunderlich [Fri, 10 Jun 2022 17:05:41 +0000 (19:05 +0200)]
arm64: dts: rockchip: Add mt7531 dsa node to BPI-R2-Pro board

Add Device Tree node for mt7531 switch connected to gmac0.

Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agodt-bindings: net: dsa: make reset optional and add rgmii-mode to mt7531
Frank Wunderlich [Fri, 10 Jun 2022 17:05:40 +0000 (19:05 +0200)]
dt-bindings: net: dsa: make reset optional and add rgmii-mode to mt7531

A board may have no independent reset-line, so reset cannot be used
inside switch driver.

E.g. on Bananapi-R2 Pro switch and gmac are connected to same reset-line.

Resets should be acquired only to 1 device/driver. This prevents reset to
be bound to switch-driver if reset is already used for gmac. If reset is
only used by switch driver it resets the switch *and* the gmac after the
mdio bus comes up resulting in mdio bus goes down. It takes some time
until all is up again, switch driver tries to read from mdio, will fail
and defer the probe. On next try the reset does the same again.

Make reset optional for such boards.

Allow port 5 as cpu-port and phy-mode rgmii for mt7531.

- MT7530 supports RGMII on port 5 and RGMII/TRGMII on port 6.
- MT7531 supports on port 5 RGMII and SGMII (dual-sgmii) and
  SGMII on port 6.

Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: dsa: mt7530: get cpu-port via dp->cpu_dp instead of constant
Frank Wunderlich [Fri, 10 Jun 2022 17:05:39 +0000 (19:05 +0200)]
net: dsa: mt7530: get cpu-port via dp->cpu_dp instead of constant

Replace last occurences of hardcoded cpu-port by cpu_dp member of
dsa_port struct.

Now the constant can be dropped.

Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: dsa: mt7530: rework mt753[01]_setup
Frank Wunderlich [Fri, 10 Jun 2022 17:05:38 +0000 (19:05 +0200)]
net: dsa: mt7530: rework mt753[01]_setup

Enumerate available cpu-ports instead of using hardcoded constant.

Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: dsa: mt7530: rework mt7530_hw_vlan_{add,del}
Frank Wunderlich [Fri, 10 Jun 2022 17:05:37 +0000 (19:05 +0200)]
net: dsa: mt7530: rework mt7530_hw_vlan_{add,del}

Rework vlan_add/vlan_del functions in preparation for dynamic cpu port.

Currently BIT(MT7530_CPU_PORT) is added to new_members, even though
mt7530_port_vlan_add() will be called on the CPU port too.

Let DSA core decide when to call port_vlan_add for the CPU port, rather
than doing it implicitly.

We can do autonomous forwarding in a certain VLAN, but not add br0 to that
VLAN and avoid flooding the CPU with those packets, if software knows it
doesn't need to process them.

Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agodt-bindings: net: dsa: convert binding for mediatek switches
Frank Wunderlich [Fri, 10 Jun 2022 17:05:36 +0000 (19:05 +0200)]
dt-bindings: net: dsa: convert binding for mediatek switches

Convert txt binding to yaml binding for Mediatek switches.

Signed-off-by: Frank Wunderlich <frank-w@public-files.de>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMAINTAINERS: add include/dt-bindings/net to NETWORKING DRIVERS
Lukas Bulwahn [Mon, 13 Jun 2022 12:18:26 +0000 (14:18 +0200)]
MAINTAINERS: add include/dt-bindings/net to NETWORKING DRIVERS

Maintainers of the directory Documentation/devicetree/bindings/net
are also the maintainers of the corresponding directory
include/dt-bindings/net.

Add the file entry for include/dt-bindings/net to the appropriate
section in MAINTAINERS.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20220613121826.11484-1-lukas.bulwahn@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoARM: dts: at91: ksz9477_evb: fix port/phy validation
Oleksij Rempel [Fri, 10 Jun 2022 08:16:21 +0000 (10:16 +0200)]
ARM: dts: at91: ksz9477_evb: fix port/phy validation

Latest drivers version requires phy-mode to be set. Otherwise we will
use "NA" mode and the switch driver will invalidate this port mode.

Fixes: 65ac79e18120 ("net: dsa: microchip: add the phylink get_caps")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20220610081621.584393-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'mlxsw-remove-xm-support'
Jakub Kicinski [Wed, 15 Jun 2022 04:51:07 +0000 (21:51 -0700)]
Merge branch 'mlxsw-remove-xm-support'

Ido Schimmel says:

====================
mlxsw: Remove XM support

The XM was supposed to be an external device connected to the
Spectrum-{2,3} ASICs using dedicated Ethernet ports. Its purpose was to
increase the number of routes that can be offloaded to hardware. This was
achieved by having the ASIC act as a cache that refers cache misses to the
XM where the FIB is stored and LPM lookup is performed.

Testing was done over an emulator and dedicated setups in the lab, but
the product was discontinued before shipping to customers.

Therefore, in order to remove dead code and reduce complexity of the
code base, revert the three patchsets that added XM support.
====================

Link: https://lore.kernel.org/r/20220613132116.2021055-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomlxsw: Revert "Prepare for XM implementation - LPM trees"
Petr Machata [Mon, 13 Jun 2022 13:21:16 +0000 (16:21 +0300)]
mlxsw: Revert "Prepare for XM implementation - LPM trees"

This reverts commit 923ba95ea22d ("Merge branch
'mlxsw-spectrum-prepare-for-xm-implementation-lpm-trees'").

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomlxsw: Revert "Prepare for XM implementation - prefix insertion and removal"
Petr Machata [Mon, 13 Jun 2022 13:21:15 +0000 (16:21 +0300)]
mlxsw: Revert "Prepare for XM implementation - prefix insertion and removal"

This reverts commit e7086213f7b4 ("Merge branch
'mlxsw-spectrum-prepare-for-xm-implementation-prefix-insertion-and-removal'").

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agomlxsw: Revert "Introduce initial XM router support"
Petr Machata [Mon, 13 Jun 2022 13:21:14 +0000 (16:21 +0300)]
mlxsw: Revert "Introduce initial XM router support"

This reverts commit 75c2a8fe8e39 ("Merge branch
'mlxsw-introduce-initial-xm-router-support'").

Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: bgmac: Fix an erroneous kfree() in bgmac_remove()
Christophe JAILLET [Mon, 13 Jun 2022 20:53:50 +0000 (22:53 +0200)]
net: bgmac: Fix an erroneous kfree() in bgmac_remove()

'bgmac' is part of a managed resource allocated with bgmac_alloc(). It
should not be freed explicitly.

Remove the erroneous kfree() from the .remove() function.

Fixes: 34a5102c3235 ("net: bgmac: allocate struct bgmac just once & don't copy it")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/a026153108dd21239036a032b95c25b5cece253b.1655153616.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoMerge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox...
Jakub Kicinski [Wed, 15 Jun 2022 02:09:39 +0000 (19:09 -0700)]
Merge branch 'mlx5-next' of git://git./linux/kernel/git/mellanox/linux

Saeed Mahameed says:

====================
mlx5-next: updates 2022-06-14

1) Updated HW bits and definitions for upcoming features
 1.1) vport debug counters
 1.2) flow meter
 1.3) Execute ASO action for flow entry
 1.4) enhanced CQE compression

2) Add ICM header-modify-pattern RDMA API

Leon Says
=========

SW steering manipulates packet's header using "modifying header" actions.
Many of these actions do the same operation, but use different data each time.
Currently we create and keep every one of these actions, which use expensive
and limited resources.

Now we introduce a new mechanism - pattern and argument, which splits
a modifying action into two parts:
1. action pattern: contains the operations to be applied on packet's header,
mainly set/add/copy of fields in the packet
2. action data/argument: contains the data to be used by each operation
in the pattern.

This way we reuse same patterns with different arguments to create new
modifying actions, and since many actions share the same operations, we end
up creating a small number of patterns that we keep in a dedicated cache.

These modify header patterns are implemented as new type of ICM memory,
so the following kernel patch series add the support for this new ICM type.
==========

* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Add bits and fields to support enhanced CQE compression
  net/mlx5: Remove not used MLX5_CAP_BITS_RW_MASK
  net/mlx5: group fdb cleanup to single function
  net/mlx5: Add support EXECUTE_ASO action for flow entry
  net/mlx5: Add HW definitions of vport debug counters
  net/mlx5: Add IFC bits and enums for flow meter
  RDMA/mlx5: Support handling of modify-header pattern ICM area
  net/mlx5: Manage ICM of type modify-header pattern
  net/mlx5: Introduce header-modify-pattern ICM properties
====================

Link: https://lore.kernel.org/r/20220614184028.51548-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonetfs: fix up netfs_inode_init() docbook comment
Linus Torvalds [Tue, 14 Jun 2022 17:36:11 +0000 (10:36 -0700)]
netfs: fix up netfs_inode_init() docbook comment

Commit e81fb4198e27 ("netfs: Further cleanups after struct netfs_inode
wrapper introduced") changed the argument types and names, and actually
updated the comment too (although that was thanks to David Howells, not
me: my original patch only changed the code).

But the comment fixup didn't go quite far enough, and didn't change the
argument name in the comment, resulting in

  include/linux/netfs.h:314: warning: Function parameter or member 'ctx' not described in 'netfs_inode_init'
  include/linux/netfs.h:314: warning: Excess function parameter 'inode' description in 'netfs_inode_init'

during htmldoc generation.

Fixes: e81fb4198e27 ("netfs: Further cleanups after struct netfs_inode wrapper introduced")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agoice: Fix memory corruption in VF driver
Przemyslaw Patynowski [Thu, 2 Jun 2022 10:09:17 +0000 (12:09 +0200)]
ice: Fix memory corruption in VF driver

Disable VF's RX/TX queues, when it's disabled. VF can have queues enabled,
when it requests a reset. If PF driver assumes that VF is disabled,
while VF still has queues configured, VF may unmap DMA resources.
In such scenario device still can map packets to memory, which ends up
silently corrupting it.
Previously, VF driver could experience memory corruption, which lead to
crash:
[ 5119.170157] BUG: unable to handle kernel paging request at 00001b9780003237
[ 5119.170166] PGD 0 P4D 0
[ 5119.170173] Oops: 0002 [#1] PREEMPT_RT SMP PTI
[ 5119.170181] CPU: 30 PID: 427592 Comm: kworker/u96:2 Kdump: loaded Tainted: G        W I      --------- -  - 4.18.0-372.9.1.rt7.166.el8.x86_64 #1
[ 5119.170189] Hardware name: Dell Inc. PowerEdge R740/014X06, BIOS 2.3.10 08/15/2019
[ 5119.170193] Workqueue: iavf iavf_adminq_task [iavf]
[ 5119.170219] RIP: 0010:__page_frag_cache_drain+0x5/0x30
[ 5119.170238] Code: 0f 0f b6 77 51 85 f6 74 07 31 d2 e9 05 df ff ff e9 90 fe ff ff 48 8b 05 49 db 33 01 eb b4 0f 1f 80 00 00 00 00 0f 1f 44 00 00 <f0> 29 77 34 74 01 c3 48 8b 07 f6 c4 80 74 0f 0f b6 77 51 85 f6 74
[ 5119.170244] RSP: 0018:ffffa43b0bdcfd78 EFLAGS: 00010282
[ 5119.170250] RAX: ffffffff896b3e40 RBX: ffff8fb282524000 RCX: 0000000000000002
[ 5119.170254] RDX: 0000000049000000 RSI: 0000000000000000 RDI: 00001b9780003203
[ 5119.170259] RBP: ffff8fb248217b00 R08: 0000000000000022 R09: 0000000000000009
[ 5119.170262] R10: 2b849d6300000000 R11: 0000000000000020 R12: 0000000000000000
[ 5119.170265] R13: 0000000000001000 R14: 0000000000000009 R15: 0000000000000000
[ 5119.170269] FS:  0000000000000000(0000) GS:ffff8fb1201c0000(0000) knlGS:0000000000000000
[ 5119.170274] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5119.170279] CR2: 00001b9780003237 CR3: 00000008f3e1a003 CR4: 00000000007726e0
[ 5119.170283] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 5119.170286] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 5119.170290] PKRU: 55555554
[ 5119.170292] Call Trace:
[ 5119.170298]  iavf_clean_rx_ring+0xad/0x110 [iavf]
[ 5119.170324]  iavf_free_rx_resources+0xe/0x50 [iavf]
[ 5119.170342]  iavf_free_all_rx_resources.part.51+0x30/0x40 [iavf]
[ 5119.170358]  iavf_virtchnl_completion+0xd8a/0x15b0 [iavf]
[ 5119.170377]  ? iavf_clean_arq_element+0x210/0x280 [iavf]
[ 5119.170397]  iavf_adminq_task+0x126/0x2e0 [iavf]
[ 5119.170416]  process_one_work+0x18f/0x420
[ 5119.170429]  worker_thread+0x30/0x370
[ 5119.170437]  ? process_one_work+0x420/0x420
[ 5119.170445]  kthread+0x151/0x170
[ 5119.170452]  ? set_kthread_struct+0x40/0x40
[ 5119.170460]  ret_from_fork+0x35/0x40
[ 5119.170477] Modules linked in: iavf sctp ip6_udp_tunnel udp_tunnel mlx4_en mlx4_core nfp tls vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc intel_rapl_msr iTCO_wdt iTCO_vendor_support dell_smbios wmi_bmof dell_wmi_descriptor dcdbas kvm_intel kvm irqbypass intel_rapl_common isst_if_common skx_edac irdma nfit libnvdimm x86_pkg_temp_thermal i40e intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ib_uverbs rapl ipmi_ssif intel_cstate intel_uncore mei_me pcspkr acpi_ipmi ib_core mei lpc_ich i2c_i801 ipmi_si ipmi_devintf wmi ipmi_msghandler acpi_power_meter xfs libcrc32c sd_mod t10_pi sg mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ice ahci drm libahci crc32c_intel libata tg3 megaraid_sas
[ 5119.170613]  i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod fuse [last unloaded: iavf]
[ 5119.170627] CR2: 00001b9780003237

Fixes: ec4f5a436bdf ("ice: Check if VF is disabled for Opcode and other operations")
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Co-developed-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoice: Fix queue config fail handling
Przemyslaw Patynowski [Thu, 2 Jun 2022 10:09:04 +0000 (12:09 +0200)]
ice: Fix queue config fail handling

Disable VF's RX/TX queues, when VIRTCHNL_OP_CONFIG_VSI_QUEUES fail.
Not disabling them might lead to scenario, where PF driver leaves VF
queues enabled, when VF's VSI failed queue config.
In this scenario VF should not have RX/TX queues enabled. If PF failed
to set up VF's queues, VF will reset due to TX timeouts in VF driver.
Initialize iterator 'i' to -1, so if error happens prior to configuring
queues then error path code will not disable queue 0. Loop that
configures queues will is using same iterator, so error path code will
only disable queues that were configured.

Fixes: 77ca27c41705 ("ice: add support for virtchnl_queue_select.[tx|rx]_queues bitmap")
Suggested-by: Slawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoice: Sync VLAN filtering features for DVM
Roman Storozhenko [Tue, 7 Jun 2022 06:54:57 +0000 (08:54 +0200)]
ice: Sync VLAN filtering features for DVM

VLAN filtering features, that is C-Tag and S-Tag, in DVM mode must be
both enabled or disabled.
In case of turning off/on only one of the features, another feature must
be turned off/on automatically with issuing an appropriate message to
the kernel log.

Fixes: 1babaf77f49d ("ice: Advertise 802.1ad VLAN filtering and offloads for PF netdev")
Signed-off-by: Roman Storozhenko <roman.storozhenko@intel.com>
Co-developed-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoice: Fix PTP TX timestamp offset calculation
Michal Michalik [Tue, 10 May 2022 11:03:43 +0000 (13:03 +0200)]
ice: Fix PTP TX timestamp offset calculation

The offset was being incorrectly calculated for E822 - that led to
collisions in choosing TX timestamp register location when more than
one port was trying to use timestamping mechanism.

In E822 one quad is being logically split between ports, so quad 0 is
having trackers for ports 0-3, quad 1 ports 4-7 etc. Each port should
have separate memory location for tracking timestamps. Due to error for
example ports 1 and 2 had been assigned to quad 0 with same offset (0),
while port 1 should have offset 0 and 1 offset 16.

Fix it by correctly calculating quad offset.

Fixes: 3a7496234d17 ("ice: implement basic E822 PTP support")
Signed-off-by: Michal Michalik <michal.michalik@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Tue, 14 Jun 2022 14:57:18 +0000 (07:57 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "While last week's pull request contained miscellaneous fixes for x86,
  this one covers other architectures, selftests changes, and a bigger
  series for APIC virtualization bugs that were discovered during 5.20
  development. The idea is to base 5.20 development for KVM on top of
  this tag.

  ARM64:

   - Properly reset the SVE/SME flags on vcpu load

   - Fix a vgic-v2 regression regarding accessing the pending state of a
     HW interrupt from userspace (and make the code common with vgic-v3)

   - Fix access to the idreg range for protected guests

   - Ignore 'kvm-arm.mode=protected' when using VHE

   - Return an error from kvm_arch_init_vm() on allocation failure

   - A bunch of small cleanups (comments, annotations, indentation)

  RISC-V:

   - Typo fix in arch/riscv/kvm/vmid.c

   - Remove broken reference pattern from MAINTAINERS entry

  x86-64:

   - Fix error in page tables with MKTME enabled

   - Dirty page tracking performance test extended to running a nested
     guest

   - Disable APICv/AVIC in cases that it cannot implement correctly"

[ This merge also fixes a misplaced end parenthesis bug introduced in
  commit 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC
  ID or APIC base") pointed out by Sean Christopherson ]

Link: https://lore.kernel.org/all/20220610191813.371682-1-seanjc@google.com/
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (34 commits)
  KVM: selftests: Restrict test region to 48-bit physical addresses when using nested
  KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2
  KVM: selftests: Clean up LIBKVM files in Makefile
  KVM: selftests: Link selftests directly with lib object files
  KVM: selftests: Drop unnecessary rule for STATIC_LIBS
  KVM: selftests: Add a helper to check EPT/VPID capabilities
  KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h
  KVM: selftests: Refactor nested_map() to specify target level
  KVM: selftests: Drop stale function parameter comment for nested_map()
  KVM: selftests: Add option to create 2M and 1G EPT mappings
  KVM: selftests: Replace x86_page_size with PG_LEVEL_XX
  KVM: x86: SVM: fix nested PAUSE filtering when L0 intercepts PAUSE
  KVM: x86: SVM: drop preempt-safe wrappers for avic_vcpu_load/put
  KVM: x86: disable preemption around the call to kvm_arch_vcpu_{un|}blocking
  KVM: x86: disable preemption while updating apicv inhibition
  KVM: x86: SVM: fix avic_kick_target_vcpus_fast
  KVM: x86: SVM: remove avic's broken code that updated APIC ID
  KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base
  KVM: x86: document AVIC/APICv inhibit reasons
  KVM: x86/mmu: Set memory encryption "value", not "mask", in shadow PDPTRs
  ...

2 years agoMerge tag 'x86-bugs-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Tue, 14 Jun 2022 14:43:15 +0000 (07:43 -0700)]
Merge tag 'x86-bugs-2022-06-01' of git://git./linux/kernel/git/tip/tip

Pull x86 MMIO stale data fixes from Thomas Gleixner:
 "Yet another hw vulnerability with a software mitigation: Processor
  MMIO Stale Data.

  They are a class of MMIO-related weaknesses which can expose stale
  data by propagating it into core fill buffers. Data which can then be
  leaked using the usual speculative execution methods.

  Mitigations include this set along with microcode updates and are
  similar to MDS and TAA vulnerabilities: VERW now clears those buffers
  too"

* tag 'x86-bugs-2022-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/speculation/mmio: Print SMT warning
  KVM: x86/speculation: Disable Fill buffer clear within guests
  x86/speculation/mmio: Reuse SRBDS mitigation for SBDS
  x86/speculation/srbds: Update SRBDS mitigation selection
  x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data
  x86/speculation/mmio: Enable CPU Fill buffer clearing on idle
  x86/bugs: Group MDS, TAA & Processor MMIO Stale Data mitigations
  x86/speculation/mmio: Add mitigation for Processor MMIO Stale Data
  x86/speculation: Add a common function for MD_CLEAR mitigation update
  x86/speculation/mmio: Enumerate Processor MMIO Stale Data bug
  Documentation: Add documentation for Processor MMIO Stale Data

2 years agomlxsw: spectrum_cnt: Reorder counter pools
Petr Machata [Mon, 13 Jun 2022 12:50:17 +0000 (15:50 +0300)]
mlxsw: spectrum_cnt: Reorder counter pools

Both RIF and ACL flow counters use a 24-bit SW-managed counter address to
communicate which counter they want to bind.

In a number of Spectrum FW releases, binding a RIF counter is broken and
slices the counter index to 16 bits. As a result, on Spectrum-2 and above,
no more than about 410 RIF counters can be effectively used. This
translates to 205 netdevices for which L3 HW stats can be enabled. (This
does not happen on Spectrum-1, because there are fewer counters available
overall and the counter index never exceeds 16 bits.)

Binding counters to ACLs does not have this issue. Therefore reorder the
counter allocation scheme so that RIF counters come first and therefore get
lower indices that are below the 16-bit barrier.

Fixes: 98e60dce4da1 ("Merge branch 'mlxsw-Introduce-initial-Spectrum-2-support'")
Reported-by: Maksym Yaremchuk <maksymy@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20220613125017.2018162-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agofs: account for group membership
Christian Brauner [Mon, 13 Jun 2022 11:15:17 +0000 (13:15 +0200)]
fs: account for group membership

When calling setattr_prepare() to determine the validity of the
attributes the ia_{g,u}id fields contain the value that will be written
to inode->i_{g,u}id. This is exactly the same for idmapped and
non-idmapped mounts and allows callers to pass in the values they want
to see written to inode->i_{g,u}id.

When group ownership is changed a caller whose fsuid owns the inode can
change the group of the inode to any group they are a member of. When
searching through the caller's groups we need to use the gid mapped
according to the idmapped mount otherwise we will fail to change
ownership for unprivileged users.

Consider a caller running with fsuid and fsgid 1000 using an idmapped
mount that maps id 65534 to 1000 and 65535 to 1001. Consequently, a file
owned by 65534:65535 in the filesystem will be owned by 1000:1001 in the
idmapped mount.

The caller now requests the gid of the file to be changed to 1000 going
through the idmapped mount. In the vfs we will immediately map the
requested gid to the value that will need to be written to inode->i_gid
and place it in attr->ia_gid. Since this idmapped mount maps 65534 to
1000 we place 65534 in attr->ia_gid.

When we check whether the caller is allowed to change group ownership we
first validate that their fsuid matches the inode's uid. The
inode->i_uid is 65534 which is mapped to uid 1000 in the idmapped mount.
Since the caller's fsuid is 1000 we pass the check.

We now check whether the caller is allowed to change inode->i_gid to the
requested gid by calling in_group_p(). This will compare the passed in
gid to the caller's fsgid and search the caller's additional groups.

Since we're dealing with an idmapped mount we need to pass in the gid
mapped according to the idmapped mount. This is akin to checking whether
a caller is privileged over the future group the inode is owned by. And
that needs to take the idmapped mount into account. Note, all helpers
are nops without idmapped mounts.

New regression test sent to xfstests.

Link: https://github.com/lxc/lxd/issues/10537
Link: https://lore.kernel.org/r/20220613111517.2186646-1-brauner@kernel.org
Fixes: 2f221d6f7b88 ("attr: handle idmapped mounts")
Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org # 5.15+
CC: linux-fsdevel@vger.kernel.org
Reviewed-by: Seth Forshee <sforshee@digitalocean.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2 years agodocs: tls: document the TLS_TX_ZEROCOPY_RO
Jakub Kicinski [Fri, 10 Jun 2022 18:02:12 +0000 (11:02 -0700)]
docs: tls: document the TLS_TX_ZEROCOPY_RO

Add missing documentation for the TLS_TX_ZEROCOPY_RO opt-in.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Link: https://lore.kernel.org/r/20220610180212.110590-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 years agodocs: networking: phy: Fix a typo
Jonathan Neuschäfer [Fri, 10 Jun 2022 07:28:08 +0000 (09:28 +0200)]
docs: networking: phy: Fix a typo

Write "to be operated" instead of "to be operate".

Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220610072809.352962-1-j.neuschaefer@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoamd-xgbe: Use platform_irq_count()
Jean-Philippe Brucker [Thu, 9 Jun 2022 16:14:59 +0000 (17:14 +0100)]
amd-xgbe: Use platform_irq_count()

The AMD XGbE driver currently counts the number of interrupts assigned
to the device by inspecting the pdev->resource array. Since commit
a1a2b7125e10 ("of/platform: Drop static setup of IRQ resource from DT
core") removed IRQs from this array, the driver now attempts to get all
interrupts from 1 to -1U and gives up probing once it reaches an invalid
interrupt index.

Obtain the number of IRQs with platform_irq_count() instead.

Fixes: a1a2b7125e10 ("of/platform: Drop static setup of IRQ resource from DT core")
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20220609161457.69614-1-jean-philippe@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agoethtool: Fix and simplify ethtool_convert_link_mode_to_legacy_u32()
Marco Bonelli [Thu, 9 Jun 2022 13:49:01 +0000 (15:49 +0200)]
ethtool: Fix and simplify ethtool_convert_link_mode_to_legacy_u32()

Fix the implementation of ethtool_convert_link_mode_to_legacy_u32(), which
is supposed to return false if src has bits higher than 31 set. The current
implementation uses the complement of bitmap_fill(ext, 32) to test high
bits of src, which is wrong as bitmap_fill() fills _with long granularity_,
and sizeof(long) can be > 4. No users of this function currently check the
return value, so the bug was dormant.

Also remove the check for __ETHTOOL_LINK_MODE_MASK_NBITS > 32, as the enum
ethtool_link_mode_bit_indices contains far beyond 32 values. Using
find_next_bit() to test the src bitmask works regardless of this anyway.

Signed-off-by: Marco Bonelli <marco@mebeim.net>
Link: https://lore.kernel.org/r/20220609134900.11201-1-marco@mebeim.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet: phy: fixed_phy: set phy_mask before calling mdiobus_register()
Rasmus Villemoes [Mon, 6 Jun 2022 20:02:08 +0000 (22:02 +0200)]
net: phy: fixed_phy: set phy_mask before calling mdiobus_register()

There's no point probing for phys on this artificial bus, so we can
save a little bit of boot time by telling mdiobus_register() not to do
that.

This doesn't have any functional change, since, at this point,
fixed_mdio_read() returns 0xffff for all addresses/registers, so

  mdiobus_scan() -> get_phy_device() -> get_phy_c22_id()

will return -ENODEV, which is just ignored.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Link: https://lore.kernel.org/r/20220606200208.1665417-1-linux@rasmusvillemoes.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 years agonet/mlx5: Add bits and fields to support enhanced CQE compression
Ofer Levi [Wed, 8 Jun 2022 20:04:52 +0000 (13:04 -0700)]
net/mlx5: Add bits and fields to support enhanced CQE compression

Expose ifc bits and add needed structure fields and methods to
support enhanced CQE compression feature.
The enhanced CQE compression feature improves cpu utiliziation with
better packet latency from nic to host.

Signed-off-by: Ofer Levi <oferle@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: Remove not used MLX5_CAP_BITS_RW_MASK
Shay Drory [Wed, 8 Jun 2022 20:04:51 +0000 (13:04 -0700)]
net/mlx5: Remove not used MLX5_CAP_BITS_RW_MASK

Remove not used MLX5_CAP_BITS_RW_MASK.
While at it, remove CAP_MASK, MLX5_CAP_OFF_CMDIF_CSUM
and MLX5_DEV_CAP_FLAG_*, since MLX5_CAP_BITS_RW_MASK
was their only user.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: group fdb cleanup to single function
Shay Drory [Wed, 8 Jun 2022 20:04:50 +0000 (13:04 -0700)]
net/mlx5: group fdb cleanup to single function

Currently, the allocation of fdb software objects are done is single
function, oppose to the cleanup of them.
Group the cleanup of fdb software objects to single function.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: Add support EXECUTE_ASO action for flow entry
Jianbo Liu [Wed, 8 Jun 2022 20:04:49 +0000 (13:04 -0700)]
net/mlx5: Add support EXECUTE_ASO action for flow entry

Attach flow meter to FTE with object id and index.
Use metadata register C5 to store the packet color meter result.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: Add HW definitions of vport debug counters
Saeed Mahameed [Wed, 8 Jun 2022 20:04:48 +0000 (13:04 -0700)]
net/mlx5: Add HW definitions of vport debug counters

total_q_under_processor_handle - number of queues in error state due to an
async error or errored command.

send_queue_priority_update_flow - number of QP/SQ priority/SL update
events.

cq_overrun - number of times CQ entered an error state due to an
overflow.

async_eq_overrun -number of time an EQ mapped to async events was
overrun.

comp_eq_overrun - number of time an EQ mapped to completion events was
overrun.

quota_exceeded_command - number of commands issued and failed due to quota
exceeded.

invalid_command - number of commands issued and failed dues to any reason
other than quota exceeded.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: Add IFC bits and enums for flow meter
Jianbo Liu [Wed, 8 Jun 2022 20:04:47 +0000 (13:04 -0700)]
net/mlx5: Add IFC bits and enums for flow meter

Add/extend structure layouts and defines for flow meter.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Ariel Levkovich <lariel@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agoRDMA/mlx5: Support handling of modify-header pattern ICM area
Yevgeny Kliteynik [Tue, 7 Jun 2022 12:47:45 +0000 (15:47 +0300)]
RDMA/mlx5: Support handling of modify-header pattern ICM area

Add support for allocate/deallocate and registering MR of the new type
of ICM area. Support exists only for devices that support sw_owner_v2.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: Manage ICM of type modify-header pattern
Yevgeny Kliteynik [Tue, 7 Jun 2022 12:47:44 +0000 (15:47 +0300)]
net/mlx5: Manage ICM of type modify-header pattern

Added support for managing new type of ICM for devices that
support sw_owner_v2.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agonet/mlx5: Introduce header-modify-pattern ICM properties
Yevgeny Kliteynik [Tue, 7 Jun 2022 12:47:43 +0000 (15:47 +0300)]
net/mlx5: Introduce header-modify-pattern ICM properties

Added new fields for device memory capabilities, in order to
support creation of ICM memory for modify header patterns.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2 years agousercopy: Make usercopy resilient against ridiculously large copies
Matthew Wilcox (Oracle) [Sun, 12 Jun 2022 21:32:27 +0000 (22:32 +0100)]
usercopy: Make usercopy resilient against ridiculously large copies

If 'n' is so large that it's negative, we might wrap around and mistakenly
think that the copy is OK when it's not.  Such a copy would probably
crash, but just doing the arithmetic in a more simple way lets us detect
and refuse this case.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Tested-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220612213227.3881769-4-willy@infradead.org
2 years agousercopy: Cast pointer to an integer once
Matthew Wilcox (Oracle) [Sun, 12 Jun 2022 21:32:26 +0000 (22:32 +0100)]
usercopy: Cast pointer to an integer once

Get rid of a lot of annoying casts by setting 'addr' once at the top
of the function.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Tested-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220612213227.3881769-3-willy@infradead.org
2 years agousercopy: Handle vm_map_ram() areas
Matthew Wilcox (Oracle) [Sun, 12 Jun 2022 21:32:25 +0000 (22:32 +0100)]
usercopy: Handle vm_map_ram() areas

vmalloc does not allocate a vm_struct for vm_map_ram() areas.  That causes
us to deny usercopies from those areas.  This affects XFS which uses
vm_map_ram() for its directories.

Fix this by calling find_vmap_area() instead of find_vm_area().

Fixes: 0aef499f3172 ("mm/usercopy: Detect vmalloc overruns")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Tested-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220612213227.3881769-2-willy@infradead.org
2 years agocfi: Fix __cfi_slowpath_diag RCU usage with cpuidle
Sami Tolvanen [Tue, 31 May 2022 17:59:10 +0000 (10:59 -0700)]
cfi: Fix __cfi_slowpath_diag RCU usage with cpuidle

RCU_NONIDLE usage during __cfi_slowpath_diag can result in an invalid
RCU state in the cpuidle code path:

  WARNING: CPU: 1 PID: 0 at kernel/rcu/tree.c:613 rcu_eqs_enter+0xe4/0x138
  ...
  Call trace:
    rcu_eqs_enter+0xe4/0x138
    rcu_idle_enter+0xa8/0x100
    cpuidle_enter_state+0x154/0x3a8
    cpuidle_enter+0x3c/0x58
    do_idle.llvm.6590768638138871020+0x1f4/0x2ec
    cpu_startup_entry+0x28/0x2c
    secondary_start_kernel+0x1b8/0x220
    __secondary_switched+0x94/0x98

Instead, call rcu_irq_enter/exit to wake up RCU only when needed and
disable interrupts for the entire CFI shadow/module check when we do.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20220531175910.890307-1-samitolvanen@google.com
Fixes: cf68fffb66d6 ("add support for Clang CFI")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
2 years agonet: make __sys_accept4_file() static
Yajun Deng [Fri, 10 Jun 2022 09:10:17 +0000 (17:10 +0800)]
net: make __sys_accept4_file() static

__sys_accept4_file() isn't used outside of the file, make it static.

As the same time, move file_flags and nofile parameters into
__sys_accept4_file().

Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoocteontx2-vf: Add support for adaptive interrupt coalescing
Suman Ghosh [Sun, 12 Jun 2022 17:45:36 +0000 (23:15 +0530)]
octeontx2-vf: Add support for adaptive interrupt coalescing

Fixes: 6e144b47f560 (octeontx2-pf: Add support for adaptive interrupt coalescing)
Added support for VF interfaces as well.

Signed-off-by: Suman Ghosh <sumang@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agotcp: sk_forced_mem_schedule() optimization
Eric Dumazet [Sat, 11 Jun 2022 03:30:16 +0000 (20:30 -0700)]
tcp: sk_forced_mem_schedule() optimization

sk_memory_allocated_add() has three callers, and returns
to them @memory_allocated.

sk_forced_mem_schedule() is one of them, and ignores
the returned value.

Change sk_memory_allocated_add() to return void.

Change sock_reserve_memory() and __sk_mem_raise_allocated()
to call sk_memory_allocated().

This removes one cache line miss [1] for RPC workloads,
as first skbs in TCP write queue and receive queue go through
sk_forced_mem_schedule().

[1] Cache line holding tcp_memory_allocated.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: smsc95xx: add support for Microchip EVB-LAN8670-USB
Parthiban Veerasooran [Mon, 13 Jun 2022 09:12:07 +0000 (14:42 +0530)]
net: smsc95xx: add support for Microchip EVB-LAN8670-USB

This patch adds support for Microchip's EVB-LAN8670-USB 10BASE-T1S
ethernet device to the existing smsc95xx driver by adding the new
USB VID/PID pairs.

Signed-off-by: Parthiban Veerasooran <Parthiban.Veerasooran@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonfp: support 48-bit DMA addressing for NFP3800
Yinjun Zhang [Mon, 13 Jun 2022 09:58:31 +0000 (11:58 +0200)]
nfp: support 48-bit DMA addressing for NFP3800

48-bit DMA addressing is supported in NFP3800 HW and implemented
in NFDK firmware, so enable this feature in driver now. Note that
with this change, NFD3 firmware, which doesn't implement 48-bit
DMA, cannot be used for NFP3800 any more.

RX free list descriptor, used by both NFD3 and NFDK, is also modified
to support 48-bit DMA. That's OK because the top bits is always get
set to 0 when assigned with 40-bit address.

Based on initial work of Jakub Kicinski <jakub.kicinski@netronome.com>.

Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoxilinx: Fix build on x86.
David S. Miller [Mon, 13 Jun 2022 11:49:21 +0000 (12:49 +0100)]
xilinx:  Fix build on x86.

CONFIG_64BIT is not sufficient for checking for availability of
iowrite64() and friends.

Also, the out_addr helpers need to be inline.

Fixes: b690f8df6497 ("net: axienet: Use iowrite64 to write all 64b descriptor pointers")
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'axienet-fixes'
David S. Miller [Mon, 13 Jun 2022 11:36:56 +0000 (12:36 +0100)]
Merge branch 'axienet-fixes'

Andy Chiu says:

====================
net: axienet: fix DMA Tx error

We ran into multiple DMA TX errors while writing files over a network
block device running on top of a DMA-connected AXI Ethernet device on
64-bit RISC-V machines. The errors indicated that the DMA had fetched a
null descriptor and we found that the reason for this is that AXI DMA had
unexpectedly processed a partially updated tail descriptor pointer. To
fix it, we suggest that the driver should use one 64-bit write instead
of two 32-bit writes to perform such update if possible. For those
archectures where double-word load/stores are unavailable, e.g. 32-bit
archectures, force a driver probe failure if the driver finds 64-bit
capability on DMA.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: axienet: Use iowrite64 to write all 64b descriptor pointers
Andy Chiu [Mon, 13 Jun 2022 03:42:02 +0000 (11:42 +0800)]
net: axienet: Use iowrite64 to write all 64b descriptor pointers

According to commit f735c40ed93c ("net: axienet: Autodetect 64-bit DMA
capability") and AXI-DMA spec (pg021), on 64-bit capable dma, only
writing MSB part of tail descriptor pointer causes DMA engine to start
fetching descriptors. However, we found that it is true only if dma is in
idle state. In other words, dma would use a tailp even if it only has LSB
updated, when the dma is running.

The non-atomicity of this behavior could be problematic if enough
delay were introduced in between the 2 writes. For example, if an
interrupt comes right after the LSB write and the cpu spends long
enough time in the handler for the dma to get back into idle state by
completing descriptors, then the seconcd write to MSB would treat dma
to start fetching descriptors again. Since the descriptor next to the
one pointed by current tail pointer is not filled by the kernel yet,
fetching a null descriptor here causes a dma internal error and halt
the dma engine down.

We suggest that the dma engine should start process a 64-bit MMIO write
to the descriptor pointer only if ONE 32-bit part of it is written on all
states. Or we should restrict the use of 64-bit addressable dma on 32-bit
platforms, since those devices have no instruction to guarantee the write
to LSB and MSB part of tail pointer occurs atomically to the dma.

initial condition:
curp =  x-3;
tailp = x-2;
LSB = x;
MSB = 0;

cpu:                       |dma:
 iowrite32(LSB, tailp)     |  completes #(x-3) desc, curp = x-3
 ...                       |  tailp updated
 => irq                    |  completes #(x-2) desc, curp = x-2
    ...                    |  completes #(x-1) desc, curp = x-1
    ...                    |  ...
    ...                    |  completes #x desc, curp = tailp = x
 <= irqreturn              |  reaches tailp == curp = x, idle
 iowrite32(MSB, tailp + 4) |  ...
                           |  tailp updated, starts fetching...
                           |  fetches #(x + 1) desc, sees cntrl = 0
                           |  post Tx error, halt

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reported-by: Max Hsu <max.hsu@sifive.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: axienet: make the 64b addresable DMA depends on 64b archectures
Andy Chiu [Mon, 13 Jun 2022 03:42:01 +0000 (11:42 +0800)]
net: axienet: make the 64b addresable DMA depends on 64b archectures

Currently it is not safe to config the IP as 64-bit addressable on 32-bit
archectures, which cannot perform a double-word store on its descriptor
pointers. The pointer is 64-bit wide if the IP is configured as 64-bit,
and the device would process the partially updated pointer on some
states if the pointer was updated via two store-words. To prevent such
condition, we force a probe fail if we discover that the IP has 64-bit
capability but it is not running on a 64-Bit kernel.

This is a series of patch (1/2). The next patch must be applied in order
to make 64b DMA safe on 64b archectures.

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reported-by: Max Hsu <max.hsu@sifive.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'ipa-refactoring'
David S. Miller [Mon, 13 Jun 2022 11:01:58 +0000 (12:01 +0100)]
Merge branch 'ipa-refactoring'

Alex Elder says:

====================
net: ipa: simple refactoring

This series contains some minor code improvements.

The first patch verifies that the configuration is compatible with a
recently-defined limit.  The second and third rename two fields so
they better reflect their use in the code.  The next gets rid of an
empty function by reworking its only caller.

The last two begin to remove the assumption that an event ring is
associated with a single channel.  Eventually we'll support having
multiple channels share an event ring but some more needs to be done
before that can happen.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipa: derive channel from transaction
Alex Elder [Fri, 10 Jun 2022 15:46:15 +0000 (10:46 -0500)]
net: ipa: derive channel from transaction

In gsi_channel_tx_queued(), we report when a transaction gets passed
to hardware.  Change that function so it takes transaction rather
than a channel as its argument, and derive the channel from the
transaction.  Rename the function accordingly.

Delete the header comments above the function definition; the ones
above the declaration in "gsi_private.h" should suffice.  In
addition, the comments above gsi_channel_tx_update() do a fine job
of explaining what's going on.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipa: determine channel from event
Alex Elder [Fri, 10 Jun 2022 15:46:14 +0000 (10:46 -0500)]
net: ipa: determine channel from event

Each event in an event ring describes the TRE whose completion
caused the event.  Currently, every event ring is dedicated to a
single channel, so the channel is easily derived from the event
ring.

An event ring can actually be shared by more than one channel
though, and to distinguish events for one channel from another, the
event structure contains a field indicating which channel the event
is associated with.

In gsi_event_trans(), use the channel ID in an event to determine
which channel the event is for.  This makes the channel pointer now
passed to that function irrelevant; pass the GSI pointer to that
function instead.

And although it shouldn't happen, warn if an event arrives that
records a channel ID that's not in use, or if the event does not
have a transaction associated with it.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipa: simplify endpoint transaction completion
Alex Elder [Fri, 10 Jun 2022 15:46:13 +0000 (10:46 -0500)]
net: ipa: simplify endpoint transaction completion

When a GSI transaction completes, ipa_endpoint_trans_complete() is
eventually called.  That handles TX and RX completions separately,
but ipa_endpoint_tx_complete() is a no-op.

Instead, have ipa_endpoint_trans_complete() return immediately for a
TX transaction, and incorporate code from ipa_endpoint_rx_complete()
to handle RX transactions.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipa: rename endpoint->trans_tre_max
Alex Elder [Fri, 10 Jun 2022 15:46:12 +0000 (10:46 -0500)]
net: ipa: rename endpoint->trans_tre_max

The trans_tre_max field of the IPA endpoint structure is only used
to limit the number of fragments allowed for an SKB being prepared
for transmission.  Recognizing that, rename the field skb_frag_max,
and reduce its value by 1.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipa: rename channel->tlv_count
Alex Elder [Fri, 10 Jun 2022 15:46:11 +0000 (10:46 -0500)]
net: ipa: rename channel->tlv_count

Each GSI channel has a TLV FIFO of a certain size, specified in the
configuration data for an AP channel.  That size dictates the
maximum number of TREs that are allowed in a single transaction.

The only way that value is used after initialization is as a limit
on the number of TREs in a transaction; calling it "tlv_count"
isn't helpful, and in fact gsi_channel_trans_tre_max() exists to
sort of abstract it.

Instead, rename the channel->tlv_count field trans_tre_max, and get
rid of the helper function.  Update a couple of comments as well.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: ipa: verify command channel TLV count
Alex Elder [Fri, 10 Jun 2022 15:46:10 +0000 (10:46 -0500)]
net: ipa: verify command channel TLV count

In commit 8797972afff3d ("net: ipa: remove command info pool"), the
maximum number of IPA commands that would be sent in a single
transaction was defined.  That number can't exceed the size of the
TLV FIFO on the command channel, and we can check that at runtime.

To add this check, pass a new flag to gsi_channel_data_valid() to
indicate the channel being checked is being used for IPA commands.
Knowing that we can also verify the channel direction is correct.

Use a new local variable that refers to the command-specific portion
of the data being checked.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agoMerge branch 'hns3-fixres'
David S. Miller [Mon, 13 Jun 2022 10:56:01 +0000 (11:56 +0100)]
Merge branch 'hns3-fixres'

Guangbin Huang says:

====================
net: hns3: add some fixes for -net

This series adds some fixes for the HNS3 ethernet driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: fix tm port shapping of fibre port is incorrect after driver initialization
Guangbin Huang [Sat, 11 Jun 2022 12:25:29 +0000 (20:25 +0800)]
net: hns3: fix tm port shapping of fibre port is incorrect after driver initialization

Currently in driver initialization process, driver will set shapping
parameters of tm port to default speed read from firmware. However, the
speed of SFP module may not be default speed, so shapping parameters of
tm port may be incorrect.

To fix this problem, driver sets new shapping parameters for tm port
after getting exact speed of SFP module in this case.

Fixes: 88d10bd6f730 ("net: hns3: add support for multiple media type")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: fix PF rss size initialization bug
Jie Wang [Sat, 11 Jun 2022 12:25:28 +0000 (20:25 +0800)]
net: hns3: fix PF rss size initialization bug

Currently hns3 driver misuses the VF rss size to initialize the PF rss size
in hclge_tm_vport_tc_info_update. So this patch fix it by checking the
vport id before initialization.

Fixes: 7347255ea389 ("net: hns3: refactor PF rss get APIs with new common rss get APIs")
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: restore tm priority/qset to default settings when tc disabled
Guangbin Huang [Sat, 11 Jun 2022 12:25:27 +0000 (20:25 +0800)]
net: hns3: restore tm priority/qset to default settings when tc disabled

Currently, settings parameters of schedule mode, dwrr, shaper of tm
priority or qset of one tc are only be set when tc is enabled, they are
not restored to the default settings when tc is disabled. It confuses
users when they cat tm_priority or tm_qset files of debugfs. So this
patch fixes it.

Fixes: 848440544b41 ("net: hns3: Add support of TX Scheduler & Shaper to HNS3 driver")
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: modify the ring param print info
Jie Wang [Sat, 11 Jun 2022 12:25:26 +0000 (20:25 +0800)]
net: hns3: modify the ring param print info

Currently tx push is also a ring param. So the original ring param print
info in hns3_is_ringparam_changed should be adjusted.

Fixes: 07fdc163ac88 ("net: hns3: refactor hns3_set_ringparam()")
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2 years agonet: hns3: don't push link state to VF if unalive
Jian Shen [Sat, 11 Jun 2022 12:25:25 +0000 (20:25 +0800)]
net: hns3: don't push link state to VF if unalive

It's unnecessary to push link state to unalive VF, and the VF will
query link state from PF when it being start works.

Fixes: 18b6e31f8bf4 ("net: hns3: PF add support for pushing link status to VFs")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>