linux-2.6-block.git
5 years agonet: ipv4: Set skb->dev for output route resolution
Ido Schimmel [Thu, 20 Dec 2018 17:03:27 +0000 (17:03 +0000)]
net: ipv4: Set skb->dev for output route resolution

When user requests to resolve an output route, the kernel synthesizes
an skb where the relevant parameters (e.g., source address) are set. The
skb is then passed to ip_route_output_key_hash_rcu() which might call
into the flow dissector in case a multipath route was hit and a nexthop
needs to be selected based on the multipath hash.

Since both 'skb->dev' and 'skb->sk' are not set, a warning is triggered
in the flow dissector [1]. The warning is there to prevent codepaths
from silently falling back to the standard flow dissector instead of the
BPF one.

Therefore, instead of removing the warning, set 'skb->dev' to the
loopback device, as its not used for anything but resolving the correct
namespace.

[1]
WARNING: CPU: 1 PID: 24819 at net/core/flow_dissector.c:764 __skb_flow_dissect+0x314/0x16b0
...
RSP: 0018:ffffa0df41fdf650 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8bcded232000 RCX: 0000000000000000
RDX: ffffa0df41fdf7e0 RSI: ffffffff98e415a0 RDI: ffff8bcded232000
RBP: ffffa0df41fdf760 R08: 0000000000000000 R09: 0000000000000000
R10: ffffa0df41fdf7e8 R11: ffff8bcdf27a3000 R12: ffffffff98e415a0
R13: ffffa0df41fdf7e0 R14: ffffffff98dd2980 R15: ffffa0df41fdf7e0
FS:  00007f46f6897680(0000) GS:ffff8bcdf7a80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055933e95f9a0 CR3: 000000021e636000 CR4: 00000000001006e0
Call Trace:
 fib_multipath_hash+0x28c/0x2d0
 ? fib_multipath_hash+0x28c/0x2d0
 fib_select_path+0x241/0x32f
 ? __fib_lookup+0x6a/0xb0
 ip_route_output_key_hash_rcu+0x650/0xa30
 ? __alloc_skb+0x9b/0x1d0
 inet_rtm_getroute+0x3f7/0xb80
 ? __alloc_pages_nodemask+0x11c/0x2c0
 rtnetlink_rcv_msg+0x1d9/0x2f0
 ? rtnl_calcit.isra.24+0x120/0x120
 netlink_rcv_skb+0x54/0x130
 rtnetlink_rcv+0x15/0x20
 netlink_unicast+0x20a/0x2c0
 netlink_sendmsg+0x2d1/0x3d0
 sock_sendmsg+0x39/0x50
 ___sys_sendmsg+0x2a0/0x2f0
 ? filemap_map_pages+0x16b/0x360
 ? __handle_mm_fault+0x108e/0x13d0
 __sys_sendmsg+0x63/0xa0
 ? __sys_sendmsg+0x63/0xa0
 __x64_sys_sendmsg+0x1f/0x30
 do_syscall_64+0x5a/0x120
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: d0e13a1488ad ("flow_dissector: lookup netns by skb->sk if skb->dev is NULL")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: mscc: ocelot: Register poll timeout should be wall time not attempts
Steen Hegelund [Thu, 20 Dec 2018 13:16:31 +0000 (14:16 +0100)]
net: mscc: ocelot: Register poll timeout should be wall time not attempts

When doing indirect access in the Ocelot chip, a command is setup,
issued and then we need to poll until the result is ready. The polling
timeout is specified in milliseconds in the datasheet and not in
register access attempts.
It is not a bug on the currently supported platform, but we observed
that the code does not work properly on other platforms that we want to
support as the timing requirements there are different.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoneighbour: remove stray semicolon
Colin Ian King [Thu, 20 Dec 2018 16:50:50 +0000 (16:50 +0000)]
neighbour: remove stray semicolon

Currently the stray semicolon means that the final term in the addition
is being missed.  Fix this by removing it. Cleans up clang warning:

net/core/neighbour.c:2821:9: warning: expression result unused [-Wunused-value]

Fixes: 82cbb5c631a0 ("neighbour: register rtnl doit handler")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-By: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: microchip: fix unicast frame leak
Tristram Ha [Thu, 20 Dec 2018 02:59:31 +0000 (18:59 -0800)]
net: dsa: microchip: fix unicast frame leak

Port partitioning is done by enabling UNICAST_VLAN_BOUNDARY and changing
the default port membership of 0x7f to other values such that there is
no communication between ports.  In KSZ9477 the member for port 1 is
0x41; port 2, 0x42; port 3, 0x44; port 4, 0x48; port 5, 0x50; and port 7,
0x60.  Port 6 is the host port.

Setting a zero value can be used to stop port from receiving.

However, when UNICAST_VLAN_BOUNDARY is disabled and the unicast addresses
are already learned in the dynamic MAC table, setting zero still allows
devices connected to those ports to communicate.  This does not apply to
multicast and broadcast addresses though.  To prevent these leaks and
make the function of port membership consistent UNICAST_VLAN_BOUNDARY
should never be disabled.

Note that UNICAST_VLAN_BOUNDARY is enabled by default in KSZ9477.

Fixes: b987e98e50ab90e5 ("dsa: add DSA switch driver for Microchip KSZ9477")
Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agovxlan: Correct merge error.
David S. Miller [Fri, 21 Dec 2018 00:14:22 +0000 (16:14 -0800)]
vxlan: Correct merge error.

When resolving the conflict wrt. the vxlan_fdb_update call
in vxlan_changelink() I made the last argument false instead
of true.

Fix this.

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'mlx5-updates-2018-12-19' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Thu, 20 Dec 2018 23:51:55 +0000 (15:51 -0800)]
Merge tag 'mlx5-updates-2018-12-19' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2018-12-19

This series adds some misc updates and the support for tunnels over VLAN
tc offloads.

From Miroslav Lichvar, patches #1,2
1) Update timecounter at least twice per counter overflow
2) Extend PTP gettime function to read system clock

From Gavi Teitz, patch #3
3) Increase VF representors' SQ size to 128

From Eli Britstein and Or Gerlitz, patches #4-10
4) Adds the capability to support tunnels over VLAN device.

Patch 4 avoids crash for TC flow with egress upper devices

Patch 5 refactors tunnel routing devs into a helper function

Patch 6 avoids crash for TC encap flows with vlan on underlay

Patches 7-8 refactor encap tunnel header preparing code.

Patch 9 adds support for building VLAN tagged ETH header.

Patch 10 adds support for tunnel routing to VLAN device.

From Aviv, patches 11,12 to fix earlier VF lag series
5) Fix query_nic_sys_image_guid() error during init
6) Fix LAG requirement when CONFIG_MLX5_ESWITCH is off
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'mlxsw-Two-usability-improvements'
David S. Miller [Thu, 20 Dec 2018 23:48:55 +0000 (15:48 -0800)]
Merge branch 'mlxsw-Two-usability-improvements'

Ido Schimmel says:

====================
mlxsw: Two usability improvements

This patchset contains two small improvements in the mlxsw driver. The
first one, in patches #1-#2, relieves the user from the need to
configure a VLAN interface and only later the corresponding VXLAN
tunnel. The issue is explained in detail in the first patch.

The second improvement is described below and allows the user to make
use of VID 1 by having the driver use the reserved 4095 VID for untagged
traffic.

VLAN entries on a given port can be associated with either a bridge or a
router. For example, if swp1.10 is assigned an IP address and swp1.20 is
enslaved to a VLAN-unaware bridge, then both {Port 1, VID 10} and {Port
1, VID 20} would be associated with a filtering identifier (FID) of the
correct type.

In case swp1 itself is assigned an IP address or enslaved to a
VLAN-unaware bridge, then a FID would be associated with {Port 1, VID
1}. Using VID 1 for this purpose means that VLAN devices with VID 1
cannot be created over mlxsw ports, as this VID is (ab)used as the
default VLAN.

Instead of using VID 1 for this purpose, we can use VID 4095 which is
reserved for internal use and cannot be configured by either the 8021q
or the bridge driver.

Patches #3-#7 perform small and non-functional changes that finally
allow us to switch to VID 4095 as the default VID in patch #8.

Patch #9 removes the limitation about creation of VLAN devices with VID
1 over mlxsw ports.

Patches #10-#11 add test cases.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: forwarding: Add router test with VID 1
Ido Schimmel [Thu, 20 Dec 2018 19:42:37 +0000 (19:42 +0000)]
selftests: forwarding: Add router test with VID 1

Previous patches made it possible to setup VLAN devices with VID 1 over
mlxsw ports. Verify this functionality actually works by conducting a
simple router test over VID 1.

Adding this test as a generic test since it can be run using veth pairs
and it can also be useful for other physical devices where VID 1 was
considered reserved (knowingly or not).

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: mlxsw: Adjust test regarding VID 1
Ido Schimmel [Thu, 20 Dec 2018 19:42:35 +0000 (19:42 +0000)]
selftests: mlxsw: Adjust test regarding VID 1

Previous patches made it possible to create VLAN devices with VID 1 over
mlxsw ports. Adjust the test to verify such an operation succeeds.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum: Remove limitation regarding VID 1
Ido Schimmel [Thu, 20 Dec 2018 19:42:34 +0000 (19:42 +0000)]
mlxsw: spectrum: Remove limitation regarding VID 1

VID 1 is not reserved anymore, so remove the check that prevented the
creation of VLAN devices with this VID over mlxsw ports.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum: Switch to VID 4095 as default VID
Ido Schimmel [Thu, 20 Dec 2018 19:42:33 +0000 (19:42 +0000)]
mlxsw: spectrum: Switch to VID 4095 as default VID

There is no need to abuse VID 1 anymore and we can instead use VID 4095
as the default VLAN, which will be configured on the port throughout its
lifetime.

The OVS join / leave functions are changed to enable VIDs 1-4094
(inclusive) instead of 2-4095. This because VID 4095 is now the default
VLAN instead of 1.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum: Add an helper function to cleanup VLAN entries
Ido Schimmel [Thu, 20 Dec 2018 19:42:32 +0000 (19:42 +0000)]
mlxsw: spectrum: Add an helper function to cleanup VLAN entries

VLAN entries on a port can be associated with either a bridge VLAN or a
router port. Before the VLAN entry is destroyed these associations need
to be cleaned up.

Currently, this is always invoked from the function which destroys the
VLAN entry, but next patch is going to skip the destruction of the
default entry when a port in unlinked from a LAG.

The above does not mean that the associations should not be cleaned up,
so add a helper that will be invoked from both call sites.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum: Store pointer to default port VLAN in port struct
Ido Schimmel [Thu, 20 Dec 2018 19:42:30 +0000 (19:42 +0000)]
mlxsw: spectrum: Store pointer to default port VLAN in port struct

Subsequent patches will need to access the default port VLAN. Since this
VLAN will exist throughout the lifetime of the port, simply store it in
the port's struct.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum: Allow controlling destruction of default port VLAN
Ido Schimmel [Thu, 20 Dec 2018 19:42:29 +0000 (19:42 +0000)]
mlxsw: spectrum: Allow controlling destruction of default port VLAN

The function allows flushing all the existing VLAN entries on a port. It
is invoked when a port is destroyed and when it is unlinked from a LAG.
In the latter case, when moving to the new default VLAN, there will not
be a need to destroy the default VLAN entry.

Therefore, add an argument that allows to control whether the default
port VLAN should be destroyed or not. Currently it is always set to
'true'.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum: Set PVID during port initialization
Ido Schimmel [Thu, 20 Dec 2018 19:42:27 +0000 (19:42 +0000)]
mlxsw: spectrum: Set PVID during port initialization

Currently, the driver does not set the port's PVID when initializing a
new port. This is because the driver is using VID 1 as PVID which is the
firmware default.

Subsequent patches are going to change the PVID the driver is setting
when initializing a new port.

Prepare for that by explicitly setting the port's PVID.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum: Replace hard-coded default VID with a define
Ido Schimmel [Thu, 20 Dec 2018 19:42:26 +0000 (19:42 +0000)]
mlxsw: spectrum: Replace hard-coded default VID with a define

Subsequent patches are going to replace the current default VID (1) with
VLAN_N_VID - 1 (4095).

Prepare for this conversion by replacing the hard-coded '1' with a
define.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: mlxsw: Add a test case for L3 VNI
Ido Schimmel [Thu, 20 Dec 2018 19:42:25 +0000 (19:42 +0000)]
selftests: mlxsw: Add a test case for L3 VNI

Previous patch added the ability to offload a VXLAN tunnel used for L3
VNI when it is present in the VLAN-aware bridge before the corresponding
VLAN interface is configured. This patch adds a test case to verify
that.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_router: Do not force specific configuration order
Ido Schimmel [Thu, 20 Dec 2018 19:42:23 +0000 (19:42 +0000)]
mlxsw: spectrum_router: Do not force specific configuration order

In symmetric routing, the only two members in the VLAN corresponding to
the L3 VNI are the router port and the VXLAN tunnel.

In case the VXLAN device is already enslaved to the bridge and only
later the VLAN interface is configured, the tunnel will not be
offloaded.

The reason for this is that when the router interface (RIF)
corresponding to the VLAN interface is configured, it calls the core
fid_get() API which does not check if NVE should be enabled on the FID.

Instead, call into the bridge code which will check if NVE should be
enabled on the FID.

This effectively means that the same code path is used to retrieve a FID
when either a local port or a router port joins the FID.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Thu, 20 Dec 2018 23:34:30 +0000 (15:34 -0800)]
Merge branch '1GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2018-12-20

This series contains updates to e100, igb, ixgbe, i40e and ice drivers.

I replaced spinlocks for mutex locks to reduce the latency on CPU0 for
igb when updating the statistics.  This work was based off a patch
provided by Jan Jablonsky, which was against an older version of the igb
driver.

Jesus adjusts the receive packet buffer size from 32K to 30K when
running in QAV mode, to stay within 60K for total packet buffer size for
igb.

Vinicius adds igb kernel documentation regarding the CBS algorithm and
its implementation in the i210 family of NICs.

YueHaibing from Huawei fixed the e100 driver that was potentially
passing a NULL pointer, so use the kernel macro IS_ERR_OR_NULL()
instead.

Konstantin Khorenko fixes i40e where we were not setting up the
neigh_priv_len in our net_device, which caused the driver to read beyond
the neighbor entry allocated memory.

Miroslav Lichvar extends the PTP gettime() to read the system clock by
adding support for PTP_SYS_OFFSET_EXTENDED ioctl in i40e.

Young Xiao fixed the ice driver to only enable NAPI on q_vectors that
actually have transmit and receive rings.

Kai-Heng Feng fixes an igb issue that when placed in suspend mode, the
NIC does not wake up when a cable is plugged in.  This was due to the
driver not setting PME during runtime suspend.

Stephen Douthit enables the ixgbe driver allow DSA devices to use the
MII interface to talk to switches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoixgbe: use mii_bus to handle MII related ioctls
Steve Douthit [Thu, 6 Dec 2018 15:50:43 +0000 (15:50 +0000)]
ixgbe: use mii_bus to handle MII related ioctls

Use the mii_bus callbacks to address the entire clause 22/45 address
space.  Enables userspace to poke switch registers instead of a single
PHY address.

The ixgbe firmware may be polling PHYs in a way that is not protected by
the mii_bus lock.  This isn't new behavior, but as Andrew Lunn pointed
out there are more addresses available for conflicts.

Signed-off-by: Stephen Douthit <stephend@silicom-usa.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoixgbe: register a mdiobus
Steve Douthit [Thu, 6 Dec 2018 15:50:39 +0000 (15:50 +0000)]
ixgbe: register a mdiobus

Most dsa devices expect a 'struct mii_bus' pointer to talk to switches
via the MII interface.

While this works for dsa devices, it will not work safely with Linux
PHYs in all configurations since the firmware of the ixgbe device may
be polling some PHY addresses in the background.

Signed-off-by: Stephen Douthit <stephend@silicom-usa.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoigb: Fix an issue that PME is not enabled during runtime suspend
Kai-Heng Feng [Mon, 3 Dec 2018 05:54:38 +0000 (13:54 +0800)]
igb: Fix an issue that PME is not enabled during runtime suspend

I210 ethernet card doesn't wakeup when a cable gets plugged. It's
because its PME is not set.

Since commit 42eca2302146 ("PCI: Don't touch card regs after runtime
suspend D3"), if the PCI state is saved, pci_pm_runtime_suspend() stops
calling pci_finish_runtime_suspend(), which enables the PCI PME.

To fix the issue, let's not to save PCI states when it's runtime
suspend, to let the PCI subsystem enables PME.

Fixes: 42eca2302146 ("PCI: Don't touch card regs after runtime suspend D3")
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Do not enable NAPI on q_vectors that have no rings
Young Xiao [Thu, 29 Nov 2018 01:54:10 +0000 (01:54 +0000)]
ice: Do not enable NAPI on q_vectors that have no rings

If ice driver has q_vectors w/ active NAPI that has no rings,
then this will result in a divide by zero error. To correct it
I am updating the driver code so that we only support NAPI on
q_vectors that have 1 or more rings allocated to them.

See commit 13a8cd191a2b ("i40e: Do not enable NAPI on q_vectors
that have no rings") for detail.

Signed-off-by: Young Xiao <YangX92@hotmail.com>
Acked-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoi40e: extend PTP gettime function to read system clock
Miroslav Lichvar [Wed, 28 Nov 2018 16:07:49 +0000 (17:07 +0100)]
i40e: extend PTP gettime function to read system clock

This adds support for the PTP_SYS_OFFSET_EXTENDED ioctl.

Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoi40e: define proper net_device::neigh_priv_len
Konstantin Khorenko [Fri, 23 Nov 2018 16:10:28 +0000 (19:10 +0300)]
i40e: define proper net_device::neigh_priv_len

Out of bound read reported by KASan.

i40iw_net_event() reads unconditionally 16 bytes from
neigh->primary_key while the memory allocated for
"neighbour" struct is evaluated in neigh_alloc() as

  tbl->entry_size + dev->neigh_priv_len

where "dev" is a net_device.

But the driver does not setup dev->neigh_priv_len and
we read beyond the neigh entry allocated memory,
so the patch in the next mail fixes this.

Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoe100: Fix passing zero to 'PTR_ERR' warning in e100_load_ucode_wait
YueHaibing [Mon, 19 Nov 2018 12:48:19 +0000 (20:48 +0800)]
e100: Fix passing zero to 'PTR_ERR' warning in e100_load_ucode_wait

Fix a static code checker warning:
drivers/net/ethernet/intel/e100.c:1349
 e100_load_ucode_wait() warn: passing zero to 'PTR_ERR'

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Thu, 20 Dec 2018 18:53:28 +0000 (10:53 -0800)]
Merge git://git./linux/kernel/git/davem/net

Lots of conflicts, by happily all cases of overlapping
changes, parallel adds, things of that nature.

Thanks to Stephen Rothwell, Saeed Mahameed, and others
for their guidance in these resolutions.

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoDocumentation: igb: Add a section about CBS
Vinicius Costa Gomes [Sat, 17 Nov 2018 00:19:24 +0000 (16:19 -0800)]
Documentation: igb: Add a section about CBS

Add some pointers to the definition of the CBS algorithm, and some
notes about the limits of its implementation in the i210 family of
controllers.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoigb: Change RXPBSIZE size when setting Qav mode
Jesus Sanchez-Palencia [Sat, 17 Nov 2018 00:19:23 +0000 (16:19 -0800)]
igb: Change RXPBSIZE size when setting Qav mode

Section 4.5.9 of the datasheet says that the total size of all packet
buffers combined (TxPB 0 + 1 + 2 + 3 + RxPB + BMC2OS + OS2BMC) must not
exceed 60KB. Today we are configuring a total of 62KB, so reduce the
RxPB from 32KB to 30KB in order to respect that.

The choice of changing RxPBSIZE here is mainly because it seems more
correct to give more priority to the transmit packet buffers over the
receiver ones when running in Qav mode. Also, the BMC2OS and OS2BMC
sizes are already too short.

Signed-off-by: Jesus Sanchez-Palencia <jesus.s.palencia@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoigb: reduce CPU0 latency when updating statistics
Jeff Kirsher [Thu, 2 Aug 2018 17:13:10 +0000 (10:13 -0700)]
igb: reduce CPU0 latency when updating statistics

This change is based off of the work and suggestion of Jan Jablonsky
<jan.jablonsky@thalesgroup.com>.

The Watchdog workqueue in igb driver is scheduled every 2s for each
network interface. That includes updating a statistics protected by
spinlock. Function igb_update_stats in this case will be protected
against preemption. According to number of a statistics registers
(cca 60), processing this function might cause additional cpu load
 on CPU0.

In case of statistics spinlock may be replaced with mutex, which
reduce latency on CPU0.

CC: Bernhard Kaindl <bernhard.kaindl@thalesgroup.com>
CC: Jan Jablonsky <jan.jablonsky@thalesgroup.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoMerge branch 'bnxt_en-next'
David S. Miller [Thu, 20 Dec 2018 16:26:16 +0000 (08:26 -0800)]
Merge branch 'bnxt_en-next'

Michael Chan says:

====================
bnxt_en: Update for net-next.

Three main changes in this series, besides the usual firmware spec
update:

1. Add support for a new firmware communication channel direct to the
firmware processor that handles flow offloads.  This speeds up
flow offload operations.

2. Use 64-bit internal flow handles to increase the number of flows
that can be offloaded.

3. Add level-2 context memory paging so that we can configure more
context memory for RDMA on the 57500 chips.  Allocate more context
memory if RDMA is enabled on the 57500 chips.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobnxt_en: Adjust default RX coalescing ticks to 10 us.
Michael Chan [Thu, 20 Dec 2018 08:38:53 +0000 (03:38 -0500)]
bnxt_en: Adjust default RX coalescing ticks to 10 us.

For a little better performance on faster machines and faster link
speeds.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobnxt_en: Support for 64-bit flow handle.
Venkat Duvvuru [Thu, 20 Dec 2018 08:38:52 +0000 (03:38 -0500)]
bnxt_en: Support for 64-bit flow handle.

Older firmware only supports 16-bit flow handle, because of which the
number of flows that can be offloaded can’t scale beyond a point.
Newer firmware supports 64-bit flow handle enabling the host to scale
upto millions of flows. With the new 64-bit flow handle support, driver
has to query flow stats in a different way compared to the older approach.

This patch adds support for 64-bit flow handle and new way to query
flow stats.

Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Reviewed-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobnxt_en: Increase context memory allocations on 57500 chips for RDMA.
Michael Chan [Thu, 20 Dec 2018 08:38:51 +0000 (03:38 -0500)]
bnxt_en: Increase context memory allocations on 57500 chips for RDMA.

If RDMA is supported on the 57500 chip, increase context memory
allocations for the resources used by RDMA.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobnxt_en: Add Level 2 context memory paging support.
Michael Chan [Thu, 20 Dec 2018 08:38:50 +0000 (03:38 -0500)]
bnxt_en: Add Level 2 context memory paging support.

Add the new functions bnxt_alloc_ctx_pg_tbls()/bnxt_free_ctx_pg_tbls()
to allocate and free pages for context memory.  The new functions
will handle the different levels of paging support and allocate/free
the pages accordingly using the existing functions.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobnxt_en: Enhance bnxt_alloc_ring()/bnxt_free_ring().
Michael Chan [Thu, 20 Dec 2018 08:38:49 +0000 (03:38 -0500)]
bnxt_en: Enhance bnxt_alloc_ring()/bnxt_free_ring().

To support level 2 context page memory structures, enhance the
bnxt_ring_mem_info structure with a "depth" field to specify the page
level and add a flag to specify using full pages for L1 and L2 page
tables.  This is needed to support RDMA functionality on 57500 chips
since RDMA requires more context memory.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobnxt_en: Add support for 2nd firmware message channel.
Venkat Duvvuru [Thu, 20 Dec 2018 08:38:48 +0000 (03:38 -0500)]
bnxt_en: Add support for 2nd firmware message channel.

Earlier, some of the firmware commands (ex: CFA_FLOW_*) which are processed
by KONG processor were sent to the CHIMP processor from the host. This
approach was taken as there was no direct message channel to KONG.
CHIMP in turn used to send them to KONG. Newer firmware supports a new
message channel which the host can send messages directly to the KONG
processor.

This patch adds support for required changes needed in the driver
to support direct KONG message channel.  This speeds up flow related
messages sent to the firmware for CLS_FLOWER offload.

Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobnxt_en: Introduce bnxt_get_hwrm_resp_addr & bnxt_get_hwrm_seq_id routines.
Venkat Duvvuru [Thu, 20 Dec 2018 08:38:47 +0000 (03:38 -0500)]
bnxt_en: Introduce bnxt_get_hwrm_resp_addr & bnxt_get_hwrm_seq_id routines.

These routines will be enhanced in the subsequent patch to
return the 2nd firmware comm. channel's hwrm response address &
sequence id respectively.

Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobnxt_en: Avoid arithmetic on void * pointer.
Venkat Duvvuru [Thu, 20 Dec 2018 08:38:46 +0000 (03:38 -0500)]
bnxt_en: Avoid arithmetic on void * pointer.

Typecast hwrm_cmd_resp_addr to (u8 *) from (void *) before doing
arithmetic.

Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobnxt_en: Use macros for firmware message doorbell offsets.
Venkat Duvvuru [Thu, 20 Dec 2018 08:38:45 +0000 (03:38 -0500)]
bnxt_en: Use macros for firmware message doorbell offsets.

In preparation for adding a 2nd communication channel to firmware.

Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobnxt_en: Set hwrm_intr_seq_id value to its inverted value.
Venkat Duvvuru [Thu, 20 Dec 2018 08:38:44 +0000 (03:38 -0500)]
bnxt_en: Set hwrm_intr_seq_id value to its inverted value.

Set hwrm_intr_seq_id value to its inverted value instead of
HWRM_SEQ_INVALID, when an hwrm completion of type
CMPL_BASE_TYPE_HWRM_DONE is received. This will enable us to use
the complete 16-bit sequence ID space.

Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobnxt_en: Update firmware interface spec. to 1.10.0.33.
Michael Chan [Thu, 20 Dec 2018 08:38:43 +0000 (03:38 -0500)]
bnxt_en: Update firmware interface spec. to 1.10.0.33.

The major changes are in the flow offload firmware APIs.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec...
David S. Miller [Thu, 20 Dec 2018 16:21:47 +0000 (08:21 -0800)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2018-12-20

Two last patches for this release cycle:

1) Remove an unused variable in xfrm_policy_lookup_bytype().
   From YueHaibing.

2) Fix possible infinite loop in __xfrm6_tunnel_alloc_spi().
   Also from YueHaibing.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'm68k-for-v4.20-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 20 Dec 2018 15:35:16 +0000 (07:35 -0800)]
Merge tag 'm68k-for-v4.20-tag2' of git://git./linux/kernel/git/geert/linux-m68k

Pull m68k fix from Geert Uytterhoeven:
 "Fix memblock-related crashes"

* tag 'm68k-for-v4.20-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
  m68k: Fix memblock-related crashes

5 years agoMerge tag 'kbuild-fixes-v4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 20 Dec 2018 15:33:09 +0000 (07:33 -0800)]
Merge tag 'kbuild-fixes-v4.20-2' of git://git./linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fix from Masahiro Yamada:
 "Fix false positive warning/error about missing library for objtool"

* tag 'kbuild-fixes-v4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: fix false positive warning/error about missing libelf

5 years agoMerge tag 'char-misc-4.20-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregk...
Linus Torvalds [Thu, 20 Dec 2018 15:30:37 +0000 (07:30 -0800)]
Merge tag 'char-misc-4.20-rc8' of git://git./linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "Here are three tiny last-minute driver fixes for 4.20-rc8 that resolve
  some reported issues, and one MAINTAINERS file update.

  All of them are related to the hyper-v subsystem, it seems people are
  actually testing and using it now, which is nice to see :)

  The fixes are:
   - uio_hv_generic: fix for opening multiple times
   - Remove PCI dependancy on hyperv drivers
   - return proper error code for an unopened channel.

  And Sasha has signed up to help out with the hyperv maintainership.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-4.20-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  Drivers: hv: vmbus: Return -EINVAL for the sys files for unopened channels
  x86, hyperv: remove PCI dependency
  MAINTAINERS: Patch monkey for the Hyper-V code
  uio_hv_generic: set callbacks on open

5 years agoMerge tag 'tty-4.20-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Thu, 20 Dec 2018 15:29:11 +0000 (07:29 -0800)]
Merge tag 'tty-4.20-rc8' of git://git./linux/kernel/git/gregkh/tty

Pull tty/serial fix from Greg KH:
 "Here is a single fix, a revert, for the 8250 serial driver to resolve
  a reported problem.

  There was some attempted patches to fix the issue, but people are
  arguing about them, so reverting the patch to revert back to the 4.19
  and older behavior is the best thing to do at this late in the release
  cycle.

  The revert has been in linux-next with no reported issues"

* tag 'tty-4.20-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  Revert "serial: 8250: Fix clearing FIFOs in RS485 mode again"

5 years agoMerge tag 'usb-4.20-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Thu, 20 Dec 2018 15:27:39 +0000 (07:27 -0800)]
Merge tag 'usb-4.20-rc8' of git://git./linux/kernel/git/gregkh/usb

Pull USB fixes and ids from Greg KH:
 "Here are some late xhci fixes for 4.20-rc8 as well as a few new device
  ids for the option usb-serial driver.

  The xhci fixes resolve some many-reported issues and all of these have
  been in linux-next for a while with no reported problems"

* tag 'usb-4.20-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  USB: xhci: fix 'broken_suspend' placement in struct xchi_hcd
  xhci: Don't prevent USB2 bus suspend in state check intended for USB3 only
  USB: serial: option: add Telit LN940 series
  USB: serial: option: add Fibocom NL668 series
  USB: serial: option: add Simcom SIM7500/SIM7600 (MBIM mode)
  USB: serial: option: add GosunCn ZTE WeLink ME3630
  USB: serial: option: add HP lt4132

5 years agoMerge tag 'mmc-v4.20-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Linus Torvalds [Thu, 20 Dec 2018 15:25:31 +0000 (07:25 -0800)]
Merge tag 'mmc-v4.20-rc7' of git://git./linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
 "MMC core:
   - Restore code to allow BKOPS and CACHE ctrl even if no HPI support
   - Reset HPI enabled state during re-init
   - Use a default minimum timeout when enabling CACHE ctrl

  MMC host:
   - omap_hsmmc: Fix DMA API warning
   - sdhci-tegra: Fix dt parsing of SDMMC pads autocal values
   - Correct register accesses when enabling v4 mode"

* tag 'mmc-v4.20-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: core: Use a minimum 1600ms timeout when enabling CACHE ctrl
  mmc: core: Allow BKOPS and CACHE ctrl even if no HPI support
  mmc: core: Reset HPI enabled state during re-init and in case of errors
  mmc: omap_hsmmc: fix DMA API warning
  mmc: tegra: Fix for SDMMC pads autocal parsing from dt
  mmc: sdhci: Fix sdhci_do_enable_v4_mode

5 years agoiomap: Revert "fs/iomap.c: get/put the page in iomap_page_create/release()"
Dave Chinner [Thu, 20 Dec 2018 12:23:24 +0000 (23:23 +1100)]
iomap: Revert "fs/iomap.c: get/put the page in iomap_page_create/release()"

This reverts commit 61c6de667263184125d5ca75e894fcad632b0dd3.

The reverted commit added page reference counting to iomap page
structures that are used to track block size < page size state. This
was supposed to align the code with page migration page accounting
assumptions, but what it has done instead is break XFS filesystems.
Every fstests run I've done on sub-page block size XFS filesystems
has since picking up this commit 2 days ago has failed with bad page
state errors such as:

# ./run_check.sh "-m rmapbt=1,reflink=1 -i sparse=1 -b size=1k" "generic/038"
....
SECTION       -- xfs
FSTYP         -- xfs (debug)
PLATFORM      -- Linux/x86_64 test1 4.20.0-rc6-dgc+
MKFS_OPTIONS  -- -f -m rmapbt=1,reflink=1 -i sparse=1 -b size=1k /dev/sdc
MOUNT_OPTIONS -- /dev/sdc /mnt/scratch

generic/038 454s ...
 run fstests generic/038 at 2018-12-20 18:43:05
 XFS (sdc): Unmounting Filesystem
 XFS (sdc): Mounting V5 Filesystem
 XFS (sdc): Ending clean mount
 BUG: Bad page state in process kswapd0  pfn:3a7fa
 page:ffffea0000ccbeb0 count:0 mapcount:0 mapping:ffff88800d9b6360 index:0x1
 flags: 0xfffffc0000000()
 raw: 000fffffc0000000 dead000000000100 dead000000000200 ffff88800d9b6360
 raw: 0000000000000001 0000000000000000 00000000ffffffff
 page dumped because: non-NULL mapping
 CPU: 0 PID: 676 Comm: kswapd0 Not tainted 4.20.0-rc6-dgc+ #915
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
 Call Trace:
  dump_stack+0x67/0x90
  bad_page.cold.116+0x8a/0xbd
  free_pcppages_bulk+0x4bf/0x6a0
  free_unref_page_list+0x10f/0x1f0
  shrink_page_list+0x49d/0xf50
  shrink_inactive_list+0x19d/0x3b0
  shrink_node_memcg.constprop.77+0x398/0x690
  ? shrink_slab.constprop.81+0x278/0x3f0
  shrink_node+0x7a/0x2f0
  kswapd+0x34b/0x6d0
  ? node_reclaim+0x240/0x240
  kthread+0x11f/0x140
  ? __kthread_bind_mask+0x60/0x60
  ret_from_fork+0x24/0x30
 Disabling lock debugging due to kernel taint
....

The failures are from anyway that frees pages and empties the
per-cpu page magazines, so it's not a predictable failure or an easy
to debug failure.

generic/038 is a reliable reproducer of this problem - it has a 9 in
10 failure rate on one of my test machines. Failure on other
machines have been at random points in fstests runs but every run
has ended up tripping this problem. Hence generic/038 was used to
bisect the failure because it was the most reliable failure.

It is too close to the 4.20 release (not to mention holidays) to
try to diagnose, fix and test the underlying cause of the problem,
so reverting the commit is the only option we have right now. The
revert has been tested against a current tot 4.20-rc7+ kernel across
multiple machines running sub-page block size XFs filesystems and
none of the bad page state failures have been seen.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Piotr Jaroszynski <pjaroszynski@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Brian Foster <bfoster@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5 years agonet/mlx5: Fix LAG requirement when CONFIG_MLX5_ESWITCH is off
Aviv Heller [Tue, 18 Dec 2018 17:03:27 +0000 (19:03 +0200)]
net/mlx5: Fix LAG requirement when CONFIG_MLX5_ESWITCH is off

If CONFIG_MLX5_ESWITCH is not defined, test for SR-IOV being disabled,
instead of calling e-switch LAG prereq routine.

Since LAG with SRIOV is allowed only when switchdev mode is on.

Fixes: eff849b2c669 ("net/mlx5: Allow/disallow LAG according to pre-req only")
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Fix query_nic_sys_image_guid() error during init
Aviv Heller [Wed, 19 Dec 2018 23:37:17 +0000 (01:37 +0200)]
net/mlx5: Fix query_nic_sys_image_guid() error during init

vport system image guid should be queried using vport nic API for
Ethernet ports, and vport hca API for Infiniband ports.

Fixes: fadd59fc50d0 ("net/mlx5: Introduce inter-device communication mechanism")
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Support tunnel encap over tagged Ethernet
Eli Britstein [Mon, 3 Dec 2018 15:09:54 +0000 (17:09 +0200)]
net/mlx5e: Support tunnel encap over tagged Ethernet

Generate encap header depending on the routed device to support
native/tagged Ethernet header.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Support VLAN encap ETH header generation
Eli Britstein [Sun, 9 Dec 2018 07:17:18 +0000 (09:17 +0200)]
net/mlx5e: Support VLAN encap ETH header generation

Support generation of native or tagged Ethernet header for encap
header, depending on provided net device.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Re-order route and encap header memory allocation
Eli Britstein [Mon, 3 Dec 2018 15:09:54 +0000 (17:09 +0200)]
net/mlx5e: Re-order route and encap header memory allocation

Change the order to first route IPv4/6 and return if error. Only after
successful route continue to allocate an encap header, with no
functional change.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Tunnel encap ETH header helper function
Eli Britstein [Sun, 9 Dec 2018 07:17:18 +0000 (09:17 +0200)]
net/mlx5e: Tunnel encap ETH header helper function

In tunnel encap we prepare the encap header for IPv4/6 cases, in two
separate functions. For ETH header generation the code is almost
duplicated.

Move the ETH header generation code from IPv4/6 functions to a helper
function, with no functional change.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Fail attempt to offload e-switch TC encap flows with vlan on underlay
Eli Britstein [Wed, 19 Dec 2018 07:29:10 +0000 (09:29 +0200)]
net/mlx5e: Fail attempt to offload e-switch TC encap flows with vlan on underlay

Currently we don't support nor fail attempts to offload encap flows routed
to vlan device on the underlay network. We wrongly consider a vlan underlay
device to be on the same e-switch b/c the switchdev ID is retrieved recursively.

Add explicit check for that and fail such attempts.

Also align to a more strict check for the ingress and the underlay devices
to practically be on the same eswitch.

Fixes: ce99f6b97fcd ('net/mlx5e: Support SRIOV TC encapsulation offloads for IPv6 tunnels')
Fixes: 3e621b19b0bb ('net/mlx5e: Support TC encapsulation offloads with upper devices')
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Tunnel routing output devs helper function
Eli Britstein [Tue, 18 Dec 2018 07:46:00 +0000 (09:46 +0200)]
net/mlx5e: Tunnel routing output devs helper function

For tunnel we determine the output devs for IPv4/6 cases, in two
separate functions, with a duplicated code.

Move that code from IPv4/6 functions to a helper function, with no
functional change.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Fail attempt to offload e-switch TC flows with egress upper devices
Eli Britstein [Wed, 19 Dec 2018 07:24:58 +0000 (09:24 +0200)]
net/mlx5e: Fail attempt to offload e-switch TC flows with egress upper devices

We use the switchdev parent HW id helper to identify if the mirred device
shares the same ASIC/port with the ingress device. This can get us wrong
in the presence of upper devices such as vlan or bridge set over the HW
devices (VF or uplink representors), b/c the switchdev ID is retrieved
recursively.

To fail offload attempts in such cases, we condition the check on the
egress device to have not only the same switchdev ID but also the relevant
mlx5 netdev ops.

Fixes: 03a9d11e6eeb ('net/mlx5e: Add TC drop and mirred/redirect action parsing for SRIOV offloads')
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Allow vlans on e-switch uplink reps
Or Gerlitz [Tue, 18 Dec 2018 11:32:46 +0000 (13:32 +0200)]
net/mlx5e: Allow vlans on e-switch uplink reps

There are cases (e.g tunneling with vlan on underlay and potentially
more) where this makes sense, so allow that.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Increase VF representors' SQ size to 128
Gavi Teitz [Wed, 12 Dec 2018 19:23:18 +0000 (21:23 +0200)]
net/mlx5e: Increase VF representors' SQ size to 128

The default size for the VF representors' SQ was too small to handle high
packet rates. Doubling the size from 64 to 128 drastically improves the
packet rate under stress (by about 50%), whereas increasing the size
beyond 128 has not shown to make any further difference.

The impact of the SQ size was measured with UDP traffic, in the following
topology: TG <-> PF <-> TC forwarding <-> VF representor <-> VF in VM
over a single core processing bi-directional traffic, with the following
results:

                                  SQ size of 64:     SQ size of 128:
Packet rate for 64B UDP packets:    860 [Kpps]         1280 [Kpps]
Packet rate for 114B VxLan
encapsulated UDP packets:           320 [Kpps]          500 [Kpps]

Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agomlx5: extend PTP gettime function to read system clock
Miroslav Lichvar [Mon, 3 Dec 2018 12:59:42 +0000 (13:59 +0100)]
mlx5: extend PTP gettime function to read system clock

Read the system time right before and immediately after reading the low
register of the internal timer. This adds support for the
PTP_SYS_OFFSET_EXTENDED ioctl.

Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agomlx5: update timecounter at least twice per counter overflow
Miroslav Lichvar [Mon, 3 Dec 2018 12:59:41 +0000 (13:59 +0100)]
mlx5: update timecounter at least twice per counter overflow

The timecounter needs to be updated at least once in half of the
cyclecounter interval to prevent timecounter_cyc2time() interpreting a
new timestamp as an old value and causing a backward jump.

This would be an issue if the timecounter multiplier was so small that
the update interval would not be limited by the 64-bit overflow in
multiplication.

Shorten the calculated interval to make sure the timecounter is updated
in time even when the system clock is slowed down by up to 10%, the
multiplier is increased by up to 10%, and the scheduled overflow check
is late by 15%.

Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ariel Levkovich <lariel@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agoneighbor: Use nda_policy for validating attributes in adds and dump requests
David Ahern [Thu, 20 Dec 2018 04:02:36 +0000 (20:02 -0800)]
neighbor: Use nda_policy for validating attributes in adds and dump requests

Add NDA_PROTOCOL to nda_policy and use the policy for attribute parsing and
validation for adding neighbors and in dump requests. Remove the now duplicate
checks on nla_len.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'hns3-next'
David S. Miller [Thu, 20 Dec 2018 07:47:59 +0000 (23:47 -0800)]
Merge branch 'hns3-next'

Peng Li says:

====================
net: hns3: code optimizations & bugfixes for HNS3 driver

This patchset includes bugfixes and code optimizations for the HNS3
ethernet controller driver
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: remove redundant variable initialization
Peng Li [Thu, 20 Dec 2018 03:52:06 +0000 (11:52 +0800)]
net: hns3: remove redundant variable initialization

This patch removes the redundant variable initialization,
as driver will devm_kzalloc to set value to hdev soon.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix the descriptor index when get rss type
Peng Li [Thu, 20 Dec 2018 03:52:05 +0000 (11:52 +0800)]
net: hns3: fix the descriptor index when get rss type

Driver gets rss information from the last descriptor of the packet.
When driver handle the rss type, ring->next_to_clean indicates the
first descriptor of next packet.

This patch fix the descriptor index with "ring->next_to_clean - 1".

Fixes: 232fc64b6e62 ("net: hns3: Add HW RSS hash information to RX skb")
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: don't restore rules when flow director is disabled
Jian Shen [Thu, 20 Dec 2018 03:52:04 +0000 (11:52 +0800)]
net: hns3: don't restore rules when flow director is disabled

When user disables flow director, all the rules will be disabled. But
when reset happens, it will restore all the rules again. It's not
reasonable. This patch fixes it by add flow director status check before
restore fules.

Fixes: 6871af29b3ab ("net: hns3: Add reset handle for flow director")
Fixes: c17852a8932f ("net: hns3: Add support for enable/disable flow director")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix vf id check issue when add flow director rule
Jian Shen [Thu, 20 Dec 2018 03:52:03 +0000 (11:52 +0800)]
net: hns3: fix vf id check issue when add flow director rule

When add flow director fule for vf, the vf id is used as array
subscript before valid checking, which may cause memory overflow.

Fixes: dd74f815dd41 ("net: hns3: Add support for rule add/delete for flow director")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: reset tqp while doing DOWN operation
Huazhong Tan [Thu, 20 Dec 2018 03:52:02 +0000 (11:52 +0800)]
net: hns3: reset tqp while doing DOWN operation

While doing DOWN operation, the driver will reclaim the memory which has
already used for TX. If the hardware is processing this memory, it will
cause a RCB error to the hardware. According the hardware's description,
the driver should reset the tqp before reclaim the memory during DOWN.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: add max vector number check for pf
Jian Shen [Thu, 20 Dec 2018 03:52:01 +0000 (11:52 +0800)]
net: hns3: add max vector number check for pf

Each pf supports max 64 vectors and 128 tqps. For 2p/4p core scenario,
there may be more than 64 cpus online. So the result of min_t(u16,
num_Online_cpus(), tqp_num) may be more than 64. This patch adds check
for the vector number.

Fixes: dd38c72604dc ("net: hns3: fix for coalesce configuration lost during reset")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix a bug caused by udelay
Peng Li [Thu, 20 Dec 2018 03:52:00 +0000 (11:52 +0800)]
net: hns3: fix a bug caused by udelay

udelay() in driver may always occupancy processor. If there is only
one cpu in system, the VF driver may initialize fail when insmod
PF and VF driver in the same system. This patch use msleep() to free
cpu when VF wait PF message.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: change default tc state to close
Jian Shen [Thu, 20 Dec 2018 03:51:59 +0000 (11:51 +0800)]
net: hns3: change default tc state to close

In original codes, default tc value is set to the max tc. It's more
reasonable to close tc by changing default tc value to 1. Users can
enable it with lldp tool when they want to use tc.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: refine the handle for hns3_nic_net_open/stop()
Jian Shen [Thu, 20 Dec 2018 03:51:58 +0000 (11:51 +0800)]
net: hns3: refine the handle for hns3_nic_net_open/stop()

When triggering nic down, there is a time window between bringing down
the protocol stack and stopping the work task. If the net is up in the
time window, it may bring up the protocol stack again.

This patch fixes it by stop the work task at the beginning of
hns3_nic_net_stop(). To keep symmetrical, start the work task at the
end of hns3_nic_net_open().

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Thu, 20 Dec 2018 07:34:33 +0000 (23:34 -0800)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Off by one in netlink parsing of mac802154_hwsim, from Alexander
    Aring.

 2) nf_tables RCU usage fix from Taehee Yoo.

 3) Flow dissector needs nhoff and thoff clamping, from Stanislav
    Fomichev.

 4) Missing sin6_flowinfo initialization in SCTP, from Xin Long.

 5) Spectrev1 in ipmr and ip6mr, from Gustavo A. R. Silva.

 6) Fix r8169 crash when DEBUG_SHIRQ is enabled, from Heiner Kallweit.

 7) Fix SKB leak in rtlwifi, from Larry Finger.

 8) Fix state pruning in bpf verifier, from Jakub Kicinski.

 9) Don't handle completely duplicate fragments as overlapping, from
    Michal Kubecek.

10) Fix memory corruption with macb and 64-bit DMA, from Anssi Hannula.

11) Fix TCP fallback socket release in smc, from Myungho Jung.

12) gro_cells_destroy needs to napi_disable, from Lorenzo Bianconi.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (130 commits)
  rds: Fix warning.
  neighbor: NTF_PROXY is a valid ndm_flag for a dump request
  net: mvpp2: fix the phylink mode validation
  net/sched: cls_flower: Remove old entries from rhashtable
  net/tls: allocate tls context using GFP_ATOMIC
  iptunnel: make TUNNEL_FLAGS available in uapi
  gro_cell: add napi_disable in gro_cells_destroy
  lan743x: Remove MAC Reset from initialization
  net/mlx5e: Remove the false indication of software timestamping support
  net/mlx5: Typo fix in del_sw_hw_rule
  net/mlx5e: RX, Fix wrong early return in receive queue poll
  ipv6: explicitly initialize udp6_addr in udp_sock_create6()
  bnxt_en: Fix ethtool self-test loopback.
  net/rds: remove user triggered WARN_ON in rds_sendmsg
  net/rds: fix warn in rds_message_alloc_sgs
  ath10k: skip sending quiet mode cmd for WCN3990
  mac80211: free skb fraglist before freeing the skb
  nl80211: fix memory leak if validate_pae_over_nl80211() fails
  net/smc: fix TCP fallback socket release
  vxge: ensure data0 is initialized in when fetching firmware version information
  ...

5 years agords: Fix warning.
David S. Miller [Thu, 20 Dec 2018 04:53:18 +0000 (20:53 -0800)]
rds: Fix warning.

>> net/rds/send.c:1109:42: warning: Using plain integer as NULL pointer

Fixes: ea010070d0a7 ("net/rds: fix warn in rds_message_alloc_sgs")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Linus Torvalds [Thu, 20 Dec 2018 02:40:48 +0000 (18:40 -0800)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost

Pull virtio fix from Michael Tsirkin:
 "A last-minute fix for a test build"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio: fix test build after uio.h change

5 years agoMerge tag 'nfs-for-4.20-6' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Linus Torvalds [Thu, 20 Dec 2018 02:38:54 +0000 (18:38 -0800)]
Merge tag 'nfs-for-4.20-6' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:

 - Fix TCP socket disconnection races by ensuring we always call
   xprt_disconnect_done() after releasing the socket.

 - Fix a race when clearing both XPRT_CONNECTING and XPRT_LOCKED

 - Remove xprt_connect_status() so it does not mask errors that should
   be handled by call_connect_status()

* tag 'nfs-for-4.20-6' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  SUNRPC: Remove xprt_connect_status()
  SUNRPC: Fix a race with XPRT_CONNECTING
  SUNRPC: Fix disconnection races

5 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Thu, 20 Dec 2018 02:27:58 +0000 (18:27 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:

 -  One nasty use-after-free bugfix, from this merge window however

 -  A less nasty use-after-free that can only zero some words at the
    beginning of the page, and hence is not really exploitable

 -  A NULL pointer dereference

 -  A dummy implementation of an AMD chicken bit MSR that Windows uses
    for some unknown reason

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: x86: Add AMD's EX_CFG to the list of ignored MSRs
  KVM: X86: Fix NULL deref in vcpu_scan_ioapic
  KVM: Fix UAF in nested posted interrupt processing
  KVM: fix unregistering coalesced mmio zone from wrong bus

5 years agoMerge tag 'dma-mapping-4.20-4' of git://git.infradead.org/users/hch/dma-mapping
Linus Torvalds [Thu, 20 Dec 2018 02:16:17 +0000 (18:16 -0800)]
Merge tag 'dma-mapping-4.20-4' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fix from Christoph Hellwig:
 "Fix a regression in dma-direct that didn't take account the magic AMD
  memory encryption mask in the DMA address"

* tag 'dma-mapping-4.20-4' of git://git.infradead.org/users/hch/dma-mapping:
  dma-direct: do not include SME mask in the DMA supported check

5 years agoneighbor: NTF_PROXY is a valid ndm_flag for a dump request
David Ahern [Thu, 20 Dec 2018 00:54:38 +0000 (16:54 -0800)]
neighbor: NTF_PROXY is a valid ndm_flag for a dump request

When dumping proxy entries the dump request has NTF_PROXY set in
ndm_flags. strict mode checking needs to be updated to allow this
flag.

Fixes: 51183d233b5a ("net/neighbor: Update neigh_dump_info for strict data checking")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoneighbor: Initialize protocol when new pneigh_entry are created
David Ahern [Wed, 19 Dec 2018 23:53:22 +0000 (15:53 -0800)]
neighbor: Initialize protocol when new pneigh_entry are created

pneigh_lookup uses kmalloc versus kzalloc when new entries are allocated.
Given that the newly added protocol field needs to be initialized.

Fixes: df9b0e30d44c ("neighbor: Add protocol attribute")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: net: refactor reuseport_addr_any test
Peter Oskolkov [Wed, 19 Dec 2018 18:20:09 +0000 (10:20 -0800)]
selftests: net: refactor reuseport_addr_any test

This patch refactors reuseport_add_any selftest a bit:
- makes it more modular (eliminates several copy/pasted blocks);
- skips DCCP tests if DCCP is not supported

V2: added "Signed-off-by" tag.

Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: mv88e6xxx: Add missing watchdog ops for 6320 family
Andrew Lunn [Wed, 19 Dec 2018 17:28:54 +0000 (18:28 +0100)]
net: dsa: mv88e6xxx: Add missing watchdog ops for 6320 family

The 6320 family of switches uses the same watchdog registers as the
6390.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: mvpp2: fix the phylink mode validation
Antoine Tenart [Wed, 19 Dec 2018 17:00:12 +0000 (18:00 +0100)]
net: mvpp2: fix the phylink mode validation

The mvpp2_phylink_validate() sets all modes that are supported by a
given PPv2 port. An mistake made the 10000baseT_Full mode being
advertised in some cases when a port wasn't configured to perform at
10G. This patch fixes this.

Fixes: d97c9f4ab000 ("net: mvpp2: 1000baseX support")
Reported-by: Russell King <linux@armlinux.org.uk>
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/sched: cls_flower: Remove old entries from rhashtable
Roi Dayan [Wed, 19 Dec 2018 16:07:56 +0000 (18:07 +0200)]
net/sched: cls_flower: Remove old entries from rhashtable

When replacing a rule we add the new rule to the rhashtable
but only remove the old if not in skip_sw.
This commit fix this and remove the old rule anyway.

Fixes: 35cc3cefc4de ("net/sched: cls_flower: Reject duplicated rules also under skip_sw")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/tls: allocate tls context using GFP_ATOMIC
Ganesh Goudar [Wed, 19 Dec 2018 11:48:22 +0000 (17:18 +0530)]
net/tls: allocate tls context using GFP_ATOMIC

create_ctx can be called from atomic context, hence use
GFP_ATOMIC instead of GFP_KERNEL.

[  395.962599] BUG: sleeping function called from invalid context at mm/slab.h:421
[  395.979896] in_atomic(): 1, irqs_disabled(): 0, pid: 16254, name: openssl
[  395.996564] 2 locks held by openssl/16254:
[  396.010492]  #0: 00000000347acb52 (sk_lock-AF_INET){+.+.}, at: do_tcp_setsockopt.isra.44+0x13b/0x9a0
[  396.029838]  #1: 000000006c9552b5 (device_spinlock){+...}, at: tls_init+0x1d/0x280
[  396.047675] CPU: 5 PID: 16254 Comm: openssl Tainted: G           O      4.20.0-rc6+ #25
[  396.066019] Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0c 09/25/2017
[  396.083537] Call Trace:
[  396.096265]  dump_stack+0x5e/0x8b
[  396.109876]  ___might_sleep+0x216/0x250
[  396.123940]  kmem_cache_alloc_trace+0x1b0/0x240
[  396.138800]  create_ctx+0x1f/0x60
[  396.152504]  tls_init+0xbd/0x280
[  396.166135]  tcp_set_ulp+0x191/0x2d0
[  396.180035]  ? tcp_set_ulp+0x2c/0x2d0
[  396.193960]  do_tcp_setsockopt.isra.44+0x148/0x9a0
[  396.209013]  __sys_setsockopt+0x7c/0xe0
[  396.223054]  __x64_sys_setsockopt+0x20/0x30
[  396.237378]  do_syscall_64+0x4a/0x180
[  396.251200]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: df9d4a178022 ("net/tls: sleeping function from invalid context")
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'mt2712'
David S. Miller [Thu, 20 Dec 2018 00:24:59 +0000 (16:24 -0800)]
Merge branch 'mt2712'

Biao Huang says:

====================
add ethernet binding and modify ethernet driver for mt2712

changes in v3:
resend this series base on the latest net-next tree.

changes in v2 as comments from Sean:
1. fix typo.
2. use capital letters for RMII/MII/RGMII in driver and bindings.

v1:
This new series is the result of discussion in:
http://lkml.org/lkml/2018/12/13/1007
http://lkml.org/lkml/2018/12/14/53

1. ethernet binding file move to this series.
2. remove fine tune property in device tree
3. remove fine tune flow in ethernet driver
4. set rgmii timing according to the value in device tree,
and don't care whether phy insert internal delay  or not.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet-next: stmmac: dwmac-mediatek: remove fine-tune property
Biao Huang [Wed, 19 Dec 2018 07:22:41 +0000 (15:22 +0800)]
net-next: stmmac: dwmac-mediatek: remove fine-tune property

1. remove fine-tune property and related setting to simplify
the timing adjustment flow.
2. set timing value according to the value from device tree,
and will not care whether PHY insert internal delay.

Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet-next: dt-binding: dwmac-mediatek: remove fine-tune property
Biao Huang [Wed, 19 Dec 2018 07:22:40 +0000 (15:22 +0800)]
net-next: dt-binding: dwmac-mediatek: remove fine-tune property

remove fine-tune property in device tree, modify
the corresponding description in dt-binding.

Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoiptunnel: make TUNNEL_FLAGS available in uapi
wenxu [Wed, 19 Dec 2018 06:11:15 +0000 (14:11 +0800)]
iptunnel: make TUNNEL_FLAGS available in uapi

ip l add dev tun type gretap external
ip r a 10.0.0.1 encap ip dst 192.168.152.171 id 1000 dev gretap

For gretap Key example when the command set the id but don't set the
TUNNEL_KEY flags. There is no key field in the send packet

In the lwtunnel situation, some TUNNEL_FLAGS should can be set by
userspace

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agogro_cell: add napi_disable in gro_cells_destroy
Lorenzo Bianconi [Wed, 19 Dec 2018 22:23:00 +0000 (23:23 +0100)]
gro_cell: add napi_disable in gro_cells_destroy

Add napi_disable routine in gro_cells_destroy since starting from
commit c42858eaf492 ("gro_cells: remove spinlock protecting receive
queues") gro_cell_poll and gro_cells_destroy can run concurrently on
napi_skbs list producing a kernel Oops if the tunnel interface is
removed while gro_cell_poll is running. The following Oops has been
triggered removing a vxlan device while the interface is receiving
traffic

[ 5628.948853] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 5628.949981] PGD 0 P4D 0
[ 5628.950308] Oops: 0002 [#1] SMP PTI
[ 5628.950748] CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 4.20.0-rc6+ #41
[ 5628.952940] RIP: 0010:gro_cell_poll+0x49/0x80
[ 5628.955615] RSP: 0018:ffffc9000004fdd8 EFLAGS: 00010202
[ 5628.956250] RAX: 0000000000000000 RBX: ffffe8ffffc08150 RCX: 0000000000000000
[ 5628.957102] RDX: 0000000000000000 RSI: ffff88802356bf00 RDI: ffffe8ffffc08150
[ 5628.957940] RBP: 0000000000000026 R08: 0000000000000000 R09: 0000000000000000
[ 5628.958803] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000040
[ 5628.959661] R13: ffffe8ffffc08100 R14: 0000000000000000 R15: 0000000000000040
[ 5628.960682] FS:  0000000000000000(0000) GS:ffff88803ea00000(0000) knlGS:0000000000000000
[ 5628.961616] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5628.962359] CR2: 0000000000000008 CR3: 000000000221c000 CR4: 00000000000006b0
[ 5628.963188] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 5628.964034] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 5628.964871] Call Trace:
[ 5628.965179]  net_rx_action+0xf0/0x380
[ 5628.965637]  __do_softirq+0xc7/0x431
[ 5628.966510]  run_ksoftirqd+0x24/0x30
[ 5628.966957]  smpboot_thread_fn+0xc5/0x160
[ 5628.967436]  kthread+0x113/0x130
[ 5628.968283]  ret_from_fork+0x3a/0x50
[ 5628.968721] Modules linked in:
[ 5628.969099] CR2: 0000000000000008
[ 5628.969510] ---[ end trace 9d9dedc7181661fe ]---
[ 5628.970073] RIP: 0010:gro_cell_poll+0x49/0x80
[ 5628.972965] RSP: 0018:ffffc9000004fdd8 EFLAGS: 00010202
[ 5628.973611] RAX: 0000000000000000 RBX: ffffe8ffffc08150 RCX: 0000000000000000
[ 5628.974504] RDX: 0000000000000000 RSI: ffff88802356bf00 RDI: ffffe8ffffc08150
[ 5628.975462] RBP: 0000000000000026 R08: 0000000000000000 R09: 0000000000000000
[ 5628.976413] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000040
[ 5628.977375] R13: ffffe8ffffc08100 R14: 0000000000000000 R15: 0000000000000040
[ 5628.978296] FS:  0000000000000000(0000) GS:ffff88803ea00000(0000) knlGS:0000000000000000
[ 5628.979327] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5628.980044] CR2: 0000000000000008 CR3: 000000000221c000 CR4: 00000000000006b0
[ 5628.980929] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 5628.981736] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 5628.982409] Kernel panic - not syncing: Fatal exception in interrupt
[ 5628.983307] Kernel Offset: disabled

Fixes: c42858eaf492 ("gro_cells: remove spinlock protecting receive queues")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agolan743x: Remove MAC Reset from initialization
Bryan Whitehead [Wed, 19 Dec 2018 21:55:15 +0000 (16:55 -0500)]
lan743x: Remove MAC Reset from initialization

The MAC Reset was noticed to erase important EEPROM settings.
It is also unnecessary since a chip wide reset was done earlier
in initialization, and that reset preserves EEPROM settings.

There for this patch removes the unnecessary MAC specific reset.

Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agovirtio: fix test build after uio.h change
Michael S. Tsirkin [Wed, 19 Dec 2018 23:21:51 +0000 (18:21 -0500)]
virtio: fix test build after uio.h change

Fixes: d38499530e5 ("fs: decouple READ and WRITE from the block layer ops")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
5 years agoMerge tag 'mlx5-fixes-2018-12-19' of git://git.kernel.org/pub/scm/linux/kernel/git...
David S. Miller [Wed, 19 Dec 2018 21:44:12 +0000 (13:44 -0800)]
Merge tag 'mlx5-fixes-2018-12-19' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-fixes-2018-12-19

Some fixes for the mlx5 driver
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'neigh-get-support'
David S. Miller [Wed, 19 Dec 2018 21:37:34 +0000 (13:37 -0800)]
Merge branch 'neigh-get-support'

Roopa Prabhu says:

====================
neigh get support

This series adds support for neigh get similar
to route and recently added fdb get.

v2: fix key len check. and some other fixes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: rtnetlink.sh: add testcase for neigh get
Roopa Prabhu [Wed, 19 Dec 2018 20:51:39 +0000 (12:51 -0800)]
selftests: rtnetlink.sh: add testcase for neigh get

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoneighbour: register rtnl doit handler
Roopa Prabhu [Wed, 19 Dec 2018 20:51:38 +0000 (12:51 -0800)]
neighbour: register rtnl doit handler

this patch registers neigh doit handler. The doit handler
returns a neigh entry given dst and dev. This is similar
to route and fdb doit (get) handlers. Also moves nda_policy
declaration from rtnetlink.c to neighbour.c

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/mlx5e: Remove the false indication of software timestamping support
Alaa Hleihel [Sun, 25 Nov 2018 09:46:09 +0000 (11:46 +0200)]
net/mlx5e: Remove the false indication of software timestamping support

mlx5 driver falsely advertises support of software timestamping.
Fix it by removing the false indication.

Fixes: ef9814deafd0 ("net/mlx5e: Add HW timestamping (TS) support")
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: Typo fix in del_sw_hw_rule
Yuval Avnery [Thu, 13 Dec 2018 00:26:46 +0000 (02:26 +0200)]
net/mlx5: Typo fix in del_sw_hw_rule

Expression terminated with "," instead of ";", resulted in
set_fte getting bad value for modify_enable_mask field.

Fixes: bd5251dbf156 ("net/mlx5_core: Introduce flow steering destination of type counter")
Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>