linux-2.6-block.git
5 years agoqed: fix spelling mistake "inculde" -> "include"
Colin Ian King [Tue, 28 May 2019 06:52:17 +0000 (07:52 +0100)]
qed: fix spelling mistake "inculde" -> "include"

There is a spelling mistake in a DP_INFO message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Michal KalderonĀ <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'qed-Fix-inifinite-spinning-of-PTP-poll-thread'
David S. Miller [Wed, 29 May 2019 07:01:30 +0000 (00:01 -0700)]
Merge branch 'qed-Fix-inifinite-spinning-of-PTP-poll-thread'

Sudarsana Reddy Kalluru says:

====================
qed*: Fix inifinite spinning of PTP poll thread.

The patch series addresses an error scenario in the PTP Tx implementation.

Please consider applying it to net-next.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqede: Handle infinite driver spinning for Tx timestamp.
Sudarsana Reddy Kalluru [Tue, 28 May 2019 03:21:33 +0000 (20:21 -0700)]
qede: Handle infinite driver spinning for Tx timestamp.

In PTP Tx implementation, driver kept scheduling a poll thread until the
timestamp is available. In the error scenarios (e.g. app requesting the
timestamp for non-ptp packet), this thread kept waiting for the timestamp
forever.  This patch add changes to report such scenario as an error and
terminate the thread. Added a timeout of 2 seconds i.e., max time to wait
for Tx timestamp. Added a stat value ptp_skip_txts for reporting the number
of packets for which Tx timestamping is skipped.

Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: Reduce the severity of ptp debug message.
Sudarsana Reddy Kalluru [Tue, 28 May 2019 03:21:32 +0000 (20:21 -0700)]
qed: Reduce the severity of ptp debug message.

PTP Tx implementation continuously polls for the availability of timestamp.
Reducing the severity of a debug message in this path to avoid filling up
the syslog buffer with this message, especially in the error scenarios.

Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomacvlan: Replace strncpy() by strscpy()
Gustavo A. R. Silva [Mon, 27 May 2019 18:38:55 +0000 (13:38 -0500)]
macvlan: Replace strncpy() by strscpy()

The strncpy() function is being deprecated. Replace it by the safer
strscpy() and fix the following Coverity warning:

"Calling strncpy with a maximum size argument of 16 bytes on destination
array ifrr.ifr_ifrn.ifrn_name of size 16 bytes might leave the destination
string unterminated."

Notice that, unlike strncpy(), strscpy() always null-terminates the
destination string.

Addresses-Coverity-ID: 1445537 ("Buffer not null terminated")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Wed, 29 May 2019 06:24:44 +0000 (23:24 -0700)]
Merge branch '1GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
1GbE Intel Wired LAN Driver Updates 2019-05-28

This series contains updates to e1000e, igb and igc.

Feng adds additional information on a warning message when a read of a
hardware register fails.

Gustavo A. R. Silva fixes up two "fall through" code comments so that
the checkers can actually determine that we did comment that the case
statement is falling through to the next case.

Sasha does some cleanup on the igc driver by removing duplicate
white space and removed a unneeded workaround for igc.  Adds support for
flow control to the igc driver.

Konstantin Khlebnikov reverts a previous fix which was causing a false
positive for a hardware hang.  Provides a fix so that when link is lost
the packets in the transmit queue are flushed and wakes the transmit
queue when the NIC is ready to send packets.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-API-and-initial-implementation-for-nexthop-objects'
David S. Miller [Wed, 29 May 2019 04:37:30 +0000 (21:37 -0700)]
Merge branch 'net-API-and-initial-implementation-for-nexthop-objects'

David Ahern says:

====================
net: API and initial implementation for nexthop objects

This set contains the API and initial implementation for nexthops as
standalone objects.

Patch 1 contains the UAPI and updates to selinux struct.

Patch 2 contains the barebones code for nexthop commands, rbtree
maintenance and notifications.

Patch 3 then adds support for IPv4 gateways along with handling of
netdev events.

Patch 4 adds support for IPv6 gateways.

Patch 5 has the implementation of the encap attributes.

Patch 6 adds support for nexthop groups.

At the end of this set, nexthop objects can be created and deleted and
userspace can monitor nexthop events, but ipv4 and ipv6 routes can not
use them yet. Once the nexthop struct is defined, follow on sets add it
to fib{6}_info and handle it within the respective code before routes
can be inserted using them.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonexthop: Add support for nexthop groups
David Ahern [Fri, 24 May 2019 21:43:08 +0000 (14:43 -0700)]
nexthop: Add support for nexthop groups

Allow the creation of nexthop groups which reference other nexthop
objects to create multipath routes:

                      +--------------+
   +------------+   +--------------+ |
   | nh  nh_grp --->| nh_grp_entry |-+
   +------------+   +---------|----+
     ^                |       |    +------------+
     +----------------+       +--->| nh, weight |
        nh_parent                  +------------+

A group entry points to a nexthop with a weight for that hop within the
group. The nexthop has a list_head, grp_list, for tracking which groups
it is a member of and the group entry has a reference back to the parent.
The grp_list is used when a nexthop is deleted - to efficiently remove
it from groups using it.

If a nexthop group spec is given, no other attributes can be set. Each
nexthop id in a group spec must already exist.

Similar to single nexthops, the specification of a nexthop group can be
updated so that data is managed with rcu locking.

Add path selection function to account for multiple paths and add
ipv{4,6}_good_nh helpers to know that if a neighbor entry exists it is
in a good state.

Update NETDEV event handling to rebalance multipath nexthop groups if
a nexthop is deleted due to a link event (down or unregister).

When a nexthop is removed any groups using it are updated. Groups using a
nexthop a tracked via a grp_list.

Nexthop dumps can be limited to groups only by adding NHA_GROUPS to the
request.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonexthop: Add support for lwt encaps
David Ahern [Fri, 24 May 2019 21:43:07 +0000 (14:43 -0700)]
nexthop: Add support for lwt encaps

Add support for NHA_ENCAP and NHA_ENCAP_TYPE. Leverages the existing code
for lwtunnel within fib_nh_common, so the only change needed is handling
the attributes in the nexthop code.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonexthop: Add support for IPv6 gateways
David Ahern [Fri, 24 May 2019 21:43:06 +0000 (14:43 -0700)]
nexthop: Add support for IPv6 gateways

Handle IPv6 gateway in a nexthop spec. If nh_family is set to AF_INET6,
NHA_GATEWAY is expected to be an IPv6 address. Add ipv6 option to gw in
nh_config to hold the address, add fib6_nh to nh_info to leverage the
ipv6 initialization and cleanup code. Update nh_fill_node to dump the v6
address.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonexthop: Add support for IPv4 nexthops
David Ahern [Fri, 24 May 2019 21:43:05 +0000 (14:43 -0700)]
nexthop: Add support for IPv4 nexthops

Add support for IPv4 nexthops. If nh_family is set to AF_INET, then
NHA_GATEWAY is expected to be an IPv4 address.

Register for netdev events to be notified of admin up/down changes as
well as deletes. A hash table is used to track nexthop per devices to
quickly convert device events to the affected nexthops.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: Initial nexthop code
David Ahern [Fri, 24 May 2019 21:43:04 +0000 (14:43 -0700)]
net: Initial nexthop code

Barebones start point for nexthops. Implementation for RTM commands,
notifications, management of rbtree for holding nexthops by id, and
kernel side data structures for nexthops and nexthop config.

Nexthops are maintained in an rbtree sorted by id. Similar to routes,
nexthops are configured per namespace using netns_nexthop struct added
to struct net.

Nexthop notifications are sent when a nexthop is added or deleted,
but NOT if the delete is due to a device event or network namespace
teardown (which also involves device events). Applications are
expected to use the device down event to flush nexthops and any
routes used by the nexthops.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: nexthop uapi
David Ahern [Fri, 24 May 2019 21:43:03 +0000 (14:43 -0700)]
net: nexthop uapi

New UAPI for nexthops as standalone objects:
- defines netlink ancillary header, struct nhmsg
- RTM commands for nexthop objects, RTM_*NEXTHOP,
- RTNLGRP for nexthop notifications, RTNLGRP_NEXTHOP,
- Attributes for creating nexthops, NHA_*
- Attribute for route specs to specify a nexthop by id, RTA_NH_ID.

The nexthop attributes and semantics follow the route and RTA ones for
device, gateway and lwt encap. Unique to nexthop objects are a blackhole
and a group which contains references to other nexthop objects. With the
exception of blackhole and group, nexthop objects MUST contain a device.
Gateway and encap are optional. Nexthop groups can only reference other
pre-existing nexthops by id. If the NHA_ID attribute is present that id
is used for the nexthop. If not specified, one is auto assigned.

Dump requests can include attributes:
- NHA_GROUPS to return only nexthop groups,
- NHA_MASTER to limit dumps to nexthops with devices enslaved to the
  given master (e.g., VRF)
- NHA_OIF to limit dumps to nexthops using given device

nlmsg_route_perms in selinux code is updated for the new RTM comands.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'hns3-next'
David S. Miller [Wed, 29 May 2019 00:39:01 +0000 (17:39 -0700)]
Merge branch 'hns3-next'

Huazhong Tan says:

====================
code optimizations & bugfixes for HNS3 driver

This patch-set includes code optimizations and bugfixes for the HNS3
ethernet controller driver.

[patch 1/12] fixes a compile warning reported by kbuild test robot.

[patch 2/12] fixes HNS3_RXD_GRO_SIZE_M macro definition error.

[patch 3/12] adds a debugfs command to dump firmware information.

[patch 4/12 - 10/12] adds some code optimizaions and cleanups for
reset and driver unloading.

[patch 11/12 - 12/12] adds two bugfixes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix a memory leak issue for hclge_map_unmap_ring_to_vf_vector
Huazhong Tan [Tue, 28 May 2019 09:03:02 +0000 (17:03 +0800)]
net: hns3: fix a memory leak issue for hclge_map_unmap_ring_to_vf_vector

When hclge_bind_ring_with_vector() fails,
hclge_map_unmap_ring_to_vf_vector() returns the error
directly, so nobody will free the memory allocated by
hclge_get_ring_chain_from_mbx().

So hclge_free_vector_ring_chain() should be called no matter
hclge_bind_ring_with_vector() fails or not.

Fixes: 84e095d64ed9 ("net: hns3: Change PF to add ring-vect binding & resetQ to mailbox")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: adjust hns3_uninit_phy()'s location in the hns3_client_uninit()
Huazhong Tan [Tue, 28 May 2019 09:03:01 +0000 (17:03 +0800)]
net: hns3: adjust hns3_uninit_phy()'s location in the hns3_client_uninit()

hns3_uninit_phy() should be called before checking
HNS3_NIC_STATE_INITED flags, otherwise when this checking fails,
there is nobody to call hns3_uninit_phy().

Fixes: c8a8045b2d0a ("net: hns3: Fix NULL deref when unloading driver")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: stop schedule reset service while unloading driver
Huazhong Tan [Tue, 28 May 2019 09:03:00 +0000 (17:03 +0800)]
net: hns3: stop schedule reset service while unloading driver

When unloading driver, the reset task should not be scheduled
anymore. If disable IRQ before cancel ongoing reset task,
the IRQ may be re-enabled by the reset task.

This patch uses HCLGE_STATE_REMOVING/HCLGEVF_STATE_REMOVING
flag to indicate that the driver is unloading, and we should
stop new coming reset service to be scheduled, otherwise,
reset service will access some resource which has been freed
by unloading.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: add handshake with hardware while doing reset
Huazhong Tan [Tue, 28 May 2019 09:02:59 +0000 (17:02 +0800)]
net: hns3: add handshake with hardware while doing reset

When reset happens, the hardware reset should begin after the
driver has finished its preparatory work, otherwise it may cause
some hardware error.

Before Hardware's reset, it will wait for the driver to write
bit HCLGE_NIC_CMQ_ENABLE of register HCLGE_NIC_CSQ_DEPTH_REG
to 1, while the driver finishes its preparatory work will do that.
BTW, since some cases this register will be cleared, so it needs
some sync time before driver's writing.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: modify hclgevf_init_client_instance()
Huazhong Tan [Tue, 28 May 2019 09:02:58 +0000 (17:02 +0800)]
net: hns3: modify hclgevf_init_client_instance()

hclgevf_init_client_instance() is a little bloated and there is
some duplicated code. This patch adds some cleanup for it.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: modify hclge_init_client_instance()
Huazhong Tan [Tue, 28 May 2019 09:02:57 +0000 (17:02 +0800)]
net: hns3: modify hclge_init_client_instance()

hclge_init_client_instance() is a little bloated and there is
some duplicated code. This patch adds some cleanup for it.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: use HCLGEVF_STATE_NIC_REGISTERED to indicate VF NIC client has registered
Huazhong Tan [Tue, 28 May 2019 09:02:56 +0000 (17:02 +0800)]
net: hns3: use HCLGEVF_STATE_NIC_REGISTERED to indicate VF NIC client has registered

When VF NIC client's init_instance() succeeds, it means this client
has been registered successfully, so we use HCLGEVF_STATE_NIC_REGISTERED
to indicate that. And before calling VF NIC client's uninit_instance(),
we clear this state.

So any operation of VF NIC client from HCLGEVF is not allowed if this
state is not set.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: use HCLGE_STATE_ROCE_REGISTERED to indicate PF ROCE client has registered
Huazhong Tan [Tue, 28 May 2019 09:02:55 +0000 (17:02 +0800)]
net: hns3: use HCLGE_STATE_ROCE_REGISTERED to indicate PF ROCE client has registered

When PF ROCE client's init_instance() succeeds, it means this client
has been registered successfully, so we use HCLGE_STATE_ROCE_REGISTERED
to indicate that. And before calling PF ROCE client's uninit_instance(),
we clear this state.

So any operation of the ROCE client from HCLGE is not allowed if this
state is not set.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: use HCLGE_STATE_NIC_REGISTERED to indicate PF NIC client has registered
Huazhong Tan [Tue, 28 May 2019 09:02:54 +0000 (17:02 +0800)]
net: hns3: use HCLGE_STATE_NIC_REGISTERED to indicate PF NIC client has registered

When PF NIC client's init_instance() succeeds, it means this client
has been registered successfully, so we use HCLGE_STATE_NIC_REGISTERED
to indicate that. And before calling PF NIC client's uninit_instance(),
we clear this state.

So any operation of PF NIC client from HCLGE is not allowed if this
state is not set.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: add support for dump firmware statistics by debugfs
Zhongzhu Liu [Tue, 28 May 2019 09:02:53 +0000 (17:02 +0800)]
net: hns3: add support for dump firmware statistics by debugfs

This patch prints firmware statistics information.

debugfs command:
echo dump m7 info > cmd

estuary:/dbg/hns3/0000:7d:00.0$ echo dump m7 info > cmd
[  172.577240] hns3 0000:7d:00.0: 0x00000000  0x00000000  0x00000000
[  172.583471] hns3 0000:7d:00.0: 0x00000000  0x00000000  0x00000000
[  172.589552] hns3 0000:7d:00.0: 0x00000030  0x00000000  0x00000000
[  172.595632] hns3 0000:7d:00.0: 0x00000000  0x00000000  0x00000000
estuary:/dbg/hns3/0000:7d:00.0$

Signed-off-by: Zhongzhu Liu <liuzhongzhu@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix for HNS3_RXD_GRO_SIZE_M macro
Yunsheng Lin [Tue, 28 May 2019 09:02:52 +0000 (17:02 +0800)]
net: hns3: fix for HNS3_RXD_GRO_SIZE_M macro

According to hardware user menual, the GRO_SIZE is 14 bits width,
the HNS3_RXD_GRO_SIZE_M is 10 bits width now, which may cause
hardware GRO received packet error problem.

Fixes: a6d53b97a2e7 ("net: hns3: Adds GRO params to SKB for the stack")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix compile warning without CONFIG_RFS_ACCEL
Jian Shen [Tue, 28 May 2019 09:02:51 +0000 (17:02 +0800)]
net: hns3: fix compile warning without CONFIG_RFS_ACCEL

The ifdef condition of function hclge_add_fd_entry_by_arfs() is
unnecessary. It may cause compile warning when CONFIG_RFS_ACCEL
is not chosen. This patch fixes it by removing the ifdef condition.

Fixes: d93ed94fbeaf ("net: hns3: add aRFS support for PF")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agohinic: fix a bug in set rx mode
Xue Chaojing [Mon, 27 May 2019 22:10:05 +0000 (22:10 +0000)]
hinic: fix a bug in set rx mode

in set_rx_mode, __dev_mc_sync and netdev_for_each_mc_addr will
repeatedly set the multicast mac address. so we delete this loop.

Signed-off-by: Xue Chaojing <xuechaojing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'inet-frags-followup'
David S. Miller [Wed, 29 May 2019 00:22:15 +0000 (17:22 -0700)]
Merge branch 'inet-frags-followup'

Eric Dumazet says:

====================
inet: frags: followup to 'inet-frags-avoid-possible-races-at-netns-dismantle'

Latest patch series ('inet-frags-avoid-possible-races-at-netns-dismantle')
brought another syzbot report shown in the third patch changelog.

While fixing the issue, I had to call inet_frags_fini() later
in IPv6 and ilowpan.

Also I believe a completion is needed to ensure proper dismantle
at module removal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoinet: frags: fix use-after-free read in inet_frag_destroy_rcu
Eric Dumazet [Mon, 27 May 2019 23:56:49 +0000 (16:56 -0700)]
inet: frags: fix use-after-free read in inet_frag_destroy_rcu

As caught by syzbot [1], the rcu grace period that is respected
before fqdir_rwork_fn() proceeds and frees fqdir is not enough
to prevent inet_frag_destroy_rcu() being run after the freeing.

We need a proper rcu_barrier() synchronization to replace
the one we had in inet_frags_fini()

We also have to fix a potential problem at module removal :
inet_frags_fini() needs to make sure that all queued work queues
(fqdir_rwork_fn) have completed, otherwise we might
call kmem_cache_destroy() too soon and get another use-after-free.

[1]
BUG: KASAN: use-after-free in inet_frag_destroy_rcu+0xd9/0xe0 net/ipv4/inet_fragment.c:201
Read of size 8 at addr ffff88806ed47a18 by task swapper/1/0

CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.2.0-rc1+ #2
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
 __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
 kasan_report+0x12/0x20 mm/kasan/common.c:614
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
 inet_frag_destroy_rcu+0xd9/0xe0 net/ipv4/inet_fragment.c:201
 __rcu_reclaim kernel/rcu/rcu.h:222 [inline]
 rcu_do_batch kernel/rcu/tree.c:2092 [inline]
 invoke_rcu_callbacks kernel/rcu/tree.c:2310 [inline]
 rcu_core+0xba5/0x1500 kernel/rcu/tree.c:2291
 __do_softirq+0x25c/0x94c kernel/softirq.c:293
 invoke_softirq kernel/softirq.c:374 [inline]
 irq_exit+0x180/0x1d0 kernel/softirq.c:414
 exiting_irq arch/x86/include/asm/apic.h:536 [inline]
 smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
 </IRQ>
RIP: 0010:native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:61
Code: ff ff 48 89 df e8 f2 95 8c fa eb 82 e9 07 00 00 00 0f 00 2d e4 45 4b 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d d4 45 4b 00 fb f4 <c3> 90 55 48 89 e5 41 57 41 56 41 55 41 54 53 e8 8e 18 42 fa e8 99
RSP: 0018:ffff8880a98e7d78 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
RAX: 1ffffffff1164e11 RBX: ffff8880a98d4340 RCX: 0000000000000000
RDX: dffffc0000000000 RSI: 0000000000000006 RDI: ffff8880a98d4bbc
RBP: ffff8880a98e7da8 R08: ffff8880a98d4340 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
R13: ffffffff88b27078 R14: 0000000000000001 R15: 0000000000000000
 arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:571
 default_idle_call+0x36/0x90 kernel/sched/idle.c:94
 cpuidle_idle_call kernel/sched/idle.c:154 [inline]
 do_idle+0x377/0x560 kernel/sched/idle.c:263
 cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:354
 start_secondary+0x34e/0x4c0 arch/x86/kernel/smpboot.c:267
 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243

Allocated by task 8877:
 save_stack+0x23/0x90 mm/kasan/common.c:71
 set_track mm/kasan/common.c:79 [inline]
 __kasan_kmalloc mm/kasan/common.c:489 [inline]
 __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:503
 kmem_cache_alloc_trace+0x151/0x750 mm/slab.c:3555
 kmalloc include/linux/slab.h:547 [inline]
 kzalloc include/linux/slab.h:742 [inline]
 fqdir_init include/net/inet_frag.h:115 [inline]
 ipv6_frags_init_net+0x48/0x460 net/ipv6/reassembly.c:513
 ops_init+0xb3/0x410 net/core/net_namespace.c:130
 setup_net+0x2d3/0x740 net/core/net_namespace.c:316
 copy_net_ns+0x1df/0x340 net/core/net_namespace.c:439
 create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
 unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
 ksys_unshare+0x440/0x980 kernel/fork.c:2692
 __do_sys_unshare kernel/fork.c:2760 [inline]
 __se_sys_unshare kernel/fork.c:2758 [inline]
 __x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 17:
 save_stack+0x23/0x90 mm/kasan/common.c:71
 set_track mm/kasan/common.c:79 [inline]
 __kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
 kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
 __cache_free mm/slab.c:3432 [inline]
 kfree+0xcf/0x220 mm/slab.c:3755
 fqdir_rwork_fn+0x33/0x40 net/ipv4/inet_fragment.c:154
 process_one_work+0x989/0x1790 kernel/workqueue.c:2269
 worker_thread+0x98/0xe40 kernel/workqueue.c:2415
 kthread+0x354/0x420 kernel/kthread.c:255
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88806ed47a00
 which belongs to the cache kmalloc-512 of size 512
The buggy address is located 24 bytes inside of
 512-byte region [ffff88806ed47a00ffff88806ed47c00)
The buggy address belongs to the page:
page:ffffea0001bb51c0 refcount:1 mapcount:0 mapping:ffff8880aa400940 index:0x0
flags: 0x1fffc0000000200(slab)
raw: 01fffc0000000200 ffffea000282a788 ffffea0001bb53c8 ffff8880aa400940
raw: 0000000000000000 ffff88806ed47000 0000000100000006 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff88806ed47900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88806ed47980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88806ed47a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                            ^
 ffff88806ed47a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88806ed47b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoinet: frags: call inet_frags_fini() after unregister_pernet_subsys()
Eric Dumazet [Mon, 27 May 2019 23:56:48 +0000 (16:56 -0700)]
inet: frags: call inet_frags_fini() after unregister_pernet_subsys()

Both IPv6 and 6lowpan are calling inet_frags_fini() too soon.

inet_frags_fini() is dismantling a kmem_cache, that might be needed
later when unregister_pernet_subsys() eventually has to remove
frags queues from hash tables and free them.

This fixes potential use-after-free, and is a prereq for the following patch.

Fixes: d4ad4d22e7ac ("inet: frags: use kmem_cache for inet_frag_queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoinet: frags: uninline fqdir_init()
Eric Dumazet [Mon, 27 May 2019 23:56:47 +0000 (16:56 -0700)]
inet: frags: uninline fqdir_init()

fqdir_init() is not fast path and is getting bigger.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests/net: ipv6 flowlabel
Willem de Bruijn [Mon, 27 May 2019 20:47:51 +0000 (16:47 -0400)]
selftests/net: ipv6 flowlabel

Test the IPv6 flowlabel control and datapath interfaces:

Acquire and release the right to use flowlabels with socket option
IPV6_FLOWLABEL_MGR.

Then configure flowlabels on send and read them on recv with cmsg
IPV6_FLOWINFO. Also verify auto-flowlabel if not explicitly set.

This helped identify the issue fixed in commit 95c169251bf73 ("ipv6:
invert flowlabel sharing check in process and user mode")

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoenetc: Enable TC offloading with mqprio
Camelia Groza [Mon, 27 May 2019 15:21:31 +0000 (18:21 +0300)]
enetc: Enable TC offloading with mqprio

Add support to configure multiple prioritized TX traffic
classes with mqprio.

Configure one BD ring per TC for the moment, one netdev
queue per TC.

Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'stmmac-SPDX'
David S. Miller [Wed, 29 May 2019 00:09:15 +0000 (17:09 -0700)]
Merge branch 'stmmac-SPDX'

Neil Armstrong says:

====================
net: stmmac: dwmac-meson: update with SPDX Licence identifier

Update the SPDX Licence identifier for the Amlogic Meson6 and Meson8 dwmac
glue drivers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: stmmac: dwmac-meson8b: update with SPDX Licence identifier
Neil Armstrong [Mon, 27 May 2019 13:46:23 +0000 (15:46 +0200)]
net: stmmac: dwmac-meson8b: update with SPDX Licence identifier

Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: stmmac: dwmac-meson: update with SPDX Licence identifier
Neil Armstrong [Mon, 27 May 2019 13:46:22 +0000 (15:46 +0200)]
net: stmmac: dwmac-meson: update with SPDX Licence identifier

Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoigc: Cleanup the redundant code
Sasha Neftin [Sun, 21 Apr 2019 08:17:23 +0000 (11:17 +0300)]
igc: Cleanup the redundant code

The default flow control settings for the i225 device is both
'rx' and 'tx' pause frames. There is no depend on the NVM value.
This patch comes to fix this and clean up the driver code.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoigc: Add flow control support
Sasha Neftin [Thu, 18 Apr 2019 07:11:08 +0000 (10:11 +0300)]
igc: Add flow control support

This change adds flow control settings. This is required to
enable the legacy flow control support.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoe1000e: start network tx queue only when link is up
Konstantin Khlebnikov [Wed, 17 Apr 2019 08:13:20 +0000 (11:13 +0300)]
e1000e: start network tx queue only when link is up

Driver does not want to keep packets in Tx queue when link is lost.
But present code only reset NIC to flush them, but does not prevent
queuing new packets. Moreover reset sequence itself could generate
new packets via netconsole and NIC falls into endless reset loop.

This patch wakes Tx queue only when NIC is ready to send packets.

This is proper fix for problem addressed by commit 0f9e980bf5ee
("e1000e: fix cyclic resets at link up with active tx").

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Tested-by: Joseph Yasi <joe.yasi@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Tested-by: Oleksandr Natalenko <oleksandr@redhat.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoRevert "e1000e: fix cyclic resets at link up with active tx"
Konstantin Khlebnikov [Wed, 17 Apr 2019 08:13:16 +0000 (11:13 +0300)]
Revert "e1000e: fix cyclic resets at link up with active tx"

This reverts commit 0f9e980bf5ee1a97e2e401c846b2af989eb21c61.

That change cased false-positive warning about hardware hang:

e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang:
   TDH                  <0>
   TDT                  <1>
   next_to_use          <1>
   next_to_clean        <0>
buffer_info[next_to_clean]:
   time_stamp           <fffba7a7>
   next_to_watch        <0>
   jiffies              <fffbb140>
   next_to_watch.status <0>
MAC Status             <40080080>
PHY Status             <7949>
PHY 1000BASE-T Status  <0>
PHY Extended Status    <3000>
PCI Status             <10>
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx

Besides warning everything works fine.
Original issue will be fixed property in following patch.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reported-by: Joseph Yasi <joe.yasi@gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203175
Tested-by: Joseph Yasi <joe.yasi@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Tested-by: Oleksandr Natalenko <oleksandr@redhat.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoigc: Remove the obsolete workaround
Sasha Neftin [Mon, 15 Apr 2019 11:10:35 +0000 (14:10 +0300)]
igc: Remove the obsolete workaround

Enables a resend request after the completion timeout workaround is not
relevant for i225 device. This patch is clean code relevant this
workaround.
Minor cosmetic fixes, replace the 'spaces' with 'tabs'

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoigc: Clean up unused pointers
Sasha Neftin [Thu, 4 Apr 2019 10:26:53 +0000 (13:26 +0300)]
igc: Clean up unused pointers

Few function pointers from phy_operations structure were unused.
This patch cleans those.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoigc: Fix double definitions
Sasha Neftin [Wed, 3 Apr 2019 13:58:04 +0000 (16:58 +0300)]
igc: Fix double definitions

Collision threshold and threshold's shift has been defined twice.
This patch comes to fix that.

Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoigb: mark expected switch fall-through
Gustavo A. R. Silva [Fri, 29 Mar 2019 23:38:46 +0000 (16:38 -0700)]
igb: mark expected switch fall-through

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warning:

drivers/net/ethernet/intel/igb/e1000_82575.c: In function ā€˜igb_get_invariants_82575ā€™:
drivers/net/ethernet/intel/igb/e1000_82575.c:636:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
   if (igb_sgmii_uses_mdio_82575(hw)) {
      ^
drivers/net/ethernet/intel/igb/e1000_82575.c:642:2: note: here
  case E1000_CTRL_EXT_LINK_MODE_PCIE_SERDES:
  ^~~~

Warning level 3 was used: -Wimplicit-fallthrough=3

Notice that, in this particular case, the code comment is modified
in accordance with what GCC is expecting to find.

This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.

Signed-off-by: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoigb: mark expected switch fall-through
Gustavo A. R. Silva [Fri, 29 Mar 2019 23:38:45 +0000 (16:38 -0700)]
igb: mark expected switch fall-through

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warning:

drivers/net/ethernet/intel/igb/igb_main.c: In function ā€˜__igb_notify_dcaā€™:
drivers/net/ethernet/intel/igb/igb_main.c:6694:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
   if (dca_add_requester(dev) == 0) {
      ^
drivers/net/ethernet/intel/igb/igb_main.c:6701:2: note: here
  case DCA_PROVIDER_REMOVE:
  ^~~~

Warning level 3 was used: -Wimplicit-fallthrough=3

Notice that, in this particular case, the code comment is modified
in accordance with what GCC is expecting to find.

This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.

Signed-off-by: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoigb/igc: warn when fatal read failure happens
Feng Tang [Wed, 13 Feb 2019 02:41:54 +0000 (10:41 +0800)]
igb/igc: warn when fatal read failure happens

Failed in read the HW register is very serious for igb/igc driver,
as its hw_addr will be set to NULL and cause the adapter be seen as
"REMOVED".

We saw the error only a few times in the MTBF test for suspend/resume,
but can hardly get any useful info to debug.

Adding WARN() so that we can get the necessary information about
where and how it happens, and use it for root causing and fixing
this "PCIe link lost issue"

This affects igb, igc.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Acked-by: Sasha Neftin <sasha.neftin@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agofsl/fman: include IPSEC SPI in the Keygen extraction
Madalin Bucur [Mon, 27 May 2019 12:32:12 +0000 (15:32 +0300)]
fsl/fman: include IPSEC SPI in the Keygen extraction

The keygen extracted fields are used as input for the hash that
determines the incoming frames distribution. Adding IPSEC SPI so
different IPSEC flows can be distributed to different CPUs.

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: mvpp2: cls: Check RSS table index validity when creating a context
Maxime Chevallier [Mon, 27 May 2019 11:52:01 +0000 (13:52 +0200)]
net: mvpp2: cls: Check RSS table index validity when creating a context

Make sure we don't use an out-of-bound index for the per-port RSS
context array.

As of today, the global context creation in mvpp22_rss_context_create
will prevent us from reaching this case, but we should still make sure
we are using a sane value anyway.

Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoenetc: fix le32/le16 degrading to integer warnings
Y.b. Lu [Mon, 27 May 2019 03:55:20 +0000 (03:55 +0000)]
enetc: fix le32/le16 degrading to integer warnings

Fix blow sparse warning introduced by a previous patch.
- restricted __le32 degrades to integer
- restricted __le16 degrades to integer

Fixes: d39823121911 ("enetc: add hardware timestamping support")
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: remove support for RTL_GIGA_MAC_VER_01
Heiner Kallweit [Sat, 25 May 2019 19:14:39 +0000 (21:14 +0200)]
r8169: remove support for RTL_GIGA_MAC_VER_01

RTL_GIGA_MAC_VER_01 is RTL8169, the ancestor of the chip family.
It didn't have an internal PHY and I've never seen it in the wild.
What isn't there doesn't need to be maintained, so let's remove
support for it.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: improve RTL8168d PHY initialization
Heiner Kallweit [Sat, 25 May 2019 18:57:42 +0000 (20:57 +0200)]
r8169: improve RTL8168d PHY initialization

Certain parts of the PHY initialization are the same for sub versions
1 and 2 of RTL8168d. So let's factor this out to simplify the code.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'r8169-small-improvements'
David S. Miller [Mon, 27 May 2019 05:19:39 +0000 (22:19 -0700)]
Merge branch 'r8169-small-improvements'

Heiner Kallweit says:

====================
r8169: small improvements

Series with small improvements.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: change type of member mac_version in rtl8169_private
Heiner Kallweit [Sat, 25 May 2019 18:45:04 +0000 (20:45 +0200)]
r8169: change type of member mac_version in rtl8169_private

Use the appropriate enum type for member mac_version. And don't assign
a fixed value to RTL_GIGA_MAC_NONE, there's no benefit in it.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: remove unneeded return statement in rtl_hw_init_8168g
Heiner Kallweit [Sat, 25 May 2019 18:44:01 +0000 (20:44 +0200)]
r8169: remove unneeded return statement in rtl_hw_init_8168g

Remove not needed return statement.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agor8169: remove rtl_hw_init_8168ep
Heiner Kallweit [Sat, 25 May 2019 18:43:25 +0000 (20:43 +0200)]
r8169: remove rtl_hw_init_8168ep

rtl_hw_init_8168ep() can be removed, this simplifies the code.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agocxgb4: Make t4_get_tp_e2c_map static
YueHaibing [Sat, 25 May 2019 12:45:10 +0000 (20:45 +0800)]
cxgb4: Make t4_get_tp_e2c_map static

Fix sparse warning:

drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:6216:14:
 warning: symbol 't4_get_tp_e2c_map' was not declared. Should it be static?

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftest: Fixes for icmp_redirect test
David Ahern [Fri, 24 May 2019 23:37:07 +0000 (16:37 -0700)]
selftest: Fixes for icmp_redirect test

I was really surprised that the IPv6 mtu exception followed by redirect
test was passing as nothing about the code suggests it should. The problem
is actually with the logic in the test script.

Fix the test cases as follows:
1. add debug function to dump the initial and redirect gateway addresses
   for ipv6. This is shown only in verbose mode. It helps verify the
   output of 'route get'.

2. fix the check_exception logic for the reset case to make sure that
   for IPv4 neither mtu nor redirect appears in the 'route get' output.
   For IPv6, make sure mtu is not present and the gateway is the initial
   R1 lladdr.

3. fix the reset logic by using a function to delete the routes added by
   initial_route_*. This format works better for the nexthop version of
   the tests.

While improving the test cases, go ahead and ensure that forwarding is
disabled since IPv6 redirect requires it.

Also, runs with kernel debugging enabled sometimes show a failure with
one of the ipv4 tests, so spread the pings over longer time interval.

The end result is that 2 tests now show failures:

TEST: IPv6: mtu exception plus redirect                    [FAIL]

and the VRF version.

This is a bug in the IPv6 logic that will need to be fixed
separately. Redirect followed by MTU works because __ip6_rt_update_pmtu
hits the 'if (!rt6_cache_allowed_for_pmtu(rt6))' path and updates the
mtu on the exception rt6_info.

MTU followed by redirect does not have this logic. rt6_do_redirect
creates a new exception and then rt6_insert_exception removes the old
one which has the MTU exception.

Fixes: ec8105352869 ("selftests: Add redirect tests")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv4: remove redundant assignment to n
Colin Ian King [Fri, 24 May 2019 21:56:58 +0000 (22:56 +0100)]
ipv4: remove redundant assignment to n

The pointer n is being assigned a value however this value is
never read in the code block and the end of the code block
continues to the next loop iteration. Clean up the code by
removing the redundant assignment.

Fixes: 1bff1a0c9bbda ("ipv4: Add function to send route updates")
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: bcm87xx: improve bcm87xx_config_init and feature detection
Heiner Kallweit [Fri, 24 May 2019 20:24:19 +0000 (22:24 +0200)]
net: phy: bcm87xx: improve bcm87xx_config_init and feature detection

PHY drivers don't have to and shouldn't fiddle with phylib internals.
Most of the code in bcm87xx_config_init() can be removed because
phylib takes care.

In addition I replaced usage of PHY_10GBIT_FEC_FEATURES with an
implementation of the get_features callback. PHY_10GBIT_FEC_FEATURES
is used by this driver only and it's questionable whether there
will be any other PHY supporting this mode only. Having said that
in one of the next kernel versions we may decide to remove it.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'inet-frags-avoid-possible-races-at-netns-dismantle'
David S. Miller [Sun, 26 May 2019 21:08:05 +0000 (14:08 -0700)]
Merge branch 'inet-frags-avoid-possible-races-at-netns-dismantle'

Eric Dumazet says:

====================
inet: frags: avoid possible races at netns dismantle

This patch series fixes a race happening on netns dismantle with
frag queues. While rhashtable_free_and_destroy() is running,
concurrent timers might run inet_frag_kill() and attempt
rhashtable_remove_fast() calls. This is not allowed by
rhashtable logic.

Since I do not want to add expensive synchronize_rcu() calls
in the netns dismantle path, I had to no longer inline
netns_frags structures, but dynamically allocate them.

The ten first patches make this preparation, so that
the last patch clearly shows the fix.

As this patch series is not exactly trivial, I chose to
target 5.3. We will backport it once soaked a bit.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoinet: frags: rework rhashtable dismantle
Eric Dumazet [Fri, 24 May 2019 16:03:40 +0000 (09:03 -0700)]
inet: frags: rework rhashtable dismantle

syszbot found an interesting use-after-free [1] happening
while IPv4 fragment rhashtable was destroyed at netns dismantle.

While no insertions can possibly happen at the time a dismantling
netns is destroying this rhashtable, timers can still fire and
attempt to remove elements from this rhashtable.

This is forbidden, since rhashtable_free_and_destroy() has
no synchronization against concurrent inserts and deletes.

Add a new fqdir->dead flag so that timers do not attempt
a rhashtable_remove_fast() operation.

We also have to respect an RCU grace period before starting
the rhashtable_free_and_destroy() from process context,
thus we use rcu_work infrastructure.

This is a refinement of a prior rough attempt to fix this bug :
https://marc.info/?l=linux-netdev&m=153845936820900&w=2

Since the rhashtable cleanup is now deferred to a work queue,
netns dismantles should be slightly faster.

[1]
BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:194 [inline]
BUG: KASAN: use-after-free in rhashtable_last_table+0x162/0x180 lib/rhashtable.c:212
Read of size 8 at addr ffff8880a6497b70 by task kworker/0:0/5

CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.2.0-rc1+ #2
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events rht_deferred_worker
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
 __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
 kasan_report+0x12/0x20 mm/kasan/common.c:614
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
 __read_once_size include/linux/compiler.h:194 [inline]
 rhashtable_last_table+0x162/0x180 lib/rhashtable.c:212
 rht_deferred_worker+0x111/0x2030 lib/rhashtable.c:411
 process_one_work+0x989/0x1790 kernel/workqueue.c:2269
 worker_thread+0x98/0xe40 kernel/workqueue.c:2415
 kthread+0x354/0x420 kernel/kthread.c:255
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

Allocated by task 32687:
 save_stack+0x23/0x90 mm/kasan/common.c:71
 set_track mm/kasan/common.c:79 [inline]
 __kasan_kmalloc mm/kasan/common.c:489 [inline]
 __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:503
 __do_kmalloc_node mm/slab.c:3620 [inline]
 __kmalloc_node+0x4e/0x70 mm/slab.c:3627
 kmalloc_node include/linux/slab.h:590 [inline]
 kvmalloc_node+0x68/0x100 mm/util.c:431
 kvmalloc include/linux/mm.h:637 [inline]
 kvzalloc include/linux/mm.h:645 [inline]
 bucket_table_alloc+0x90/0x480 lib/rhashtable.c:178
 rhashtable_init+0x3f4/0x7b0 lib/rhashtable.c:1057
 inet_frags_init_net include/net/inet_frag.h:109 [inline]
 ipv4_frags_init_net+0x182/0x410 net/ipv4/ip_fragment.c:683
 ops_init+0xb3/0x410 net/core/net_namespace.c:130
 setup_net+0x2d3/0x740 net/core/net_namespace.c:316
 copy_net_ns+0x1df/0x340 net/core/net_namespace.c:439
 create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
 unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
 ksys_unshare+0x440/0x980 kernel/fork.c:2692
 __do_sys_unshare kernel/fork.c:2760 [inline]
 __se_sys_unshare kernel/fork.c:2758 [inline]
 __x64_sys_unshare+0x31/0x40 kernel/fork.c:2758
 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 7:
 save_stack+0x23/0x90 mm/kasan/common.c:71
 set_track mm/kasan/common.c:79 [inline]
 __kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
 kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
 __cache_free mm/slab.c:3432 [inline]
 kfree+0xcf/0x220 mm/slab.c:3755
 kvfree+0x61/0x70 mm/util.c:460
 bucket_table_free+0x69/0x150 lib/rhashtable.c:108
 rhashtable_free_and_destroy+0x165/0x8b0 lib/rhashtable.c:1155
 inet_frags_exit_net+0x3d/0x50 net/ipv4/inet_fragment.c:152
 ipv4_frags_exit_net+0x73/0x90 net/ipv4/ip_fragment.c:695
 ops_exit_list.isra.0+0xaa/0x150 net/core/net_namespace.c:154
 cleanup_net+0x3fb/0x960 net/core/net_namespace.c:553
 process_one_work+0x989/0x1790 kernel/workqueue.c:2269
 worker_thread+0x98/0xe40 kernel/workqueue.c:2415
 kthread+0x354/0x420 kernel/kthread.c:255
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff8880a6497b40
 which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 48 bytes inside of
 1024-byte region [ffff8880a6497b40ffff8880a6497f40)
The buggy address belongs to the page:
page:ffffea0002992580 refcount:1 mapcount:0 mapping:ffff8880aa400ac0 index:0xffff8880a64964c0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea0002916e88 ffffea000218fe08 ffff8880aa400ac0
raw: ffff8880a64964c0 ffff8880a6496040 0000000100000005 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8880a6497a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8880a6497a80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>ffff8880a6497b00: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
                                                             ^
 ffff8880a6497b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8880a6497c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 648700f76b03 ("inet: frags: use rhashtables for reassembly units")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dynamically allocate fqdir structures
Eric Dumazet [Fri, 24 May 2019 16:03:39 +0000 (09:03 -0700)]
net: dynamically allocate fqdir structures

Following patch will add rcu grace period before fqdir
rhashtable destruction, so we need to dynamically allocate
fqdir structures to not force expensive synchronize_rcu() calls
in netns dismantle path.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: add a net pointer to struct fqdir
Eric Dumazet [Fri, 24 May 2019 16:03:38 +0000 (09:03 -0700)]
net: add a net pointer to struct fqdir

fqdir will soon be dynamically allocated.

We need to reach the struct net pointer from fqdir,
so add it, and replace the various container_of() constructs
by direct access to the new field.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: rename inet_frags_init_net() to fdir_init()
Eric Dumazet [Fri, 24 May 2019 16:03:37 +0000 (09:03 -0700)]
net: rename inet_frags_init_net() to fdir_init()

And pass an extra parameter, since we will soon
dynamically allocate fqdir structures.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoieee820154: 6lowpan: no longer reference init_net in lowpan_frags_ns_ctl_table
Eric Dumazet [Fri, 24 May 2019 16:03:36 +0000 (09:03 -0700)]
ieee820154: 6lowpan: no longer reference init_net in lowpan_frags_ns_ctl_table

(struct net *)->ieee802154_lowpan.fqdir will soon be a pointer, so make
sure lowpan_frags_ns_ctl_table[] does not reference init_net.

lowpan_frags_ns_sysctl_register() can perform the needed initialization
for all netns.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonetfilter: ipv6: nf_defrag: no longer reference init_net in nf_ct_frag6_sysctl_table
Eric Dumazet [Fri, 24 May 2019 16:03:35 +0000 (09:03 -0700)]
netfilter: ipv6: nf_defrag: no longer reference init_net in nf_ct_frag6_sysctl_table

(struct net *)->nf_frag.fqdir will soon be a pointer, so make
sure nf_ct_frag6_sysctl_table[] does not reference init_net.

nf_ct_frag6_sysctl_register() can perform the needed initialization
for all netns.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv6: no longer reference init_net in ip6_frags_ns_ctl_table[]
Eric Dumazet [Fri, 24 May 2019 16:03:34 +0000 (09:03 -0700)]
ipv6: no longer reference init_net in ip6_frags_ns_ctl_table[]

(struct net *)->ipv6.fqdir will soon be a pointer, so make
sure ip6_frags_ns_ctl_table[] does not reference init_net.

ip6_frags_ns_ctl_register() can perform the needed initialization
for all netns.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoipv4: no longer reference init_net in ip4_frags_ns_ctl_table[]
Eric Dumazet [Fri, 24 May 2019 16:03:33 +0000 (09:03 -0700)]
ipv4: no longer reference init_net in ip4_frags_ns_ctl_table[]

(struct net *)->ipv4.fqdir will soon be a pointer, so make
sure ip4_frags_ns_ctl_table[] does not reference init_net.

ip4_frags_ns_ctl_register() can perform the needed initialization
for all netns.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: rename struct fqdir fields
Eric Dumazet [Fri, 24 May 2019 16:03:32 +0000 (09:03 -0700)]
net: rename struct fqdir fields

Rename the @frags fields from structs netns_ipv4, netns_ipv6,
netns_nf_frag and netns_ieee802154_lowpan to @fqdir

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: rename inet_frags_exit_net() to fqdir_exit()
Eric Dumazet [Fri, 24 May 2019 16:03:31 +0000 (09:03 -0700)]
net: rename inet_frags_exit_net() to fqdir_exit()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoinet: rename netns_frags to fqdir
Eric Dumazet [Fri, 24 May 2019 16:03:30 +0000 (09:03 -0700)]
inet: rename netns_frags to fqdir

1) struct netns_frags is renamed to struct fqdir
  This structure is really holding many frag queues in a hash table.

2) (struct inet_frag_queue)->net field is renamed to fqdir
  since net is generally associated to a 'struct net' pointer
  in networking stack.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: tja11xx: Add TJA11xx PHY driver
Marek Vasut [Fri, 24 May 2019 14:22:28 +0000 (16:22 +0200)]
net: phy: tja11xx: Add TJA11xx PHY driver

Add driver for the NXP TJA1100 and TJA1101 PHYs. These PHYs are special
BroadRReach 100BaseT1 PHYs used in automotive.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: linux-hwmon@vger.kernel.org
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-hns3-add-aRFS-feature-and-fix-FEC-bugs-for-HNS3-driver'
David S. Miller [Sun, 26 May 2019 20:24:56 +0000 (13:24 -0700)]
Merge branch 'net-hns3-add-aRFS-feature-and-fix-FEC-bugs-for-HNS3-driver'

Huazhong Tan says:

====================
net: hns3: add aRFS feature and fix FEC bugs for HNS3 driver

This patchset adds some new features support and fixes some bugs:
[Patch 1/4 - 3/4] adds support for aRFS
[Patch 4/4] fix FEC configuration issue
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix for FEC configuration
Jian Shen [Fri, 24 May 2019 11:19:48 +0000 (19:19 +0800)]
net: hns3: fix for FEC configuration

The FEC capbility may be changed with port speed changes. Driver
needs to read the active FEC mode, and update FEC capability
when port speed changes.

Fixes: 7e6ec9148a1d ("net: hns3: add support for FEC encoding control")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: add aRFS support for PF
Jian Shen [Fri, 24 May 2019 11:19:47 +0000 (19:19 +0800)]
net: hns3: add aRFS support for PF

This patch adds aRFS support for PF. The aRFS rules are also
stored in the hardware flow director table, Use the existing
filter management functions to insert TCPv4/UDPv4/TCPv6/UDPv6
flow director filters. To avoid rule conflict, once user adds
flow director rules with ethtool, the aRFS will be disabled,
and clear exist aRFS rules. Once all user configure rules were
removed, aRFS can work again.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: refine the flow director handle
Jian Shen [Fri, 24 May 2019 11:19:46 +0000 (19:19 +0800)]
net: hns3: refine the flow director handle

In order to be compatible with aRFS rules, this patch adds
spin_lock for flow director rule adding, deleting, querying,
and packages the rule configuration.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: initialize CPU reverse mapping
Jian Shen [Fri, 24 May 2019 11:19:45 +0000 (19:19 +0800)]
net: hns3: initialize CPU reverse mapping

Allocate CPU rmap and add entry for each irq. CPU rmap is
used in aRFS to get the queue number of the rx completion
interrupts.

In additional, remove the calling of
irq_set_affinity_notifier() in hns3_nic_init_irq(), because
we have registered notifier in irq_cpu_rmap_add() for each
vector, otherwise it may cause use-after-free issue.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'ath79-add-ag71xx-support'
David S. Miller [Sun, 26 May 2019 20:22:50 +0000 (13:22 -0700)]
Merge branch 'ath79-add-ag71xx-support'

Oleksij Rempel says:

====================
MIPS: ath79: add ag71xx support

2019.05.24 v6:
- ag71xx: remove double union
- ag71xx: reverse Christmas tree for all functions
- ag71xx: add Reviewed-by: Andrew Lunn <andrew@lunn.ch>

2019.05.20 v5:
- ag71xx: remove MII_CMD_WRITE, the name is confusing. It is
  actually disables MII_CMD_READ.
- ag71xx: rework ag71xx_mdio_mii_read/write
- ag71xx: set proper mask for the addr in ag71xx_mdio_mii_read/write
- Kconfig: remove MDIO_BITBANG
- ag71xx: ./scripts/checkpatch.pl it.

2019.05.19 v4:
- DT: define eth and mdio clocks
- ag71xx: remove module parameters
- ag71xx: return proper error value on mdio_read/write
- ag71xx: use proper mdio clock registration
- ag71xx: add ag71xx_dma_wait_stop() for ag71xx_dma_reset()
- ag71xx: remove ag71xx_speed_str()
- ag71xx: use phydev->link/sped/duplex instead of ag-> variants
- ag71xx: use WARN() instead of BUG()
- ag71xx: drop big part of ag71xx_phy_link_adjust()
- ag71xx: drop most of ag71xx_do_ioctl()
- ag71xx: register eth clock
- ag71xx: remove AG71XX_ETH0_NO_MDIO quirk.

2019.04.22 v3:
- ag71xx: use phy_modes() instead of ag71xx_get_phy_if_mode_name()
- ag71xx: remove .ndo_poll_controller support
- ag71xx: unregister_netdev before disconnecting phy.

2019.04.18 v2:
- ag71xx: add list of openwrt authors
- ag71xx: remove redundant PHY_POLL assignment
- ag71xx: use phy_attached_info instead of netif_info
- ag71xx: remove redundant netif_carrier_off() on .stop.
- DT: use "ethernet" instead of "eth"

This patch series provide ethernet support for many Atheros/QCA
MIPS based SoCs.

I reworked ag71xx driver which was previously maintained within OpenWRT
repository. So far, following changes was made to make upstreaming
easier:
- everything what can be some how used in user space was removed. Most
  of it was debug functionality.
- most of deficetree bindings was removed. Not every thing made sense
  and most of it is SoC specific, so it is possible to detect it by
  compatible.
- mac and mdio parts are merged in to one driver. It makes easier to
  maintaine SoC specific quirks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: add ag71xx driver
Oleksij Rempel [Fri, 24 May 2019 11:12:24 +0000 (13:12 +0200)]
net: ethernet: add ag71xx driver

Add support for Atheros/QCA AR7XXX/AR9XXX/QCA95XX built-in ethernet mac support

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMIPS: ath79: ar9331: add Ethernet nodes
Oleksij Rempel [Fri, 24 May 2019 11:12:23 +0000 (13:12 +0200)]
MIPS: ath79: ar9331: add Ethernet nodes

Add ethernet nodes supported by ag71xx driver.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodt-bindings: net: add qca,ar71xx.txt documentation
Oleksij Rempel [Fri, 24 May 2019 11:12:22 +0000 (13:12 +0200)]
dt-bindings: net: add qca,ar71xx.txt documentation

Add binding documentation for Atheros/QCA networking IP core used
in many routers.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'qed-Improve-performance-on-100G-link-for-offload-protocols'
David S. Miller [Sun, 26 May 2019 20:04:12 +0000 (13:04 -0700)]
Merge branch 'qed-Improve-performance-on-100G-link-for-offload-protocols'

Michal Kalderon says:

====================
qed*: Improve performance on 100G link for offload protocols

This patch series modifies the current implementation of PF selection.
The refactoring of the llh code enables setting additional filters
(mac / protocol) per PF, and improves performance for offload protocols
(RoCE, iWARP, iSCSI, fcoe) on 100G link (was capped at 90G per single
PF).

Improved performance on 100G link is achieved by configuring engine
affinty to each PF.
The engine affinity is read from the Management FW and hw is configured accordingly.
A new hw resource called PPFID is exposed and an API is introduced to utilize
it. This additional resource enables setting the affinity of a PF and providing
more classification rules per PF.
qedr,qedi,qedf are also modified as part of the series. Without the
changes functionality is broken.

v1 --> v2
---------
- Remove iWARP module parameter. Instead use devlink param infrastructure
  for setting the iwarp_cmt mode. Additional patch added to the series for
  adding the devlink support.

- Fix kbuild test robot warning on qed_llh_filter initialization.

- Remove comments inside function calls
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqedf: Use hwfns and affin_hwfn_idx to get MSI-X vector index to use
Chad Dupuis [Sun, 26 May 2019 12:22:30 +0000 (15:22 +0300)]
qedf: Use hwfns and affin_hwfn_idx to get MSI-X vector index to use

MSI-X vector index is determined using qed device information and
affinity to use.

Signed-off-by: Chad Dupuis <cdupuis@marvell.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqedi: Use hwfns and affin_hwfn_idx to get MSI-X vector index
Manish Rangankar [Sun, 26 May 2019 12:22:29 +0000 (15:22 +0300)]
qedi: Use hwfns and affin_hwfn_idx to get MSI-X vector index

MSI-X vector index is determined using qed device information and
affinity to use.

Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoRevert "scsi: qedi: Allocate IRQs based on msix_cnt"
Manish Rangankar [Sun, 26 May 2019 12:22:28 +0000 (15:22 +0300)]
Revert "scsi: qedi: Allocate IRQs based on msix_cnt"

 Always request for number of irqs equals to number of queues.

This reverts commit 1a291bce5eaf5374627d337157544aa6499ce34a.

Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed*: Add iWARP 100g support
Michal Kalderon [Sun, 26 May 2019 12:22:27 +0000 (15:22 +0300)]
qed*: Add iWARP 100g support

Add iWARP engine affinity setting for supporting iWARP over 100g.
iWARP cannot be distinguished by the LLH from L2, hence the
engine division will affect L2 as well. For this reason we add
a parameter to devlink to determine the engine division.

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: Add qed devlink parameters table
Michal Kalderon [Sun, 26 May 2019 12:22:26 +0000 (15:22 +0300)]
qed: Add qed devlink parameters table

The table currently contains a single parameter for
configuring whether iWARP should be enabled on a 100g
device. Enabling iWARP on a 100g device impacts L2
performance and is therefore not enabled by default.

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: Set the doorbell address correctly
Michal Kalderon [Sun, 26 May 2019 12:22:25 +0000 (15:22 +0300)]
qed: Set the doorbell address correctly

In 100g mode the doorbell bar is united for both engines. Set
the correct offset in the hwfn so that the doorbell returned
for RoCE is in the affined hwfn.

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Denis Bolotin <denis.bolotin@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqedr: Change the MSI-X vectors selection to be based on affined engine
Michal Kalderon [Sun, 26 May 2019 12:22:24 +0000 (15:22 +0300)]
qedr: Change the MSI-X vectors selection to be based on affined engine

Use the msix vectors of the affined hwfn and not the
leading one.

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: Modify offload protocols to use the affined engine
Michal Kalderon [Sun, 26 May 2019 12:22:23 +0000 (15:22 +0300)]
qed: Modify offload protocols to use the affined engine

To enable 100g support for offload protocols each PF gets
a dedicated engine to work on from the MFW.
This patch modifies the code to use the affined hwfn instead
of the leading one.
The offload protocols require the ll2 to be opened on both
engines, and not just the affined hwfn.

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed*: Change hwfn used for sb initialization
Michal Kalderon [Sun, 26 May 2019 12:22:22 +0000 (15:22 +0300)]
qed*: Change hwfn used for sb initialization

When initializing status blocks use the affined hwfn
instead of the leading one for RDMA / Storage

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: Add llh ppfid interface and 100g support for offload protocols
Michal Kalderon [Sun, 26 May 2019 12:22:21 +0000 (15:22 +0300)]
qed: Add llh ppfid interface and 100g support for offload protocols

This patch refactors the current llh implementation. It exposes a hw
resource called ppfid (port-pfid) and implements an API for configuring
the resource. Default configuration which was used until now limited
the number of filters per PF and did not support engine affinity per
protocol. The new API enables allocating more filter rules per PF and
enables affinitizing protocol packets to a certain engine which
enables full 100g protocol offload support.

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: Modify api for performing a dmae to another PF
Michal Kalderon [Sun, 26 May 2019 12:22:20 +0000 (15:22 +0300)]
qed: Modify api for performing a dmae to another PF

This patch modifies the dmae API to enable performing a dmae operation
to another PF. This enables sharing between the llh entries between PFs
and thus increasing the amount of filters per PF under certain
configurations.
The llh entries require using the dmae since the memory is widebus,
which requires atomicity in access.

Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-mvpp2-Classifier-updates-RSS'
David S. Miller [Sat, 25 May 2019 23:38:16 +0000 (16:38 -0700)]
Merge branch 'net-mvpp2-Classifier-updates-RSS'

Maxime Chevallier says:

====================
net: mvpp2: Classifier updates, RSS

Here is a set of updates for the PPv2 classifier, the main feature being
the support for steering to RSS contexts, to leverage all the available
RSS tables in the controller.

The first two patches are non-critical fixes for the classifier, the
first one prevents us from allocating too much room to store the
classification rules, the second one configuring the C2 engine as
suggested by the PPv2 functionnal specs.

Patches 3 to 5 introduce support for RSS contexts in mvpp2, allowing us
to steer traffic to dedicated RSS tables.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: mvpp2: cls: Support steering to RSS contexts
Maxime Chevallier [Fri, 24 May 2019 10:05:54 +0000 (12:05 +0200)]
net: mvpp2: cls: Support steering to RSS contexts

When steering to an RXQ, we can perform an extra RSS step to assign a
queue from an RSS table.

This is done by setting the RSS_EN attribute in the C2 engine. In that
case, the RXQ that is assigned is the global RSS context id, that is
then translated to an RSS table using the RXQ2RSS table.

An example using ethtool to steer to RXQ 2 and 3 would be :

ethtool -X eth0 weight 0 0 1 1 context new

(This would print the allocated context id, let's say it's 1)

ethtool -N eth0 flow-type udp4 dst-port 1234 context 1 loc 0

The hash parameters are the ones that are globally configured for RSS :

ethtool -N eth0 rx-flow-hash udp4 sdfn

When an RSS context is removed while there are active classification
rules using this context, these rules are removed.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: mvpp2: cls: Extract the RSS context when parsing the ethtool rule
Maxime Chevallier [Fri, 24 May 2019 10:05:53 +0000 (12:05 +0200)]
net: mvpp2: cls: Extract the RSS context when parsing the ethtool rule

ethtool_rx_flow_rule_create takes into parameter the ethtool flow spec,
which doesn't contain the rss context id. We therefore need to extract
it ourself before parsing the ethtool rule.

The FLOW_RSS flag is only set in info->fs.flow_type, and not
info->flow_type.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: mvpp2: cls: Use RSS contexts to handle RSS tables
Maxime Chevallier [Fri, 24 May 2019 10:05:52 +0000 (12:05 +0200)]
net: mvpp2: cls: Use RSS contexts to handle RSS tables

The PPv2 controller has 8 RSS tables that are shared across all ports on
a given PPv2 instance. The previous implementation allocated one table
per port, leaving others unused.

By using RSS contexts, we can make use of multiple RSS tables per
port, one being the default table (always id 0), the other ones being
used as destinations for flow steering, in the same way as rx rings.

This commit introduces RSS contexts management in the PPv2 driver. We
always reserve one table per port, allocated when the port is probed.

The global table list is stored in the struct mvpp2, as it's a global
resource. Each port then maintains a list of indices in that global
table, that way each port can have it's own numbering scheme starting
from 0.

One limitation that seems unavoidable is that the hashing parameters are
shared across all RSS contexts for a given port. Hashing parameters for
ctx 0 will be applied to all contexts.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: mvpp2: cls: Bypass C2 internals FIFOs at init
Maxime Chevallier [Fri, 24 May 2019 10:05:51 +0000 (12:05 +0200)]
net: mvpp2: cls: Bypass C2 internals FIFOs at init

The C2 TCAM has internal FIFOs that are only useful for the built-in
self-tests. Disable these FIFOS at init, as recommended in the
functionnal specs.

Suggested-by: Alan Winkowski <walan@marvell.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: mvpp2: cls: Use the correct number of rules in various places
Maxime Chevallier [Fri, 24 May 2019 10:05:50 +0000 (12:05 +0200)]
net: mvpp2: cls: Use the correct number of rules in various places

As of today, the classification offload implementation only supports 4
different rules to be offloaded. This number has been hardcoded in the
rule insertion function, and the wrong define is being used elsewhere.

Use the correct #define everywhere to make sure we always check for the
correct number of rules.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoflow_offload: use struct_size() in kzalloc()
Gustavo A. R. Silva [Thu, 23 May 2019 22:56:53 +0000 (17:56 -0500)]
flow_offload: use struct_size() in kzalloc()

One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
   int stuff;
   struct boo entry[];
};

instance = kzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>