linux-2.6-block.git
5 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec...
David S. Miller [Tue, 2 Oct 2018 05:31:17 +0000 (22:31 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2018-10-01

1) Make xfrmi_get_link_net() static to silence a sparse warning.
   From Wei Yongjun.

2) Remove a unused esph pointer definition in esp_input().
   From Haishuang Yan.

3) Allow the NIC driver to quietly refuse xfrm offload
   in case it does not support it, the SA is created
   without offload in this case.
   From Shannon Nelson.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge tag 'mlx5e-updates-2018-10-01' of git://git.kernel.org/pub/scm/linux/kernel...
David S. Miller [Mon, 1 Oct 2018 22:49:17 +0000 (15:49 -0700)]
Merge tag 'mlx5e-updates-2018-10-01' of git://git./linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5e-updates-2018-10-01

This series includes updates to mlx5e ethernet netdevice driver:

From Or Gerlitz:
1) Support masks for l3/l4 filters in ethtool flow steering
2) Report checksum unnecessary also when the L3 checksum flag on the
   cqe is set and there's no L4 header
3) Allow reporting of checksum unnecessary, using an ethtool private flag.

From Gavi Teitz and Or, VF representors netdevs performance improvements
4) Allow striding RQ in VF representor and bigger RQ size, ~3X performance improvement
5) Enable stateless offloads for VF representor, csum and TSO, 1.5X performance improvement
6) RSS Support for VF representors
   6.1) Allow flow table destination fir VF representor steering rule.
   6.2) Create RSS flow table per representor netdev
   6.3) Expose mlx5e RSS ethtool to be used by representor netdevs
   6.4) Enable multi-queue and RSS for VF representors, using mlx5e existing infrastructure
            for managing a multi-queue RX RSS tables.

From Alaa Hleihel:
7) Cache the system image guid, The system image guid is a read-only field
   Read this once and save it on the core device.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: start receiver buffer autotuning sooner
Yuchung Cheng [Mon, 1 Oct 2018 22:42:32 +0000 (15:42 -0700)]
tcp: start receiver buffer autotuning sooner

Previously receiver buffer auto-tuning starts after receiving
one advertised window amount of data. After the initial receiver
buffer was raised by patch a337531b942b ("tcp: up initial rmem to
128KB and SYN rwin to around 64KB"), the reciver buffer may take
too long to start raising. To address this issue, this patch lowers
the initial bytes expected to receive roughly the expected sender's
initial window.

Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Mon, 1 Oct 2018 22:43:58 +0000 (15:43 -0700)]
Merge branch '100GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2018-10-01

This series contains updates to ice driver only.

Anirudh provides several changes to "prep" the driver for upcoming
features.  Specifically, the functions that are used for PF VSI/netdev
setup will also be used in SR-IOV support and to allow the reuse of
these functions, code needs to move.

Dave provides the only other change in the series, updates the driver to
protect the reset patch in its entirety.  This is done by adding the
various bit checks to determine if a reset is scheduled/initiated and
whether it came from the software or firmware.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoice: Change pf state behavior to protect reset path
Dave Ertman [Thu, 20 Sep 2018 00:23:11 +0000 (17:23 -0700)]
ice: Change pf state behavior to protect reset path

Currently, there is no bit, or set of bits, that protect the entirety
of the reset path.

If the reset is originated by the driver, then the relevant
one of the following bits will be set when the reset is scheduled:
__ICE_PFR_REQ
__ICE_CORER_REQ
__ICE_GLOBR_REQ
This bit will not be cleared until after the rebuild has completed.

If the reset is originated by the FW, then the first the driver knows of
it will be the reception of the OICR interrupt.  The __ICE_RESET_OICR_RECV
bit will be set in the interrupt handler.  This will also be the indicator
in a SW originated reset that we have completed the pre-OICR tasks and
have informed the FW that a reset was requested.

To utilize these bits, change the function:
ice_is_reset_recovery_pending()
to be:
ice_is_reset_in_progress()

The new function will check all of the above bits in the pf->state and
will return a true if one or more of these bits are set.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Move common functions out of ice_main.c part 7/7
Anirudh Venkataramanan [Thu, 20 Sep 2018 00:23:10 +0000 (17:23 -0700)]
ice: Move common functions out of ice_main.c part 7/7

This patch completes the code move out of ice_main.c

The following top level functions and related dependency functions) were
moved to ice_lib.c:
ice_vsi_setup
ice_vsi_cfg_tc

The following functions were made static again:
ice_vsi_setup_vector_base
ice_vsi_alloc_q_vectors
ice_vsi_get_qs
void ice_vsi_map_rings_to_vectors
ice_vsi_alloc_rings
ice_vsi_set_rss_params
ice_vsi_set_num_qs
ice_get_free_slot
ice_vsi_init
ice_vsi_alloc_arrays

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Move common functions out of ice_main.c part 6/7
Anirudh Venkataramanan [Thu, 20 Sep 2018 00:23:09 +0000 (17:23 -0700)]
ice: Move common functions out of ice_main.c part 6/7

This patch continues the code move out of ice_main.c

The following top level functions (and related dependency functions) were
moved to ice_lib.c:
ice_vsi_setup_vector_base
ice_vsi_alloc_q_vectors
ice_vsi_get_qs

The following functions were made static again:
ice_vsi_free_arrays
ice_vsi_clear_rings

Also, in this patch, the netdev and NAPI registration logic was de-coupled
from the VSI creation logic (ice_vsi_setup) as for SR-IOV, while we want to
create VF VSIs using ice_vsi_setup, we don't want to create netdevs.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Move common functions out of ice_main.c part 5/7
Anirudh Venkataramanan [Thu, 20 Sep 2018 00:23:08 +0000 (17:23 -0700)]
ice: Move common functions out of ice_main.c part 5/7

This patch continues the code move out of ice_main.c

The following top level functions (and related dependency functions) were
moved to ice_lib.c:
ice_vsi_clear
ice_vsi_close
ice_vsi_free_arrays
ice_vsi_map_rings_to_vectors

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Move common functions out of ice_main.c part 4/7
Anirudh Venkataramanan [Thu, 20 Sep 2018 00:23:07 +0000 (17:23 -0700)]
ice: Move common functions out of ice_main.c part 4/7

This patch continues the code move out of ice_main.c

The following top level functions (and related dependency functions) were
moved to ice_lib.c:
ice_vsi_alloc_rings
ice_vsi_set_rss_params
ice_vsi_set_num_qs
ice_get_free_slot
ice_vsi_init
ice_vsi_clear_rings
ice_vsi_alloc_arrays

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Move common functions out of ice_main.c part 3/7
Anirudh Venkataramanan [Thu, 20 Sep 2018 00:23:06 +0000 (17:23 -0700)]
ice: Move common functions out of ice_main.c part 3/7

This patch continues the code move out of ice_main.c

The following top level functions (and related dependency functions) were
moved to ice_lib.c:
ice_vsi_delete
ice_free_res
ice_get_res
ice_is_reset_recovery_pending
ice_vsi_put_qs
ice_vsi_dis_irq
ice_vsi_free_irq
ice_vsi_free_rx_rings
ice_vsi_free_tx_rings
ice_msix_clean_rings

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Move common functions out of ice_main.c part 2/7
Anirudh Venkataramanan [Thu, 20 Sep 2018 00:23:05 +0000 (17:23 -0700)]
ice: Move common functions out of ice_main.c part 2/7

This patch continues the code move out of ice_main.c

The following top level functions (and related dependency functions) were
moved to ice_lib.c:
ice_vsi_start_rx_rings
ice_vsi_stop_rx_rings
ice_vsi_stop_tx_rings
ice_vsi_cfg_rxqs
ice_vsi_cfg_txqs
ice_vsi_cfg_msix

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Move common functions out of ice_main.c part 1/7
Anirudh Venkataramanan [Thu, 20 Sep 2018 00:23:04 +0000 (17:23 -0700)]
ice: Move common functions out of ice_main.c part 1/7

The functions that are used for PF VSI/netdev setup will also be used
for SR-IOV support. To allow reuse of these functions, move these
functions out of ice_main.c to ice_common.c/ice_lib.c

This move is done across multiple patches. Each patch moves a few
functions and may have minor adjustments. For example, a function that was
previously static in ice_main.c will be made non-static temporarily in
its new location to allow the driver to build cleanly. These adjustments
will be removed in subsequent patches where more code is moved out of
ice_main.c

In this particular patch, the following functions were moved out of
ice_main.c:
int ice_add_mac_to_list
ice_free_fltr_list
ice_stat_update40
ice_stat_update32
ice_update_eth_stats
ice_vsi_add_vlan
ice_vsi_kill_vlan
ice_vsi_manage_vlan_insertion
ice_vsi_manage_vlan_stripping

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agonet/mlx5: Cache the system image guid
Alaa Hleihel [Wed, 5 Sep 2018 14:06:37 +0000 (17:06 +0300)]
net/mlx5: Cache the system image guid

The system image guid is a read-only field which is used by the TC
offloads code to determine if two mlx5 devices belong to the same
ASIC while adding flows.

Read this once and save it on the core device rather than querying each
time an offloaded flow is added.

Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Allow reporting of checksum unnecessary
Or Gerlitz [Sun, 1 Jul 2018 08:58:38 +0000 (08:58 +0000)]
net/mlx5e: Allow reporting of checksum unnecessary

Currently we practically never report checksum unnecessary, because
for all IP packets we take the checksum complete path.

Enable non-default runs with reprorting checksum unnecessary, using
an ethtool private flag. This can be useful for performance evals
and other explorations.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Enable reporting checksum unnecessary also for L3 packets
Or Gerlitz [Sun, 1 Jul 2018 08:49:38 +0000 (08:49 +0000)]
net/mlx5e: Enable reporting checksum unnecessary also for L3 packets

We can report checksum unnecessary also when the L3 checksum
flag on the cqe is set and there's no L4 header.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Add ethtool control of ring params to VF representors
Gavi Teitz [Thu, 13 Sep 2018 11:40:25 +0000 (14:40 +0300)]
net/mlx5e: Add ethtool control of ring params to VF representors

Added ethtool control to the representors for setting and querying
the ring params.

Signed-off-by: Gavi Teitz <gavi@mellanox.com>
5 years agonet/mlx5e: Enable multi-queue and RSS for VF representors
Gavi Teitz [Wed, 12 Sep 2018 12:18:40 +0000 (15:18 +0300)]
net/mlx5e: Enable multi-queue and RSS for VF representors

Increased the amount of channels the representors can open to be the
amount of CPUs. The default amount opened remains one.

Used the standard NIC netdev functions to:
* Set RSS params when building the representors' params.
* Setup an indirect TIR and RQT for the representors upon
  initialization.
* Create a TTC flow table for the representors' indirect TIR (when
  creating the TTC table, mlx5e_set_ttc_basic_params() is not called,
  in order to avoid setting the inner_ttc param, which is not needed).

Added ethtool control to the representors for setting and querying
the amount of open channels. Additionally, included logic in the
representors' ethtool set channels handler which controls a
representor's vport rx rule, so that if there is one open channel
the rx rule steers traffic to the representor's direct TIR, whereas
if there is more than one channel, the rx rule steers traffic to the
new TTC flow table.

Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Expose ethtool rss key size / indirection table functions
Or Gerlitz [Sun, 26 Aug 2018 09:53:51 +0000 (12:53 +0300)]
net/mlx5e: Expose ethtool rss key size / indirection table functions

Towards enabling RSS for the vport representors, expose the functions for
querying the rss hash key size and indirection table size via ethtool.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Expose function for building RSS params
Gavi Teitz [Sun, 19 Aug 2018 12:01:13 +0000 (15:01 +0300)]
net/mlx5e: Expose function for building RSS params

Towards enabling RSS for the vport representors, extract the
procedure for building a device's RSS params, and expose the
function.

Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Provide explicit directive if to create inner indirect tirs
Or Gerlitz [Tue, 28 Aug 2018 17:53:55 +0000 (20:53 +0300)]
net/mlx5e: Provide explicit directive if to create inner indirect tirs

Change the driver functions that deal with creating indirect tirs
to get a flag telling if inner ttc is desired.

A pre-step for enabling rss on the vport representors, where
inner ttc is not needed.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5: E-Switch, Provide flow dest when creating vport rx rule
Gavi Teitz [Thu, 16 Aug 2018 21:28:53 +0000 (00:28 +0300)]
net/mlx5: E-Switch, Provide flow dest when creating vport rx rule

Currently the destination for the representor e-switch rx rule is
a TIR number. Towards changing that to potentially be a flow table,
as part of enabling RSS for representors, modify the signature of
the related e-switch API to get a flow destination.

Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Extract creation of rep's default flow rule
Gavi Teitz [Sun, 19 Aug 2018 11:08:27 +0000 (14:08 +0300)]
net/mlx5e: Extract creation of rep's default flow rule

Cleaning up the flow of the representors' rx initialization, towards
enabling RSS for the representors.

Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Enable stateless offloads for VF representor netdevs
Gavi Teitz [Thu, 16 Aug 2018 12:12:27 +0000 (15:12 +0300)]
net/mlx5e: Enable stateless offloads for VF representor netdevs

Enabled checksum and TSO offloads for the representors, in
order to increase their performance, which is required to
increase the performance of flows that cannot be offloaded.

Checksum offloads contribute to a general acceleration of all
traffic (to around 150%), whereas the TSO offload contributes
to a prominent acceleration of the representor's TX for traffic
flows with larger than MTU sized packets (to around 200%). This
is the usual case for TCP streams, as the PF, which serves as
the uplink representor, and the VF representors employ GRO before
forwarding the packets to the representor.

GRO was enabled implicitly for the representors beforehand, and
is explicitly enabled here to ensure that the representors preserve
the performance boost it provides (of around 200%) when working in
tandem with the TSO offload by the forwardee, which is the standard
case as both the PF and the VF representors employ HW TSO.

The impact of these changes can be seen in the following
measurements taken on a setup of a VM over a VF, connected
to OVS via the VF representor, to an external host:

Before current changes:
                     TCP Throughput [Gb/s]
External host to VM         ~ 10.5
VM to external host         ~ 23.5

With just checksum offloads enabled:
                     TCP Throughput [Gb/s]
External host to VM         ~ 14.9
VM to external host         ~ 28.5

With the TSO offload also enabled:
                     TCP Throughput [Gb/s]
External host to VM         ~ 30.5

Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Change VF representors' RQ type
Gavi Teitz [Thu, 16 Aug 2018 11:25:24 +0000 (14:25 +0300)]
net/mlx5e: Change VF representors' RQ type

The representors' RQ size was not large enough for them to achieve
high enough performance, and therefore needed to be enlarged, while
suffering a minimum hit to its memory usage. To achieve this the
representors RQ size was increased, and its type was changed to be a
striding RQ if it is supported.

Towards that goal the following changes were made:

* Extracted the sequence for setting the standard netdev's RQ parmas
  into a function

* Replaced the sequence for setting the representor's RQ params with
  the standard sequence

The impact of this change can be seen in the following measurements
taken on a setup of a VM over a VF, connected to OVS via the VF
representor, to an external host:

Before current change:
                     TCP Throughput [Gb/s]
VM to external host         ~  7.2

With the current change (measured with a striding RQ):
                     TCP Throughput [Gb/s]
VM to external host         ~ 23.5

Each representor now consumes 2 [MB] of memory for its packet
buffers.

Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agonet/mlx5e: Ethtool steering, Support masks for l3/l4 filters
Or Gerlitz [Thu, 16 Aug 2018 18:38:22 +0000 (21:38 +0300)]
net/mlx5e: Ethtool steering, Support masks for l3/l4 filters

Allow using partial masks for L3 addresses and L4 ports across
the place.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
5 years agoopenvswitch: Use correct reply values in datapath and vport ops
Yifeng Sun [Wed, 26 Sep 2018 18:40:14 +0000 (11:40 -0700)]
openvswitch: Use correct reply values in datapath and vport ops

This patch fixes the bug that all datapath and vport ops are returning
wrong values (OVS_FLOW_CMD_NEW or OVS_DP_CMD_NEW) in their replies.

Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotls: Remove redundant vars from tls record structure
Vakul Garg [Wed, 26 Sep 2018 10:52:08 +0000 (16:22 +0530)]
tls: Remove redundant vars from tls record structure

Structure 'tls_rec' contains sg_aead_in and sg_aead_out which point
to a aad_space and then chain scatterlists sg_plaintext_data,
sg_encrypted_data respectively. Rather than using chained scatterlists
for plaintext and encrypted data in aead_req, it is efficient to store
aad_space in sg_encrypted_data and sg_plaintext_data itself in the
first index and get rid of sg_aead_in, sg_aead_in and further chaining.

This requires increasing size of sg_encrypted_data & sg_plaintext_data
arrarys by 1 to accommodate entry for aad_space. The code which uses
sg_encrypted_data and sg_plaintext_data has been modified to skip first
index as it points to aad_space.

Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'tipc-next'
David S. Miller [Sat, 29 Sep 2018 18:24:22 +0000 (11:24 -0700)]
Merge branch 'tipc-next'

Jon Maloy says:

====================
tipc: make connection setup more robust

In this series we make a few improvements to the connection setup and
probing mechanism, culminating in the last commit where we make it
possible for a client socket to make multiple setup attempts in case
it encounters receive buffer overflow at the listener socket.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotipc: buffer overflow handling in listener socket
Tung Nguyen [Fri, 28 Sep 2018 18:23:22 +0000 (20:23 +0200)]
tipc: buffer overflow handling in listener socket

Default socket receive buffer size for a listener socket is 2Mb. For
each arriving empty SYN, the linux kernel allocates a 768 bytes buffer.
This means that a listener socket can serve maximum 2700 simultaneous
empty connection setup requests before it hits a receive buffer
overflow, and much fewer if the SYN is carrying any significant
amount of data.

When this happens the setup request is rejected, and the client
receives an ECONNREFUSED error.

This commit mitigates this problem by letting the client socket try to
retransmit the SYN message multiple times when it sees it rejected with
the code TIPC_ERR_OVERLOAD. Retransmission is done at random intervals
in the range of [100 ms, setup_timeout / 4], as many times as there is
room for within the setup timeout limit.

Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotipc: add SYN bit to connection setup messages
Jon Maloy [Fri, 28 Sep 2018 18:23:21 +0000 (20:23 +0200)]
tipc: add SYN bit to connection setup messages

Messages intended for intitating a connection are currently
indistinguishable from regular datagram messages. The TIPC
protocol specification defines bit 17 in word 0 as a SYN bit
to allow sanity check of such messages in the listening socket,
but this has so far never been implemented.

We do that in this commit.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotipc: refactor function tipc_sk_filter_connect()
Jon Maloy [Fri, 28 Sep 2018 18:23:20 +0000 (20:23 +0200)]
tipc: refactor function tipc_sk_filter_connect()

We refactor the function tipc_sk_filter_connect(), both to make it
more readable and as a preparation for the next commit.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotipc: refactor function tipc_sk_timeout()
Jon Maloy [Fri, 28 Sep 2018 18:23:19 +0000 (20:23 +0200)]
tipc: refactor function tipc_sk_timeout()

We refactor this function as a preparation for the coming commits in
the same series.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotipc: refactor function tipc_msg_reverse()
Jon Maloy [Fri, 28 Sep 2018 18:23:18 +0000 (20:23 +0200)]
tipc: refactor function tipc_msg_reverse()

The function tipc_msg_reverse() is reversing the header of a message
while reusing the original buffer. We have seen at several occasions
that this may have unfortunate side effects when the buffer to be
reversed is a clone.

In one of the following commits we will again need to reverse cloned
buffers, so this is the right time to permanently eliminate this
problem. In this commit we let the said function always consume the
original buffer and replace it with a new one when applicable.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: up initial rmem to 128KB and SYN rwin to around 64KB
Yuchung Cheng [Thu, 27 Sep 2018 18:21:19 +0000 (11:21 -0700)]
tcp: up initial rmem to 128KB and SYN rwin to around 64KB

Previously TCP initial receive buffer is ~87KB by default and
the initial receive window is ~29KB (20 MSS). This patch changes
the two numbers to 128KB and ~64KB (rounding down to the multiples
of MSS) respectively. The patch also simplifies the calculations s.t.
the two numbers are directly controlled by sysctl tcp_rmem[1]:

  1) Initial receiver buffer budget (sk_rcvbuf): while this should
     be configured via sysctl tcp_rmem[1], previously tcp_fixup_rcvbuf()
     always override and set a larger size when a new connection
     establishes.

  2) Initial receive window in SYN: previously it is set to 20
     packets if MSS <= 1460. The number 20 was based on the initial
     congestion window of 10: the receiver needs twice amount to
     avoid being limited by the receive window upon out-of-order
     delivery in the first window burst. But since this only
     applies if the receiving MSS <= 1460, connection using large MTU
     (e.g. to utilize receiver zero-copy) may be limited by the
     receive window.

With this patch TCP memory configuration is more straight-forward and
more properly sized to modern high-speed networks by default. Several
popular stacks have been announcing 64KB rwin in SYNs as well.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agohns3: Another build fix.
David S. Miller [Sat, 29 Sep 2018 18:21:06 +0000 (11:21 -0700)]
hns3: Another build fix.

drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c: In function ‘hclge_get_sset_count’:
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c:496:31: error: ‘HNAE3_REVISION_ID_21’ undeclared (first use in this function); did you mean ‘FADT2_REVISION_ID’?
   if (hdev->pdev->revision >= HNAE3_REVISION_ID_21 ||
                               ^~~~~~~~~~~~~~~~~~~~
                               FADT2_REVISION_ID

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agohns3: Fix the build.
David S. Miller [Sat, 29 Sep 2018 17:49:58 +0000 (10:49 -0700)]
hns3: Fix the build.

drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c: In function ‘hns3_self_test’:
drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c:278:15: error: ‘HNS3_SELF_TEST_TYPE_NUM’ undeclared (first use in this function); did you mean ‘HNS3_SELF_TEST_TPYE_NUM’?
  int st_param[HNS3_SELF_TEST_TYPE_NUM][2];
               ^~~~~~~~~~~~~~~~~~~~~~~
               HNS3_SELF_TEST_TPYE_NUM

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoice: fix changing of ring descriptor size (ethtool -G)
Bruce Allan [Thu, 20 Sep 2018 00:23:11 +0000 (17:23 -0700)]
ice: fix changing of ring descriptor size (ethtool -G)

rx_mini_pending was set to an incorrect value. This was causing EINVAL to
always be returned to 'ethtool -G'. The driver does not support mini or
jumbo rings so the respective settings should be zero.

Also, change the valid range of the number of descriptors in the rings to
make the code simpler and easier for users to understand (this removes the
valid settings of 8 and 16). Add a system log message indicating when the
number is rounded-up from what the user specifies with the 'ethtool -G'
command (i.e. when it is not a multiple of 32), and update the log message
when a user-provided value is out of range to also indicate the stride.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Update to capabilities admin queue command
Anirudh Venkataramanan [Thu, 20 Sep 2018 00:23:10 +0000 (17:23 -0700)]
ice: Update to capabilities admin queue command

This patch makes a couple of changes in the way the driver uses the
"get capabilities" command.

1. Get device capabilities in addition to function capabilities

2. Align to latest spec by using cap_count to determine size of the
   buffer in case of length error.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Query the Tx scheduler node before adding it
Anirudh Venkataramanan [Thu, 20 Sep 2018 00:23:09 +0000 (17:23 -0700)]
ice: Query the Tx scheduler node before adding it

Query the Tx scheduler tree node information from FW before adding it to
the driver's software database. This will keep the node information current
in driver.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Update comment for ice_fltr_mgmt_list_entry
Brett Creeley [Thu, 20 Sep 2018 00:23:08 +0000 (17:23 -0700)]
ice: Update comment for ice_fltr_mgmt_list_entry

Previously the comment stated that VSI lists should be used when a
second VSI becomes a subscriber to the "VLAN address". VSI lists
are always used for VLAN membership, so replace "VLAN address" with
"MAC address". Also note that VLAN(s) always use VSI list rules.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: update fw version check logic
Jacob Keller [Thu, 20 Sep 2018 00:23:07 +0000 (17:23 -0700)]
ice: update fw version check logic

We have MAX_FW_API_VER_BRANCH, MAX_FW_API_VER_MAJOR, and
MAX_FW_API_VER_MINOR that we use in ice_controlq.h to test when a
firmware version is newer than expected. This is currently tested by
comparing each field separately. Thus, we compare the branch field
against the MAX_FW_API_VER_BRANCH, and so forth.

This means that currently, if we suppose that the max firmware version
is defined as 0.2.1, i.e.

Then firmware 0.1.3 will fail to load. This is because the minor version
3 is greater than the max minor version 1.

This is not intuitive, because of the notion that increasing the major
firmware version to 2 should mean any firmware version with a major
version is less than 2 should be considered older than 2...

In order to allow both 0.2.1 and 0.1.3 to load, you would have to define
the "max" firmware version as 0.2.3.. It is possible that such
a firmware version doesn't even exist yet!

Fix this by replacing the current logic with an updated check that
behaves as follows:

First, we check the major version. If it is greater than the expected
version, then we prevent driver load. Additionally, a warning message is
logged to indicate to the system administrator that they need to update
their driver. This is now the only case where the driver will refuse to
load.

Second, if the major version is less than the expected version, we log
an information message indicating the NVM should be updated.

Third, if the major version is exact, we'll then check the minor
version. If the minor version is more than two versions less than
expected, we log an information message indicating the NVM should be
updated. If it is more than two versions greater than the expected
version, we log an information message that the driver should be
updated.

To support this, the ice_aq_ver_check function needs its signature
updated to pass the HW structure. Since we now pass this structure,
there is no need to pass the firmware API versions separately.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: update branding strings and supported device ids
Bruce Allan [Thu, 20 Sep 2018 00:23:06 +0000 (17:23 -0700)]
ice: update branding strings and supported device ids

Update branding strings and remove device ids 0x1594 and 0x1595.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: replace unnecessary memcpy with direct assignment
Bruce Allan [Thu, 20 Sep 2018 00:23:05 +0000 (17:23 -0700)]
ice: replace unnecessary memcpy with direct assignment

Direct assignment is preferred over a memcpy()

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: use [sr]q.count when checking if queue is initialized
Jacob Keller [Thu, 20 Sep 2018 00:23:04 +0000 (17:23 -0700)]
ice: use [sr]q.count when checking if queue is initialized

When shutting down the controlqs, we check if they are initialized
before we shut them down and destroy the lock. This is important, as it
prevents attempts to access the lock of an already shutdown queue.

Unfortunately, we checked rq.head and sq.head as the value to determine
if the queue was initialized. This doesn't work, because head is not
reset when the queue is shutdown. In some flows, the adminq will have
already been shut down prior to calling ice_shutdown_all_ctrlqs. This
can result in a crash due to attempting to access the already destroyed
mutex.

Fix this by using rq.count and sq.count instead. Indeed, ice_shutdown_sq
and ice_shutdown_rq already indicate that this is the value we should be
using to determine of the queue was initialized.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoqed: fix spelling mistake "b_cb_registred" -> "b_cb_registered"
Colin Ian King [Thu, 27 Sep 2018 17:04:54 +0000 (18:04 +0100)]
qed: fix spelling mistake "b_cb_registred" -> "b_cb_registered"

Trivial fix to spelling mistake struct field name, rename it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Fri, 28 Sep 2018 18:09:02 +0000 (11:09 -0700)]
Merge branch '100GbE' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2018-09-27

This series contains fixes to the ice driver only.

Jake fixes a potential crash due to attempting to access the mutex which
is already destroyed.  Fix this by using rq.count and sq.count to
determine if the queue was initialized.  Fixed the current logic for
checking the firmware version to properly handle situations when
firmware major/minor versions differ and when the branch version
differs.

Bruce replaces a memcpy() with a direct assignment, which is preferred.
Also updated the branding strings and device ids supported by the
driver.  Fixed the "ethtool -G" command in the driver, which was always
returning EINVAL when changing the descriptor ring size.

Brett update and clarified code comments.

Anirudh updates the driver to ensure we query the firmware for the
transmit scheduler node information before adding it to the driver
database, to ensure we have the current information.  Also update the
"get capabilities" command to get device and function capabilities.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: sched: make function qdisc_free_cb() static
Wei Yongjun [Thu, 27 Sep 2018 14:47:56 +0000 (14:47 +0000)]
net: sched: make function qdisc_free_cb() static

Fixes the following sparse warning:

net/sched/sch_generic.c:944:6: warning:
 symbol 'qdisc_free_cb' was not declared. Should it be static?

Fixes: 3a7d0d07a386 ("net: sched: extend Qdisc with rcu")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: forwarding: test for bridge sticky flag
Nikolay Aleksandrov [Thu, 27 Sep 2018 13:35:13 +0000 (16:35 +0300)]
selftests: forwarding: test for bridge sticky flag

This test adds an fdb entry with the sticky flag and sends traffic from
a different port with the same mac as a source address expecting the entry
to not change ports if the flag is operating correctly.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: bridge: explicitly zero is_sticky in fdb_create
Nikolay Aleksandrov [Thu, 27 Sep 2018 12:05:10 +0000 (15:05 +0300)]
net: bridge: explicitly zero is_sticky in fdb_create

We need to explicitly zero is_sticky when creating a new fdb, otherwise
we might get a stale value for a new entry.

Fixes: 435f2e7cc0b7 ("net: bridge: add support for sticky fdb entries")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'hns3-next'
David S. Miller [Fri, 28 Sep 2018 17:37:42 +0000 (10:37 -0700)]
Merge branch 'hns3-next'

Salil Mehta says:

====================
Cleanups, minor additions & fixes for HNS3 driver

This patch-set contains cleans-ups, minor changes and fixes to the HNS3 driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: Fix loss of coal configuration while doing reset
Huazhong Tan [Wed, 26 Sep 2018 18:28:40 +0000 (19:28 +0100)]
net: hns3: Fix loss of coal configuration while doing reset

The user's coal configuration will be lost after reset, so the tx_coal
and rx_coal fields are added to the struct hns_nic_priv to save the coal
configuration and used to restore the user's configuration after the reset
is complete.

Fixes: bb6b94a896d4 ("net: hns3: Add reset interface implementation in client")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: Modify hns3_get_max_available_channels
Huazhong Tan [Wed, 26 Sep 2018 18:28:39 +0000 (19:28 +0100)]
net: hns3: Modify hns3_get_max_available_channels

The current hns3_get_max_available_channels returns the total number
of queues for the device, which makes ethtool -L set the number of queues
per channel queues incorrectly, so hns3_get_max_available_channels should
return the maximum available number of queues per channel, depending on
the total number of queues allocated and the hardware configurations.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: Change return type of hclge_tm_schd_info_update()
Huazhong Tan [Wed, 26 Sep 2018 18:28:38 +0000 (19:28 +0100)]
net: hns3: Change return type of hclge_tm_schd_info_update()

hclge_tm_schd_info_update should return an error when num_tc is greater
than alloc_tqps.

This patch changes the return type of hnae3_register_ae_algo from void
to int.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: Fix for netdev not up problem when setting mtu
Yunsheng Lin [Wed, 26 Sep 2018 18:28:37 +0000 (19:28 +0100)]
net: hns3: Fix for netdev not up problem when setting mtu

Currently hns3_nic_change_mtu will try to down the netdev before
setting mtu, and it does not up the netdev when the setting fails,
which causes netdev not up problem.

This patch fixes it by not returning when the setting fails.

Fixes: a8e8b7ff3517 ("net: hns3: Add support to change MTU in HNS3 hardware")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: Fix for packet buffer setting bug
Yunsheng Lin [Wed, 26 Sep 2018 18:28:36 +0000 (19:28 +0100)]
net: hns3: Fix for packet buffer setting bug

The hardware expects a unit of 128 bytes when setting
packet buffer. When calculating the packet buffer size,
hclge_rx_buffer_calc does not round up the size as a unit
of 128 byte, which may casue packet lost problem when stress
testing.

This patch fixes it by rounding up packet size when calculating.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: Add serdes parallel inner loopback support
Fuyun Liang [Wed, 26 Sep 2018 18:28:35 +0000 (19:28 +0100)]
net: hns3: Add serdes parallel inner loopback support

This patch adds serdes parallel inner loopback support for self test.

Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: Rename mac loopback to app loopback
Fuyun Liang [Wed, 26 Sep 2018 18:28:34 +0000 (19:28 +0100)]
net: hns3: Rename mac loopback to app loopback

In fact, our implementation of mac loopback is the implementation of app
loopback now. Current name is wrong. This patch renames mac loopback to
app loopback.

Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: Rename loop mode
Fuyun Liang [Wed, 26 Sep 2018 18:28:33 +0000 (19:28 +0100)]
net: hns3: Rename loop mode

Our loop mode includes mac loop, serdes loop and phy loop. Not all of them
are related with mac. This patch corrects their names.

Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: Set extra mac address of pause param for HW
Fuyun Liang [Wed, 26 Sep 2018 18:28:32 +0000 (19:28 +0100)]
net: hns3: Set extra mac address of pause param for HW

The extra mac address of pause param is used to do double check
for pause frame. This patch set it to HW. If we do not do that,
pfc pause frame will be transferred protocol stack when normal
flow control mode is enabled.

Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: Add support for sctp checksum offload
Peng Li [Wed, 26 Sep 2018 18:28:31 +0000 (19:28 +0100)]
net: hns3: Add support for sctp checksum offload

This patch adds support for sctp checksum offload.

Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: b53: Fix build with B53_SRAB enabled and B53_SERDES=m
Arnd Bergmann [Thu, 27 Sep 2018 10:02:38 +0000 (12:02 +0200)]
net: dsa: b53: Fix build with B53_SRAB enabled and B53_SERDES=m

When B53_SERDES is a loadable module, a built-in srab driver still
cannot reach it, so the previous fix is incomplete:

b53_srab.c:(.text+0x3f4): undefined reference to `b53_serdes_init'
drivers/net/dsa/b53/b53_srab.o:(.rodata+0xe64): undefined reference to `b53_serdes_link_state'
drivers/net/dsa/b53/b53_srab.o:(.rodata+0xe74): undefined reference to `b53_serdes_link_set'
drivers/net/dsa/b53/b53_srab.o:(.rodata+0xe88): undefined reference to `b53_serdes_an_restart'
drivers/net/dsa/b53/b53_srab.o:(.rodata+0xea0): undefined reference to `b53_serdes_phylink_validate'
drivers/net/dsa/b53/b53_srab.o:(.rodata+0xea4): undefined reference to `b53_serdes_config'

Add a Kconfig dependency that forces srab to also be a module
in this case, but allow it to be built-in when serdes is
disabled or built-in.

Fixes: 7a8c7f5c30f9 ("net: dsa: b53: Fix build with B53_SRAB enabled and not B53_SERDES")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: phy: mdio-bcm-unimac: mark PM functions as __maybe_unused
Arnd Bergmann [Wed, 26 Sep 2018 13:14:10 +0000 (15:14 +0200)]
net: phy: mdio-bcm-unimac: mark PM functions as __maybe_unused

The newly added runtime-pm support causes a harmless warning
when CONFIG_PM is disabled:

drivers/net/phy/mdio-bcm-unimac.c:330:12: error: 'unimac_mdio_resume' defined but not used [-Werror=unused-function]
 static int unimac_mdio_resume(struct device *d)
drivers/net/phy/mdio-bcm-unimac.c:321:12: error: 'unimac_mdio_suspend' defined but not used [-Werror=unused-function]
 static int unimac_mdio_suspend(struct device *d)

Marking the functions as __maybe_unused is the easiest workaround
and avoids adding #ifdef checks.

Fixes: b78ac6ecd1b6 ("net: phy: mdio-bcm-unimac: Allow configuring MDIO clock divider")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agophy: mscc: fix printf format
Arnd Bergmann [Wed, 26 Sep 2018 13:20:11 +0000 (15:20 +0200)]
phy: mscc: fix printf format

gcc points out that the length of the temporary buffer may not be sufficient for
large numbers of leds:

drivers/net/phy/mscc.c: In function 'vsc85xx_probe':
drivers/net/phy/mscc.c:460:45: error: '-mode' directive writing 5 bytes into a region of size between 0 and 9 [-Werror=format-overflow=]
   ret = sprintf(led_dt_prop, "vsc8531,led-%d-mode", i);
                                             ^~~~~
drivers/net/phy/mscc.c:460:9: note: 'sprintf' output between 19 and 28 bytes into a destination of size 22
   ret = sprintf(led_dt_prop, "vsc8531,led-%d-mode", i);
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

While we can make a reasonable assumption that the number of LEDs is small,
the cost of making the buffer a little bigger is insignificant as well.

Fixes: 11bfdabb7ff5 ("net: phy: mscc: factorize code for LEDs mode")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: dpaa: remove unused variables
Arnd Bergmann [Wed, 26 Sep 2018 13:12:13 +0000 (15:12 +0200)]
net: ethernet: dpaa: remove unused variables

The patch that removed the only users of the oldadv/newadv variables
accidentally left the now-unused declarations behind:

drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c: In function 'dpaa_set_pauseparam':
drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c:185:14: error: unused variable 'oldadv' [-Werror=unused-variable]
drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c:185:6: error: unused variable 'newadv' [-Werror=unused-variable]

Fixes: 70814e819c11 ("net: ethernet: Add helper for set_pauseparam for Asym Pause")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: aquantia: Make function aq_fw1x_set_power() static
Wei Yongjun [Wed, 26 Sep 2018 12:20:00 +0000 (12:20 +0000)]
net: aquantia: Make function aq_fw1x_set_power() static

Fixes the following sparse warning:

drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c:873:5: warning:
 symbol 'aq_fw1x_set_power' was not declared. Should it be static?

Fixes: a0da96c08cfa ("net: aquantia: implement WOL support")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/tls: Make function get_rec() static
Wei Yongjun [Wed, 26 Sep 2018 12:10:48 +0000 (12:10 +0000)]
net/tls: Make function get_rec() static

Fixes the following sparse warning:

net/tls/tls_sw.c:655:16: warning:
 symbol 'get_rec' was not declared. Should it be static?

Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet/core: make function ___gnet_stats_copy_basic() static
Wei Yongjun [Wed, 26 Sep 2018 12:09:45 +0000 (12:09 +0000)]
net/core: make function ___gnet_stats_copy_basic() static

Fixes the following sparse warning:

net/core/gen_stats.c:166:1: warning:
 symbol '___gnet_stats_copy_basic' was not declared. Should it be static?

Fixes: 5e111210a443 ("net/core: Add new basic hardware counter")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: xen-netback: fix return type of ndo_start_xmit function
YueHaibing [Wed, 26 Sep 2018 09:18:14 +0000 (17:18 +0800)]
net: xen-netback: fix return type of ndo_start_xmit function

The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, so make sure the implementation in
this driver has returns 'netdev_tx_t' value, and change the function
return type to netdev_tx_t.

Found by coccinelle.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoqed: Remove set but not used variable 'p_archipelago'
YueHaibing [Thu, 27 Sep 2018 06:45:06 +0000 (06:45 +0000)]
qed: Remove set but not used variable 'p_archipelago'

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/qlogic/qed/qed_ooo.c: In function 'qed_ooo_delete_isles':
drivers/net/ethernet/qlogic/qed/qed_ooo.c:354:30: warning:
 variable 'p_archipelago' set but not used [-Wunused-but-set-variable]

drivers/net/ethernet/qlogic/qed/qed_ooo.c: In function 'qed_ooo_join_isles':
drivers/net/ethernet/qlogic/qed/qed_ooo.c:463:30: warning:
 variable 'p_archipelago' set but not used [-Wunused-but-set-variable]

Since commit 1eec2437d14c ("qed: Make OOO archipelagos into an array"),
'p_archipelago' is no longer in use.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ovs: fix return type of ndo_start_xmit function
YueHaibing [Wed, 26 Sep 2018 09:15:38 +0000 (17:15 +0800)]
net: ovs: fix return type of ndo_start_xmit function

The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, so make sure the implementation in
this driver has returns 'netdev_tx_t' value, and change the function
return type to netdev_tx_t.

Found by coccinelle.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'netlink-nested-policy-validation'
David S. Miller [Fri, 28 Sep 2018 17:24:48 +0000 (10:24 -0700)]
Merge branch 'netlink-nested-policy-validation'

Johannes Berg says:

====================
netlink: nested policy validation

This adds nested policy validation, which lets you specify the
nested attribute type, e.g. NLA_NESTED with sub-policy, or the
new NLA_NESTED_ARRAY with sub-sub-policy.

Changes in v2:
 * move setting the bad attr pointer/message into validate_nla()
 * remove the recursion patch since that's no longer needed
 * simply skip the generic bad attr pointer/message setting in
   case of nested nla_validate() failing since that could fail
   only due to validate_nla() failing inside, which already sets
   the extack information

Changes in v3:
 * fix NLA_REJECT to have an error message if none is in policy
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonetlink: add nested array policy validation
Johannes Berg [Wed, 26 Sep 2018 09:15:34 +0000 (11:15 +0200)]
netlink: add nested array policy validation

Sometimes nested netlink attributes are just used as arrays, with
the nla_type() of each not being used; we have this in nl80211 and
e.g. NFTA_SET_ELEM_LIST_ELEMENTS.

Add the ability to validate this type of message directly in the
policy, by adding the type NLA_NESTED_ARRAY which does exactly
this: require a first level of nesting but ignore the attribute
type, and then inside each require a second level of nested and
validate those attributes against a given policy (if present).

Note that some nested array types actually require that all of
the entries have the same index, this is possible to express in
a nested policy already, apart from the validation that only the
one allowed type is used.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonetlink: allow NLA_NESTED to specify nested policy to validate
Johannes Berg [Wed, 26 Sep 2018 09:15:33 +0000 (11:15 +0200)]
netlink: allow NLA_NESTED to specify nested policy to validate

Now that we have a validation_data pointer, and the len field in
the policy is unused for NLA_NESTED, we can allow using them both
to have nested validation. This can be nice in code, although we
still have to use nla_parse_nested() or similar which would also
take a policy; however, it also serves as documentation in the
policy without requiring a look at the code.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonetlink: move extack setting into validate_nla()
Johannes Berg [Wed, 26 Sep 2018 09:15:32 +0000 (11:15 +0200)]
netlink: move extack setting into validate_nla()

This unifies the code between nla_parse() which sets the bad
attribute pointer and an error message, and nla_validate()
which only sets the bad attribute pointer.

It also cleans up the code for NLA_REJECT and paves the way
for nested policy validation, as it will allow us to easily
skip setting the "generic" message without any extra args
like the **error_msg now, just passing the extack through is
now enough.

While at it, remove the unnecessary label in nla_parse().

Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonetlink: make validation_data const
Johannes Berg [Wed, 26 Sep 2018 09:15:31 +0000 (11:15 +0200)]
netlink: make validation_data const

The validation data is only used within the policy that
should usually already be const, and isn't changed in any
code that uses it. Therefore, make the validation_data
pointer const.

While at it, remove the duplicate variable in the bitfield
validation that I'd otherwise have to change to const.

Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonetlink: remove NLA_NESTED_COMPAT
Johannes Berg [Wed, 26 Sep 2018 09:15:30 +0000 (11:15 +0200)]
netlink: remove NLA_NESTED_COMPAT

This isn't used anywhere, so we might as well get rid of it.

Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoice: fix changing of ring descriptor size (ethtool -G)
Bruce Allan [Thu, 20 Sep 2018 00:23:11 +0000 (17:23 -0700)]
ice: fix changing of ring descriptor size (ethtool -G)

rx_mini_pending was set to an incorrect value. This was causing EINVAL to
always be returned to 'ethtool -G'. The driver does not support mini or
jumbo rings so the respective settings should be zero.

Also, change the valid range of the number of descriptors in the rings to
make the code simpler and easier for users to understand (this removes the
valid settings of 8 and 16). Add a system log message indicating when the
number is rounded-up from what the user specifies with the 'ethtool -G'
command (i.e. when it is not a multiple of 32), and update the log message
when a user-provided value is out of range to also indicate the stride.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Update to capabilities admin queue command
Anirudh Venkataramanan [Thu, 20 Sep 2018 00:23:10 +0000 (17:23 -0700)]
ice: Update to capabilities admin queue command

This patch makes a couple of changes in the way the driver uses the
"get capabilities" command.

1. Get device capabilities in addition to function capabilities

2. Align to latest spec by using cap_count to determine size of the
   buffer in case of length error.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Query the Tx scheduler node before adding it
Anirudh Venkataramanan [Thu, 20 Sep 2018 00:23:09 +0000 (17:23 -0700)]
ice: Query the Tx scheduler node before adding it

Query the Tx scheduler tree node information from FW before adding it to
the driver's software database. This will keep the node information current
in driver.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: Update comment for ice_fltr_mgmt_list_entry
Brett Creeley [Thu, 20 Sep 2018 00:23:08 +0000 (17:23 -0700)]
ice: Update comment for ice_fltr_mgmt_list_entry

Previously the comment stated that VSI lists should be used when a
second VSI becomes a subscriber to the "VLAN address". VSI lists
are always used for VLAN membership, so replace "VLAN address" with
"MAC address". Also note that VLAN(s) always use VSI list rules.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: update fw version check logic
Jacob Keller [Thu, 20 Sep 2018 00:23:07 +0000 (17:23 -0700)]
ice: update fw version check logic

We have MAX_FW_API_VER_BRANCH, MAX_FW_API_VER_MAJOR, and
MAX_FW_API_VER_MINOR that we use in ice_controlq.h to test when a
firmware version is newer than expected. This is currently tested by
comparing each field separately. Thus, we compare the branch field
against the MAX_FW_API_VER_BRANCH, and so forth.

This means that currently, if we suppose that the max firmware version
is defined as 0.2.1, i.e.

Then firmware 0.1.3 will fail to load. This is because the minor version
3 is greater than the max minor version 1.

This is not intuitive, because of the notion that increasing the major
firmware version to 2 should mean any firmware version with a major
version is less than 2 should be considered older than 2...

In order to allow both 0.2.1 and 0.1.3 to load, you would have to define
the "max" firmware version as 0.2.3.. It is possible that such
a firmware version doesn't even exist yet!

Fix this by replacing the current logic with an updated check that
behaves as follows:

First, we check the major version. If it is greater than the expected
version, then we prevent driver load. Additionally, a warning message is
logged to indicate to the system administrator that they need to update
their driver. This is now the only case where the driver will refuse to
load.

Second, if the major version is less than the expected version, we log
an information message indicating the NVM should be updated.

Third, if the major version is exact, we'll then check the minor
version. If the minor version is more than two versions less than
expected, we log an information message indicating the NVM should be
updated. If it is more than two versions greater than the expected
version, we log an information message that the driver should be
updated.

To support this, the ice_aq_ver_check function needs its signature
updated to pass the HW structure. Since we now pass this structure,
there is no need to pass the firmware API versions separately.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: update branding strings and supported device ids
Bruce Allan [Thu, 20 Sep 2018 00:23:06 +0000 (17:23 -0700)]
ice: update branding strings and supported device ids

Update branding strings and remove device ids 0x1594 and 0x1595.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: replace unnecessary memcpy with direct assignment
Bruce Allan [Thu, 20 Sep 2018 00:23:05 +0000 (17:23 -0700)]
ice: replace unnecessary memcpy with direct assignment

Direct assignment is preferred over a memcpy()

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agoice: use [sr]q.count when checking if queue is initialized
Jacob Keller [Thu, 20 Sep 2018 00:23:04 +0000 (17:23 -0700)]
ice: use [sr]q.count when checking if queue is initialized

When shutting down the controlqs, we check if they are initialized
before we shut them down and destroy the lock. This is important, as it
prevents attempts to access the lock of an already shutdown queue.

Unfortunately, we checked rq.head and sq.head as the value to determine
if the queue was initialized. This doesn't work, because head is not
reset when the queue is shutdown. In some flows, the adminq will have
already been shut down prior to calling ice_shutdown_all_ctrlqs. This
can result in a crash due to attempting to access the already destroyed
mutex.

Fix this by using rq.count and sq.count instead. Indeed, ice_shutdown_sq
and ice_shutdown_rq already indicate that this is the value we should be
using to determine of the queue was initialized.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
5 years agonet-ipv4: remove 2 always zero parameters from ipv4_redirect()
Maciej Żenczykowski [Wed, 26 Sep 2018 03:56:27 +0000 (20:56 -0700)]
net-ipv4: remove 2 always zero parameters from ipv4_redirect()

(the parameters in question are mark and flow_flags)

Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet-ipv4: remove 2 always zero parameters from ipv4_update_pmtu()
Maciej Żenczykowski [Wed, 26 Sep 2018 03:56:26 +0000 (20:56 -0700)]
net-ipv4: remove 2 always zero parameters from ipv4_update_pmtu()

(the parameters in question are mark and flow_flags)

Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: mvneta: Add support for 2500Mbps SGMII
Maxime Chevallier [Tue, 25 Sep 2018 13:59:39 +0000 (15:59 +0200)]
net: mvneta: Add support for 2500Mbps SGMII

The mvneta controller can handle speeds up to 2500Mbps on the SGMII
interface. This relies on serdes configuration, the lane must be
configured at 3.125Gbps and we can't use in-band autoneg at that speed.

The main issue when supporting that speed on this particular controller
is that the link partner can send ethernet frames with a shortened
preamble, which if not explicitly enabled in the controller will cause
unexpected behaviours.

This was tested on Armada 385, with the comphy configuration done in
bootloader.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-vhost-improve-performance-when-enable-busyloop'
David S. Miller [Thu, 27 Sep 2018 03:25:55 +0000 (20:25 -0700)]
Merge branch 'net-vhost-improve-performance-when-enable-busyloop'

Tonghao Zhang says:

====================
net: vhost: improve performance when enable busyloop

This patches improve the guest receive performance.
On the handle_tx side, we poll the sock receive queue
at the same time. handle_rx do that in the same way.

For more performance report, see patch 4
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: vhost: add rx busy polling in tx path
Tonghao Zhang [Tue, 25 Sep 2018 12:36:52 +0000 (05:36 -0700)]
net: vhost: add rx busy polling in tx path

This patch improves the guest receive performance.
On the handle_tx side, we poll the sock receive queue at the
same time. handle_rx do that in the same way.

We set the poll-us=100us and use the netperf to test throughput
and mean latency. When running the tests, the vhost-net kthread
of that VM, is alway 100% CPU. The commands are shown as below.

Rx performance is greatly improved by this patch. There is not
notable performance change on tx with this series though. This
patch is useful for bi-directional traffic.

netperf -H IP -t TCP_STREAM -l 20 -- -O "THROUGHPUT, THROUGHPUT_UNITS, MEAN_LATENCY"

Topology:
[Host] ->linux bridge -> tap vhost-net ->[Guest]

TCP_STREAM:
* Without the patch:  19842.95 Mbps, 6.50 us mean latency
* With the patch:     37598.20 Mbps, 3.43 us mean latency

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: vhost: factor out busy polling logic to vhost_net_busy_poll()
Tonghao Zhang [Tue, 25 Sep 2018 12:36:51 +0000 (05:36 -0700)]
net: vhost: factor out busy polling logic to vhost_net_busy_poll()

Factor out generic busy polling logic and will be
used for in tx path in the next patch. And with the patch,
qemu can set differently the busyloop_timeout for rx queue.

To avoid duplicate codes, introduce the helper functions:
* sock_has_rx_data(changed from sk_has_rx_data)
* vhost_net_busy_poll_try_queue

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: vhost: replace magic number of lock annotation
Tonghao Zhang [Tue, 25 Sep 2018 12:36:50 +0000 (05:36 -0700)]
net: vhost: replace magic number of lock annotation

Use the VHOST_NET_VQ_XXX as a subclass for mutex_lock_nested.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: vhost: lock the vqs one by one
Tonghao Zhang [Tue, 25 Sep 2018 12:36:49 +0000 (05:36 -0700)]
net: vhost: lock the vqs one by one

This patch changes the way that lock all vqs
at the same, to lock them one by one. It will
be used for next patch to avoid the deadlock.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agotcp: expose sk_state in tcp_retransmit_skb tracepoint
Yafang Shao [Mon, 24 Sep 2018 12:57:29 +0000 (20:57 +0800)]
tcp: expose sk_state in tcp_retransmit_skb tracepoint

After sk_state exposed, we can get in which state this retransmission
occurs. That could give us more detail for dignostic.
For example, if this retransmission occurs in SYN_SENT state, it may
also indicates that the syn packet may be dropped on the remote peer due
to syn backlog queue full and then we could check the remote peer.

BTW,SYNACK retransmission is traced in tcp_retransmit_synack tracepoint.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: faraday: fix return type of ndo_start_xmit function
YueHaibing [Wed, 26 Sep 2018 09:13:05 +0000 (17:13 +0800)]
net: faraday: fix return type of ndo_start_xmit function

The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, so make sure the implementation in
this driver has returns 'netdev_tx_t' value, and change the function
return type to netdev_tx_t.

Found by coccinelle.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: smsc: fix return type of ndo_start_xmit function
YueHaibing [Wed, 26 Sep 2018 09:06:29 +0000 (17:06 +0800)]
net: smsc: fix return type of ndo_start_xmit function

The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, so make sure the implementation in
this driver has returns 'netdev_tx_t' value, and change the function
return type to netdev_tx_t.

Found by coccinelle.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: liquidio: list usage cleanup
zhong jiang [Wed, 26 Sep 2018 08:56:50 +0000 (16:56 +0800)]
net: liquidio: list usage cleanup

Trival cleanup, list_move_tail will implement the same function that
list_del() + list_add_tail() will do. hence just replace them.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: qed: list usage cleanup
zhong jiang [Wed, 26 Sep 2018 08:53:00 +0000 (16:53 +0800)]
net: qed: list usage cleanup

Trival cleanup, list_move_tail will implement the same function that
list_del() + list_add_tail() will do. hence just replace them.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-bridge-convert-bool-options-to-bits'
David S. Miller [Wed, 26 Sep 2018 17:04:23 +0000 (10:04 -0700)]
Merge branch 'net-bridge-convert-bool-options-to-bits'

Nikolay Aleksandrov says:

====================
net: bridge: convert bool options to bits

A lot of boolean bridge options have been added around the net_bridge
structure resulting in holes and more importantly different cache lines
that need to be fetched in the fast path. This set moves all of those
to bits in a bitfield which resides in a hot cache line thus reducing
the size of net_bridge, the number of holes and the number of cache
lines needed for the fast path.
The set is also sent in preparation for new boolean options to avoid
spreading them in the structure and making new holes.
One nice side-effect is that we avoid potential race conditions by using
the bitops since some of the options were bits being directly set in
parallel risking hard to debug issues (has_ipv6_addr).

Before:
 size: 1184, holes: 8, sum holes: 30
After:
 size: 1160, holes: 3, sum holes: 7

Patch 01 is a trivial style fix
Patch 02 adds the new options bitfield and converts the vlan boolean
         options to bits
Patches 03-08 convert the rest of the boolean options to bits
Patch 09 re-arranges a few fields in net_bridge to further reduce size

v2: patch 09: remove the comment about offload_fwd_mark in net_bridge and
    leave it where it is now, thanks to Ido for spotting it
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: bridge: pack net_bridge better
Nikolay Aleksandrov [Wed, 26 Sep 2018 14:01:07 +0000 (17:01 +0300)]
net: bridge: pack net_bridge better

Further reduce the size of net_bridge with 8 bytes and reduce the number of
holes in it:
 Before: holes: 5, sum holes: 15
 After: holes: 3, sum holes: 7

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: bridge: convert mtu_set_by_user to a bit
Nikolay Aleksandrov [Wed, 26 Sep 2018 14:01:06 +0000 (17:01 +0300)]
net: bridge: convert mtu_set_by_user to a bit

Convert the last remaining bool option to a bit thus reducing the overall
net_bridge size further by 8 bytes.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>