linux-2.6-block.git
9 years agoMerge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
David S. Miller [Mon, 13 Apr 2015 22:18:05 +0000 (18:18 -0400)]
Merge branch 'for-davem' of git://git./linux/kernel/git/viro/vfs

Al Viro says:

====================
netdev-related stuff in vfs.git

There are several commits sitting in vfs.git that probably ought to go in
via net-next.git.  First of all, there's merge with vfs.git#iocb - that's
Christoph's aio rework, which has triggered conflicts with the ->sendmsg()
and ->recvmsg() patches a while ago.  It's not so much Christoph's stuff
that ought to be in net-next, as (pretty simple) conflict resolution on merge.
The next chunk is switch to {compat_,}import_iovec/import_single_range - new
safer primitives for initializing iov_iter.  The primitives themselves come
from vfs/git#iov_iter (and they are used quite a lot in vfs part of queue),
conversion of net/socket.c syscalls belongs in net-next, IMO.  Next there's
afs and rxrpc stuff from dhowells.  And then there's sanitizing kernel_sendmsg
et.al.  + missing inlined helper for "how much data is left in msg->msg_iter" -
this stuff is used in e.g.  cifs stuff, but it belongs in net-next.

That pile is pullable from
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-davem

I'll post the individual patches in there in followups; could you take a look
and tell if everything in there is OK with you?
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp/dccp: get rid of central timewait timer
Eric Dumazet [Mon, 13 Apr 2015 01:51:09 +0000 (18:51 -0700)]
tcp/dccp: get rid of central timewait timer

Using a timer wheel for timewait sockets was nice ~15 years ago when
memory was expensive and machines had a single processor.

This does not scale, code is ugly and source of huge latencies
(Typically 30 ms have been seen, cpus spinning on death_lock spinlock.)

We can afford to use an extra 64 bytes per timewait sock and spread
timewait load to all cpus to have better behavior.

Tested:

On following test, /proc/sys/net/ipv4/tcp_tw_recycle is set to 1
on the target (lpaa24)

Before patch :

lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
419594

lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
437171

While test is running, we can observe 25 or even 33 ms latencies.

lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23
...
1000 packets transmitted, 1000 received, 0% packet loss, time 20601ms
rtt min/avg/max/mdev = 0.020/0.217/25.771/1.535 ms, pipe 2

lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23
...
1000 packets transmitted, 1000 received, 0% packet loss, time 20702ms
rtt min/avg/max/mdev = 0.019/0.183/33.761/1.441 ms, pipe 2

After patch :

About 90% increase of throughput :

lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
810442

lpaa23:~# ./super_netperf 200 -H lpaa24 -t TCP_CC -l 60 -- -p0,0
800992

And latencies are kept to minimal values during this load, even
if network utilization is 90% higher :

lpaa24:~# ping -c 1000 -i 0.02 -qn lpaa23
...
1000 packets transmitted, 1000 received, 0% packet loss, time 19991ms
rtt min/avg/max/mdev = 0.023/0.064/0.360/0.042 ms

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonetfilter: Fix format string of nfnetlink_log proc file
Richard Weinberger [Sun, 12 Apr 2015 22:52:39 +0000 (00:52 +0200)]
netfilter: Fix format string of nfnetlink_log proc file

The printed values are all of type unsigned integer, therefore use
%u instead of %d. Otherwise an user can face negative values.

Signed-off-by: Richard Weinberger <richard@nod.at>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonetfilter: Fix format string of nfnetlink_queue proc file
Richard Weinberger [Sun, 12 Apr 2015 22:52:38 +0000 (00:52 +0200)]
netfilter: Fix format string of nfnetlink_queue proc file

The printed values are all of type unsigned integer, therefore use
%u instead of %d. Otherwise an user can face negative values.

Fixes:
$ cat /proc/net/netfilter/nfnetlink_queue
    0  29508   278 2 65531     0 2004213241 -2129885586  1
    1 -27747     0 2 65531     0     0        0  1
    2 -27748     0 2 65531     0     0        0  1

Signed-off-by: Richard Weinberger <richard@nod.at>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonetfilter: Fix portid types
Richard Weinberger [Sun, 12 Apr 2015 22:52:37 +0000 (00:52 +0200)]
netfilter: Fix portid types

The netlink portid is an unsigned integer, use this type
also in netfilter.

Signed-off-by: Richard Weinberger <richard@nod.at>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonfc: Fix portid type in urelease_work
Richard Weinberger [Sun, 12 Apr 2015 22:52:36 +0000 (00:52 +0200)]
nfc: Fix portid type in urelease_work

portid is an unsigned integer. Fix urelease_work to
match all other portid user in the kernel.

Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonetlink: Fix portid type in netlink_notify
Richard Weinberger [Sun, 12 Apr 2015 22:52:35 +0000 (00:52 +0200)]
netlink: Fix portid type in netlink_notify

portid is an unsigned integer. Fix netlink_notify to
match all other portid user in the kernel.

Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp: fix bogus RTT for CC when retransmissions are acked
Kenneth Klette Jonassen [Sat, 11 Apr 2015 00:17:49 +0000 (02:17 +0200)]
tcp: fix bogus RTT for CC when retransmissions are acked

Since retransmitted segments are not used for RTT estimation, previously
SACKed segments present in the rtx queue are used. This estimation can be
several times larger than the actual RTT. When a cumulative ack covers both
previously SACKed and retransmitted segments, CC may thus get a bogus RTT.

Such segments previously had an RTT estimation in tcp_sacktag_one(), so it
seems reasonable to not reuse them in tcp_clean_rtx_queue() at all.

Afaik, this has had no effect on SRTT/RTO because of Karn's check.

Signed-off-by: Kenneth Klette Jonassen <kennetkl@ifi.uio.no>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: use jump label patching for ingress qdisc in __netif_receive_skb_core
Daniel Borkmann [Fri, 10 Apr 2015 21:07:54 +0000 (23:07 +0200)]
net: use jump label patching for ingress qdisc in __netif_receive_skb_core

Even if we make use of classifier and actions from the egress
path, we're going into handle_ing() executing additional code
on a per-packet cost for ingress qdisc, just to realize that
nothing is attached on ingress.

Instead, this can just be blinded out as a no-op entirely with
the use of a static key. On input fast-path, we already make
use of static keys in various places, e.g. skb time stamping,
in RPS, etc. It makes sense to not waste time when we're assured
that no ingress qdisc is attached anywhere.

Enabling/disabling of that code path is being done via two
helpers, namely net_{inc,dec}_ingress_queue(), that are being
invoked under RTNL mutex when a ingress qdisc is being either
initialized or destructed.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'netdev_diet'
David S. Miller [Mon, 13 Apr 2015 17:15:14 +0000 (13:15 -0400)]
Merge branch 'netdev_diet'

Thomas Graf says:

====================
Bring sizeof(net_device) down to < 2K bytes

The size of struct net_device crossed the 2K boundary a while ago which
is a waste in combination with many net namespaces. This series brings
the size of struct net_device down to well below 2K in total size with
a typical configuration. Some reserves a several holes leave room for
further expansion.

Before:
/* size: 2176, cachelines: 34, members: 121 */

After:
/* size: 1984, cachelines: 31, members: 120 */
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet_device: Reorder members to fill holes
Thomas Graf [Fri, 10 Apr 2015 13:52:38 +0000 (15:52 +0200)]
net_device: Reorder members to fill holes

Some trivial reorders while preserving the RX/TX cache lines
split to fill a couple of holes.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoe1000e: Move pm_qos_req to e1000e adapter
Thomas Graf [Fri, 10 Apr 2015 13:52:37 +0000 (15:52 +0200)]
e1000e: Move pm_qos_req to e1000e adapter

e1000e is the only driver requiring pm_qos_req, instead of causing
every device to waste up to 240 bytes. Allocate it for the specific
driver.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoselinux/nlmsg: add a build time check for rtnl/xfrm cmds
Nicolas Dichtel [Mon, 13 Apr 2015 13:20:37 +0000 (15:20 +0200)]
selinux/nlmsg: add a build time check for rtnl/xfrm cmds

When a new rtnl or xfrm command is added, this part of the code is frequently
missing. Let's help the developer with a build time test.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Mon, 13 Apr 2015 01:36:57 +0000 (21:36 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-04-11

This series contains updates to iflink, ixgbe and ixgbevf.

The entire set of changes come from Vlad Zolotarov to ultimately add
the ethtool ops to VF driver to allow querying the RSS indirection table
and RSS random key.

Currently we support only 82599 and x540 devices.  On those devices, VFs
share the RSS redirection table and hash key with a PF.  Letting the VF
query this information may introduce some security risks, therefore this
feature will be disabled by default.

The new netdev op allows a system administrator to change the default
behaviour with "ip link set" command.  The relevant iproute2 patch has
already been sent and awaits for this series upstream.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'fou-next'
David S. Miller [Mon, 13 Apr 2015 01:25:14 +0000 (21:25 -0400)]
Merge branch 'fou-next'

Cong Wang says:

====================
fou: some fixes and updates

Patch 1~3 fix some minor bugs in net/ipv4/fou.c, the only
thing I am not sure is if it's too late to change the
byte order of FOU_ATTR_PORT, if so we have to fix iproute2
instead of kernel.

Patch 4~5 add some new features to make it complete.

v2: make fou->port be16 too
====================

Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agofou: implement FOU_CMD_GET
WANG Cong [Fri, 10 Apr 2015 19:00:30 +0000 (12:00 -0700)]
fou: implement FOU_CMD_GET

Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agofou: add network namespace support
WANG Cong [Fri, 10 Apr 2015 19:00:29 +0000 (12:00 -0700)]
fou: add network namespace support

Also convert the spinlock to a mutex.

Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agofou: always use be16 for port
WANG Cong [Fri, 10 Apr 2015 19:00:28 +0000 (12:00 -0700)]
fou: always use be16 for port

udp_config.local_udp_port is be16. And iproute2 passes
network order for FOU_ATTR_PORT.

This doesn't fix any bug, just for consistency.

Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agofou: exit early when parsing config fails
WANG Cong [Fri, 10 Apr 2015 19:00:27 +0000 (12:00 -0700)]
fou: exit early when parsing config fails

Not a big deal, just for corretness.

Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agofou: avoid calling udp_del_offload() twice
WANG Cong [Fri, 10 Apr 2015 19:00:26 +0000 (12:00 -0700)]
fou: avoid calling udp_del_offload() twice

This fixes the following harmless warning:

./ip/ip fou del port 7777
[  122.907516] udp_del_offload: didn't find offload for port 7777

Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'selinux_xfrm_nl_cmd'
David S. Miller [Mon, 13 Apr 2015 01:19:40 +0000 (21:19 -0400)]
Merge branch 'selinux_xfrm_nl_cmd'

Nicolas Dichtel says:

====================
selinux: add missing xfrm nl cmd

With this series, xfrm commands are fully synchronized.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoselinux/nlmsg: add XFRM_MSG_MAPPING
Nicolas Dichtel [Fri, 10 Apr 2015 14:24:28 +0000 (16:24 +0200)]
selinux/nlmsg: add XFRM_MSG_MAPPING

This command is missing.

Fixes: 3a2dfbe8acb1 ("xfrm: Notify changes in UDP encapsulation via netlink")
CC: Martin Willi <martin@strongswan.org>
Reported-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoselinux/nlmsg: add XFRM_MSG_MIGRATE
Nicolas Dichtel [Fri, 10 Apr 2015 14:24:27 +0000 (16:24 +0200)]
selinux/nlmsg: add XFRM_MSG_MIGRATE

This command is missing.

Fixes: 5c79de6e79cd ("[XFRM]: User interface for handling XFRM_MSG_MIGRATE")
Reported-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoselinux/nlmsg: add XFRM_MSG_REPORT
Nicolas Dichtel [Fri, 10 Apr 2015 14:24:26 +0000 (16:24 +0200)]
selinux/nlmsg: add XFRM_MSG_REPORT

This command is missing.

Fixes: 97a64b4577ae ("[XFRM]: Introduce XFRM_MSG_REPORT.")
Reported-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp: do not cache align timewait sockets
Eric Dumazet [Fri, 10 Apr 2015 13:07:18 +0000 (06:07 -0700)]
tcp: do not cache align timewait sockets

With recent adoption of skc_cookie in struct sock_common,
struct tcp_timewait_sock size increased from 192 to 200 bytes
on 64bit arches. SLAB rounds then to 256 bytes.

It is time to drop SLAB_HWCACHE_ALIGN constraint for twsk_slab.

This saves about 12 MB of memory on typical configuration reaching
262144 timewait sockets, and has no noticeable impact on performance.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge tag 'mac80211-next-for-davem-2015-04-10' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Mon, 13 Apr 2015 00:43:46 +0000 (20:43 -0400)]
Merge tag 'mac80211-next-for-davem-2015-04-10' of git://git./linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
There isn't much left, but we have
 * new mac80211 internal software queue to allow drivers to have
   shorter hardware queues and pull on-demand
 * use rhashtable for mac80211 station table
 * minstrel rate control debug improvements and some refactoring
 * fix noisy message about TX power reduction
 * fix continuous message printing and activity if CRDA doesn't respond
 * fix VHT-related capabilities with "iw connect" or "iwconfig ..."
 * fix Kconfig for cfg80211 wireless extensions compatibility
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet/macb: sqe_test_errors are TX errors, not RX errors
Wolfgang Steinwender [Fri, 10 Apr 2015 09:42:56 +0000 (11:42 +0200)]
net/macb: sqe_test_errors are TX errors, not RX errors

The statistics are grouped by TX and RX errors.
The SQE Test Errors Register indicates problems with TX.

Signed-off-by: Wolfgang Steinwender <wsteinwender@pcs.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonew helper: msg_data_left()
Al Viro [Tue, 16 Dec 2014 02:39:31 +0000 (21:39 -0500)]
new helper: msg_data_left()

convert open-coded instances

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoMerge remote-tracking branch 'dh/afs' into for-davem
Al Viro [Sat, 11 Apr 2015 19:51:09 +0000 (15:51 -0400)]
Merge remote-tracking branch 'dh/afs' into for-davem

9 years agoget rid of the size argument of sock_sendmsg()
Al Viro [Thu, 11 Dec 2014 05:02:50 +0000 (00:02 -0500)]
get rid of the size argument of sock_sendmsg()

it's equal to iov_iter_count(&msg->msg_iter) in all cases

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoixgbevf: Add the appropriate ethtool ops to query RSS indirection table and key
Vlad Zolotarov [Mon, 30 Mar 2015 18:35:29 +0000 (21:35 +0300)]
ixgbevf: Add the appropriate ethtool ops to query RSS indirection table and key

Added get_rxfh_indir_size, get_rxfh_key_size and get_rxfh ethtool_ops
callbacks implementations.

This enables the ethtool's "-x" and "--show-rxfh[-indir]" options for VF
devices.

This patch adds the support for 82599 and x540 devices only. Support for
other devices will be added later.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoixgbevf: Add RSS Key query code
Vlad Zolotarov [Mon, 30 Mar 2015 18:35:28 +0000 (21:35 +0300)]
ixgbevf: Add RSS Key query code

Add the ixgbevf_get_rss_key() function that queries the PF for an RSS
Random Key using a new VF-PF channel IXGBE_VF_GET_RSS_KEY command.

This patch adds the support for 82599 and x540 devices only. Support for
other devices will be added later.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoixgbe: Add GET_RSS_KEY command to VF-PF channel commands set
Vlad Zolotarov [Mon, 30 Mar 2015 18:35:27 +0000 (21:35 +0300)]
ixgbe: Add GET_RSS_KEY command to VF-PF channel commands set

For 82599 and x540 VFs and PF share the same RSS Key. Therefore we will
return the same RSS key for all VFs.

Support for other devices will be added later.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoixgbevf: Add a RETA query code
Vlad Zolotarov [Mon, 30 Mar 2015 18:35:26 +0000 (21:35 +0300)]
ixgbevf: Add a RETA query code

We will currently support only 82599 and x540 devices. Support for other
devices will be added later.

   - Added a new API version support.
   - Added the query implementation in the ixgbevf.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoixgbe: Add a RETA query command to VF-PF channel API
Vlad Zolotarov [Wed, 1 Apr 2015 08:24:54 +0000 (11:24 +0300)]
ixgbe: Add a RETA query command to VF-PF channel API

Add this new command for 82599 and x540 devices only. Support for other
devices will be added later.

82599 and x540 VFs and PF share the same RSS redirection table (RETA).
Therefore we just return it for all VFs.

For 82599 and x540 RETA table is an array of 32 registers (128 bytes) and
the maximum number of registers that may be delivered in a single VF-PF
channel command is 15. On the other hand VFs of these devices can be
configured to have up to 4 RSS queues. Therefore we will "compress" the
RETA by transferring only 2 bits per entry and thereby it will take only 8
registers (DWORDS) to transfer the whole VF RETA.

Thus this patch does the following:

  - Adds a new API version (to specify a new commands set).
  - Adds the IXGBE_VF_GET_RETA command to the VF-PF commands set.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoixgbe: Add a new netdev op to allow/prevent a VF from querying an RSS info
Vlad Zolotarov [Mon, 30 Mar 2015 18:35:24 +0000 (21:35 +0300)]
ixgbe: Add a new netdev op to allow/prevent a VF from querying an RSS info

Implements the new netdev op to allow user to enable/disable the ability
of a specific VF to query its RSS Indirection Table and an RSS Hash Key.

This patch limits the new feature support to 82599 and x540 devices only.
Support for other devices will be added later.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoif_link: Add an additional parameter to ifla_vf_info for RSS querying
Vlad Zolotarov [Mon, 30 Mar 2015 18:35:23 +0000 (21:35 +0300)]
if_link: Add an additional parameter to ifla_vf_info for RSS querying

Add configuration setting for drivers to allow/block an RSS Redirection
Table and a Hash Key querying for discrete VFs.

On some devices VF share the mentioned above information with PF and
querying it may adduce a theoretical security risk. We want to let a
system administrator to decide if he/she wants to take this risk or not.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoixgbe: Add the appropriate ethtool ops to query RSS indirection table and key
Vlad Zolotarov [Mon, 30 Mar 2015 18:18:58 +0000 (21:18 +0300)]
ixgbe: Add the appropriate ethtool ops to query RSS indirection table and key

Added get_rxfh_indir_size, get_rxfh_key_size and get_rxfh ethtool_ops
callbacks implementations.

This enables the ethtool's "-x" and "--show-rxfh[-indir]" options.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoixgbe: Refactor the RSS configuration code
Vlad Zolotarov [Mon, 30 Mar 2015 18:18:57 +0000 (21:18 +0300)]
ixgbe: Refactor the RSS configuration code

This patch is a preparation for enablement of ethtool RSS indirection
table and hash key querying. We don't want to read registers every time
the RSS info is queried. Therefore we will store its current content in the
arrays in the adapter struct and will read it from there (instead of from
registers) when requested.

Will change the code that writes the indirection table and hash key into
the HW registers to take its content from these arrays. This will also
simplify the indirection table updating ethtool callback implementation
in the future.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next...
David S. Miller [Fri, 10 Apr 2015 19:49:34 +0000 (12:49 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-04-10

This series contains updates to ixgbe and documentation for igb,
ixgbe and ixgb.

Stephen cleans up documentation to igb, ixgbe and ixgb.

Don updates how bridge mode is stored to minimize obfuscation and
makes updates for future silicon easier.  Adds a new bridge mode
support function which gathers all the logic needed to configure
bridge modes.  Adds Source Address Prunning for VEPA bridge mode
for x550 devices.

Vasu adds specific FCoE offloads for x550 for DDP context programming
and increased DDP exchanges.

Alex Duyck cleans up the use of HW_VLAN_CTAG_FILTER in hw_features,
where the driver was actually ignoring the value of the bit and was
just assuming it was always set.  Also cleans up the use of rcu_barrier()
since the driver has not used call_rcu() to free the rings for some
time now.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agortnetlink: Mark name argument of rtnl_create_link() const
Thomas Graf [Thu, 9 Apr 2015 23:45:53 +0000 (01:45 +0200)]
rtnetlink: Mark name argument of rtnl_create_link() const

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoixgbe: Drop unnecessary call to rcu_barrier
Alexander Duyck [Fri, 10 Apr 2015 05:03:24 +0000 (22:03 -0700)]
ixgbe: Drop unnecessary call to rcu_barrier

The ixgbe driver hasn't used call_rcu to free the rings for some time now.
Since that is the case the call to rcu_barrier can be dropped since calls
to kfree_rcu don't require it.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoixgbe: Remove NETIF_F_HW_VLAN_CTAG_FILTER from hw_features
Alexander Duyck [Fri, 10 Apr 2015 05:03:24 +0000 (22:03 -0700)]
ixgbe: Remove NETIF_F_HW_VLAN_CTAG_FILTER from hw_features

This change makes it so that the HW_VLAN_CTAG_FILTER bit is not falsely
advertised as being a feature that can be toggled on ixgbe parts.  The
driver was setting the bit in features and letting it be inherited by
hw_features, however the driver was actually ignoring the value of the bit
and just assuming it was always set.  As a result VLAN filtering was always
enabled which is a requirement for SR-IOV, VMDq, DCB, FCoE, and possibly
other features within the adapters.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoixgbe: adds x550 specific FCoE offloads
Vasu Dev [Fri, 10 Apr 2015 05:03:23 +0000 (22:03 -0700)]
ixgbe: adds x550 specific FCoE offloads

Adds x550 specific FCoE offloads for DDP context programming and
increased DDP exchanges.

Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoixgbe: add support for X550 source_address_prunning
Don Skidmore [Fri, 10 Apr 2015 05:03:23 +0000 (22:03 -0700)]
ixgbe: add support for X550 source_address_prunning

This patch will enable X550 Source Address Prunning for VEPA
bridge mode.  This requires that we also have replication enabled
as well, while in this mode.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoixgbe: add new bridge mode support function.
Don Skidmore [Fri, 10 Apr 2015 05:03:22 +0000 (22:03 -0700)]
ixgbe: add new bridge mode support function.

This patch gathers together all the logic needed to configure bridge
modes.  Currently that it is rather simple but this is really laying
the ground work for future X550 feature enhancement.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoixgbe: Move bridge mode from flag to variable
Don Skidmore [Fri, 10 Apr 2015 05:03:22 +0000 (22:03 -0700)]
ixgbe: Move bridge mode from flag to variable

We are currently storing our BRIDGE_MODE as a bit in our adapter flags.
This patch will store the actual mode instead which minimizes obfuscation
and makes following patches for X550 simpler.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoixgb: remove references to ifconfig
Stephen Hemminger [Fri, 10 Apr 2015 05:03:21 +0000 (22:03 -0700)]
ixgb: remove references to ifconfig

Move documentation into this century, even if this device hasn't
been available for some time.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoixgbe: fix documentation
Stephen Hemminger [Fri, 10 Apr 2015 05:03:21 +0000 (22:03 -0700)]
ixgbe: fix documentation

The MTU values in the documentation do not match the source.
The source has frame limit of IXGBE_MAX_JUMBO_FRAME_SIZE (9728)
which is MTU of 9710 because of the accounting for Ethernet header
and CRC.

Also, don't refer to the obsolete ifconfig command.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoigb: doc don't refer to ifconfig
Stephen Hemminger [Fri, 10 Apr 2015 04:02:02 +0000 (21:02 -0700)]
igb: doc don't refer to ifconfig

ifconfig command is obsolete, best to remove all references so that
new users learn ip.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
9 years agoMerge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetoot...
David S. Miller [Thu, 9 Apr 2015 22:31:50 +0000 (18:31 -0400)]
Merge branch 'for-upstream' of git://git./linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2015-04-09

We've had enough new patches during the past week (especially from
Marcel) that it'd be good to still get these queued for 4.1.

The majority of the changes are from Marcel with lots of cleanup &
refactoring patches for the HCI UART driver. Marcel also split out some
Broadcom & Intel vendor specific functionality into two new btintel &
btbcm modules.

In addition to the HCI driver changes there's the completion of our
local OOB data interface for pairing, added support for requesting
remote LE features when connecting, as well as a couple of minor fixes
for mac802154.

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp: md5: fix a typo in tcp_v4_md5_lookup()
Eric Dumazet [Thu, 9 Apr 2015 21:36:42 +0000 (14:36 -0700)]
tcp: md5: fix a typo in tcp_v4_md5_lookup()

Lookup key for tcp_md5_do_lookup() has to be taken
from addr_sk, not sk (which can be the listener)

Fixes: fd3a154a00fb ("tcp: md5: get rid of tcp_v[46]_reqsk_md5_lookup()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'xgbe-next'
David S. Miller [Thu, 9 Apr 2015 21:35:37 +0000 (17:35 -0400)]
Merge branch 'xgbe-next'

Tom Lendacky says:

====================
amd-xgbe: AMD XGBE driver updates 2015-04-09

The following series of patches includes functional updates and changes
to the driver.

- Allow ethtool rx-frames coalescing to be changed while the device is up
- Consolidate initialization routine into the init function
- Add support for the TX watchdog timeout

This patch series is based on net-next.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoamd-xgbe: Add support for the netdev Tx watchdog
Lendacky, Thomas [Thu, 9 Apr 2015 17:12:03 +0000 (12:12 -0500)]
amd-xgbe: Add support for the netdev Tx watchdog

Add support to be able to detect a hung Tx task by adding the netdev
ndo_tx_timeout function callback. Do not set the watchdog_timeo value
so as to use the system default time (currently 5 seconds).

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoamd-xgbe: Move Rx mode configuration into init
Lendacky, Thomas [Thu, 9 Apr 2015 17:11:57 +0000 (12:11 -0500)]
amd-xgbe: Move Rx mode configuration into init

Currently a call to configure the Rx mode (promiscuous mode, all
multicast mode, etc.) is made in xgbe_start separate from the xgbe_init
function. This call to set the Rx mode should be part of the xgbe_init
function so that calls to the init function don't have to be preceded
with calls to configure the Rx mode.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoamd-xgbe: Allow rx-frames coalescing to be changed anytime
Lendacky, Thomas [Thu, 9 Apr 2015 17:11:51 +0000 (12:11 -0500)]
amd-xgbe: Allow rx-frames coalescing to be changed anytime

Currently the device must be down in order to update the rx-frames
coalescing setting because the interrupt indicator is set in the
descriptor data during initialization. Allow this setting to be changed
while the device is up by moving the interrupt decision into the
descriptor reset function and base the decision off of the supplied
descriptor index value.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonet: Pass VLAN ID to rtnl_fdb_notify.
Hubert Sokolowski [Thu, 9 Apr 2015 12:16:17 +0000 (12:16 +0000)]
net: Pass VLAN ID to rtnl_fdb_notify.

When an FDB entry is added or deleted the information about VLAN
is not passed to listening applications like 'bridge monitor fdb'.
With this patch VLAN ID is passed if it was set in the original
netlink message.

Also remove an unused bdev variable.

Signed-off-by: Hubert Sokolowski <hubert.sokolowski@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
David S. Miller [Thu, 9 Apr 2015 18:46:04 +0000 (14:46 -0400)]
Merge git://git./linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next tree.
They are:

* nf_tables set timeout infrastructure from Patrick Mchardy.

1) Add support for set timeout support.

2) Add support for set element timeouts using the new set extension
   infrastructure.

4) Add garbage collection helper functions to get rid of stale elements.
   Elements are accumulated in a batch that are asynchronously released
   via RCU when the batch is full.

5) Add garbage collection synchronization helpers. This introduces a new
   element busy bit to address concurrent access from the netlink API and the
   garbage collector.

5) Add timeout support for the nft_hash set implementation. The garbage
   collector peridically checks for stale elements from the workqueue.

* iptables/nftables cgroup fixes:

6) Ignore non full-socket objects from the input path, otherwise cgroup
   match may crash, from Daniel Borkmann.

7) Fix cgroup in nf_tables.

8) Save some cycles from xt_socket by skipping packet header parsing when
   skb->sk is already set because of early demux. Also from Daniel.

* br_netfilter updates from Florian Westphal.

9) Save frag_max_size and restore it from the forward path too.

10) Use a per-cpu area to restore the original source MAC address when traffic
    is DNAT'ed.

11) Add helper functions to access physical devices.

12) Use these new physdev helper function from xt_physdev.

13) Add another nf_bridge_info_get() helper function to fetch the br_netfilter
    state information.

14) Annotate original layer 2 protocol number in nf_bridge info, instead of
    using kludgy flags.

15) Also annotate the pkttype mangling when the packet travels back and forth
    from the IP to the bridge layer, instead of using a flag.

* More nf_tables set enhancement from Patrick:

16) Fix possible usage of set variant that doesn't support timeouts.

17) Avoid spurious "set is full" errors from Netlink API when there are pending
    stale elements scheduled to be released.

18) Restrict loop checks to set maps.

19) Add support for dynamic set updates from the packet path.

20) Add support to store optional user data (eg. comments) per set element.

BTW, I have also pulled net-next into nf-next to anticipate the conflict
resolution between your okfn() signature changes and Florian's br_netfilter
updates.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge tag 'wireless-drivers-next-for-davem-2015-04-09' of git://git.kernel.org/pub...
David S. Miller [Thu, 9 Apr 2015 18:43:13 +0000 (14:43 -0400)]
Merge tag 'wireless-drivers-next-for-davem-2015-04-09' of git://git./linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
Major changes:

iwlwifi:

* some more work on LAR
* fixes for UMAC scan
* more work on debugging framework
* more work for 8000 devices
* cleanups and small bugfixes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec...
David S. Miller [Thu, 9 Apr 2015 18:41:47 +0000 (14:41 -0400)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2015-04-09

1) Prohibit the use/abuse of the xfrm netlink interface on
   32/64 bit compatibility tasks. We need a full compat
   layer before we can allow this. From Fan Du.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'dma_rmb_wmb'
David S. Miller [Thu, 9 Apr 2015 18:25:26 +0000 (14:25 -0400)]
Merge branch 'dma_rmb_wmb'

Alexander Duyck says:

====================
Replace wmb()/rmb() with dma_wmb()/dma_rmb() where appropriate, round 2

More cleanup of drivers in order to start making use of dma_rmb and dma_wmb
calls.  This is another pass of what I would consider to be low hanging
fruit.  There may be other opportunities to make use of the barriers in the
Mellanox and Chelsio drivers but I didn't want to risk meddling with code I
was not completely familiar with so I am leaving that for future work.

I have revisited the Mellanox driver changes.  This time around I went only
for the sections with a clearly defined pattern.  For dma_wmb I used it
between accesses of the descriptor bits followed by owner or size.  For
dma_rmb I used it to replace rmb following a read of the ownership bit in
the descriptor.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoe100: Use dma_rmb/wmb where appropriate
Alexander Duyck [Thu, 9 Apr 2015 01:49:49 +0000 (18:49 -0700)]
e100: Use dma_rmb/wmb where appropriate

Reduce the CPU overhead for transmit and receive by using lightweight dma_
barriers instead of full barriers where they are applicable.

Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoi40e/i40evf: Use dma_rmb where appropriate
Alexander Duyck [Thu, 9 Apr 2015 01:49:43 +0000 (18:49 -0700)]
i40e/i40evf: Use dma_rmb where appropriate

Update i40e and i40evf to use dma_rmb.  This should improve performance by
decreasing the barrier overhead on strong ordered architectures.

Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomlx4/mlx5: Use dma_wmb/rmb where appropriate
Alexander Duyck [Thu, 9 Apr 2015 01:49:36 +0000 (18:49 -0700)]
mlx4/mlx5: Use dma_wmb/rmb where appropriate

This patch should help to improve the performance of the mlx4 and mlx5 on a
number of architectures.  For example, on x86 the dma_wmb/rmb equates out
to a barrer() call as the architecture is already strong ordered, and on
PowerPC the call works out to a lwsync which is significantly less expensive
than the sync call that was being used for wmb.

I placed the new barriers between any spots that seemed to be trying to
order memory/memory reads or writes, if there are any spots that involved
MMIO I left the existing wmb in place as the new barriers cannot order
transactions between coherent and non-coherent memories.

v2: Reduced the replacments to just the spots where I could clearly
    identify the usage pattern.

Cc: Amir Vadai <amirv@mellanox.com>
Cc: Ido Shamay <idos@mellanox.com>
Cc: Eli Cohen <eli@mellanox.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agocxgb3/4/4vf: Update drivers to use dma_rmb/wmb where appropriate
Alexander Duyck [Thu, 9 Apr 2015 01:49:29 +0000 (18:49 -0700)]
cxgb3/4/4vf: Update drivers to use dma_rmb/wmb where appropriate

Update the Chelsio Ethernet drivers to use the dma_rmb/wmb calls instead of
the full barriers in order to improve performance.

Cc: Santosh Raspatur <santosh@chelsio.com>
Cc: Hariprasad S <hariprasad@chelsio.com>
Cc: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agomac802154: fix transmission power datatype
Varka Bhadram [Thu, 9 Apr 2015 08:25:11 +0000 (13:55 +0530)]
mac802154: fix transmission power datatype

Netlink attribute for the power is s8. But for the driver level
operations we are collection power level value into integer.
It has to be change to s8 from int.

Signed-off-by: Varka Bhadram <varkab@cdac.in>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
9 years agoBluetooth: btusb: Use proper data structures for Intel vendor events
Marcel Holtmann [Thu, 9 Apr 2015 07:35:19 +0000 (00:35 -0700)]
Bluetooth: btusb: Use proper data structures for Intel vendor events

The Intel vendors events indicating firmware loading result and the
bootup of the operational firmware are currently hardcoded byte
comparisons. So intead of doing that, provide proper data structures
and actually use them.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
9 years agomac802154: fix typo for device
Varka Bhadram [Thu, 9 Apr 2015 06:35:43 +0000 (12:05 +0530)]
mac802154: fix typo for device

Signed-off-by: Varka Bhadram <varkab@cdac.in>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
9 years agoBluetooth: Read LE remote features during connection establishment
Marcel Holtmann [Wed, 8 Apr 2015 16:05:27 +0000 (09:05 -0700)]
Bluetooth: Read LE remote features during connection establishment

When establishing a Bluetooth LE connection, read the remote used
features mask to determine which features are supported. This was
not really needed with Bluetooth 4.0, but since Bluetooth 4.1 and
also 4.2 have introduced new optional features, this becomes more
important.

This works the same as with BR/EDR where the connection enters the
BT_CONFIG stage and hci_connect_cfm call is delayed until the remote
features have been retrieved. Only after successfully receiving the
remote features, the connection enters the BT_CONNECTED state.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
9 years agoswitch kernel_sendmsg() and kernel_recvmsg() to iov_iter_kvec()
Al Viro [Sat, 21 Mar 2015 23:56:16 +0000 (19:56 -0400)]
switch kernel_sendmsg() and kernel_recvmsg() to iov_iter_kvec()

For kernel_sendmsg() that eliminates the need to play with setfs();
for kernel_recvmsg() it does *not* - a couple of callers are using
it with non-NULL ->msg_control, which would be treated as userland
address on recvmsg side of things.

In all cases we are really setting a kvec-backed iov_iter, though.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonet: switch importing msghdr from userland to {compat_,}import_iovec()
Al Viro [Sat, 21 Mar 2015 23:29:06 +0000 (19:29 -0400)]
net: switch importing msghdr from userland to {compat_,}import_iovec()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonet: switch sendto() and recvfrom() to import_single_range()
Al Viro [Sat, 21 Mar 2015 23:12:32 +0000 (19:12 -0400)]
net: switch sendto() and recvfrom() to import_single_range()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoMerge branch 'iov_iter' into for-davem
Al Viro [Thu, 9 Apr 2015 04:02:06 +0000 (00:02 -0400)]
Merge branch 'iov_iter' into for-davem

9 years agoMerge branch 'iocb' into for-davem
Al Viro [Thu, 9 Apr 2015 04:00:30 +0000 (00:00 -0400)]
Merge branch 'iocb' into for-davem

trivial conflict in net/socket.c and non-trivial one in crypto -
that one had evaded aio_complete() removal.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agovxlan: do not exit on error in vxlan_stop()
WANG Cong [Wed, 8 Apr 2015 21:48:30 +0000 (14:48 -0700)]
vxlan: do not exit on error in vxlan_stop()

We need to clean up vxlan despite vxlan_igmp_leave() fails.

This fixes the following kernel warning:

 WARNING: CPU: 0 PID: 6 at lib/debugobjects.c:263 debug_print_object+0x7c/0x8d()
 ODEBUG: free active (active state 0) object type: timer_list hint: vxlan_cleanup+0x0/0xd0
 CPU: 0 PID: 6 Comm: kworker/u8:0 Not tainted 4.0.0-rc7+ #953
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 Workqueue: netns cleanup_net
  0000000000000009 ffff88011955f948 ffffffff81a25f5a 00000000253f253e
  ffff88011955f998 ffff88011955f988 ffffffff8107608e 0000000000000000
  ffffffff814deba2 ffff8800d4e94000 ffffffff82254c30 ffffffff81fbe455
 Call Trace:
  [<ffffffff81a25f5a>] dump_stack+0x4c/0x65
  [<ffffffff8107608e>] warn_slowpath_common+0x9c/0xb6
  [<ffffffff814deba2>] ? debug_print_object+0x7c/0x8d
  [<ffffffff81076116>] warn_slowpath_fmt+0x46/0x48
  [<ffffffff814deba2>] debug_print_object+0x7c/0x8d
  [<ffffffff81666bf1>] ? vxlan_fdb_destroy+0x5b/0x5b
  [<ffffffff814dee02>] __debug_check_no_obj_freed+0xc3/0x15f
  [<ffffffff814df728>] debug_check_no_obj_freed+0x12/0x16
  [<ffffffff8117ae4e>] slab_free_hook+0x64/0x6c
  [<ffffffff8114deaa>] ? kvfree+0x31/0x33
  [<ffffffff8117dc66>] kfree+0x101/0x1ac
  [<ffffffff8114deaa>] kvfree+0x31/0x33
  [<ffffffff817d4137>] netdev_freemem+0x18/0x1a
  [<ffffffff817e8b52>] netdev_release+0x2e/0x32
  [<ffffffff815b4163>] device_release+0x5a/0x92
  [<ffffffff814bd4dd>] kobject_cleanup+0x49/0x5e
  [<ffffffff814bd3ff>] kobject_put+0x45/0x49
  [<ffffffff817d3fc1>] netdev_run_todo+0x26f/0x283
  [<ffffffff817d4873>] ? rollback_registered_many+0x20f/0x23b
  [<ffffffff817e0c80>] rtnl_unlock+0xe/0x10
  [<ffffffff817d4af0>] default_device_exit_batch+0x12a/0x139
  [<ffffffff810aadfa>] ? wait_woken+0x8f/0x8f
  [<ffffffff817c8e14>] ops_exit_list+0x2b/0x57
  [<ffffffff817c9b21>] cleanup_net+0x154/0x1e7
  [<ffffffff8108b05d>] process_one_work+0x255/0x4ad
  [<ffffffff8108af69>] ? process_one_work+0x161/0x4ad
  [<ffffffff8108b4b1>] worker_thread+0x1cd/0x2ab
  [<ffffffff8108b2e4>] ? process_scheduled_works+0x2f/0x2f
  [<ffffffff81090686>] kthread+0xd4/0xdc
  [<ffffffff8109eca3>] ? local_clock+0x19/0x22
  [<ffffffff810905b2>] ? __kthread_parkme+0x83/0x83
  [<ffffffff81a31c48>] ret_from_fork+0x58/0x90
  [<ffffffff810905b2>] ? __kthread_parkme+0x83/0x83

For the long-term, we should handle NETDEV_{UP,DOWN} event
from the lower device of a tunnel device.

Fixes: 56ef9c909b40 ("vxlan: Move socket initialization to within rtnl scope")
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agotcp: do not rearm rsk_timer on FastOpen requests
Eric Dumazet [Wed, 8 Apr 2015 22:34:04 +0000 (15:34 -0700)]
tcp: do not rearm rsk_timer on FastOpen requests

FastOpen requests are not like other regular request sockets.

They do not yet use rsk_timer : tcp_fastopen_queue_check()
simply manually removes one expired request from fastopenq->rskq_rst
list.

Therefore, tcp_check_req() must not call mod_timer_pending(),
otherwise we crash because rsk_timer was not initialized.

Fixes: fa76ce7328b ("inet: get rid of central tcp/dccp listener timer")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge tag 'nfc-next-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo...
David S. Miller [Wed, 8 Apr 2015 21:12:50 +0000 (17:12 -0400)]
Merge tag 'nfc-next-4.1-1' of git://git./linux/kernel/git/sameo/nfc-next

Samuel Ortiz says:

====================
NFC: 4.1 pull request

This is the NFC pull request for 4.1.

This is a shorter one than usual, as the Intel Field Peak NFC
driver could not make it in time.

We have:

- A new driver for NXP NCI based chipsets, like e.g. the NPC100 or
  the PN7150. It currently only supports an i2c physical layer, but
  could easily be extended to work on top of e.g. SPI.
  This driver also includes support for user space triggered firmware
  updates.

- A few minor st21nfc[ab] fixes, cleanups, and comments improvements.

- A pn533 error return fix.

- A few NFC related logs formatting cleanups.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agosfc: Revert SRIOV changes.
David S. Miller [Wed, 8 Apr 2015 20:30:01 +0000 (16:30 -0400)]
sfc: Revert SRIOV changes.

This reverts commits:

d92916f71a57582ce7276547510cedb2c10b6bd6 ("sfc: Own header for nic-specific sriov functions,")
25672dba9535b804331145379c79f835ba2205c5 ("sfc: Enable VF's via a write to the sysfs file
 sriov_numvfs")

As they break the build with SRIOV disabled and there is no
easy way to fix it the way things are arranged.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agonetfilter: Fix switch statement warnings with recent gcc.
David Miller [Wed, 8 Apr 2015 03:05:42 +0000 (23:05 -0400)]
netfilter: Fix switch statement warnings with recent gcc.

More recent GCC warns about two kinds of switch statement uses:

1) Switching on an enumeration, but not having an explicit case
   statement for all members of the enumeration.  To show the
   compiler this is intentional, we simply add a default case
   with nothing more than a break statement.

2) Switching on a boolean value.  I think this warning is dumb
   but nevertheless you get it wholesale with -Wswitch.

This patch cures all such warnings in netfilter.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
9 years agoMerge branch 'selinux-nlmsg'
David S. Miller [Wed, 8 Apr 2015 19:19:17 +0000 (15:19 -0400)]
Merge branch 'selinux-nlmsg'

Nicolas Dichtel says:

====================
selinux: add some missing nlmsg commands

It's not a critical issue, thus the patches are based on net-next.

Patches are splitted because the 'Fixes' tag is not the same for all
commands.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoselinux/nlmsg: add XFRM_MSG_[NEW|GET]SADINFO
Nicolas Dichtel [Wed, 8 Apr 2015 16:36:42 +0000 (18:36 +0200)]
selinux/nlmsg: add XFRM_MSG_[NEW|GET]SADINFO

These commands are missing.

Fixes: 28d8909bc790 ("[XFRM]: Export SAD info.")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoselinux/nlmsg: add XFRM_MSG_GETSPDINFO
Nicolas Dichtel [Wed, 8 Apr 2015 16:36:41 +0000 (18:36 +0200)]
selinux/nlmsg: add XFRM_MSG_GETSPDINFO

This command is missing.

Fixes: ecfd6b183780 ("[XFRM]: Export SPD info")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoselinux/nlmsg: add XFRM_MSG_NEWSPDINFO
Nicolas Dichtel [Wed, 8 Apr 2015 16:36:40 +0000 (18:36 +0200)]
selinux/nlmsg: add XFRM_MSG_NEWSPDINFO

This new command is missing.

Fixes: 880a6fab8f6b ("xfrm: configure policy hash table thresholds by netlink")
Reported-by: Christophe Gouault <christophe.gouault@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoselinux/nlmsg: add RTM_GETNSID
Nicolas Dichtel [Wed, 8 Apr 2015 16:36:39 +0000 (18:36 +0200)]
selinux/nlmsg: add RTM_GETNSID

This new command is missing.

Fixes: 9a9634545c70 ("netns: notify netns id events")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoselinux/nlmsg: add RTM_NEWNSID and RTM_GETNSID
Nicolas Dichtel [Wed, 8 Apr 2015 16:36:38 +0000 (18:36 +0200)]
selinux/nlmsg: add RTM_NEWNSID and RTM_GETNSID

These new commands are missing.

Fixes: 0c7aecd4bde4 ("netns: add rtnl cmd to add and get peer netns ids")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agostmmac: Add an optional register interface clock
Andrew Bresticker [Tue, 7 Apr 2015 20:38:45 +0000 (13:38 -0700)]
stmmac: Add an optional register interface clock

The DWMAC block on certain SoCs (such as IMG Pistachio) have a second
clock which must be enabled in order to access the peripheral's
register interface, so add support for requesting and enabling an
optional "pclk".

Signed-off-by: Andrew Bresticker <abrestic@chromium.org>
Cc: James Hartley <james.hartley@imgtec.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agovxlan: fix a shadow local variable
WANG Cong [Wed, 8 Apr 2015 17:17:58 +0000 (10:17 -0700)]
vxlan: fix a shadow local variable

Commit 79b16aadea32cce077
("udp_tunnel: Pass UDP socket down through udp_tunnel{, 6}_xmit_skb()")
introduce 'sk' but we already have one inner 'sk'.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pablo Neira Ayuso [Wed, 8 Apr 2015 15:40:17 +0000 (17:40 +0200)]
Merge git://git./linux/kernel/git/davem/net-next

Resolve conflicts between 5888b93 ("Merge branch 'nf-hook-compress'") and
Florian Westphal br_netfilter works.

Conflicts:
        net/bridge/br_netfilter.c

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
9 years agoMerge branch 'hv_netvsc_linearize'
David S. Miller [Wed, 8 Apr 2015 16:27:26 +0000 (12:27 -0400)]
Merge branch 'hv_netvsc_linearize'

Vitaly Kuznetsov says:

====================
hv_netvsc: linearize SKBs bigger than MAX_PAGE_BUFFER_COUNT-2 pages

This patch series fixes the same issue which was fixed in Xen with commit
97a6d1bb2b658ac85ed88205ccd1ab809899884d ("xen-netfront: Fix handling packets on
compound pages with skb_linearize").

It is relatively easy to create a packet which is small in size but occupies
more than 30 (MAX_PAGE_BUFFER_COUNT-2) pages. Here is a kernel-mode reproducer
which tries sending a packet with only 34 bytes of payload (but on 34 pages)
and fails:

static int __init sendfb_init(void)
{
struct socket *sock;
int i, ret;
struct sockaddr_in in4_addr = { 0 };
struct page *pages[17];
unsigned long flags;

ret = sock_create_kern(AF_INET, SOCK_STREAM, IPPROTO_TCP, &sock);
if (ret) {
pr_err("failed to create socket: %d!\n", ret);
return ret;
}

in4_addr.sin_family = AF_INET;
/* www.google.com, 74.125.133.99 */
in4_addr.sin_addr.s_addr = cpu_to_be32(0x4a7d8563);
in4_addr.sin_port = cpu_to_be16(80);

ret = sock->ops->connect(sock, (struct sockaddr *)&in4_addr, sizeof(in4_addr), 0);
if (ret) {
pr_err("failed to connect: %d!\n", ret);
return ret;
}

/* We can send up to 17 frags */
flags = MSG_MORE;
for (i = 0; i < 17; i++) {
if (i == 16)
flags = MSG_EOR;
pages[i] = alloc_pages(GFP_KERNEL | __GFP_COMP, 1);
if (!pages[i]) {
pr_err("out of memory!");
goto free_pages;
}
sock->ops->sendpage(sock, pages[i], PAGE_SIZE -1, 2, flags);
}

free_pages:
for (; i > 0; i--)
__free_pages(pages[i - 1], 1);

printk("sendfb_init: test done\n");
        return -1;
}

module_init(sendfb_init);

MODULE_LICENSE("GPL");

A try to load such module results in multiple
'kernel: hv_netvsc vmbus_15 eth0: Packet too big: 100' messages as all retries
fail as well. It should also be possible to trigger the issue from userspace, I
expect e.g. NFS under heavy load to get stuck sometimes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agohv_netvsc: try linearizing big SKBs before dropping them
Vitaly Kuznetsov [Wed, 8 Apr 2015 15:54:06 +0000 (17:54 +0200)]
hv_netvsc: try linearizing big SKBs before dropping them

In netvsc_start_xmit() we can handle packets which are scattered around not
more than MAX_PAGE_BUFFER_COUNT-2 pages. It is, however, easy to create a
packet which is not big in size but occupies more pages (e.g. if it uses frags
on compound pages boundaries). When we drop such packet it cases sender to try
resending it but in most cases it will try resending the same packet which will
also get dropped, this will cause the particular connection to stick. To solve
the issue we can try linearizing skb.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agohv_netvsc: use single existing drop path in netvsc_start_xmit
Vitaly Kuznetsov [Wed, 8 Apr 2015 15:54:05 +0000 (17:54 +0200)]
hv_netvsc: use single existing drop path in netvsc_start_xmit

... which validly uses dev_kfree_skb_any() instead of dev_kfree_skb().

Setting ret to -EFAULT and -ENOMEM have no real meaning here (we need to set
it to anything but -EAGAIN) as we drop the packet and return NETDEV_TX_OK
anyway.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'sfc-next'
David S. Miller [Wed, 8 Apr 2015 16:21:36 +0000 (12:21 -0400)]
Merge branch 'sfc-next'

Shradha Shah says:

====================
sfc: Nic specific sriov functions, netdev_ops and sriov_configure

First two patches among the series of patches to support SRIOV on EF10.

First patch declares nic specific sriov functions in nic specific headers,
creates only one instance of the netdev_ops, removes sriov functionality
from Falcon code.

Second patch adds support for sriov_configure.

The Virtual Functions can be enabled but they do not bind to the SFC
driver just yet.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agosfc: Enable VF's via a write to the sysfs file sriov_numvfs
Shradha Shah [Wed, 8 Apr 2015 14:25:04 +0000 (15:25 +0100)]
sfc: Enable VF's via a write to the sysfs file sriov_numvfs

This patch adds support for the use of sriov_configure on EF10
to enable Virtual Functions while the driver is loaded.

Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agosfc: Own header for nic-specific sriov functions, single instance of netdev_ops and...
Shradha Shah [Wed, 8 Apr 2015 14:24:45 +0000 (15:24 +0100)]
sfc: Own header for nic-specific sriov functions, single instance of netdev_ops and sriov removed from Falcon code

By putting all the efx_{siena,ef10}_sriov_* declarations in
{siena,ef10}_sriov.h, ensure they cannot be called from nic-generic code.
Also fixes up an instance of this, where mcdi.c was calling
efx_siena_sriov_flr.

The single instance of netdev_ops should call general high level
functions that can then call something adapter specific in efx_nic_type.
We should only do adapter specialisation via efx_nic_type.

Removal of sriov functionality from the Falcon code means that tests
are needed for the presence of some callbacks.

Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'dma_rmb_wmb'
David S. Miller [Wed, 8 Apr 2015 16:15:15 +0000 (12:15 -0400)]
Merge branch 'dma_rmb_wmb'

Alexander Duyck says:

====================
Replace wmb()/rmb() with dma_wmb()/dma_rmb() where appropriate

This is a start of a side project cleaning up the drivers that can make use
of the dma_wmb and dma_rmb calls.  The general idea is to start removing
the unnecessary wmb/rmb calls from a number of drivers and to make use of
the lighter weight dma_wmb/dma_rmb calls as this should allow for an
overall improvement in performance as each barrier can cost a significant
number of cycles and on architectures such as x86 this is unnecessary.

These changes are what I would consider low hanging fruit.  The likelihood
of the changes introducing an error should be low since the use of the
barriers in these cases are fairly obvious.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoe1000, e1000e: Use dma_rmb instead of rmb for descriptor read ordering
Alexander Duyck [Tue, 7 Apr 2015 23:55:27 +0000 (16:55 -0700)]
e1000, e1000e: Use dma_rmb instead of rmb for descriptor read ordering

This change replaces calls to rmb with dma_rmb in the case where we want to
order all follow-on descriptor reads after the check for the descriptor
status bit.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agos2io: Update driver to use dma_wmb
Alexander Duyck [Tue, 7 Apr 2015 23:55:21 +0000 (16:55 -0700)]
s2io: Update driver to use dma_wmb

This change updates several spots where a wmb was being used to instead use
a dma_wmb to flush out writes before updating the control portion of the
descriptor.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agosungem, sunhme, sunvnet: Update drivers to use dma_wmb/rmb
Alexander Duyck [Tue, 7 Apr 2015 23:55:14 +0000 (16:55 -0700)]
sungem, sunhme, sunvnet: Update drivers to use dma_wmb/rmb

This patch goes through and replaces wmb/rmb with dma_wmb/dma_rmb in cases
where the barrier is being used to order writes or reads to just memory and
doesn't involve any programmed I/O.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobonding: Remove unnecessary initialization
Mahesh Bandewar [Tue, 7 Apr 2015 23:16:29 +0000 (16:16 -0700)]
bonding: Remove unnecessary initialization

bond_3ad_bind_slave() calls ad_initialize_port() and then immediately
assigns correct values making some of that initialization unnecessary.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobonding: Code re-factoring for admin, oper-key operations
Mahesh Bandewar [Tue, 7 Apr 2015 23:16:11 +0000 (16:16 -0700)]
bonding: Code re-factoring for admin, oper-key operations

This patch breaks the rich assignments into it's own statements
and removes some duplicate code where admin-key, & oper-key are
updated.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>