David S. Miller [Sun, 17 Dec 2017 03:11:55 +0000 (22:11 -0500)]
Merge git://git./linux/kernel/git/davem/net
Three sets of overlapping changes, two in the packet scheduler
and one in the meson-gxl PHY driver.
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 16 Dec 2017 21:43:08 +0000 (13:43 -0800)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"More fixes from testing done on the rc kernel, including more SELinux
testing. Looking forward, lockdep found regression today in ipoib
which is still being fixed.
Summary:
- Fix for SELinux on the umad SMI path. Some old hardware does not
fill the PKey properly exposing another bug in the newer SELinux
code.
- Check the input port as we can exceed array bounds from this user
supplied value
- Users are unable to use the hash field support as they want due to
incorrect checks on the field restrictions, correct that so the
feature works as intended
- User triggerable oops in the NETLINK_RDMA handler
- cxgb4 driver fix for a bad interaction with CQ flushing in iser
caused by patches in this merge window, and bad CQ flushing during
normal close.
- Unbalanced memalloc_noio in ipoib in an error path"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
IB/ipoib: Restore MM behavior in case of tx_ring allocation failure
iw_cxgb4: only insert drain cqes if wq is flushed
iw_cxgb4: only clear the ARMED bit if a notification is needed
RDMA/netlink: Fix general protection fault
IB/mlx4: Fix RSS hash fields restrictions
IB/core: Don't enforce PKey security on SMI MADs
IB/core: Bound check alternate path port number
Linus Torvalds [Sat, 16 Dec 2017 21:34:38 +0000 (13:34 -0800)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Two bugfixes for the AT24 I2C eeprom driver and some minor corrections
for I2C bus drivers"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: piix4: Fix port number check on release
i2c: stm32: Fix copyrights
i2c-cht-wc: constify platform_device_id
eeprom: at24: change nvmem stride to 1
eeprom: at24: fix I2C device selection for runtime PM
Linus Torvalds [Sat, 16 Dec 2017 21:12:53 +0000 (13:12 -0800)]
Merge tag 'nfs-for-4.15-3' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
"This has two stable bugfixes, one to fix a BUG_ON() when
nfs_commit_inode() is called with no outstanding commit requests and
another to fix a race in the SUNRPC receive codepath.
Additionally, there are also fixes for an NFS client deadlock and an
xprtrdma performance regression.
Summary:
Stable bugfixes:
- NFS: Avoid a BUG_ON() in nfs_commit_inode() by not waiting for a
commit in the case that there were no commit requests.
- SUNRPC: Fix a race in the receive code path
Other fixes:
- NFS: Fix a deadlock in nfs client initialization
- xprtrdma: Fix a performance regression for small IOs"
* tag 'nfs-for-4.15-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
SUNRPC: Fix a race in the receive code path
nfs: don't wait on commit in nfs_commit_inode() if there were no commit requests
xprtrdma: Spread reply processing over more CPUs
nfs: fix a deadlock in nfs client initialization
Linus Torvalds [Sat, 16 Dec 2017 02:53:22 +0000 (18:53 -0800)]
Revert "mm: replace p??_write with pte_access_permitted in fault + gup paths"
This reverts commits
5c9d2d5c269c,
c7da82b894e9, and
e7fe7b5cae90.
We'll probably need to revisit this, but basically we should not
complicate the get_user_pages_fast() case, and checking the actual page
table protection key bits will require more care anyway, since the
protection keys depend on the exact state of the VM in question.
Particularly when doing a "remote" page lookup (ie in somebody elses VM,
not your own), you need to be much more careful than this was. Dave
Hansen says:
"So, the underlying bug here is that we now a get_user_pages_remote()
and then go ahead and do the p*_access_permitted() checks against the
current PKRU. This was introduced recently with the addition of the
new p??_access_permitted() calls.
We have checks in the VMA path for the "remote" gups and we avoid
consulting PKRU for them. This got missed in the pkeys selftests
because I did a ptrace read, but not a *write*. I also didn't
explicitly test it against something where a COW needed to be done"
It's also not entirely clear that it makes sense to check the protection
key bits at this level at all. But one possible eventual solution is to
make the get_user_pages_fast() case just abort if it sees protection key
bits set, which makes us fall back to the regular get_user_pages() case,
which then has a vma and can do the check there if we want to.
We'll see.
Somewhat related to this all: what we _do_ want to do some day is to
check the PAGE_USER bit - it should obviously always be set for user
pages, but it would be a good check to have back. Because we have no
generic way to test for it, we lost it as part of moving over from the
architecture-specific x86 GUP implementation to the generic one in
commit
e585513b76f7 ("x86/mm/gup: Switch GUP to the generic
get_user_page_fast() implementation").
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 15 Dec 2017 21:08:37 +0000 (13:08 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Clamp timeouts to INT_MAX in conntrack, from Jay Elliot.
2) Fix broken UAPI for BPF_PROG_TYPE_PERF_EVENT, from Hendrik
Brueckner.
3) Fix locking in ieee80211_sta_tear_down_BA_sessions, from Johannes
Berg.
4) Add missing barriers to ptr_ring, from Michael S. Tsirkin.
5) Don't advertise gigabit in sh_eth when not available, from Thomas
Petazzoni.
6) Check network namespace when delivering to netlink taps, from Kevin
Cernekee.
7) Kill a race in raw_sendmsg(), from Mohamed Ghannam.
8) Use correct address in TCP md5 lookups when replying to an incoming
segment, from Christoph Paasch.
9) Add schedule points to BPF map alloc/free, from Eric Dumazet.
10) Don't allow silly mtu values to be used in ipv4/ipv6 multicast, also
from Eric Dumazet.
11) Fix SKB leak in tipc, from Jon Maloy.
12) Disable MAC learning on OVS ports of mlxsw, from Yuval Mintz.
13) SKB leak fix in skB_complete_tx_timestamp(), from Willem de Bruijn.
14) Add some new qmi_wwan device IDs, from Daniele Palmas.
15) Fix static key imbalance in ingress qdisc, from Jiri Pirko.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
net: qcom/emac: Reduce timeout for mdio read/write
net: sched: fix static key imbalance in case of ingress/clsact_init error
net: sched: fix clsact init error path
ip_gre: fix wrong return value of erspan_rcv
net: usb: qmi_wwan: add Telit ME910 PID 0x1101 support
pkt_sched: Remove TC_RED_OFFLOADED from uapi
net: sched: Move to new offload indication in RED
net: sched: Add TCA_HW_OFFLOAD
net: aquantia: Increment driver version
net: aquantia: Fix typo in ethtool statistics names
net: aquantia: Update hw counters on hw init
net: aquantia: Improve link state and statistics check interval callback
net: aquantia: Fill in multicast counter in ndev stats from hardware
net: aquantia: Fill ndev stat couters from hardware
net: aquantia: Extend stat counters to 64bit values
net: aquantia: Fix hardware DMA stream overload on large MRRS
net: aquantia: Fix actual speed capabilities reporting
sock: free skb in skb_complete_tx_timestamp on error
s390/qeth: update takeover IPs after configuration change
s390/qeth: lock IP table while applying takeover changes
...
Linus Torvalds [Fri, 15 Dec 2017 21:03:25 +0000 (13:03 -0800)]
Merge tag 'usb-4.15-rc4' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some USB fixes for 4.15-rc4.
There is the usual handful gadget/dwc2/dwc3 fixes as always, for
reported issues. But the most important things in here is the core fix
from Alan Stern to resolve a nasty security bug (my first attempt is
reverted, Alan's was much cleaner), as well as a number of usbip fixes
from Shuah Khan to resolve those reported security issues.
All of these have been in linux-next with no reported issues"
* tag 'usb-4.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: core: prevent malicious bNumInterfaces overflow
Revert "USB: core: only clean up what we allocated"
USB: core: only clean up what we allocated
Revert "usb: gadget: allow to enable legacy drivers without USB_ETH"
usb: gadget: webcam: fix V4L2 Kconfig dependency
usb: dwc2: Fix TxFIFOn sizes and total TxFIFO size issues
usb: dwc3: gadget: Fix PCM1 for ISOC EP with ep->mult less than 3
usb: dwc3: of-simple: set dev_pm_ops
usb: dwc3: of-simple: fix missing clk_disable_unprepare
usb: dwc3: gadget: Wait longer for controller to end command processing
usb: xhci: fix TDS for MTK xHCI1.1
xhci: Don't add a virt_dev to the devs array before it's fully allocated
usbip: fix stub_send_ret_submit() vulnerability to null transfer_buffer
usbip: prevent vhci_hcd driver from leaking a socket pointer address
usbip: fix stub_rx: harden CMD_SUBMIT path to handle malicious input
usbip: fix stub_rx: get_pipe() to validate endpoint number
tools/usbip: fixes potential (minor) "buffer overflow" (detected on recent gcc with -Werror)
USB: uas and storage: Add US_FL_BROKEN_FUA for another JMicron JMS567 ID
usb: musb: da8xx: fix babble condition handling
Linus Torvalds [Fri, 15 Dec 2017 20:59:48 +0000 (12:59 -0800)]
Merge tag 'staging-4.15-rc4' of git://git./linux/kernel/git/gregkh/staging
Pull staging fixes from Greg KH:
"Here are some small staging driver fixes for 4.15-rc4.
One patch for the ccree driver to prevent an unitialized value from
being returned to a caller, and the other fixes a logic error in the
pi433 driver"
* tag 'staging-4.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: pi433: Fixes issue with bit shift in rf69_get_modulation
staging: ccree: Uninitialized return in ssi_ahash_import()
Linus Torvalds [Fri, 15 Dec 2017 20:56:23 +0000 (12:56 -0800)]
Merge tag 'for_linus' of git://git./linux/kernel/git/mst/vhost
Pull virtio regression fixes from Michael Tsirkin:
"Fixes two issues in the latest kernel"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio_mmio: fix devm cleanup
ptr_ring: fix up after recent ptr_ring changes
Linus Torvalds [Fri, 15 Dec 2017 20:53:37 +0000 (12:53 -0800)]
Merge tag 'for-4.15/dm-fixes' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- fix a particularly nasty DM core bug in a 4.15 refcount_t conversion.
- fix various targets to dm_register_target after module __init
resources created; otherwise racing lvm2 commands could result in a
NULL pointer during initialization of associated DM kernel module.
- fix regression in bio-based DM multipath queue_if_no_path handling.
- fix DM bufio's shrinker to reclaim more than one buffer per scan.
* tag 'for-4.15/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm bufio: fix shrinker scans when (nr_to_scan < retain_target)
dm mpath: fix bio-based multipath queue_if_no_path handling
dm: fix various targets to dm_register_target after module __init resources created
dm table: fix regression from improper dm_dev_internal.count refcount_t conversion
Linus Torvalds [Fri, 15 Dec 2017 20:51:42 +0000 (12:51 -0800)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"The most important one is the bfa fix because it's easy to oops the
kernel with this driver (this includes the commit that corrects the
compiler warning in the original), a regression in the new timespec
conversion in aacraid and a regression in the Fibre Channel ELS
handling patch.
The other three are a theoretical problem with termination in the
vendor/host matching code and a use after free in lpfc.
The additional patches are a fix for an I/O hang in the mq code under
certain circumstances and a rare oops in some debugging code"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: core: Fix a scsi_show_rq() NULL pointer dereference
scsi: MAINTAINERS: change FCoE list to linux-scsi
scsi: libsas: fix length error in sas_smp_handler()
scsi: bfa: fix type conversion warning
scsi: core: run queue if SCSI device queue isn't ready and queue is idle
scsi: scsi_devinfo: cleanly zero-pad devinfo strings
scsi: scsi_devinfo: handle non-terminated strings
scsi: bfa: fix access to bfad_im_port_s
scsi: aacraid: address UBSAN warning regression
scsi: libfc: fix ELS request handling
scsi: lpfc: Use after free in lpfc_rq_buf_free()
Linus Torvalds [Fri, 15 Dec 2017 20:49:54 +0000 (12:49 -0800)]
Merge tag 'mmc-v4.15-rc2' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"A couple of MMC fixes:
- fix use of uninitialized drv_typ variable
- apply NO_CMD23 quirk to some specific SD cards to make them work"
* tag 'mmc-v4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: core: apply NO_CMD23 quirk to some specific cards
mmc: core: properly init drv_type
Linus Torvalds [Fri, 15 Dec 2017 20:48:27 +0000 (12:48 -0800)]
Merge tag 'ceph-for-4.15-rc4' of git://github.com/ceph/ceph-client
Pull ceph fix from Ilya Dryomov:
"CephFS inode trimming fix from Zheng, marked for stable"
* tag 'ceph-for-4.15-rc4' of git://github.com/ceph/ceph-client:
ceph: drop negative child dentries before try pruning inode's alias
Linus Torvalds [Fri, 15 Dec 2017 20:46:48 +0000 (12:46 -0800)]
Merge branch 'overlayfs-linus' of git://git./linux/kernel/git/mszeredi/vfs
Pull overlayfs fixes from Miklos Szeredi:
- fix incomplete syncing of filesystem
- fix regression in readdir on ovl over 9p
- only follow redirects when needed
- misc fixes and cleanups
* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: fix overlay: warning prefix
ovl: Use PTR_ERR_OR_ZERO()
ovl: Sync upper dirty data when syncing overlayfs
ovl: update ctx->pos on impure dir iteration
ovl: Pass ovl_get_nlink() parameters in right order
ovl: don't follow redirects if redirect_dir=off
Hemanth Puranik [Fri, 15 Dec 2017 14:35:58 +0000 (20:05 +0530)]
net: qcom/emac: Reduce timeout for mdio read/write
Currently mdio read/write takes around ~115us as the timeout
between status check is set to 100us.
By reducing the timeout to 1us mdio read/write takes ~15us to
complete. This improves the link up event response.
Signed-off-by: Hemanth Puranik <hpuranik@codeaurora.org>
Acked-by: Timur Tabi <timur@codeaurora.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 15 Dec 2017 20:44:49 +0000 (12:44 -0800)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"There are some significant fixes in here for FP state corruption,
hardware access/dirty PTE corruption and an erratum workaround for the
Falkor CPU.
I'm hoping that things finally settle down now, but never say never...
Summary:
- Fix FPSIMD context switch regression introduced in -rc2
- Fix ABI break with SVE CPUID register reporting
- Fix use of uninitialised variable
- Fixes to hardware access/dirty management and sanity checking
- CPU erratum workaround for Falkor CPUs
- Fix reporting of writeable+executable mappings
- Fix signal reporting for RAS errors"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: fpsimd: Fix copying of FP state from signal frame into task struct
arm64/sve: Report SVE to userspace via CPUID only if supported
arm64: fix CONFIG_DEBUG_WX address reporting
arm64: fault: avoid send SIGBUS two times
arm64: hw_breakpoint: Use linux/uaccess.h instead of asm/uaccess.h
arm64: Add software workaround for Falkor erratum 1041
arm64: Define cputype macros for Falkor CPU
arm64: mm: Fix false positives in set_pte_at access/dirty race detection
arm64: mm: Fix pte_mkclean, pte_mkdirty semantics
arm64: Initialise high_memory global variable earlier
Jiri Pirko [Fri, 15 Dec 2017 11:40:13 +0000 (12:40 +0100)]
net: sched: fix static key imbalance in case of ingress/clsact_init error
Move static key increments to the beginning of the init function
so they pair 1:1 with decrements in ingress/clsact_destroy,
which is called in case ingress/clsact_init fails.
Fixes:
6529eaba33f0 ("net: sched: introduce tcf block infractructure")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 15 Dec 2017 11:40:12 +0000 (12:40 +0100)]
net: sched: fix clsact init error path
Since in qdisc_create, the destroy op is called when init fails, we
don't do cleanup in init and leave it up to destroy.
This fixes use-after-free when trying to put already freed block.
Fixes:
6e40cf2d4dee ("net: sched: use extended variants of block_get/put in ingress and clsact qdiscs")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 15 Dec 2017 01:48:16 +0000 (17:48 -0800)]
net: phy: broadcom: Add entry for 5395 switch PHYs
Add an entry for the builtin PHYs present in the Broadcom BCM5395 switch. This
allows us to retrieve the PHY statistics among other things.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Chris Healy <cphealy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 15 Dec 2017 20:14:33 +0000 (12:14 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- fix the s2ram regression related to confusion around segment
register restoration, plus related cleanups that make the code more
robust
- a guess-unwinder Kconfig dependency fix
- an isoimage build target fix for certain tool chain combinations
- instruction decoder opcode map fixes+updates, and the syncing of
the kernel decoder headers to the objtool headers
- a kmmio tracing fix
- two 5-level paging related fixes
- a topology enumeration fix on certain SMP systems"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Resync objtool's instruction decoder source code copy with the kernel's latest version
x86/decoder: Fix and update the opcodes map
x86/power: Make restore_processor_context() sane
x86/power/32: Move SYSENTER MSR restoration to fix_processor_context()
x86/power/64: Use struct desc_ptr for the IDT in struct saved_context
x86/unwinder/guess: Prevent using CONFIG_UNWINDER_GUESS=y with CONFIG_STACKDEPOT=y
x86/build: Don't verify mtools configuration file for isoimage
x86/mm/kmmio: Fix mmiotrace for page unaligned addresses
x86/boot/compressed/64: Print error if 5-level paging is not supported
x86/boot/compressed/64: Detect and handle 5-level paging at boot-time
x86/smpboot: Do not use smp_num_siblings in __max_logical_packages calculation
Linus Torvalds [Fri, 15 Dec 2017 19:44:59 +0000 (11:44 -0800)]
Merge branch 'locking-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
"Misc fixes:
- Fix a S390 boot hang that was caused by the lock-break logic.
Remove lock-break to begin with, as review suggested it was
unreasonably fragile and our confidence in its continued good
health is lower than our confidence in its removal.
- Remove the lockdep cross-release checking code for now, because of
unresolved false positive warnings. This should make lockdep work
well everywhere again.
- Get rid of the final (and single) ACCESS_ONCE() straggler and
remove the API from v4.15.
- Fix a liblockdep build warning"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tools/lib/lockdep: Add missing declaration of 'pr_cont()'
checkpatch: Remove ACCESS_ONCE() warning
compiler.h: Remove ACCESS_ONCE()
tools/include: Remove ACCESS_ONCE()
tools/perf: Convert ACCESS_ONCE() to READ_ONCE()
locking/lockdep: Remove the cross-release locking checks
locking/core: Remove break_lock field when CONFIG_GENERIC_LOCKBREAK=y
locking/core: Fix deadlock during boot on systems with GENERIC_LOCKBREAK
Linus Torvalds [Fri, 15 Dec 2017 19:40:24 +0000 (11:40 -0800)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Two fixes: a crash fix for an ARM SoC platform, and kernel-doc
warnings fixes"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/rt: Do not pull from current CPU if only one CPU to pull
sched/core: Fix kernel-doc warnings after code movement
Linus Torvalds [Fri, 15 Dec 2017 19:36:20 +0000 (11:36 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf tooling fix from Ingo Molnar:
"Synchronize kernel <-> tooling headers to resolve two build warnings
in the perf build"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tools/headers: Synchronize kernel <-> tooling headers
Linus Torvalds [Fri, 15 Dec 2017 19:34:29 +0000 (11:34 -0800)]
Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull early_ioremap fix from Ingo Molnar:
"A boot hang fix when the EFI earlyprintk driver is enabled"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
mm/early_ioremap: Fix boot hang with earlyprintk=efi,keep
Linus Torvalds [Fri, 15 Dec 2017 19:32:09 +0000 (11:32 -0800)]
Merge tag 'for-linus-4.15-rc4-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Two minor fixes for running as Xen dom0:
- when built as 32 bit kernel on large machines the Xen LAPIC
emulation should report a rather modern LAPIC in order to support
enough APIC-Ids
- The Xen LAPIC emulation is needed for dom0 only, so build it only
for kernels supporting to run as Xen dom0"
* tag 'for-linus-4.15-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: XEN_ACPI_PROCESSOR is Dom0-only
x86/Xen: don't report ancient LAPIC version
Trond Myklebust [Fri, 15 Dec 2017 02:24:08 +0000 (21:24 -0500)]
SUNRPC: Fix a race in the receive code path
We must ensure that the call to rpc_sleep_on() in xprt_transmit() cannot
race with the call to xprt_complete_rqst().
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=317
Fixes:
ce7c252a8c74 ("SUNRPC: Add a separate spinlock to protect..")
Cc: stable@vger.kernel.org # 4.14+
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Scott Mayhew [Fri, 8 Dec 2017 21:00:12 +0000 (16:00 -0500)]
nfs: don't wait on commit in nfs_commit_inode() if there were no commit requests
If there were no commit requests, then nfs_commit_inode() should not
wait on the commit or mark the inode dirty, otherwise the following
BUG_ON can be triggered:
[ 1917.130762] kernel BUG at fs/inode.c:578!
[ 1917.130766] Oops: Exception in kernel mode, sig: 5 [#1]
[ 1917.130768] SMP NR_CPUS=2048 NUMA pSeries
[ 1917.130772] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc sg nx_crypto pseries_rng ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common ibmvscsi scsi_transport_srp ibmveth scsi_tgt dm_mirror dm_region_hash dm_log dm_mod
[ 1917.130805] CPU: 2 PID: 14923 Comm: umount.nfs4 Tainted: G ------------ T 3.10.0-768.el7.ppc64 #1
[ 1917.130810] task:
c0000005ecd88040 ti:
c00000004cea0000 task.ti:
c00000004cea0000
[ 1917.130813] NIP:
c000000000354178 LR:
c000000000354160 CTR:
c00000000012db80
[ 1917.130816] REGS:
c00000004cea3720 TRAP: 0700 Tainted: G ------------ T (3.10.0-768.el7.ppc64)
[ 1917.130820] MSR:
8000000100029032 <SF,EE,ME,IR,DR,RI> CR:
22002822 XER:
20000000
[ 1917.130828] CFAR:
c00000000011f594 SOFTE: 1
GPR00:
c000000000354160 c00000004cea39a0 c0000000014c4700 c0000000018cc750
GPR04:
000000000000c750 80c0000000000000 0600000000000000 04eeb76bea749a03
GPR08:
0000000000000034 c0000000018cc758 0000000000000001 d000000005e619e8
GPR12:
c00000000012db80 c000000007b31200 0000000000000000 0000000000000000
GPR16:
0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20:
0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24:
0000000000000000 c000000000dfc3ec 0000000000000000 c0000005eefc02c0
GPR28:
d0000000079dbd50 c0000005b94a02c0 c0000005b94a0250 c0000005b94a01c8
[ 1917.130867] NIP [
c000000000354178] .evict+0x1c8/0x350
[ 1917.130871] LR [
c000000000354160] .evict+0x1b0/0x350
[ 1917.130873] Call Trace:
[ 1917.130876] [
c00000004cea39a0] [
c000000000354160] .evict+0x1b0/0x350 (unreliable)
[ 1917.130880] [
c00000004cea3a30] [
c0000000003558cc] .evict_inodes+0x13c/0x270
[ 1917.130884] [
c00000004cea3af0] [
c000000000327d20] .kill_anon_super+0x70/0x1e0
[ 1917.130896] [
c00000004cea3b80] [
d000000005e43e30] .nfs_kill_super+0x20/0x60 [nfs]
[ 1917.130900] [
c00000004cea3c00] [
c000000000328a20] .deactivate_locked_super+0xa0/0x1b0
[ 1917.130903] [
c00000004cea3c80] [
c00000000035ba54] .cleanup_mnt+0xd4/0x180
[ 1917.130907] [
c00000004cea3d10] [
c000000000119034] .task_work_run+0x114/0x150
[ 1917.130912] [
c00000004cea3db0] [
c00000000001ba6c] .do_notify_resume+0xcc/0x100
[ 1917.130916] [
c00000004cea3e30] [
c00000000000a7b0] .ret_from_except_lite+0x5c/0x60
[ 1917.130919] Instruction dump:
[ 1917.130921]
7fc3f378 486734b5 60000000 387f00a0 38800003 4bdcb365 60000000 e95f00a0
[ 1917.130927]
694a0060 7d4a0074 794ad182 694a0001 <
0b0a0000>
892d02a4 2f890000 40de0134
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Cc: stable@vger.kernel.org # 4.5+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Mon, 4 Dec 2017 19:04:04 +0000 (14:04 -0500)]
xprtrdma: Spread reply processing over more CPUs
Commit
d8f532d20ee4 ("xprtrdma: Invoke rpcrdma_reply_handler
directly from RECV completion") introduced a performance regression
for NFS I/O small enough to not need memory registration. In multi-
threaded benchmarks that generate primarily small I/O requests,
IOPS throughput is reduced by nearly a third. This patch restores
the previous level of throughput.
Because workqueues are typically BOUND (in particular ib_comp_wq,
nfsiod_workqueue, and rpciod_workqueue), NFS/RDMA workloads tend
to aggregate on the CPU that is handling Receive completions.
The usual approach to addressing this problem is to create a QP
and CQ for each CPU, and then schedule transactions on the QP
for the CPU where you want the transaction to complete. The
transaction then does not require an extra context switch during
completion to end up on the same CPU where the transaction was
started.
This approach doesn't work for the Linux NFS/RDMA client because
currently the Linux NFS client does not support multiple connections
per client-server pair, and the RDMA core API does not make it
straightforward for ULPs to determine which CPU is responsible for
handling Receive completions for a CQ.
So for the moment, record the CPU number in the rpcrdma_req before
the transport sends each RPC Call. Then during Receive completion,
queue the RPC completion on that same CPU.
Additionally, move all RPC completion processing to the deferred
handler so that even RPCs with simple small replies complete on
the CPU that sent the corresponding RPC Call.
Fixes:
d8f532d20ee4 ("xprtrdma: Invoke rpcrdma_reply_handler ...")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Scott Mayhew [Tue, 5 Dec 2017 18:55:44 +0000 (13:55 -0500)]
nfs: fix a deadlock in nfs client initialization
The following deadlock can occur between a process waiting for a client
to initialize in while walking the client list during nfsv4 server trunking
detection and another process waiting for the nfs_clid_init_mutex so it
can initialize that client:
Process 1 Process 2
--------- ---------
spin_lock(&nn->nfs_client_lock);
list_add_tail(&CLIENTA->cl_share_link,
&nn->nfs_client_list);
spin_unlock(&nn->nfs_client_lock);
spin_lock(&nn->nfs_client_lock);
list_add_tail(&CLIENTB->cl_share_link,
&nn->nfs_client_list);
spin_unlock(&nn->nfs_client_lock);
mutex_lock(&nfs_clid_init_mutex);
nfs41_walk_client_list(clp, result, cred);
nfs_wait_client_init_complete(CLIENTA);
(waiting for nfs_clid_init_mutex)
Make sure nfs_match_client() only evaluates clients that have completed
initialization in order to prevent that deadlock.
This patch also fixes v4.0 trunking behavior by not marking the client
NFS_CS_READY until the clientid has been confirmed.
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Haishuang Yan [Fri, 15 Dec 2017 02:46:16 +0000 (10:46 +0800)]
ip_gre: fix wrong return value of erspan_rcv
If pskb_may_pull return failed, return PACKET_REJECT instead of -ENOMEM.
Fixes:
84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN")
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 15 Dec 2017 18:52:23 +0000 (13:52 -0500)]
Merge branch 'sctp-stream-interleave'
Xin Long says:
====================
sctp: Implement Stream Interleave: Interaction with Other SCTP Extensions
Stream Interleave would be implemented in two Parts:
1. The I-DATA Chunk Supporting User Message Interleaving
2. Interaction with Other SCTP Extensions
Overview in section 2.3 of RFC8260 for Part 2:
The usage of the I-DATA chunk might interfere with other SCTP
extensions. Future SCTP extensions MUST describe if and how they
interfere with the usage of I-DATA chunks. For the SCTP extensions
already defined when this document was published, the details are
given in the following subsections.
As the 2nd part of Stream Interleave Implementation, this patchset mostly
adds the support for SCTP Partial Reliability Extension with I-FORWARD-TSN
chunk. Then adjusts stream scheduler and stream reconfig to make them work
properly with I-DATA chunks.
In the last patch, all stream interleave codes will be enabled by adding
sysctl to allow users to use this feature.
v1 -> v2:
- removed the intl_enable check from sctp_chunk_event_lookup, as Marcelo's
suggestion.
- fixed a typo in changelog.
====================
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 14 Dec 2017 16:41:32 +0000 (00:41 +0800)]
sctp: support sysctl to allow users to use stream interleave
This is the last patch for support of stream interleave, after this patch,
users could enable stream interleave by systcl -w net.sctp.intl_enable=1.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 14 Dec 2017 16:41:31 +0000 (00:41 +0800)]
sctp: update mid instead of ssn when doing stream and asoc reset
When using idata and doing stream and asoc reset, setting ssn with
0 could only clear the 1st 16 bits of mid.
So to make this work for both data and idata, it sets mid with 0
instead of ssn, and also mid_uo for unordered idata also need to
be cleared, as said in section 2.3.2 of RFC8260.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 14 Dec 2017 16:41:30 +0000 (00:41 +0800)]
sctp: add stream interleave support in stream scheduler
As Marcelo said in the stream scheduler patch:
Support for I-DATA chunks, also described in RFC8260, with user message
interleaving is straightforward as it just requires the schedulers to
probe for the feature and ignore datamsg boundaries when dequeueing.
All needs to do is just to ignore datamsg boundaries when dequeueing.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 14 Dec 2017 16:41:29 +0000 (00:41 +0800)]
sctp: implement handle_ftsn for sctp_stream_interleave
handle_ftsn is added as a member of sctp_stream_interleave, used to skip
ssn for data or mid for idata, called for SCTP_CMD_PROCESS_FWDTSN cmd.
sctp_handle_iftsn works for ifwdtsn, and sctp_handle_fwdtsn works for
fwdtsn. Note that different from sctp_handle_fwdtsn, sctp_handle_iftsn
could do stream abort pd.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 14 Dec 2017 16:41:28 +0000 (00:41 +0800)]
sctp: implement report_ftsn for sctp_stream_interleave
report_ftsn is added as a member of sctp_stream_interleave, used to
skip tsn from tsnmap, remove old events from reasm or lobby queue,
and abort pd for data or idata, called for SCTP_CMD_REPORT_FWDTSN
cmd and asoc reset.
sctp_report_iftsn works for ifwdtsn, and sctp_report_fwdtsn works
for fwdtsn. Note that sctp_report_iftsn doesn't do asoc abort_pd,
as stream abort_pd will be done when handling ifwdtsn. But when
ftsn is equal with ftsn, which means asoc reset, asoc abort_pd has
to be done.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 14 Dec 2017 16:41:27 +0000 (00:41 +0800)]
sctp: implement validate_ftsn for sctp_stream_interleave
validate_ftsn is added as a member of sctp_stream_interleave, used to
validate ssn/chunk type for fwdtsn or mid (message id)/chunk type for
ifwdtsn, called in sctp_sf_eat_fwd_tsn, just as validate_data.
If this check fails, an abort packet will be sent, as said in section
2.3.1 of RFC8260.
As ifwdtsn and fwdtsn chunks have different length, it also defines
ftsn_chunk_len for sctp_stream_interleave to describe the chunk size.
Then it replaces all sizeof(struct sctp_fwdtsn_chunk) with
sctp_ftsnchk_len.
It also adds the process for ifwdtsn in rx path. As Marcelo pointed
out, there's no need to add event table for ifwdtsn, but just share
prsctp_chunk_event_table with fwdtsn's. It would drop fwdtsn chunk
for ifwdtsn and drop ifwdtsn chunk for fwdtsn by calling validate_ftsn
in sctp_sf_eat_fwd_tsn.
After this patch, the ifwdtsn can be accepted.
Note that this patch also removes the sctp.intl_enable check for
idata chunks in sctp_chunk_event_lookup, as it will do this check
in validate_data later.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 14 Dec 2017 16:41:26 +0000 (00:41 +0800)]
sctp: implement generate_ftsn for sctp_stream_interleave
generate_ftsn is added as a member of sctp_stream_interleave, used to
create fwdtsn or ifwdtsn chunk according to abandoned chunks, called
in sctp_retransmit and sctp_outq_sack.
sctp_generate_iftsn works for ifwdtsn, and sctp_generate_fwdtsn is
still used for making fwdtsn.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 14 Dec 2017 16:41:25 +0000 (00:41 +0800)]
sctp: add basic structures and make chunk function for ifwdtsn
sctp_ifwdtsn_skip, sctp_ifwdtsn_hdr and sctp_ifwdtsn_chunk are used to
define and parse I-FWD TSN chunk format, and sctp_make_ifwdtsn is a
function to build the chunk.
The I-FORWARD-TSN Chunk Format is defined in section 2.3.1 of RFC8260.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 14 Dec 2017 23:57:58 +0000 (15:57 -0800)]
net: phy: phylink: Handle NULL fwnode_handle
Unlike the various of_* routines to fetch properties, fwnode_* routines can
have an early check against a NULL fwnode_handle reference which makes them
return -EINVAL (see fwnode_call_int_op), thus making it virtually impossible to
differentiate what type of error is going on.
Have an early check in phylink_register_sfp() so we can keep proceeding with
the initialization, there is not much we can do without a valid fwnode_handle
except return early and treat this similarly to -ENOENT.
Fixes:
8fa7b9b6af25 ("phylink: convert to fwnode")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Thu, 14 Dec 2017 18:55:50 +0000 (19:55 +0100)]
qmi_wwan: set FLAG_SEND_ZLP to avoid network initiated disconnect
It has been reported that the dummy byte we add to avoid
ZLPs can be forwarded by the modem to the PGW/GGSN, and that
some operators will drop the connection if this happens.
In theory, QMI devices are based on CDC ECM and should as such
both support ZLPs and silently ignore the dummy byte. The latter
assumption failed. Let's test out the first.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniele Palmas [Thu, 14 Dec 2017 15:56:14 +0000 (16:56 +0100)]
net: usb: qmi_wwan: add Telit ME910 PID 0x1101 support
This patch adds support for Telit ME910 PID 0x1101.
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 15 Dec 2017 18:35:37 +0000 (13:35 -0500)]
Merge branch 'net-sched-Make-qdisc-offload-uapi-uniform'
Yuval Mintz says:
====================
net: sched: Make qdisc offload uapi uniform
Several qdiscs can already be offloaded to hardware, but there's an
inconsistecy in regard to the uapi through which they indicate such
an offload is taking place - indication is passed to the user via
TCA_OPTIONS where each qdisc retains private logic for setting it.
The recent addition of offloading to RED in
602f3baf2218 ("net_sch: red: Add offload ability to RED qdisc") caused
the addition of yet another uapi field for this purpose -
TC_RED_OFFLOADED.
For clarity and prevention of bloat in the uapi we want to eliminate
said added uapi, replacing it with a common mechanism that can be used
to reflect offload status of the various qdiscs.
The first patch introduces TCA_HW_OFFLOAD as the generic message meant
for this purpose. The second changes the current RED implementation into
setting the internal bits necessary for passing it, and the third removes
TC_RED_OFFLOADED as its no longer needed.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Thu, 14 Dec 2017 13:54:31 +0000 (15:54 +0200)]
pkt_sched: Remove TC_RED_OFFLOADED from uapi
Following the previous patch, RED is now using the new uniform uapi
for indicating it's offloaded. As a result, TC_RED_OFFLOADED is no
longer utilized by kernel and can be removed [as it's still not
part of any stable release].
Fixes:
602f3baf2218 ("net_sch: red: Add offload ability to RED qdisc")
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Thu, 14 Dec 2017 13:54:30 +0000 (15:54 +0200)]
net: sched: Move to new offload indication in RED
Let RED utilize the new internal flag, TCQ_F_OFFLOADED,
to mark a given qdisc as offloaded instead of using a dedicated
indication.
Also, change internal logic into looking at said flag when possible.
Fixes:
602f3baf2218 ("net_sch: red: Add offload ability to RED qdisc")
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Thu, 14 Dec 2017 13:54:29 +0000 (15:54 +0200)]
net: sched: Add TCA_HW_OFFLOAD
Qdiscs can be offloaded to HW, but current implementation isn't uniform.
Instead, qdiscs either pass information about offload status via their
TCA_OPTIONS or omit it altogether.
Introduce a new attribute - TCA_HW_OFFLOAD that would form a uniform
uAPI for the offloading status of qdiscs.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Thu, 14 Dec 2017 11:40:21 +0000 (11:40 +0000)]
net: alteon: acenic: clean up indentation issue
There is a hunk of code that is incorrectly indented with spaces
and rather than a tab. Clean this up.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 15 Dec 2017 18:23:22 +0000 (13:23 -0500)]
Merge branch 'sfp-SFF-module-support'
Russell King says:
====================
Add SFF module support
Add support for SFF modules. SFF modules are similar to SFP modules,
but they have fewer control signals, and are soldered down rather than
pluggable.
They also have different IDs in the EEPROM to identify as soldered down
SFF modules.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Thu, 14 Dec 2017 10:27:47 +0000 (10:27 +0000)]
sfp: add sff module support
Add support for SFF modules, which are soldered down SFP modules.
These have a different phys_id value, and also have the present and
rate select signals omitted compared with their socketed counter-parts.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Thu, 14 Dec 2017 10:27:42 +0000 (10:27 +0000)]
dt-bindings: add sff,sff binding for SFP support
Add "sff,sff" for SFF module support with SFP. These have a different
phys_id value, and also have the present and rate select signals omitted
compared with their socketed counter-parts.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 15 Dec 2017 17:48:46 +0000 (12:48 -0500)]
Merge branch 'nfp-fix-rtsym-and-XPB-register-handling-in-debug-dump'
Simon Horman says:
====================
nfp: fix rtsym and XPB register handling in debug dump
this series resolves two problems in the recently added debug dump facility.
* Correctly handle reading absolute rtysms
* Correctly handle special-case PB register reads
These fixes are for code only present in net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Carl Heymann [Thu, 14 Dec 2017 09:50:26 +0000 (10:50 +0100)]
nfp: fix XPB register reads in debug dump
For XPB registers reads, some island IDs require special handling (e.g.
ARM island), which is already taken care of in nfp_xpb_readl(), so use
that instead of a straight CPP read.
Without this fix all "xpbm:ArmIsldXpbmMap.*" registers are reported as
0xffffffff. It has also been observed to cause a system reboot.
With this fix correct values are reported, none of which are 0xffffffff.
The values may be read using ethtool debug level 2.
# ethtool -W <netdev> 2
# ethtool -w <netdev> data dump.dat
Fixes:
0e6c4955e149 ("nfp: dump CPP, XPB and direct ME CSRs")
Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Carl Heymann [Thu, 14 Dec 2017 09:50:25 +0000 (10:50 +0100)]
nfp: fix absolute rtsym handling in debug dump
In TLV-based ethtool debug dumps, don't do a CPP read for absolute
rtsyms, use the addr field in the symbol table directly as the value.
Without this fix rtsym gro_release_ring_0 is 4 bytes of zeros.
With this fix the correct value, 0x0000004a 0x00000000 is reported.
The values may be read using ethtool debug level 2.
# ethtool -W <netdev> 2
# ethtool -w <netdev> data dump.dat
Fixes:
e1e798e3fd93 ("nfp: dump rtsyms")
Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 15 Dec 2017 17:46:43 +0000 (12:46 -0500)]
Merge branch 'aquantia-fixes'
Igor Russkikh says:
====================
net: aquantia: Atlantic driver 12/2017 updates
The patchset contains important hardware fix for machines with large MRRS
and couple of improvement in stats and capabilities reporting
patch v3:
- Fixed patch #7 after Andrew's finding. NIC level stats actually
have to be cleaned only on hw struct creation (and this is done
in kzalloc). On each hwinit we only have to reset link state
to make sure hw stats update will not increment nic stats during init.
patch v2:
- split into more detailed commits
Comment from David on wrong defines case will be submitted separately later
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Thu, 14 Dec 2017 09:34:48 +0000 (12:34 +0300)]
net: aquantia: Increment driver version
Add a suffix to distinguish kernel mainline version and aquantia releases
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Thu, 14 Dec 2017 09:34:47 +0000 (12:34 +0300)]
net: aquantia: Fix typo in ethtool statistics names
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Thu, 14 Dec 2017 09:34:46 +0000 (12:34 +0300)]
net: aquantia: Update hw counters on hw init
On very first start we should read out current HW counter values
to make diff based calculations later.
This also should be done each time NIC gets down/up or wakes up
after sleep state. We reset link state explicitly to prevent diffs
from being summed this first time.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Thu, 14 Dec 2017 09:34:45 +0000 (12:34 +0300)]
net: aquantia: Improve link state and statistics check interval callback
Reduce timeout from 2 secs to 1 sec. If link is down,
reduce it to 500msec. This speeds up link detection.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Thu, 14 Dec 2017 09:34:44 +0000 (12:34 +0300)]
net: aquantia: Fill in multicast counter in ndev stats from hardware
This metric comes from HW and is also diff-calculated, like other counters
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Thu, 14 Dec 2017 09:34:43 +0000 (12:34 +0300)]
net: aquantia: Fill ndev stat couters from hardware
Originally they were filled from ring sw counters.
These sometimes incorrectly calculate byte and packet amounts
when using LRO/LSO and jumboframes. Filling ndev counters from
hardware makes them precise.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Thu, 14 Dec 2017 09:34:42 +0000 (12:34 +0300)]
net: aquantia: Extend stat counters to 64bit values
Device hardware provides only 32bit counters. Using these directly
causes byte counters to overflow soon. A separate nic level structure
with 64 bit counters is now used to collect incrementally all the stats
and report these counters to ethtool stats and ndev stats.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Thu, 14 Dec 2017 09:34:41 +0000 (12:34 +0300)]
net: aquantia: Fix hardware DMA stream overload on large MRRS
Systems with large MRRS on device (2K, 4K) with high data rates and/or
large MTU, atlantic observes DMA packet buffer overflow. On some systems
that causes PCIe transaction errors, hardware NMIs or datapath freeze.
This patch
1) Limits MRRS from device side to 2K (thats maximum our hardware supports)
2) Limit maximum size of outstanding TX DMA data read requests. This makes
hardware buffers running fine.
Signed-off-by: Pavel Belous <pavel.belous@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Thu, 14 Dec 2017 09:34:40 +0000 (12:34 +0300)]
net: aquantia: Fix actual speed capabilities reporting
Different hardware device Ids correspond to different maximum speed
available. Extra checks were added for devices D108 and D109 to
remove unsupported speeds from these device capabilities list.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 15 Dec 2017 17:34:01 +0000 (12:34 -0500)]
Merge branch 'erspan-version-2'
William Tu says:
====================
ERSPAN version 2 (type III) support
ERSPAN has two versions, v1 (type II) and v2 (type III). This patch
series add support for erspan v2 based on existing erspan v1
implementation. The first patch refactors the existing erspan v1's
header structure, making it extensible to put additional v2's header.
The second and third patch introduces erspan v2's implementation to
ipv4 and ipv6 erspan, for both native mode and collect metadata mode.
Finally, test cases are added under the samples/bpf.
Note:
ERSPAN version 2 has many features and this patch does not implement
all. One major use case of version 2 over version 1 is its timestamp
and direction. So the traffic collector is able to distinguish the
mirrorred traffic better. Other features such as SGT (security group
tag), FT (frame type) for carrying non-ethernet packet, and optional
subheader are not implemented yet.
Example commandline for ERSPAN version 2:
ip link add dev ip6erspan11 type ip6erspan seq key 102 \
local fc00:100::2 remote fc00:100::1 \
erspan_ver 2 erspan_dir 1 erspan_hwid 17
The corresponding iproute2 patch:
https://marc.info/?l=linux-netdev&m=
151321141525106&w=2
William Tu (4):
net: erspan: refactor existing erspan code
net: erspan: introduce erspan v2 for ip_gre
ip6_gre: add erspan v2 support
samples/bpf: add erspan v2 sample code
include/net/erspan.h | 152 ++++++++++++++++++++++++++++++++++++++---
include/net/ip6_tunnel.h | 3 +
include/net/ip_tunnels.h | 5 +-
include/uapi/linux/if_ether.h | 1 +
include/uapi/linux/if_tunnel.h | 3 +
net/ipv4/ip_gre.c | 124 +++++++++++++++++++++++++++------
net/ipv6/ip6_gre.c | 139 +++++++++++++++++++++++++++++++------
net/openvswitch/flow_netlink.c | 8 +--
samples/bpf/tcbpf2_kern.c | 77 ++++++++++++++++++---
samples/bpf/test_tunnel_bpf.sh | 38 ++++++++---
10 files changed, 472 insertions(+), 78 deletions(-)
--
A simple script to test it:
set -ex
function cleanup() {
set +ex
ip netns del ns0
ip link del ip6erspan11
ip link del veth1
}
function main() {
trap cleanup 0 2 3 9
ip netns add ns0
ip link add veth0 type veth peer name veth1
ip link set veth0 netns ns0
# non-namespace
ip addr add dev veth1 fc00:100::2/96
if [ "$1" == "v1" ]; then
echo "create IP6 ERSPAN v1 tunnel"
ip link add dev ip6erspan11 type ip6erspan seq key 102 \
local fc00:100::2 remote fc00:100::1 \
erspan 123 erspan_ver 1
else
echo "create IP6 ERSPAN v2 tunnel"
ip link add dev ip6erspan11 type ip6erspan seq key 102 \
local fc00:100::2 remote fc00:100::1 \
erspan_ver 2 erspan_dir 1 erspan_hwid 17
fi
ip addr add dev ip6erspan11 fc00:200::2/96
ip addr add dev ip6erspan11 10.10.200.2/24
# namespace: ns0
ip netns exec ns0 ip addr add fc00:100::1/96 dev veth0
if [ "$1" == "v1" ]; then
ip netns exec ns0 \
ip link add dev ip6erspan00 type ip6erspan seq key 102 \
local fc00:100::1 remote fc00:100::2 \
erspan 123 erspan_ver 1
else
ip netns exec ns0 \
ip link add dev ip6erspan00 type ip6erspan seq key 102 \
local fc00:100::1 remote fc00:100::2 \
erspan_ver 2 erspan_dir 1 erspan_hwid 7
fi
ip netns exec ns0 ip addr add dev ip6erspan00 fc00:200::1/96
ip netns exec ns0 ip addr add dev ip6erspan00 10.10.200.1/24
ip link set dev veth1 up
ip link set dev ip6erspan11 up
ip netns exec ns0 ip link set dev ip6erspan00 up
ip netns exec ns0 ip link set dev veth0 up
}
main $1
ping6 -c 1 fc00:100::1 || true
ping -c 3 10.10.200.1
exit 0
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
William Tu [Thu, 14 Dec 2017 00:38:58 +0000 (16:38 -0800)]
samples/bpf: add erspan v2 sample code
Extend the existing tests for ipv4 ipv6 erspan version 2.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
William Tu [Thu, 14 Dec 2017 00:38:57 +0000 (16:38 -0800)]
ip6_gre: add erspan v2 support
Similar to support for ipv4 erspan, this patch adds
erspan v2 to ip6erspan tunnel.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
William Tu [Thu, 14 Dec 2017 00:38:56 +0000 (16:38 -0800)]
net: erspan: introduce erspan v2 for ip_gre
The patch adds support for erspan version 2. Not all features are
supported in this patch. The SGT (security group tag), GRA (timestamp
granularity), FT (frame type) are set to fixed value. Only hardware
ID and direction are configurable. Optional subheader is also not
supported.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
William Tu [Thu, 14 Dec 2017 00:38:55 +0000 (16:38 -0800)]
net: erspan: refactor existing erspan code
The patch refactors the existing erspan implementation in order
to support erspan version 2, which has additional metadata. So, in
stead of having one 'struct erspanhdr' holding erspan version 1,
breaks it into 'struct erspan_base_hdr' and 'struct erspan_metadata'.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 15 Dec 2017 17:26:13 +0000 (12:26 -0500)]
Merge branch 'nfp-ethtool-flash-updates'
Jakub Kicinski says:
====================
nfp: ethtool flash updates
Dirk says:
This series adds the ability to update the control FW with ethtool.
It should be noted that the locking scheme here is to release the RTNL
lock before the flashing operation and to take it again afterwards to
ensure consistent state from the core code point of view. In this time,
we take a reference to the device to prevent the device being freed
while its being flashed.
This provides protection for the device being flashed while at the same
time not holding up any networking related functions which would
otherwise be locked out due to RTNL being held.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Dirk van der Merwe [Wed, 13 Dec 2017 22:45:02 +0000 (14:45 -0800)]
nfp: implement firmware flashing
Firmware flashing takes around 60s (specified to not take more than
70s). Prevent hogging the RTNL lock in this time and make use of the
longer timeout for the NSP command. The timeout is set to 2.5 * 70
seconds.
We only allow flashing the firmware from reprs or PF netdevs. VFs do not
have an app reference.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dirk van der Merwe [Wed, 13 Dec 2017 22:45:01 +0000 (14:45 -0800)]
nfp: extend NSP infrastructure for configurable timeouts
The firmware flashing NSP operation takes longer to execute than the
current default timeout. We need a mechanism to set a longer timeout for
some commands. This patch adds the infrastructure to this.
The default timeout is still 30 seconds.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 15 Dec 2017 16:36:54 +0000 (11:36 -0500)]
Merge branch 'ipvlan-packet-scrub'
Mahesh Bandewar says:
====================
ipvlan: packet scrub
While crossing namespace boundary IPvlan aggressively scrubs packets.
This is creating problems. First thing is that scrubbing changes the
packet type in skb meta-data to PACKET_HOST. This causes erroneous
packet delivery when dev_forward_skb() has already marked the packet
type as OTHER_HOST.
On the egress side scrubbing just before calling dev_queue_xmit()
creates another set of problems. Scrubbing remove skb->sk so the
prio update gets missed and more seriously, socket back-pressure
fails making TSQ not function correctly.
The first patch in the series just reverts the earlier change which
was adding a mac-check, but that is unnecessary if packet_type that
dev_forward_skb() has set is honored. The second path removes two of
the scrubs which are causing problems described above.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mahesh Bandewar [Wed, 13 Dec 2017 22:40:26 +0000 (14:40 -0800)]
ipvlan: remove excessive packet scrubbing
IPvlan currently scrubs packets at every location where packets may be
crossing namespace boundary. Though this is desirable, currently IPvlan
does it more than necessary. e.g. packets that are going to take
dev_forward_skb() path will get scrubbed so no point in scrubbing them
before forwarding. Another side-effect of scrubbing is that pkt-type gets
set to PACKET_HOST which overrides what was already been set by the
earlier path making erroneous delivery of the packets.
Also scrubbing packets just before calling dev_queue_xmit() has detrimental
effects since packets lose skb->sk and because of that miss prio updates,
incorrect socket back-pressure and would even break TSQ.
Fixes:
b93dd49c1a35 ('ipvlan: Scrub skb before crossing the namespace boundary')
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mahesh Bandewar [Wed, 13 Dec 2017 22:40:23 +0000 (14:40 -0800)]
Revert "ipvlan: add L2 check for packets arriving via virtual devices"
This reverts commit
92ff42645028fa6f9b8aa767718457b9264316b4.
Even though the check added is not that taxing, it's not really needed.
First of all this will be per packet cost and second thing is that the
eth_type_trans() already does this correctly. The excessive scrubbing
in IPvlan was changing the pkt-type skb metadata of the packet which
made it necessary to re-check the mac. The subsequent patch in this
series removes the faulty packet-scrub.
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Wed, 13 Dec 2017 19:41:06 +0000 (14:41 -0500)]
sock: free skb in skb_complete_tx_timestamp on error
skb_complete_tx_timestamp must ingest the skb it is passed. Call
kfree_skb if the skb cannot be enqueued.
Fixes:
b245be1f4db1 ("net-timestamp: no-payload only sysctl")
Fixes:
9ac25fc06375 ("net: fix socket refcounting in skb_complete_tx_timestamp()")
Reported-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 15 Dec 2017 16:29:44 +0000 (11:29 -0500)]
Merge branch 's390-fixes'
Julian Wiedmann says:
====================
s390/qeth: fixes 2017-12-13
some more patches for 4.15, that fix multiple issues with IP Takeover
configuration in qeth.
Please queue them up for stable kernels as well (4.9 and newer).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Wed, 13 Dec 2017 17:56:32 +0000 (18:56 +0100)]
s390/qeth: update takeover IPs after configuration change
Any modification to the takeover IP-ranges requires that we re-evaluate
which IP addresses are takeover-eligible. Otherwise we might do takeover
for some addresses when we no longer should, or vice-versa.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Wed, 13 Dec 2017 17:56:31 +0000 (18:56 +0100)]
s390/qeth: lock IP table while applying takeover changes
Modifying the flags of an IP addr object needs to be protected against
eg. concurrent removal of the same object from the IP table.
Fixes:
5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback")
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Wed, 13 Dec 2017 17:56:30 +0000 (18:56 +0100)]
s390/qeth: don't apply takeover changes to RXIP
When takeover is switched off, current code clears the 'TAKEOVER' flag on
all IPs. But the flag is also used for RXIP addresses, and those should
not be affected by the takeover mode.
Fix the behaviour by consistenly applying takover logic to NORMAL
addresses only.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Wed, 13 Dec 2017 17:56:29 +0000 (18:56 +0100)]
s390/qeth: apply takeover changes when mode is toggled
Just as for an explicit enable/disable, toggling the takeover mode also
requires that the IP addresses get updated. Otherwise all IPs that were
added to the table before the mode-toggle, get registered with the old
settings.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Will Deacon [Fri, 15 Dec 2017 16:07:22 +0000 (16:07 +0000)]
arm64: fpsimd: Fix copying of FP state from signal frame into task struct
Commit
9de52a755cfb6da5 ("arm64: fpsimd: Fix failure to restore FPSIMD
state after signals") fixed an issue reported in our FPSIMD signal
restore code but inadvertently introduced another issue which tends to
manifest as random SEGVs in userspace.
The problem is that when we copy the struct fpsimd_state from the kernel
stack (populated from the signal frame) into the struct held in the
current thread_struct, we blindly copy uninitialised stack into the
"cpu" field, which means that context-switching of the FP registers is
no longer reliable.
This patch fixes the problem by copying only the user_fpsimd member of
struct fpsimd_state. We should really rework the function prototypes
to take struct user_fpsimd_state * instead, but let's just get this
fixed for now.
Cc: Dave Martin <Dave.Martin@arm.com>
Fixes:
9de52a755cfb6da5 ("arm64: fpsimd: Fix failure to restore FPSIMD state after signals")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
David S. Miller [Fri, 15 Dec 2017 16:10:27 +0000 (11:10 -0500)]
Merge branch 'master' of git://git./linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2017-12-15
1) Currently we can add or update socket policies, but
not clear them. Support clearing of socket policies
too. From Lorenzo Colitti.
2) Add documentation for the xfrm device offload api.
From Shannon Nelson.
3) Fix IPsec extended sequence numbers (ESN) for
IPsec offloading. From Yossef Efraim.
4) xfrm_dev_state_add function returns success even for
unsupported options, fix this to fail in such cases.
From Yossef Efraim.
5) Remove a redundant xfrm_state assignment.
From Aviv Heller.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 15 Dec 2017 16:02:11 +0000 (11:02 -0500)]
Merge tag 'batadv-net-for-davem-
20171215' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
Here are some batman-adv bugfixes:
- Initialize the fragment headers, by Sven Eckelmann
- Fix a NULL check in BATMAN V, by Sven Eckelmann
- Fix kernel doc for the time_setup() change, by Sven Eckelmann
- Use the right lock in BATMAN IV OGM Update, by Sven Eckelmann
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 15 Dec 2017 01:59:40 +0000 (17:59 -0800)]
net: dsa: bcm_sf2: Update compatible string for 7278B0
Update the compatible string and Device Tree binding document for
7278B0.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 15 Dec 2017 15:55:35 +0000 (10:55 -0500)]
Merge branch 'hnx3-vf'
Salil Mehta says:
====================
Hisilicon Network Subsystem 3 VF Ethernet Driver
This patch-set contains the support of the HNS3 (Hisilicon Network Subsystem 3)
Virtual Function Ethernet driver for hip08 family of SoCs. The Physical Function
driver is already part of the Linux mainline.
This VF driver has its Hardware Compatibility Layer and has commom/unified ENET
layer/client/ethtool code with the PF driver. It also has support of mailbox to
communicate with the HNS3 PF driver. The basic architecture of VF driver is
derivative of the PF driver. Just like PF driver, this driver is also PCI
Express based.
This driver is the ongoing development work and HNS3 VF Ethernet driver would be
incrementally enhanced with more new features.
High Level Architecture:
[ Ethtool ]
|
[ Ethernet Client ] ... [ RoCE Client ]
| |
[ HNAE Device ] |________
| | |
--------------------------------------------- |
|
[ HNAE3 Framework (Register/unregister) ] |
|
--------------------------------------------- |
| |
[ VF HCLGE Layer ] |
| | |
| | |
| | |
| [ VF Mailbox (To PF via IMP) ] |
| | |
[ IMP command Interface ] [ IMP command Interface ]
| |
| |
(A B O V E R U N S O N G U E S T S Y S T E M)
-------------------------------------------------------------
Q E M U / V F I O / K V M (on Host System)
-------------------------------------------------------------
HIP08 H A R D W A R E (limited to VF by SMMU)
[ IMP/Mgmt Processor (hardware common to system/cmd based) ]
Fig 1. HNS3 Virtual Function Driver
[ dcbnl ] [ Ethtool ]
| |
[ Ethernet Client ] [ ODP/UIO Client ] . . .[ RoCE Client ]
|_____________________| |
| _________|
[ HNAE Device ] | |
| | |
--------------------------------------------- |
|
[ HNAE3 Framework (Register/unregister) ] |
|
--------------------------------------------- |
| |
[ HCLGE Layer ] |
________________|_________________ |
| | | |
[ DCB ] | | |
| | | |
[ Scheduler/Shaper ] [ MDIO ] [ PF Mailbox ] |
| | | |
|________________|_________________| |
| |
[ IMP command Interface ] [ IMP command Interface ]
----------------------------------------------------------------
HIP08 H A R D W A R E
[ IMP/Mgmt Processor (hardware common to system/cmd based) ]
Fig 2. Existing HNS3 PF Driver (added with mailbox)
Change Log Summary:
Patch V4: Addressed SPDX related comment by Philippe Ombredanne
Patch V3: Addressed SPDX change requested by Philippe Ombredanne
Patch V2: 1. Addressed some comments by David Miller.
2. Addressed some internal comments on various patches
Patch V1: Initial Submit
====================
Acked-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Salil Mehta [Thu, 14 Dec 2017 18:03:09 +0000 (18:03 +0000)]
net: hns3: Add mailbox interrupt handling to PF driver
All PF mailbox events are conveyed through a common interrupt
(vector 0). This interrupt vector is shared by reset and mailbox.
This patch adds the handling of mailbox interrupt event and its
deferred processing in context to a separate mailbox task.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Salil Mehta [Thu, 14 Dec 2017 18:03:08 +0000 (18:03 +0000)]
net: hns3: Change PF to add ring-vect binding & resetQ to mailbox
This patch is required to support ring-vector binding and reset
of TQPs requested by the VF driver to the PF driver. Mailbox
handler is added with corresponding VF commands/messages to
handle the request.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Salil Mehta [Thu, 14 Dec 2017 18:03:07 +0000 (18:03 +0000)]
net: hns3: Add mailbox support to PF driver
Command queue provides the provision of Mailbox command which
can be used for communication between PF and VF. PF handles
messages from various VFs for fetching various information like,
queue, vlan, link status related etc. It also handles the request
from various VFs to perform certain privileged operations.
This patch adds the support of a message handler for handling
such various command requests from VF.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Salil Mehta [Thu, 14 Dec 2017 18:03:06 +0000 (18:03 +0000)]
net: hns3: Unified HNS3 {VF|PF} Ethernet Driver for hip08 SoC
Most of the NAPI handling interface, skb buffer management,
management of the RX/TX descriptors, ethool interface etc.
has quite a bit of code which is common to VF and PF driver.
This patch makes the exisitng PF's HNS3 ENET driver as the
common ENET driver for both Virtual & Physical Function. This
will help in reduction of redundancy and better management of
code.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Salil Mehta [Thu, 14 Dec 2017 18:03:05 +0000 (18:03 +0000)]
net: hns3: Add HNS3 VF driver to kernel build framework
This patch introduces the new Makefiles and updates existing
Makefiles required to build the HNS3 Virtual Function driver.
This also updates the Kconfig for introduction of new menuconfig
entries related to VF driver.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Salil Mehta [Thu, 14 Dec 2017 18:03:04 +0000 (18:03 +0000)]
net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support
This patch adds the support of hardware compatibiltiy layer to the
HNS3 VF Driver. This layer implements various {set|get} operations
over MAC address for a virtual port, RSS related configuration,
fetches the link status info from PF, does various VLAN related
configuration over the virtual port, queries the statistics from
the hardware etc.
This layer can directly interact with hardware through the
IMP(Integrated Mangement Processor) interface or can use mailbox
to interact with the PF driver.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Salil Mehta [Thu, 14 Dec 2017 18:03:03 +0000 (18:03 +0000)]
net: hns3: Add mailbox support to VF driver
This patch adds the support of the mailbox to the VF driver. The
mailbox shall be used as an interface to communicate with the
PF driver for various purposes like {set|get} MAC related
operations, reset, link status etc. The mailbox supports both
synchronous and asynchronous command send to PF driver.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Salil Mehta [Thu, 14 Dec 2017 18:03:02 +0000 (18:03 +0000)]
net: hns3: Add HNS3 VF IMP(Integrated Management Proc) cmd interface
This patch adds support of command interface for communication with
the IMP(Integrated Management Processor) for HNS3 Virtual Function
Driver.
Each VF has support of CQP(Command Queue Pair) ring interface.
Each CQP consis of send queue CSQ and receive queue CRQ.
There are various commands a VF may support, like to query frimware
version, TQP management, statistics, interrupt related, mailbox etc.
This also contains code to initialize the command queue, manage the
command queue descriptors and Rx/Tx protocol with the command processor
in the form of various commands/results and acknowledgements.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: lipeng <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Fri, 15 Dec 2017 07:44:21 +0000 (08:44 +0100)]
mlxsw: spectrum: Disable MAC learning for ovs port
Learning is currently enabled for ports which are OVS slaves -
even though OVS doesn't need this indication.
Since we're not associating a fid with the port, HW would continuously
notify driver of learned [& aged] MACs which would be logged as errors.
Fixes:
2b94e58df58c ("mlxsw: spectrum: Allow ports to work under OVS master")
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 15 Dec 2017 15:31:55 +0000 (10:31 -0500)]
Merge branch 'dsa-MT7530-vlan'
Sean Wang says:
====================
add VLAN support to DSA MT7530
Changes sicne v2:
update to the latest code base from net-next and fix up all building
errors with -Werror.
Changes since v1:
- fix up the typo
- prefer ordering declarations longest to shortest
- update that vlan_prepare callback should not change any state
- use lower case letter for function naming
The patchset extends DSA MT7530 to VLAN support through filling required
callbacks in patch 1 and merging the special tag with VLAN tag in patch 2
for allowing that the hardware can handle these packets with VID from the
CPU port.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sean Wang [Fri, 15 Dec 2017 04:47:02 +0000 (12:47 +0800)]
net: dsa: mediatek: update MAINTAINERS entry with MediaTek switch driver
I work for MediaTek and maintain SoC targeting to home gateway and
also will keep extending and testing the function from MediaTek
switch.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sean Wang [Fri, 15 Dec 2017 04:47:01 +0000 (12:47 +0800)]
net: dsa: mediatek: combine MediaTek tag with VLAN tag
In order to let MT7530 switch can recognize well those egress packets
having both special tag and VLAN tag, the information about the special
tag should be carried on the existing VLAN tag. On the other hand, it's
unnecessary for extra handling for ingress packets when VLAN tag is
present since it is able to put the VLAN tag after the special tag and
then follow the existing way to parse.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sean Wang [Fri, 15 Dec 2017 04:47:00 +0000 (12:47 +0800)]
net: dsa: mediatek: add VLAN support for MT7530
MT7530 can treat each port as either VLAN-unaware port or VLAN-aware port
through the implementation of port matrix mode or port security mode on
the ingress port, respectively. On one hand, Each port has been acting as
the VLAN-unaware one whenever the device is created in the initial or
certain port joins or leaves into/from the bridge at the runtime. On the
other hand, the patch just filling the required callbacks for VLAN
operations is achieved via extending the port to be into port security
mode when the port is configured as VLAN-aware port. Which mode can make
the port be able to recognize VID from incoming packets and look up VLAN
table to validate and judge which port it should be going to. And the
range for VID from 1 to 4094 is valid for the hardware.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steven Rostedt [Sat, 2 Dec 2017 18:04:54 +0000 (13:04 -0500)]
sched/rt: Do not pull from current CPU if only one CPU to pull
Daniel Wagner reported a crash on the BeagleBone Black SoC.
This is a single CPU architecture, and does not have a functional
arch_send_call_function_single_ipi() implementation which can crash
the kernel if that is called.
As it only has one CPU, it shouldn't be called, but if the kernel is
compiled for SMP, the push/pull RT scheduling logic now calls it for
irq_work if the one CPU is overloaded, it can use that function to call
itself and crash the kernel.
Ideally, we should disable the SCHED_FEAT(RT_PUSH_IPI) if the system
only has a single CPU. But SCHED_FEAT is a constant if sched debugging
is turned off. Another fix can also be used, and this should also help
with normal SMP machines. That is, do not initiate the pull code if
there's only one RT overloaded CPU, and that CPU happens to be the
current CPU that is scheduling in a lower priority task.
Even on a system with many CPUs, if there's many RT tasks waiting to
run on a single CPU, and that CPU schedules in another RT task of lower
priority, it will initiate the PULL logic in case there's a higher
priority RT task on another CPU that is waiting to run. But if there is
no other CPU with waiting RT tasks, it will initiate the RT pull logic
on itself (as it still has RT tasks waiting to run). This is a wasted
effort.
Not only does this help with SMP code where the current CPU is the only
one with RT overloaded tasks, it should also solve the issue that
Daniel encountered, because it will prevent the PULL logic from
executing, as there's only one CPU on the system, and the check added
here will cause it to exit the RT pull code.
Reported-by: Daniel Wagner <wagi@monom.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Cc: stable@vger.kernel.org
Fixes:
4bdced5c9 ("sched/rt: Simplify the IPI based RT balancing logic")
Link: http://lkml.kernel.org/r/20171202130454.4cbbfe8d@vmware.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Fri, 15 Dec 2017 12:47:51 +0000 (13:47 +0100)]
tools/headers: Synchronize kernel <-> tooling headers
Two kernel headers got modified recently, which are used by tooling as well:
tools/include/uapi/linux/kvm.h
arch/x86/include/asm/cpufeatures.h
None of those changes have an effect on tooling, so do a plain copy.
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>