linux-2.6-block.git
9 years agotools/lguest: more documentation and checking of virtio 1.0 compliance.
Rusty Russell [Fri, 13 Feb 2015 06:43:43 +0000 (17:13 +1030)]
tools/lguest: more documentation and checking of virtio 1.0 compliance.

This is from all the non-PCI parts of the spec.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: don't look in console features to find emerg_wr.
Rusty Russell [Fri, 13 Feb 2015 06:43:43 +0000 (17:13 +1030)]
lguest: don't look in console features to find emerg_wr.

The 1.0 spec clearly states that you must set the ACKNOWLEDGE and
DRIVER status bits before accessing the feature bits.  This is a
problem for the early console code, which doesn't really want to
acknowledge the device (the spec specifically excepts writing to the
console's emerg_wr from the usual ordering constrains).

Instead, we check that the *size* of the device configuration is
sufficient to hold emerg_wr: at worst (if the device doesn't support
the VIRTIO_CONSOLE_F_EMERG_WRITE feature), it will ignore the
writes.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agotools/lguest: don't start devices until DRIVER_OK status set.
Rusty Russell [Fri, 13 Feb 2015 06:43:42 +0000 (17:13 +1030)]
tools/lguest: don't start devices until DRIVER_OK status set.

We were activating them with the virtqueues, and that's not allowed.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agotools/lguest: handle indirect partway through chain.
Rusty Russell [Fri, 13 Feb 2015 06:43:42 +0000 (17:13 +1030)]
tools/lguest: handle indirect partway through chain.

Linux doesn't generate these, but it's perfectly valid according to
a close reading of the spec.  I opened virtio spec bug VIRTIO-134 to
make this clearer there, too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agotools/lguest: insert driver references from the 1.0 spec (4.1 Virtio Over PCI)
Rusty Russell [Fri, 13 Feb 2015 06:43:42 +0000 (17:13 +1030)]
tools/lguest: insert driver references from the 1.0 spec (4.1 Virtio Over PCI)

As a demonstration, the lguest launcher is pretty strict, trying to
catch badly behaved drivers.  Document this precisely.

A good implementation would *NOT* crash the guest when these happened!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agotools/lguest: insert device references from the 1.0 spec (4.1 Virtio Over PCI)
Rusty Russell [Fri, 13 Feb 2015 06:43:41 +0000 (17:13 +1030)]
tools/lguest: insert device references from the 1.0 spec (4.1 Virtio Over PCI)

There are some (optional) parts we don't implement, but this quotes all
the device requirements from the spec (csd 03, but it should be the same
across all released versions).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agotools/lguest: rename virtio_pci_cfg_cap field to match spec.
Rusty Russell [Fri, 13 Feb 2015 06:43:41 +0000 (17:13 +1030)]
tools/lguest: rename virtio_pci_cfg_cap field to match spec.

The next patch will insert many quotes from the virtio 1.0 spec; they
make most sense if we copy the spec.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agotools/lguest: fix features_accepted logic in example launcher.
Rusty Russell [Fri, 13 Feb 2015 06:43:41 +0000 (17:13 +1030)]
tools/lguest: fix features_accepted logic in example launcher.

We were clearing the lower bits when setting the upper bits.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agotools/lguest: handle device reset correctly in example launcher.
Rusty Russell [Fri, 13 Feb 2015 06:43:40 +0000 (17:13 +1030)]
tools/lguest: handle device reset correctly in example launcher.

The example launcher doesn't reset the queue_enable like the spec says
we have to.  Plus, we should reset the size in case they negotiated
a different (smaller) one.

This is easy to test by unloading and reloading a virtio module.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agovirtual: Documentation: simplify and generalize paravirt_ops.txt
Luis R. Rodriguez [Fri, 13 Feb 2015 06:43:40 +0000 (17:13 +1030)]
virtual: Documentation: simplify and generalize paravirt_ops.txt

The general documentation we have for pv_ops is currenty present
on the IA64 docs, but since this documentation covers IA64 xen
enablement and IA64 Xen support got ripped out a while ago
through commit d52eefb47 present since v3.14-rc1 lets just
simplify, generalize and move the pv_ops documentation to a
shared place.

Cc: Isaku Yamahata <yamahata@valinux.co.jp>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: virtualization@lists.linux-foundation.org
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: xen-devel@lists.xenproject.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: remove NOTIFY call and eventfd facility.
Rusty Russell [Wed, 11 Feb 2015 04:58:01 +0000 (15:28 +1030)]
lguest: remove NOTIFY call and eventfd facility.

Disappointing, as this was kind of neat (especially getting to use RCU
to manage the address -> eventfd mapping).  But now the devices are PCI
handled in userspace, we get rid of both the NOTIFY hypercall and
the interface to connect an eventfd.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: remove NOTIFY facility from demonstration launcher.
Rusty Russell [Wed, 11 Feb 2015 04:57:01 +0000 (15:27 +1030)]
lguest: remove NOTIFY facility from demonstration launcher.

This was only used for early console, now we can get rid of it altogether.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: use the PCI console device's emerg_wr for early boot messages.
Rusty Russell [Wed, 11 Feb 2015 04:56:01 +0000 (15:26 +1030)]
lguest: use the PCI console device's emerg_wr for early boot messages.

This involves manually checking the console device (which is always in
slot 1 of bus 0) and using the window in VIRTIO_PCI_CAP_PCI_CFG to
program it (as we can't map the BAR yet).

We could in fact do this much earlier, but we wait for the first
write from the virtio_cons_early_init() facility.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: always put console in PCI slot #1.
Rusty Russell [Wed, 11 Feb 2015 04:55:01 +0000 (15:25 +1030)]
lguest: always put console in PCI slot #1.

This simplifies the early probe.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: support backdoor window.
Rusty Russell [Wed, 11 Feb 2015 04:54:01 +0000 (15:24 +1030)]
lguest: support backdoor window.

The VIRTIO_PCI_CAP_PCI_CFG in the PCI virtio 1.0 spec allows access to
the BAR registers without mapping them.  This is a compulsory feature,
and we implement it here.

There are some subtleties involving access widths which we should
note:

4.1.4.7.1 Device Requirements: PCI configuration access capability

...
   Upon detecting driver write access to pci_cfg_data, the device MUST
   execute a write access at offset cap.offset at BAR selected by
   cap.bar using the first cap.length bytes from pci_cfg_data.

   Upon detecting driver read access to pci_cfg_data, the device MUST
   execute a read access of length cap.length at offset cap.offset at
   BAR selected by cap.bar and store the first cap.length bytes in
   pci_cfg_data.

So, for a write, we copy into the pci_cfg_data window, then write from
there out to the BAR.  This works correctly if cap.length != width of
write.  Similarly, for a read, we read into window from the BAR then
read the value from there.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: support emerg_wr in console device in example launcher.
Rusty Russell [Wed, 11 Feb 2015 04:53:01 +0000 (15:23 +1030)]
lguest: support emerg_wr in console device in example launcher.

This is a magic register which causes a character to be outputted: it can
be used even before the device is configured.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: remove lguest bus definitions from header.
Rusty Russell [Wed, 11 Feb 2015 04:52:01 +0000 (15:22 +1030)]
lguest: remove lguest bus definitions from header.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: remove support for lguest bus in demonstration launcher.
Rusty Russell [Wed, 11 Feb 2015 04:51:01 +0000 (15:21 +1030)]
lguest: remove support for lguest bus in demonstration launcher.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: remove support for lguest bus.
Rusty Russell [Wed, 11 Feb 2015 04:50:01 +0000 (15:20 +1030)]
lguest: remove support for lguest bus.

The demonstration launcher now uses PCI entirely.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: define VIRTIO_CONFIG_NO_LEGACY in example launcher.
Rusty Russell [Wed, 11 Feb 2015 04:49:01 +0000 (15:19 +1030)]
lguest: define VIRTIO_CONFIG_NO_LEGACY in example launcher.

We only support virtio 1.0 now

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: Convert console device to virtio 1.0 PCI.
Rusty Russell [Wed, 11 Feb 2015 04:48:01 +0000 (15:18 +1030)]
lguest: Convert console device to virtio 1.0 PCI.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: Convert entropy device to virtio 1.0 PCI.
Rusty Russell [Wed, 11 Feb 2015 04:47:01 +0000 (15:17 +1030)]
lguest: Convert entropy device to virtio 1.0 PCI.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: Convert net device to virtio 1.0 PCI.
Rusty Russell [Wed, 11 Feb 2015 04:46:01 +0000 (15:16 +1030)]
lguest: Convert net device to virtio 1.0 PCI.

The only real change here (other than using the PCI bus) is that we
didn't negotiate VIRTIO_NET_F_MRG_RXBUF before, so the format of the
packet header changed with virtio 1.0; we need TUNSETVNETHDRSZ on the
tun fd to tell it about the extra two bytes.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: Convert block device to virtio 1.0 PCI.
Rusty Russell [Wed, 11 Feb 2015 04:45:12 +0000 (15:15 +1030)]
lguest: Convert block device to virtio 1.0 PCI.

We remove SCSI support (which was removed for 1.0) and VIRTIO_BLK_F_FLUSH
feature flag (removed too, since it's compulsory for 1.0).

The rest is mainly mechanical.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: add a dummy PCI host bridge.
Rusty Russell [Wed, 11 Feb 2015 04:45:12 +0000 (15:15 +1030)]
lguest: add a dummy PCI host bridge.

Otherwise Linux fails to find the bus.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: fix failure to find linux/virtio_types.h
Rusty Russell [Wed, 11 Feb 2015 04:45:11 +0000 (15:15 +1030)]
lguest: fix failure to find linux/virtio_types.h

We want to use the local kernel headers, but -I../../include/uapi leads us into
a world of hurt.  Instead we create a dummy include/ dir with symlinks.

If we just use #include "../../include/uapi/linux/virtio_blk.h" we get:

../../include/uapi/linux/virtio_blk.h:31:32: fatal error: linux/virtio_types.h: No such file or directory
 #include <linux/virtio_types.h>

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: implement virtio-PCI MMIO accesses.
Rusty Russell [Wed, 11 Feb 2015 04:45:11 +0000 (15:15 +1030)]
lguest: implement virtio-PCI MMIO accesses.

For each device, We need to include the vendor capabilities to demark
where virtio common, notification and ISR regions are (we put them
all in BAR0).

We need to handle the switching of the virtqueues using the accessors.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: add PCI config space emulation to example launcher.
Rusty Russell [Wed, 11 Feb 2015 04:45:11 +0000 (15:15 +1030)]
lguest: add PCI config space emulation to example launcher.

This handles ioport 0xCF8 and 0xCFC accesses, which are used to
read/write PCI device config space.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: decode mmio accesses for PCI in example launcher.
Rusty Russell [Wed, 11 Feb 2015 04:45:11 +0000 (15:15 +1030)]
lguest: decode mmio accesses for PCI in example launcher.

We don't do anything with them yet (emulate_mmio_write and
emulate_mmio_read are stubs), but we decode the instructions and
search for the device they're hitting.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: add MMIO region allocator in example launcher.
Rusty Russell [Wed, 11 Feb 2015 04:45:11 +0000 (15:15 +1030)]
lguest: add MMIO region allocator in example launcher.

This is where we point our PCI BARs, so that we can intercept MMIO
accesses.  We tell the kernel about it so any faults in this area are
directed to us.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: Override pcibios_enable_irq/pcibios_disable_irq to our stupid PIC
Rusty Russell [Wed, 11 Feb 2015 04:45:10 +0000 (15:15 +1030)]
lguest: Override pcibios_enable_irq/pcibios_disable_irq to our stupid PIC

This lets us deliver interrupts for our emulated PCI devices using our
dumb PIC, and not emulate an 8259 and PCI irq mapping tables or whatever.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: disable ACPI explicitly.
Rusty Russell [Wed, 11 Feb 2015 04:45:10 +0000 (15:15 +1030)]
lguest: disable ACPI explicitly.

Once we add PCI, it starts trying to manage our interrupts.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: add iomem region, where guest page faults get sent to userspace.
Rusty Russell [Wed, 11 Feb 2015 04:45:10 +0000 (15:15 +1030)]
lguest: add iomem region, where guest page faults get sent to userspace.

This lets us implement PCI.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: don't disable iospace.
Rusty Russell [Wed, 11 Feb 2015 04:45:10 +0000 (15:15 +1030)]
lguest: don't disable iospace.

This no longer speeds up boot (IDE got better, I guess), but it does stop
us probing for a PCI bus.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: suppress PS/2 keyboard polling.
Rusty Russell [Wed, 11 Feb 2015 04:45:10 +0000 (15:15 +1030)]
lguest: suppress PS/2 keyboard polling.

While hacking on getting I/O out to the lguest launcher, I noticed
that returning 0xFF for the PS/2 keyboard status made it spin for a
while thinking there was a key pending.  Fix this by returning 1
instead of 0xFF.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: send trap 13 through to userspace.
Rusty Russell [Wed, 11 Feb 2015 04:45:10 +0000 (15:15 +1030)]
lguest: send trap 13 through to userspace.

We copy 7 bytes at eip for userspace's instruction decode; we have to
carefully handle the case where eip is at the end of a page.  We can't
leave this to userspace since kernel has all the page table decode
logic.

The decode logic moves to userspace, basically unchanged.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: add infrastructure to check mappings.
Rusty Russell [Wed, 11 Feb 2015 04:45:09 +0000 (15:15 +1030)]
lguest: add infrastructure to check mappings.

We normally abort the guest unconditionally when it gives us a bad address,
but in the next patch we want to copy some bytes which may not be mapped.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: add infrastructure for userspace to deliver a trap to the guest.
Rusty Russell [Wed, 11 Feb 2015 04:45:09 +0000 (15:15 +1030)]
lguest: add infrastructure for userspace to deliver a trap to the guest.

This is required for instruction emulation to move to userspace.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: write more information to userspace about pending traps.
Rusty Russell [Wed, 11 Feb 2015 04:45:09 +0000 (15:15 +1030)]
lguest: write more information to userspace about pending traps.

This is preparation for userspace handling MMIO and ioport accesses.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: add operations to get/set a register from the Launcher.
Rusty Russell [Wed, 11 Feb 2015 04:45:09 +0000 (15:15 +1030)]
lguest: add operations to get/set a register from the Launcher.

We use the ptrace API struct, and we currently don't let them set
anything but the normal registers (we'd have to filter the others).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agolguest: have --rng read from /dev/urandom not /dev/random.
Rusty Russell [Wed, 11 Feb 2015 04:45:09 +0000 (15:15 +1030)]
lguest: have --rng read from /dev/urandom not /dev/random.

Theoretical debates aside, now it boots.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agovirtio: don't require a config space on the console device.
Rusty Russell [Wed, 11 Feb 2015 04:31:14 +0000 (15:01 +1030)]
virtio: don't require a config space on the console device.

Strictly, it's only needed when we have features (size or multiport).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agovirtio_pci: use 16-bit accessor for queue_enable.
Rusty Russell [Wed, 11 Feb 2015 04:31:14 +0000 (15:01 +1030)]
virtio_pci: use 16-bit accessor for queue_enable.

Since PCI is little endian, 8-bit access might work, but the spec section
is very clear on this:

  4.1.3.1 Driver Requirements: PCI Device Layout

  The driver MUST access each field using the “natural” access method,
  i.e. 32-bit accesses for 32-bit fields, 16-bit accesses for 16-bit
  fields and 8-bit accesses for 8-bit fields.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
9 years agovirtio: Don't expose legacy config features when VIRTIO_CONFIG_NO_LEGACY defined.
Rusty Russell [Wed, 11 Feb 2015 04:31:14 +0000 (15:01 +1030)]
virtio: Don't expose legacy config features when VIRTIO_CONFIG_NO_LEGACY defined.

The VIRTIO_F_ANY_LAYOUT and VIRTIO_F_NOTIFY_ON_EMPTY features are pre-1.0
only.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
9 years agovirtio: Don't expose legacy block features when VIRTIO_BLK_NO_LEGACY defined.
Rusty Russell [Wed, 11 Feb 2015 04:31:14 +0000 (15:01 +1030)]
virtio: Don't expose legacy block features when VIRTIO_BLK_NO_LEGACY defined.

This allows modern implementations to ensure they don't use legacy
feature bits or SCSI commands (which are not used in v1.0 non-legacy).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
9 years agovirtio: define VIRTIO_PCI_CAP_PCI_CFG in header.
Rusty Russell [Wed, 11 Feb 2015 04:31:14 +0000 (15:01 +1030)]
virtio: define VIRTIO_PCI_CAP_PCI_CFG in header.

This provides backdoor access to the device MMIOs, and every device should
have one.  From the virtio 1.0 spec (CS03):

  4.1.4.7.1 Device Requirements: PCI configuration access capability

  The device MUST present at least one VIRTIO_PCI_CAP_PCI_CFG capability.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
9 years agovirtio: Avoid possible kernel panic if DEBUG is enabled.
Tetsuo Handa [Wed, 11 Feb 2015 04:31:13 +0000 (15:01 +1030)]
virtio: Avoid possible kernel panic if DEBUG is enabled.

The virtqueue_add() calls START_USE() upon entry. The virtqueue_kick() is
called if vq->num_added == (1 << 16) - 1 before calling END_USE().
The virtqueue_kick_prepare() called via virtqueue_kick() calls START_USE()
upon entry, and will call panic() if DEBUG is enabled.
Move this virtqueue_kick() call to after END_USE() call.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agovirtio-mmio: Update the device to OASIS spec version
Pawel Moll [Fri, 23 Jan 2015 04:15:55 +0000 (14:45 +1030)]
virtio-mmio: Update the device to OASIS spec version

This patch add a support for second version of the virtio-mmio device,
which follows OASIS "Virtual I/O Device (VIRTIO) Version 1.0"
specification.

Main changes:

1. The control register symbolic names use the new device/driver
   nomenclature rather than the old guest/host one.

2. The driver detect the device version (version 1 is the pre-OASIS
   spec, version 2 is compatible with fist revision of the OASIS spec)
   and drives the device accordingly.

3. New version uses direct addressing (64 bit address split into two
   low/high register) instead of the guest page size based one,
   and addresses each part of the queue (descriptors, available, used)
   separately.

4. The device activity is now explicitly triggered by writing to the
   "queue ready" register.

5. Whole 64 bit features are properly handled now (both ways).

Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agovirtio_pci_modern: drop an unused function
Michael S. Tsirkin [Tue, 20 Jan 2015 12:30:52 +0000 (14:30 +0200)]
virtio_pci_modern: drop an unused function

release function in modern driver is unused:
it's a left-over from when each driver had
to have its own release.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
9 years agovirtio_pci: add module param to force legacy mode
Michael S. Tsirkin [Thu, 15 Jan 2015 15:54:13 +0000 (17:54 +0200)]
virtio_pci: add module param to force legacy mode

If set, try legacy interface first, modern one if that fails.  Useful to
work around device/driver bugs, and for compatibility testing.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agovirtio_pci: add an option to disable legacy driver
Michael S. Tsirkin [Thu, 15 Jan 2015 14:06:26 +0000 (16:06 +0200)]
virtio_pci: add an option to disable legacy driver

Useful for testing device virtio 1 compatibility.
Based on patch by Rusty - couldn't resist putting
that flying car joke in there!

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agovirtio_pci: drop Kconfig warnings
Michael S. Tsirkin [Thu, 15 Jan 2015 12:16:24 +0000 (14:16 +0200)]
virtio_pci: drop Kconfig warnings

The ABI *is* stable, and has been for a while now.
Drop Kconfig warning saying that it's not guaranteed
to work.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agovirtio_pci: Kconfig grammar fix
Michael S. Tsirkin [Thu, 15 Jan 2015 12:15:51 +0000 (14:15 +0200)]
virtio_pci: Kconfig grammar fix

This drivers -> this driver.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agovirtio_rng: drop extra empty line
Michael S. Tsirkin [Thu, 15 Jan 2015 11:50:06 +0000 (13:50 +0200)]
virtio_rng: drop extra empty line

makes code look a bit prettier.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agovirtio_ring: coding style fix
Michael S. Tsirkin [Thu, 15 Jan 2015 11:33:31 +0000 (13:33 +0200)]
virtio_ring: coding style fix

Most of our code has
struct foo {
}

Fix one instances where ring is inconsistent.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agovirtio_blk: coding style fixes
Michael S. Tsirkin [Thu, 15 Jan 2015 11:33:31 +0000 (13:33 +0200)]
virtio_blk: coding style fixes

Most of our code has
struct foo {
}

Fix two instances where blk is inconsistent.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agovirtio_balloon: coding style fixes
Michael S. Tsirkin [Thu, 15 Jan 2015 11:33:31 +0000 (13:33 +0200)]
virtio_balloon: coding style fixes

Most of our code has
struct foo {
}

Fix two instances where balloon is inconsistent.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agovirtio_pci_modern: support devices with no config
Michael S. Tsirkin [Tue, 13 Jan 2015 14:34:58 +0000 (16:34 +0200)]
virtio_pci_modern: support devices with no config

Virtio 1.0 spec lists device config as optional.
Set get/set callbacks to NULL. Drivers can check that
and fail gracefully.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agovirtio_pci_modern: reduce number of mappings
Michael S. Tsirkin [Wed, 14 Jan 2015 16:50:55 +0000 (18:50 +0200)]
virtio_pci_modern: reduce number of mappings

We don't know the # of VQs that drivers are going to use so it's hard to
predict how much memory we'll need to map. However, the relevant
capability does give us an upper limit.
If that's below a page, we can reduce the number of required
mappings by mapping it all once ahead of the time.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agovirtio_pci: macros for PCI layout offsets
Rusty Russell [Thu, 30 May 2013 06:59:32 +0000 (16:29 +0930)]
virtio_pci: macros for PCI layout offsets

QEMU wants it, so why not?  Trust, but verify.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
9 years agovirtio_pci: modern driver
Michael S. Tsirkin [Thu, 11 Dec 2014 11:59:51 +0000 (13:59 +0200)]
virtio_pci: modern driver

Lightly tested against qemu.

One thing *not* implemented here is separate mappings
for descriptor/avail/used rings. That's nice to have,
will be done later after we have core support.

This also exposes the PCI layout to userspace, and
adds macros for PCI layout offsets:

QEMU wants it, so why not?  Trust, but verify.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
9 years agovirtio-pci: define layout for virtio 1.0
Rusty Russell [Wed, 29 May 2013 02:22:22 +0000 (11:52 +0930)]
virtio-pci: define layout for virtio 1.0

Based on patches by Michael S. Tsirkin <mst@redhat.com>, but I found it
hard to follow so changed to use structures which are more
self-documenting.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
9 years agovirtio_pci: move probe/remove code to common
Michael S. Tsirkin [Tue, 13 Jan 2015 09:23:32 +0000 (11:23 +0200)]
virtio_pci: move probe/remove code to common

Most of initialization is device-independent.
Let's move it to common.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agovirtio_pci: drop useless del_vqs call
Sasha Levin [Fri, 2 Jan 2015 19:47:39 +0000 (14:47 -0500)]
virtio_pci: drop useless del_vqs call

Device VQs were getting freed twice: once in every device's removal
functions, and then again in virtio_pci_legacy_remove().  The ones in
devices are called first, so drop the useless second call.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agos390: add pci_iomap_range
Michael S. Tsirkin [Wed, 29 May 2013 02:22:21 +0000 (11:52 +0930)]
s390: add pci_iomap_range

Virtio drivers should map the part of the range they need, not
necessarily all of it.
To this end, support mapping ranges within BAR on s390.
Since multiple ranges can now be mapped within a BAR, we keep track of
the number of mappings created, and only clear out the mapping for a BAR
when this number reaches 0.

Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Tested-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agopci: add pci_iomap_range
Michael S. Tsirkin [Wed, 29 May 2013 02:22:21 +0000 (11:52 +0930)]
pci: add pci_iomap_range

Virtio drivers should map the part of the BAR they need, not necessarily
all of it.

Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agomn10300: drop dead code
Michael S. Tsirkin [Sun, 14 Dec 2014 12:54:17 +0000 (14:54 +0200)]
mn10300: drop dead code

pci-iomap.c was (apparently, mistakenly) reintroduced as part of
commit 83c2dc15ce824450e7044b9f90cd529c25747ae0
    MN10300: Handle cacheable PCI regions in pci_iomap()
probably as side-effect of forward-porting the patch
from an old kernel.

It's not really needed: the generic pci_iomap does the right thing here.

The new file isn't compiled so it's safe to drop.

Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Cc: trivial@kernel.org
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agovirtio/balloon: verify device has config space
Michael S. Tsirkin [Mon, 12 Jan 2015 14:23:37 +0000 (16:23 +0200)]
virtio/balloon: verify device has config space

Some devices might not implement config space access
(e.g. remoteproc used not to - before 3.9).
virtio/balloon needs config space access so make it
fail gracefully if not there.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agovirtio/scsi: verify device has config space
Michael S. Tsirkin [Mon, 12 Jan 2015 14:23:37 +0000 (16:23 +0200)]
virtio/scsi: verify device has config space

Some devices might not implement config space access
(e.g. remoteproc used not to - before 3.9).
virtio/scsi needs config space access so make it
fail gracefully if not there.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agovirtio/net: verify device has config space
Michael S. Tsirkin [Mon, 12 Jan 2015 14:23:37 +0000 (16:23 +0200)]
virtio/net: verify device has config space

Some devices might not implement config space access
(e.g. remoteproc used not to - before 3.9).
virtio/net needs config space access so make it
fail gracefully if not there.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agovirtio/console: verify device has config space
Michael S. Tsirkin [Mon, 12 Jan 2015 14:23:37 +0000 (16:23 +0200)]
virtio/console: verify device has config space

Some devices might not implement config space access
(e.g. remoteproc used not to - before 3.9).
virtio/console needs config space access so make it
fail gracefully if not there.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agovirtio/blk: verify device has config space
Michael S. Tsirkin [Mon, 12 Jan 2015 14:23:37 +0000 (16:23 +0200)]
virtio/blk: verify device has config space

Some devices might not implement config space access
(e.g. remoteproc used not to - before 3.9).
virtio/blk needs config space access so make it
fail gracefully if not there.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agovirtio/9p: verify device has config space
Michael S. Tsirkin [Mon, 12 Jan 2015 14:23:37 +0000 (16:23 +0200)]
virtio/9p: verify device has config space

Some devices might not implement config space access
(e.g. remoteproc used not to - before 3.9).
virtio/9p needs config space access so make it
fail gracefully if not there.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agovirtio_pci: drop virtio_config dependency
Michael S. Tsirkin [Sun, 28 Dec 2014 10:35:05 +0000 (12:35 +0200)]
virtio_pci: drop virtio_config dependency

virtio_pci does not depend on virtio_config:
let's not include it, users can pull it in as necessary.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9 years agoMerge branch 'for-3.19-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj...
Linus Torvalds [Tue, 20 Jan 2015 19:54:16 +0000 (07:54 +1200)]
Merge branch 'for-3.19-fixes' of git://git./linux/kernel/git/tj/libata

Pull libata fixes from Tejun Heo:

 - Bartlomiej will be co-maintaining PATA portion of libata.  git
   workflow will stay the same.

 - sata_sil24 wasn't happy with tag ordered submission.  An option to
   restore the old tag allocation behavior is implemented for sil24.

 - a very old race condition in PIO host state machine which can trigger
   BUG fixed.

 - other driver-specific changes

* 'for-3.19-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  libata: prevent HSM state change race between ISR and PIO
  libata: allow sata_sil24 to opt-out of tag ordered submission
  ata: pata_at91: depend on !ARCH_MULTIPLATFORM
  ahci: Remove Device ID for Intel Sunrise Point PCH
  ahci: Use dev_info() to inform about the lack of Device Sleep support
  libata: Whitelist SSDs that are known to properly return zeroes after TRIM
  sata_dwc_460ex: fix resource leak on error path
  ata: add MAINTAINERS entry for libata PATA drivers
  libata: clean up MAINTAINERS entries
  libata: export ata_get_cmd_descript()
  ahci_xgene: Fix the DMA state machine lockup for the ATA_CMD_PACKET PIO mode command.
  ahci_xgene: Fix the endianess issue in APM X-Gene SoC AHCI SATA controller driver.

9 years agoMerge branch 'for-3.19-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Linus Torvalds [Tue, 20 Jan 2015 19:51:46 +0000 (07:51 +1200)]
Merge branch 'for-3.19-fixes' of git://git./linux/kernel/git/tj/wq

Pull workqueue fix from Tejun Heo:
 "The xfs folks have been running into weird and very rare lockups for
  some time now.  I didn't think this could have been from workqueue
  side because no one else was reporting it.  This time, Eric had a
  kdump which we looked into and it turned out this actually was a
  workqueue bug and the bug has been there since the beginning of
  concurrency managed workqueue.

  A worker pool ensures forward progress of the workqueues associated
  with it by always having at least one worker reserved from executing
  work items.  When the pool is under contention, the idle one tries to
  create more workers for the pool and if that doesn't succeed quickly
  enough, it calls the rescuers to the pool.

  This logic had a subtle race condition in an early exit path.  When a
  worker invokes this manager function, the function may return %false
  indicating that the caller may proceed to executing work items either
  because another worker is already performing the role or conditions
  have changed and the pool is no longer under contention.

  The latter part depended on the assumption that whether more workers
  are necessary or not remains stable while the pool is locked; however,
  pool->nr_running (concurrency count) may change asynchronously and it
  getting bumped from zero asynchronously could send off the last idle
  worker to execute work items.

  The race window is fairly narrow, and, even when it gets triggered,
  the pool deadlocks iff if all work items get blocked on pending work
  items of the pool, which is highly unlikely but can be triggered by
  xfs.

  The patch removes the race window by removing the early exit path,
  which doesn't server any purpose anymore anyway"

* 'for-3.19-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: fix subtle pool management issue which can stall whole worker_pool

9 years agoMerge tag 'pinctrl-v3.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw...
Linus Torvalds [Tue, 20 Jan 2015 09:23:41 +0000 (21:23 +1200)]
Merge tag 'pinctrl-v3.19-3' of git://git./linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
 "Here is a (hopefully final) slew of pin control fixes for the v3.19
  series.  The deadlock fix is kind of serious and tagged for stable,
  the rest is business as usual.

   - Fix two deadlocks around the pin control mutexes, a long-standing
     issue that manifest itself in plug/unplug of pin controllers.
     (Tagged for stable.)

   - Handle an error path with zero functions in the Qualcomm pin
     controller.

   - Drop a bogus second GPIO chip added in the Lantiq driver.

   - Fix sudden IRQ loss on Rockchip pin controllers.

   - Register the GIT tree in MAINTAINERS"

* tag 'pinctrl-v3.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: MAINTAINERS: add git tree reference
  pinctrl: qcom: Don't iterate past end of function array
  pinctrl: lantiq: remove bogus of_gpio_chip_add
  pinctrl: Fix two deadlocks
  pinctrl: rockchip: Avoid losing interrupts when supporting both edges

9 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Tue, 20 Jan 2015 06:19:31 +0000 (18:19 +1200)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Socket addresses returned in the error queue need to be fully
    initialized before being passed on to userspace, fix from Willem de
    Bruijn.

 2) Interrupt handling fixes to davinci_emac driver from Tony Lindgren.

 3) Fix races between receive packet steering and cpu hotplug, from Eric
    Dumazet.

 4) Allowing netlink sockets to subscribe to unknown multicast groups
    leads to crashes, don't allow it.  From Johannes Berg.

 5) One to many socket races in SCTP fixed by Daniel Borkmann.

 6) Put in a guard against the mis-use of ipv6 atomic fragments, from
    Hagen Paul Pfeifer.

 7) Fix promisc mode and ethtool crashes in sh_eth driver, from Ben
    Hutchings.

 8) NULL deref and double kfree fix in sxgbe driver from Girish K.S and
    Byungho An.

 9) cfg80211 deadlock fix from Arik Nemtsov.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (36 commits)
  s2io: use snprintf() as a safety feature
  r8152: remove sram_read
  r8152: remove generic_ocp_read before writing
  bgmac: activate irqs only if there is nothing to poll
  bgmac: register napi before the device
  sh_eth: Fix ethtool operation crash when net device is down
  sh_eth: Fix promiscuous mode on chips without TSU
  ipv6: stop sending PTB packets for MTU < 1280
  net: sctp: fix race for one-to-many sockets in sendmsg's auto associate
  genetlink: synchronize socket closing and family removal
  genetlink: disallow subscribing to unknown mcast groups
  genetlink: document parallel_ops
  net: rps: fix cpu unplug
  net: davinci_emac: Add support for emac on dm816x
  net: davinci_emac: Fix ioremap for devices with MDIO within the EMAC address space
  net: davinci_emac: Fix incomplete code for getting the phy from device tree
  net: davinci_emac: Free clock after checking the frequency
  net: davinci_emac: Fix runtime pm calls for davinci_emac
  net: davinci_emac: Fix hangs with interrupts
  ip: zero sockaddr returned on error queue
  ...

9 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Tue, 20 Jan 2015 06:17:34 +0000 (18:17 +1200)]
Merge git://git./linux/kernel/git/herbert/crypto-2.6

Pull crypto fix from Herbert Xu:
 "This fixes a regression that arose from the change to add a crypto
  prefix to module names which was done to prevent the loading of
  arbitrary modules through the Crypto API.

  In particular, a number of modules were missing the crypto prefix
  which meant that they could no longer be autoloaded"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: add missing crypto module aliases

9 years agos2io: use snprintf() as a safety feature
Dan Carpenter [Mon, 19 Jan 2015 19:34:51 +0000 (22:34 +0300)]
s2io: use snprintf() as a safety feature

"sp->desc[i]" has 25 characters.  "dev->name" has 15 characters.  If we
used all 15 characters then the sprintf() would overflow.

I changed the "sprintf(sp->name, "%s Neterion %s"" to snprintf(), as
well, even though it can't overflow just to be consistent.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'r8152'
David S. Miller [Mon, 19 Jan 2015 21:16:36 +0000 (16:16 -0500)]
Merge branch 'r8152'

Hayes Wang says:

====================
r8152: couldn't read OCP_SRAM_DATA

Read OCP_SRAM_DATA would read additional bytes and may let
the hw abnormal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agor8152: remove sram_read
hayeswang [Mon, 19 Jan 2015 09:02:46 +0000 (17:02 +0800)]
r8152: remove sram_read

Read OCP register 0xa43a~0xa43b would clear some flags which the hw
would use, and it may let the device lost. However, the unit of
reading is 4 bytes. That is, it would read 0xa438~0xa43b when calling
sram_read() to read OCP_SRAM_DATA.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agor8152: remove generic_ocp_read before writing
hayeswang [Mon, 19 Jan 2015 09:02:45 +0000 (17:02 +0800)]
r8152: remove generic_ocp_read before writing

For ocp_write_word() and ocp_write_byte(), there is a generic_ocp_read()
which is used to read the whole 4 byte data, keep the unchanged bytes,
and modify the expected bytes. However, the "byen" could be used to
determine which bytes of the 4 bytes to write, so the action could be
removed.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'bgmac'
David S. Miller [Mon, 19 Jan 2015 21:00:02 +0000 (16:00 -0500)]
Merge branch 'bgmac'

Hauke Mehrtens says:

====================
bgmac: some fixes to napi usage

I compared the napi documentation with the bgmac driver and found some
problems in that driver. These two patches should fix the problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobgmac: activate irqs only if there is nothing to poll
Hauke Mehrtens [Sun, 18 Jan 2015 18:49:59 +0000 (19:49 +0100)]
bgmac: activate irqs only if there is nothing to poll

IRQs should only get activated when there is nothing to poll in the
queue any more and to after every poll.

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agobgmac: register napi before the device
Hauke Mehrtens [Sun, 18 Jan 2015 18:49:58 +0000 (19:49 +0100)]
bgmac: register napi before the device

napi should get registered before the netdev and not after.

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge branch 'sh_eth'
David S. Miller [Mon, 19 Jan 2015 20:37:44 +0000 (15:37 -0500)]
Merge branch 'sh_eth'

Ben Hutchings says:

====================
sh_eth fixes

I'm currently looking at Ethernet support on the R-Car H2 chip,
reviewing and testing the sh_eth driver.  Here are fixes for two fairly
obvious bugs in the driver; I will probably have some more later.

These are not tested on any of the other supported chips.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agosh_eth: Fix ethtool operation crash when net device is down
Ben Hutchings [Fri, 16 Jan 2015 17:51:25 +0000 (17:51 +0000)]
sh_eth: Fix ethtool operation crash when net device is down

The driver connects and disconnects the PHY device whenever the
net device is brought up and down.  The ethtool get_settings,
set_settings and nway_reset operations will dereference a null
or dangling pointer if called while it is down.

I think it would be preferable to keep the PHY connected, but there
may be good reasons not to.

As an immediate fix for this bug:
- Set the phydev pointer to NULL after disconnecting the PHY
- Change those three operations to return -ENODEV while the PHY is
  not connected

Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agosh_eth: Fix promiscuous mode on chips without TSU
Ben Hutchings [Fri, 16 Jan 2015 17:51:12 +0000 (17:51 +0000)]
sh_eth: Fix promiscuous mode on chips without TSU

Currently net_device_ops::set_rx_mode is only implemented for
chips with a TSU (multiple address table).  However we do need
to turn the PRM (promiscuous) flag on and off for other chips.

- Remove the unlikely() from the TSU functions that we may safely
  call for chips without a TSU
- Make setting of the MCT flag conditional on the tsu capability flag
- Rename sh_eth_set_multicast_list() to sh_eth_set_rx_mode() and plumb
  it into both net_device_ops structures
- Remove the previously-unreachable branch in sh_eth_rx_mode() that
  would otherwise reset the flags to defaults for non-TSU chips

Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoipv6: stop sending PTB packets for MTU < 1280
Hagen Paul Pfeifer [Thu, 15 Jan 2015 21:34:25 +0000 (22:34 +0100)]
ipv6: stop sending PTB packets for MTU < 1280

Reduce the attack vector and stop generating IPv6 Fragment Header for
paths with an MTU smaller than the minimum required IPv6 MTU
size (1280 byte) - called atomic fragments.

See IETF I-D "Deprecating the Generation of IPv6 Atomic Fragments" [1]
for more information and how this "feature" can be misused.

[1] https://tools.ietf.org/html/draft-ietf-6man-deprecate-atomfrag-generation-00

Signed-off-by: Fernando Gont <fgont@si6networks.com>
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agolibata: prevent HSM state change race between ISR and PIO
David Jeffery [Mon, 19 Jan 2015 19:03:25 +0000 (13:03 -0600)]
libata: prevent HSM state change race between ISR and PIO

It is possible for ata_sff_flush_pio_task() to set ap->hsm_task_state to
HSM_ST_IDLE in between the time __ata_sff_port_intr() checks for HSM_ST_IDLE
and before it calls ata_sff_hsm_move() causing ata_sff_hsm_move() to BUG().

This problem is hard to reproduce making this patch hard to verify, but this
fix will prevent the race.

I have not been able to reproduce the problem, but here is a crash dump from
a 2.6.32 kernel.

On examining the ata port's state, its hsm_task_state field has a value of HSM_ST_IDLE:

crash> struct ata_port.hsm_task_state ffff881c1121c000
  hsm_task_state = 0

Normally, this should not be possible as ata_sff_hsm_move() was called from ata_sff_host_intr(),
which checks hsm_task_state and won't call ata_sff_hsm_move() if it has a HSM_ST_IDLE value.

PID: 11053  TASK: ffff8816e846cae0  CPU: 0   COMMAND: "sshd"
 #0 [ffff88008ba03960] machine_kexec at ffffffff81038f3b
 #1 [ffff88008ba039c0] crash_kexec at ffffffff810c5d92
 #2 [ffff88008ba03a90] oops_end at ffffffff8152b510
 #3 [ffff88008ba03ac0] die at ffffffff81010e0b
 #4 [ffff88008ba03af0] do_trap at ffffffff8152ad74
 #5 [ffff88008ba03b50] do_invalid_op at ffffffff8100cf95
 #6 [ffff88008ba03bf0] invalid_op at ffffffff8100bf9b
    [exception RIP: ata_sff_hsm_move+317]
    RIP: ffffffff813a77ad  RSP: ffff88008ba03ca0  RFLAGS: 00010097
    RAX: 0000000000000000  RBX: ffff881c1121dc60  RCX: 0000000000000000
    RDX: ffff881c1121dd10  RSI: ffff881c1121dc60  RDI: ffff881c1121c000
    RBP: ffff88008ba03d00   R8: 0000000000000000   R9: 000000000000002e
    R10: 000000000001003f  R11: 000000000000009b  R12: ffff881c1121c000
    R13: 0000000000000000  R14: 0000000000000050  R15: ffff881c1121dd78
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff88008ba03d08] ata_sff_host_intr at ffffffff813a7fbd
 #8 [ffff88008ba03d38] ata_sff_interrupt at ffffffff813a821e
 #9 [ffff88008ba03d78] handle_IRQ_event at ffffffff810e6ec0
--- <IRQ stack> ---
    [exception RIP: pipe_poll+48]
    RIP: ffffffff81192780  RSP: ffff880f26d459b8  RFLAGS: 00000246
    RAX: 0000000000000000  RBX: ffff880f26d459c8  RCX: 0000000000000000
    RDX: 0000000000000001  RSI: 0000000000000000  RDI: ffff881a0539fa80
    RBP: ffffffff8100bb8e   R8: ffff8803b23324a0   R9: 0000000000000000
    R10: ffff880f26d45dd0  R11: 0000000000000008  R12: ffffffff8109b646
    R13: ffff880f26d45948  R14: 0000000000000246  R15: 0000000000000246
    ORIG_RAX: ffffffffffffff10  CS: 0010  SS: 0018
    RIP: 00007f26017435c3  RSP: 00007fffe020c420  RFLAGS: 00000206
    RAX: 0000000000000017  RBX: ffffffff8100b072  RCX: 00007fffe020c45c
    RDX: 00007f2604a3f120  RSI: 00007f2604a3f140  RDI: 000000000000000d
    RBP: 0000000000000000   R8: 00007fffe020e570   R9: 0101010101010101
    R10: 0000000000000000  R11: 0000000000000246  R12: 00007fffe020e5f0
    R13: 00007fffe020e5f4  R14: 00007f26045f373c  R15: 00007fffe020e5e0
    ORIG_RAX: 0000000000000017  CS: 0033  SS: 002b

Somewhere between the ata_sff_hsm_move() check and the ata_sff_host_intr() check, the value changed.
On examining the other cpus to see what else was running, another cpu was running the error handler
routines:

PID: 326    TASK: ffff881c11014aa0  CPU: 1   COMMAND: "scsi_eh_1"
 #0 [ffff88008ba27e90] crash_nmi_callback at ffffffff8102fee6
 #1 [ffff88008ba27ea0] notifier_call_chain at ffffffff8152d515
 #2 [ffff88008ba27ee0] atomic_notifier_call_chain at ffffffff8152d57a
 #3 [ffff88008ba27ef0] notify_die at ffffffff810a154e
 #4 [ffff88008ba27f20] do_nmi at ffffffff8152b1db
 #5 [ffff88008ba27f50] nmi at ffffffff8152aaa0
    [exception RIP: _spin_lock_irqsave+47]
    RIP: ffffffff8152a1ff  RSP: ffff881c11a73aa0  RFLAGS: 00000006
    RAX: 0000000000000001  RBX: ffff881c1121deb8  RCX: 0000000000000000
    RDX: 0000000000000246  RSI: 0000000000000020  RDI: ffff881c122612d8
    RBP: ffff881c11a73aa0   R8: ffff881c17083800   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: ffff881c1121c000
    R13: 000000000000001f  R14: ffff881c1121dd50  R15: ffff881c1121dc60
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
--- <NMI exception stack> ---
 #6 [ffff881c11a73aa0] _spin_lock_irqsave at ffffffff8152a1ff
 #7 [ffff881c11a73aa8] ata_exec_internal_sg at ffffffff81396fb5
 #8 [ffff881c11a73b58] ata_exec_internal at ffffffff81397109
 #9 [ffff881c11a73bd8] atapi_eh_request_sense at ffffffff813a34eb

Before it tried to acquire a spinlock, ata_exec_internal_sg() called ata_sff_flush_pio_task().
This function will set ap->hsm_task_state to HSM_ST_IDLE, and has no locking around setting this
value. ata_sff_flush_pio_task() can then race with the interrupt handler and potentially set
HSM_ST_IDLE at a fatal moment, which will trigger a kernel BUG.

v2: Fixup comment in ata_sff_flush_pio_task()

tj: Further updated comment.  Use ap->lock instead of shost lock and
    use the [un]lock_irq variant instead of the irqsave/restore one.

Signed-off-by: David Milburn <dmilburn@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
9 years agolibata: allow sata_sil24 to opt-out of tag ordered submission
Dan Williams [Fri, 16 Jan 2015 23:13:02 +0000 (15:13 -0800)]
libata: allow sata_sil24 to opt-out of tag ordered submission

Ronny reports: https://bugzilla.kernel.org/show_bug.cgi?id=87101
    "Since commit 8a4aeec8d "libata/ahci: accommodate tag ordered
    controllers" the access to the harddisk on the first SATA-port is
    failing on its first access. The access to the harddisk on the
    second port is working normal.

    When reverting the above commit, access to both harddisks is working
    fine again."

Maintain tag ordered submission as the default, but allow sata_sil24 to
continue with the old behavior.

Cc: <stable@vger.kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Reported-by: Ronny Hegewald <Ronny.Hegewald@online.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
9 years agopinctrl: MAINTAINERS: add git tree reference
Linus Walleij [Mon, 19 Jan 2015 10:27:19 +0000 (11:27 +0100)]
pinctrl: MAINTAINERS: add git tree reference

Reference my pinctrl GIT tree @kernel.org

Reported-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
9 years agopinctrl: qcom: Don't iterate past end of function array
Stephen Boyd [Mon, 19 Jan 2015 10:17:45 +0000 (11:17 +0100)]
pinctrl: qcom: Don't iterate past end of function array

Timur reports that this code crashes if nfunctions is 0. Fix the
loop iteration to only consider valid elements of the functions
array.

Reported-by: Timur Tabi <timur@codeaurora.org>
Cc: Pramod Gurav <pramod.gurav@smartplayin.com>
Cc: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Cc: Ivan T. Ivanov <iivanov@mm-sol.com>
Cc: Andy Gross <agross@codeaurora.org>
Fixes: 327455817a92 "pinctrl: qcom: Add support for reset for apq8064"
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
9 years agoMerge tag 'gpio-v3.19-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw...
Linus Torvalds [Sun, 18 Jan 2015 17:03:13 +0000 (05:03 +1200)]
Merge tag 'gpio-v3.19-4' of git://git./linux/kernel/git/linusw/linux-gpio

Pull GPIO fixes from Linus Walleij:
 "Here is a set of fixes that mainly appeared when Johan Hovold started
  exercising the removal path of the GPIO library, dealing with
  hotplugging of GPIO controllers. Details from tag:

  A slew of fixes dealing with some irritating bugs (non-regressions)
  that have been around forever in the GPIO subsystem, most of them also
  tagged for stable:

   - A large slew of fixes from Johan Hovold who is finally testing and
     reviewing the removal path of the GPIO drivers.

   - Fix of_get_named_gpiod_flags() so it works as expected.

   - Fix an IRQ handling bug in the crystalcove driver"

* tag 'gpio-v3.19-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpiolib: of: Correct error handling in of_get_named_gpiod_flags
  gpio: sysfs: fix gpio attribute-creation race
  gpio: sysfs: fix gpio device-attribute leak
  gpio: sysfs: fix gpio-chip device-attribute leak
  gpio: unregister gpiochip device before removing it
  gpio: fix sleep-while-atomic in gpiochip_remove
  gpio: fix memory leak and sleep-while-atomic
  gpio: clean up gpiochip_add error handling
  gpio: fix gpio-chip list corruption
  gpio: fix memory and reference leaks in gpiochip_add error path
  gpio: crystalcove: use handle_nested_irq

9 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Linus Torvalds [Sun, 18 Jan 2015 16:55:23 +0000 (04:55 +1200)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

Pull input subsystem fixes from Dmitry Torokhov.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: uinput - fix ioctl nr overflow for UI_GET_SYSNAME/VERSION
  Input: I8042 - add Acer Aspire 7738 to the nomux list
  Input: elantech - support new ICs types for version 4
  Input: i8042 - reset keyboard to fix Elantech touchpad detection
  MAINTAINERS: remove Dmitry Torokhov's alternate address

9 years agoLinux 3.19-rc5 v3.19-rc5
Linus Torvalds [Sun, 18 Jan 2015 06:02:20 +0000 (18:02 +1200)]
Linux 3.19-rc5

9 years agoMerge tag 'armsoc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm...
Linus Torvalds [Sun, 18 Jan 2015 06:00:40 +0000 (18:00 +1200)]
Merge tag 'armsoc-for-linus' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "We've been sitting on our fixes branch for a while, so this batch is
  unfortunately on the large side.

  A lot of these are tweaks and fixes to device trees, fixing various
  bugs around clocks, reg ranges, etc.  There's also a few defconfig
  updates (which are on the late side, no more of those).

  All in all the diffstat is bigger than ideal at this time, but nothing
  in here seems particularly risky"

* tag 'armsoc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (31 commits)
  reset: sunxi: fix spinlock initialization
  ARM: dts: disable CCI on exynos5420 based arndale-octa
  drivers: bus: check cci device tree node status
  ARM: rockchip: disable jtag/sdmmc autoswitching on rk3288
  ARM: nomadik: fix up leftover device tree pins
  ARM: at91: board-dt-sama5: add phy_fixup to override NAND_Tree
  ARM: at91/dt: sam9263: Add missing clocks to lcdc node
  ARM: at91: sama5d3: dt: correct the sound route
  ARM: at91/dt: sama5d4: fix the timer reg length
  ARM: exynos_defconfig: Enable LM90 driver
  ARM: exynos_defconfig: Enable options for display panel support
  arm: dts: Use pmu_system_controller phandle for dp phy
  ARM: shmobile: sh73a0 legacy: Set .control_parent for all irqpin instances
  ARM: dts: berlin: correct BG2Q's SM GPIO location.
  ARM: dts: berlin: add broken-cd and set bus width for eMMC in Marvell DMP DT
  ARM: dts: berlin: fix io clk and add missing core clk for BG2Q sdhci2 host
  ARM: dts: Revert disabling of smc91x for n900
  ARM: dts: imx51-babbage: Fix ULPI PHY reset modelling
  ARM: dts: dra7-evm: fix qspi device tree partition size
  ARM: omap2plus_defconfig: use CONFIG_CPUFREQ_DT
  ...

9 years agonet: sctp: fix race for one-to-many sockets in sendmsg's auto associate
Daniel Borkmann [Thu, 15 Jan 2015 15:34:35 +0000 (16:34 +0100)]
net: sctp: fix race for one-to-many sockets in sendmsg's auto associate

I.e. one-to-many sockets in SCTP are not required to explicitly
call into connect(2) or sctp_connectx(2) prior to data exchange.
Instead, they can directly invoke sendmsg(2) and the SCTP stack
will automatically trigger connection establishment through 4WHS
via sctp_primitive_ASSOCIATE(). However, this in its current
implementation is racy: INIT is being sent out immediately (as
it cannot be bundled anyway) and the rest of the DATA chunks are
queued up for later xmit when connection is established, meaning
sendmsg(2) will return successfully. This behaviour can result
in an undesired side-effect that the kernel made the application
think the data has already been transmitted, although none of it
has actually left the machine, worst case even after close(2)'ing
the socket.

Instead, when the association from client side has been shut down
e.g. first gracefully through SCTP_EOF and then close(2), the
client could afterwards still receive the server's INIT_ACK due
to a connection with higher latency. This INIT_ACK is then considered
out of the blue and hence responded with ABORT as there was no
alive assoc found anymore. This can be easily reproduced f.e.
with sctp_test application from lksctp. One way to fix this race
is to wait for the handshake to actually complete.

The fix defers waiting after sctp_primitive_ASSOCIATE() and
sctp_primitive_SEND() succeeded, so that DATA chunks cooked up
from sctp_sendmsg() have already been placed into the output
queue through the side-effect interpreter, and therefore can then
be bundeled together with COOKIE_ECHO control chunks.

strace from example application (shortened):

socket(PF_INET, SOCK_SEQPACKET, IPPROTO_SCTP) = 3
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
           msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
           msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
           msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
           msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
           msg_iov(0)=[], msg_controllen=48, {cmsg_len=48, cmsg_level=0x84 /* SOL_??? */, cmsg_type=, ...},
           msg_flags=0}, 0) = 0 // graceful shutdown for SOCK_SEQPACKET via SCTP_EOF
close(3) = 0

tcpdump before patch (fooling the application):

22:33:36.306142 IP 192.168.1.114.41462 > 192.168.1.115.8888: sctp (1) [INIT] [init tag: 3879023686] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 3139201684]
22:33:36.316619 IP 192.168.1.115.8888 > 192.168.1.114.41462: sctp (1) [INIT ACK] [init tag: 3345394793] [rwnd: 106496] [OS: 10] [MIS: 10] [init TSN: 3380109591]
22:33:36.317600 IP 192.168.1.114.41462 > 192.168.1.115.8888: sctp (1) [ABORT]

tcpdump after patch:

14:28:58.884116 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [INIT] [init tag: 438593213] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 3092969729]
14:28:58.888414 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [INIT ACK] [init tag: 381429855] [rwnd: 106496] [OS: 10] [MIS: 10] [init TSN: 2141904492]
14:28:58.888638 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [COOKIE ECHO] , (2) [DATA] (B)(E) [TSN: 3092969729] [...]
14:28:58.893278 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [COOKIE ACK] , (2) [SACK] [cum ack 3092969729] [a_rwnd 106491] [#gap acks 0] [#dup tsns 0]
14:28:58.893591 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [DATA] (B)(E) [TSN: 3092969730] [...]
14:28:59.096963 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SACK] [cum ack 3092969730] [a_rwnd 106496] [#gap acks 0] [#dup tsns 0]
14:28:59.097086 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [DATA] (B)(E) [TSN: 3092969731] [...] , (2) [DATA] (B)(E) [TSN: 3092969732] [...]
14:28:59.103218 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SACK] [cum ack 3092969732] [a_rwnd 106486] [#gap acks 0] [#dup tsns 0]
14:28:59.103330 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [SHUTDOWN]
14:28:59.107793 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SHUTDOWN ACK]
14:28:59.107890 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [SHUTDOWN COMPLETE]

Looks like this bug is from the pre-git history museum. ;)

Fixes: 08707d5482df ("lksctp-2_5_31-0_5_1.patch")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9 years agoMerge tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mike.turquette/linux
Linus Torvalds [Sun, 18 Jan 2015 03:29:11 +0000 (15:29 +1200)]
Merge tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mike.turquette/linux

Pull clock driver fixes from Mike Turquette:
 "Small number of fixes for clock drivers and a single null pointer
  dereference fix in the framework core code.

  The driver fixes vary from fixing section mismatch warnings to
  preventing machines from hanging (and preventing developers from
  crying)"

* tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mike.turquette/linux:
  clk: fix possible null pointer dereference
  Revert "clk: ppc-corenet: Fix Section mismatch warning"
  clk: rockchip: fix deadlock possibility in cpuclk
  clk: berlin: bg2q: remove non-exist "smemc" gate clock
  clk: at91: keep slow clk enabled to prevent system hang
  clk: rockchip: fix rk3288 cpuclk core dividers
  clk: rockchip: fix rk3066 pll lock bit location
  clk: rockchip: Fix clock gate for rk3188 hclk_emem_peri
  clk: rockchip: add CLK_IGNORE_UNUSED flag to fix rk3066/rk3188 USB Host