S: Maintained
F: drivers/hwmon/abituguru3.c
+ACCES 104-IDIO-16 GPIO DRIVER
+M: "William Breathitt Gray" <vilhelm.gray@gmail.com>
+L: linux-gpio@vger.kernel.org
+S: Maintained
+F: drivers/gpio/gpio-104-idio-16.c
+
ACENIC DRIVER
M: Jes Sorensen <jes@trained-monkey.org>
L: linux-acenic@sunsite.dk
F: drivers/gpu/drm/radeon/radeon_kfd.h
F: include/uapi/linux/kfd_ioctl.h
-AMD MICROCODE UPDATE SUPPORT
-M: Borislav Petkov <bp@alien8.de>
-S: Maintained
-F: arch/x86/kernel/cpu/microcode/amd*
-
AMD XGBE DRIVER
M: Tom Lendacky <thomas.lendacky@amd.com>
L: netdev@vger.kernel.org
ARM PMU PROFILING AND DEBUGGING
M: Will Deacon <will.deacon@arm.com>
+R: Mark Rutland <mark.rutland@arm.com>
S: Maintained
-F: arch/arm/kernel/perf_*
+F: arch/arm*/kernel/perf_*
F: arch/arm/oprofile/common.c
-F: arch/arm/kernel/hw_breakpoint.c
-F: arch/arm/include/asm/hw_breakpoint.h
-F: arch/arm/include/asm/perf_event.h
+F: arch/arm*/kernel/hw_breakpoint.c
+F: arch/arm*/include/asm/hw_breakpoint.h
+F: arch/arm*/include/asm/perf_event.h
F: drivers/perf/arm_pmu.c
F: include/linux/perf/arm_pmu.h
L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
S: Maintained
-ARM/Allwinner A1X SoC support
+ARM/Allwinner sunXi SoC support
M: Maxime Ripard <maxime.ripard@free-electrons.com>
+M: Chen-Yu Tsai <wens@csie.org>
L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
S: Maintained
-N: sun[x4567]i
+N: sun[x456789]i
ARM/Allwinner SoC Clock Support
M: Emilio López <emilio@elopez.com.ar>
N: mtk
K: mediatek
+ARM/Mediatek USB3 PHY DRIVER
+M: Chunfeng Yun <chunfeng.yun@mediatek.com>
+L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
+L: linux-mediatek@lists.infradead.org (moderated for non-subscribers)
+S: Maintained
+F: drivers/phy/phy-mt65xx-usb3.c
+
ARM/MICREL KS8695 ARCHITECTURE
M: Greg Ungerer <gerg@uclinux.org>
L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
S: Maintained
F: drivers/media/platform/s5p-tv/
+ARM/SAMSUNG S5P SERIES JPEG CODEC SUPPORT
+M: Andrzej Pietrasiewicz <andrzej.p@samsung.com>
+M: Jacek Anaszewski <j.anaszewski@samsung.com>
+L: linux-arm-kernel@lists.infradead.org
+L: linux-media@vger.kernel.org
+S: Maintained
+F: drivers/media/platform/s5p-jpeg/
+
ARM/SHMOBILE ARM ARCHITECTURE
M: Simon Horman <horms@verge.net.au>
M: Magnus Damm <magnus.damm@gmail.com>
S: Maintained
F: arch/arm/mach-sti/
F: arch/arm/boot/dts/sti*
+F: drivers/char/hw_random/st-rng.c
F: drivers/clocksource/arm_global_timer.c
F: drivers/clocksource/clksrc_st_lpc.c
F: drivers/i2c/busses/i2c-st.c
F: Documentation/aoe/
F: drivers/block/aoe/
+ATHEROS 71XX/9XXX GPIO DRIVER
+M: Alban Bedel <albeu@free.fr>
+W: https://github.com/AlbanBedel/linux
+T: git git://github.com/AlbanBedel/linux
+S: Maintained
+F: drivers/gpio/gpio-ath79.c
+F: Documentation/devicetree/bindings/gpio/gpio-ath79.txt
+
ATHEROS ATH GENERIC UTILITIES
M: "Luis R. Rodriguez" <mcgrof@do-not-panic.com>
L: linux-wireless@vger.kernel.org
F: drivers/net/ethernet/cisco/enic/
CISCO VIC LOW LATENCY NIC DRIVER
- M: Upinder Malhi <umalhi@cisco.com>
+ M: Christian Benvenuti <benve@cisco.com>
+ M: Dave Goodell <dgoodell@cisco.com>
S: Supported
- F: drivers/infiniband/hw/usnic
+ F: drivers/infiniband/hw/usnic/
CIRRUS LOGIC EP93XX ETHERNET DRIVER
M: Hartley Sweeten <hsweeten@visionengravers.com>
F: Documentation/powerpc/cxl.txt
F: Documentation/ABI/testing/sysfs-class-cxl
+CXLFLASH (IBM Coherent Accelerator Processor Interface CAPI Flash) SCSI DRIVER
+M: Manoj N. Kumar <manoj@linux.vnet.ibm.com>
+M: Matthew R. Ochs <mrochs@linux.vnet.ibm.com>
+L: linux-scsi@vger.kernel.org
+S: Supported
+F: drivers/scsi/cxlflash/
+F: include/uapi/scsi/cxlflash_ioctls.h
+F: Documentation/powerpc/cxlflash.txt
+
STMMAC ETHERNET DRIVER
M: Giuseppe Cavallaro <peppe.cavallaro@st.com>
L: netdev@vger.kernel.org
W: http://www.dialog-semiconductor.com/products
S: Supported
F: Documentation/hwmon/da90??
+F: Documentation/devicetree/bindings/sound/da[79]*.txt
F: drivers/gpio/gpio-da90??.c
F: drivers/hwmon/da90??-hwmon.c
F: drivers/iio/adc/da91??-*.c
L: linux-doc@vger.kernel.org
S: Maintained
F: Documentation/
+F: scripts/docproc.c
+F: scripts/kernel-doc*
X: Documentation/ABI/
X: Documentation/devicetree/
X: Documentation/acpi
X: Documentation/power
X: Documentation/spi
X: Documentation/DocBook/media
-T: git git://git.lwn.net/linux-2.6.git docs-next
+T: git git://git.lwn.net/linux.git docs-next
DOUBLETALK DRIVER
M: "James R. Van Zandt" <jrv@vanzandt.mv.com>
F: drivers/gpu/drm/drm_panel.c
F: drivers/gpu/drm/panel/
F: include/drm/drm_panel.h
-F: Documentation/devicetree/bindings/panel/
+F: Documentation/devicetree/bindings/display/panel/
INTEL DRM DRIVERS (excluding Poulsbo, Moorestown and derivative chipsets)
M: Daniel Vetter <daniel.vetter@intel.com>
F: include/drm/i915*
F: include/uapi/drm/i915*
+DRM DRIVERS FOR ATMEL HLCDC
+M: Boris Brezillon <boris.brezillon@free-electrons.com>
+L: dri-devel@lists.freedesktop.org
+S: Supported
+F: drivers/gpu/drm/atmel-hlcdc/
+F: Documentation/devicetree/bindings/drm/atmel/
+
DRM DRIVERS FOR EXYNOS
M: Inki Dae <inki.dae@samsung.com>
M: Joonyoung Shim <jy0922.shim@samsung.com>
L: dri-devel@lists.freedesktop.org
S: Supported
F: drivers/gpu/drm/fsl-dcu/
-F: Documentation/devicetree/bindings/video/fsl,dcu.txt
-F: Documentation/devicetree/bindings/panel/nec,nl4827hc19_05b.txt
+F: Documentation/devicetree/bindings/display/fsl,dcu.txt
+F: Documentation/devicetree/bindings/display/panel/nec,nl4827hc19_05b.txt
DRM DRIVERS FOR FREESCALE IMX
M: Philipp Zabel <p.zabel@pengutronix.de>
L: dri-devel@lists.freedesktop.org
S: Maintained
F: drivers/gpu/drm/imx/
-F: Documentation/devicetree/bindings/drm/imx/
+F: Documentation/devicetree/bindings/display/imx/
+
+DRM DRIVERS FOR GMA500 (Poulsbo, Moorestown and derivative chipsets)
+M: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
+L: dri-devel@lists.freedesktop.org
+T: git git://github.com/patjak/drm-gma500
+S: Maintained
+F: drivers/gpu/drm/gma500
+F: include/drm/gma500*
DRM DRIVERS FOR NVIDIA TEGRA
M: Thierry Reding <thierry.reding@gmail.com>
F: drivers/gpu/host1x/
F: include/linux/host1x.h
F: include/uapi/drm/tegra_drm.h
-F: Documentation/devicetree/bindings/gpu/nvidia,tegra20-host1x.txt
+F: Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.txt
DRM DRIVERS FOR RENESAS
M: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
L: dri-devel@lists.freedesktop.org
S: Maintained
F: drivers/gpu/drm/rockchip/
-F: Documentation/devicetree/bindings/video/rockchip*
+F: Documentation/devicetree/bindings/display/rockchip*
DRM DRIVERS FOR STI
M: Benjamin Gaignard <benjamin.gaignard@linaro.org>
T: git http://git.linaro.org/people/benjamin.gaignard/kernel.git
S: Maintained
F: drivers/gpu/drm/sti
-F: Documentation/devicetree/bindings/gpu/st,stih4xx.txt
+F: Documentation/devicetree/bindings/display/st,stih4xx.txt
DSBR100 USB FM RADIO DRIVER
M: Alexey Klimov <klimov.linux@gmail.com>
F: sound/usb/misc/ua101.c
EXTENSIBLE FIRMWARE INTERFACE (EFI)
-M: Matt Fleming <matt.fleming@intel.com>
+M: Matt Fleming <matt@codeblueprint.co.uk>
L: linux-efi@vger.kernel.org
T: git git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi.git
S: Maintained
EFI VARIABLE FILESYSTEM
M: Matthew Garrett <matthew.garrett@nebula.com>
M: Jeremy Kerr <jk@ozlabs.org>
-M: Matt Fleming <matt.fleming@intel.com>
+M: Matt Fleming <matt@codeblueprint.co.uk>
T: git git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi.git
L: linux-efi@vger.kernel.org
S: Maintained
F: include/linux/ipmi-fru.h
K: fmc_d.*register
+FPGA MANAGER FRAMEWORK
+M: Alan Tull <atull@opensource.altera.com>
+S: Maintained
+F: drivers/fpga/
+F: include/linux/fpga/fpga-mgr.h
+W: http://www.rocketboards.org
+
FPU EMULATOR
M: Bill Metzenthen <billm@melbpc.org.au>
W: http://floatingpoint.sourceforge.net/emulator/index.html
T: git git://git.kernel.org/pub/scm/linux/kernel/git/plagnioj/linux-fbdev.git
S: Maintained
F: Documentation/fb/
-F: Documentation/devicetree/bindings/fb/
F: drivers/video/
F: include/video/
F: include/linux/fb.h
S: Maintained
F: drivers/net/ethernet/freescale/ucc_geth*
+FREESCALE eTSEC ETHERNET DRIVER (GIANFAR)
+M: Claudiu Manoil <claudiu.manoil@freescale.com>
+L: netdev@vger.kernel.org
+S: Maintained
+F: drivers/net/ethernet/freescale/gianfar*
+X: drivers/net/ethernet/freescale/gianfar_ptp.c
+F: Documentation/devicetree/bindings/net/fsl-tsec-phy.txt
+
FREESCALE QUICC ENGINE UCC UART DRIVER
M: Timur Tabi <timur@tabi.org>
L: linuxppc-dev@lists.ozlabs.org
S: Supported
F: drivers/platform/x86/intel_menlow.c
-INTEL IA32 MICROCODE UPDATE SUPPORT
-M: Borislav Petkov <bp@alien8.de>
-S: Maintained
-F: arch/x86/kernel/cpu/microcode/core*
-F: arch/x86/kernel/cpu/microcode/intel*
-
INTEL I/OAT DMA DRIVER
M: Dave Jiang <dave.jiang@intel.com>
R: Dan Williams <dan.j.williams@intel.com>
F: Documentation/networking/README.ipw2200
F: drivers/net/wireless/ipw2x00/
+INTEL(R) TRACE HUB
+M: Alexander Shishkin <alexander.shishkin@linux.intel.com>
+S: Supported
+F: Documentation/trace/intel_th.txt
+F: drivers/hwtracing/intel_th/
+
INTEL(R) TRUSTED EXECUTION TECHNOLOGY (TXT)
M: Richard L Maliszewski <richard.l.maliszewski@intel.com>
M: Gang Wei <gang.wei@intel.com>
INTEL WIRELESS WIFI LINK (iwlwifi)
M: Johannes Berg <johannes.berg@intel.com>
M: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
-M: Intel Linux Wireless <ilw@linux.intel.com>
+M: Intel Linux Wireless <linuxwifi@intel.com>
L: linux-wireless@vger.kernel.org
W: http://intellinuxwireless.org
T: git git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi.git
F: drivers/misc/mei/*
F: Documentation/misc-devices/mei/*
+INTEL MIC DRIVERS (mic)
+M: Sudeep Dutt <sudeep.dutt@intel.com>
+M: Ashutosh Dixit <ashutosh.dixit@intel.com>
+S: Supported
+W: https://github.com/sudeepdutt/mic
+W: http://software.intel.com/en-us/mic-developer
+F: include/linux/mic_bus.h
+F: include/linux/scif.h
+F: include/uapi/linux/mic_common.h
+F: include/uapi/linux/mic_ioctl.h
+F include/uapi/linux/scif_ioctl.h
+F: drivers/misc/mic/
+F: drivers/dma/mic_x100_dma.c
+F: drivers/dma/mic_x100_dma.h
+F Documentation/mic/
+
INTEL PMC IPC DRIVER
M: Zha Qipeng<qipeng.zha@intel.com>
L: platform-driver-x86@vger.kernel.org
KERNEL VIRTUAL MACHINE (KVM) FOR AMD-V
M: Joerg Roedel <joro@8bytes.org>
L: kvm@vger.kernel.org
-W: http://kvm.qumranet.com
+W: http://www.linux-kvm.org/
S: Maintained
F: arch/x86/include/asm/svm.h
F: arch/x86/kvm/svm.c
KERNEL VIRTUAL MACHINE (KVM) FOR POWERPC
M: Alexander Graf <agraf@suse.com>
L: kvm-ppc@vger.kernel.org
-W: http://kvm.qumranet.com
+W: http://www.linux-kvm.org/
T: git git://github.com/agraf/linux-2.6.git
S: Supported
F: arch/powerpc/include/asm/kvm*
F: drivers/auxdisplay/ks0108.c
F: include/linux/ks0108.h
+L3MDEV
+M: David Ahern <dsa@cumulusnetworks.com>
+L: netdev@vger.kernel.org
+S: Maintained
+F: net/l3mdev
+F: include/net/l3mdev.h
+
LAPB module
L: linux-x25@vger.kernel.org
S: Orphan
F: include/linux/pmem.h
F: arch/*/include/asm/pmem.h
+LIGHTNVM PLATFORM SUPPORT
+M: Matias Bjorling <mb@lightnvm.io>
+W: http://github/OpenChannelSSD
+S: Maintained
+F: drivers/lightnvm/
+F: include/linux/lightnvm.h
+F: include/uapi/linux/lightnvm.h
+
LINUX FOR IBM pSERIES (RS/6000)
M: Paul Mackerras <paulus@au.ibm.com>
W: http://www.ibm.com/linux/ltc/projects/ppc
S: Maintained
F: drivers/net/dsa/mv88e6352.c
+MARVELL CRYPTO DRIVER
+M: Boris Brezillon <boris.brezillon@free-electrons.com>
+M: Arnaud Ebalard <arno@natisbad.org>
+F: drivers/crypto/marvell/
+S: Maintained
+L: linux-crypto@vger.kernel.org
+
MARVELL GIGABIT ETHERNET DRIVERS (skge/sky2)
M: Mirko Lindner <mlindner@marvell.com>
M: Stephen Hemminger <stephen@networkplumber.org>
S: Maintained
F: drivers/media/radio/radio-maxiradio*
+MCP4531 MICROCHIP DIGITAL POTENTIOMETER DRIVER
+M: Peter Rosin <peda@axentia.se>
+L: linux-iio@vger.kernel.org
+S: Maintained
+F: drivers/iio/potentiometer/mcp4531.c
+
MEDIA DRIVERS FOR RENESAS - VSP1
M: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
L: linux-media@vger.kernel.org
MELLANOX ETHERNET DRIVER (mlx4_en)
M: Amir Vadai <amirv@mellanox.com>
-M: Ido Shamay <idos@mellanox.com>
L: netdev@vger.kernel.org
S: Supported
W: http://www.mellanox.com
F: arch/metag/
F: Documentation/metag/
F: Documentation/devicetree/bindings/metag/
+F: Documentation/devicetree/bindings/interrupt-controller/img,*
F: drivers/clocksource/metag_generic.c
F: drivers/irqchip/irq-metag.c
F: drivers/irqchip/irq-metag-ext.c
F: include/linux/mlx5/
F: drivers/infiniband/hw/mlx5/
+MELEXIS MLX90614 DRIVER
+M: Crt Mori <cmo@melexis.com>
+L: linux-iio@vger.kernel.org
+W: http://www.melexis.com
+S: Supported
+F: drivers/iio/temperature/mlx90614.c
+
MN88472 MEDIA DRIVER
M: Antti Palosaari <crope@iki.fi>
L: linux-media@vger.kernel.org
L: linux-wpan@vger.kernel.org
S: Maintained
F: drivers/net/ieee802154/mrf24j40.c
+F: Documentation/devicetree/bindings/net/ieee802154/mrf24j40.txt
MSI LAPTOP SUPPORT
M: "Lee, Chun-Yi" <jlee@suse.com>
F: include/media/mt9v032.h
MULTIFUNCTION DEVICES (MFD)
-M: Samuel Ortiz <sameo@linux.intel.com>
M: Lee Jones <lee.jones@linaro.org>
T: git git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd.git
S: Supported
F: drivers/net/
F: include/linux/if_*
F: include/linux/netdevice.h
-F: include/linux/arcdevice.h
F: include/linux/etherdevice.h
F: include/linux/fcdevice.h
F: include/linux/fddidevice.h
M: Pali Rohár <pali.rohar@gmail.com>
S: Maintained
F: include/linux/power/bq2415x_charger.h
-F: include/linux/power/bq27x00_battery.h
+F: include/linux/power/bq27xxx_battery.h
F: include/linux/power/isp1704_charger.h
F: drivers/power/bq2415x_charger.c
-F: drivers/power/bq27x00_battery.c
+F: drivers/power/bq27xxx_battery.c
F: drivers/power/isp1704_charger.c
F: drivers/power/rx51_battery.c
F: drivers/video/fbdev/nvidia/
NVM EXPRESS DRIVER
-M: Matthew Wilcox <willy@linux.intel.com>
+M: Keith Busch <keith.busch@intel.com>
+M: Jens Axboe <axboe@fb.com>
L: linux-nvme@lists.infradead.org
-T: git git://git.infradead.org/users/willy/linux-nvme.git
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git
+W: https://kernel.googlesource.com/pub/scm/linux/kernel/git/axboe/linux-block/
S: Supported
-F: drivers/block/nvme*
+F: drivers/nvme/host/
F: include/linux/nvme.h
NVMEM FRAMEWORK
F: arch/x86/pci/
F: arch/x86/kernel/quirks.c
+PCI DRIVER FOR ALTERA PCIE IP
+M: Ley Foon Tan <lftan@altera.com>
+L: rfi@lists.rocketboards.org (moderated for non-subscribers)
+L: linux-pci@vger.kernel.org
+S: Supported
+F: Documentation/devicetree/bindings/pci/altera-pcie.txt
+F: drivers/pci/host/pcie-altera.c
+
PCI DRIVER FOR ARM VERSATILE PLATFORM
M: Rob Herring <robh@kernel.org>
L: linux-pci@vger.kernel.org
S: Maintained
F: drivers/pci/host/*spear*
+PCI MSI DRIVER FOR ALTERA MSI IP
+M: Ley Foon Tan <lftan@altera.com>
+L: rfi@lists.rocketboards.org (moderated for non-subscribers)
+L: linux-pci@vger.kernel.org
+S: Supported
+F: Documentation/devicetree/bindings/pci/altera-pcie-msi.txt
+F: drivers/pci/host/pcie-altera-msi.c
+
PCI MSI DRIVER FOR APPLIEDMICRO XGENE
M: Duc Dang <dhdang@apm.com>
L: linux-pci@vger.kernel.org
F: Documentation/devicetree/bindings/pci/xgene-pci-msi.txt
F: drivers/pci/host/pci-xgene-msi.c
+PCIE DRIVER FOR HISILICON
+M: Zhou Wang <wangzhou1@hisilicon.com>
+L: linux-pci@vger.kernel.org
+S: Maintained
+F: Documentation/devicetree/bindings/pci/hisilicon-pcie.txt
+F: drivers/pci/host/pcie-hisi.c
+
PCMCIA SUBSYSTEM
P: Linux PCMCIA Team
L: linux-pcmcia@lists.infradead.org
S: Maintained
F: drivers/pinctrl/pinctrl-at91.*
+PIN CONTROLLER - ATMEL AT91 PIO4
+M: Ludovic Desroches <ludovic.desroches@atmel.com>
+L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
+L: linux-gpio@vger.kernel.org
+S: Supported
+F: drivers/pinctrl/pinctrl-at91-pio4.*
+
PIN CONTROLLER - INTEL
M: Mika Westerberg <mika.westerberg@linux.intel.com>
M: Heikki Krogerus <heikki.krogerus@linux.intel.com>
S: Supported
F: drivers/net/ethernet/qlogic/qlge/
+QLOGIC QL4xxx ETHERNET DRIVER
+M: Yuval Mintz <Yuval.Mintz@qlogic.com>
+M: Ariel Elior <Ariel.Elior@qlogic.com>
+M: everest-linux-l2@qlogic.com
+L: netdev@vger.kernel.org
+S: Supported
+F: drivers/net/ethernet/qlogic/qed/
+F: include/linux/qed/
+F: drivers/net/ethernet/qlogic/qede/
+
QNX4 FILESYSTEM
M: Anders Larsen <al@alarsen.net>
W: http://www.alarsen.net/linux/qnx4fs/
F: drivers/net/wireless/rtlwifi/
F: drivers/net/wireless/rtlwifi/rtl8192ce/
+RTL8XXXU WIRELESS DRIVER (rtl8xxxu)
+M: Jes Sorensen <Jes.Sorensen@redhat.com>
+L: linux-wireless@vger.kernel.org
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/jes/linux.git rtl8723au-mac80211
+S: Maintained
+F: drivers/net/wireless/realtek/rtl8xxxu/
+
S3 SAVAGE FRAMEBUFFER DRIVER
M: Antonino Daplas <adaplas@gmail.com>
L: linux-fbdev@vger.kernel.org
F: include/net/iucv/
F: net/iucv/
+S390 IOMMU (PCI)
+M: Gerald Schaefer <gerald.schaefer@de.ibm.com>
+L: linux-s390@vger.kernel.org
+W: http://www.ibm.com/developerworks/linux/linux390/
+S: Supported
+F: drivers/iommu/s390-iommu.c
+
S3C24XX SD/MMC Driver
M: Ben Dooks <ben-linux@fluff.org>
L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
F: Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt
F: drivers/net/ethernet/synopsys/dwc_eth_qos.c
+SYNOPSYS DESIGNWARE I2C DRIVER
+M: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
+M: Jarkko Nikula <jarkko.nikula@linux.intel.com>
+M: Mika Westerberg <mika.westerberg@linux.intel.com>
+L: linux-i2c@vger.kernel.org
+S: Maintained
+F: drivers/i2c/busses/i2c-designware-*
+F: include/linux/platform_data/i2c-designware.h
+
SYNOPSYS DESIGNWARE MMC/SD/SDIO DRIVER
M: Seungwon Jeon <tgih.jun@samsung.com>
M: Jaehoon Chung <jh80.chung@samsung.com>
F: include/linux/mmc/dw_mmc.h
F: drivers/mmc/host/dw_mmc*
+SYSTEM TRACE MODULE CLASS
+M: Alexander Shishkin <alexander.shishkin@linux.intel.com>
+S: Maintained
+F: Documentation/trace/stm.txt
+F: drivers/hwtracing/stm/
+F: include/linux/stm.h
+F: include/uapi/linux/stm.h
+
THUNDERBOLT DRIVER
M: Andreas Noever <andreas.noever@gmail.com>
S: Maintained
SERVER ENGINES 10Gbps iSCSI - BladeEngine 2 DRIVER
M: Jayamohan Kallickal <jayamohan.kallickal@avagotech.com>
-M: Minh Tran <minh.tran@avagotech.com>
-M: John Soni Jose <sony.john-n@avagotech.com>
+M: Ketan Mukadam <ketan.mukadam@avagotech.com>
+M: John Soni Jose <sony.john@avagotech.com>
L: linux-scsi@vger.kernel.org
W: http://www.avagotech.com
S: Supported
M: Hans de Goede <hdegoede@redhat.com>
L: linux-fbdev@vger.kernel.org
S: Maintained
-F: Documentation/devicetree/bindings/video/simple-framebuffer.txt
+F: Documentation/devicetree/bindings/display/simple-framebuffer.txt
F: drivers/video/fbdev/simplefb.c
F: include/linux/platform_data/simplefb.h
F: drivers/staging/lustre
STAGING - NVIDIA COMPLIANT EMBEDDED CONTROLLER INTERFACE (nvec)
-M: Julian Andres Klode <jak@jak-linux.org>
M: Marc Dietrich <marvin24@gmx.de>
L: ac100@lists.launchpad.net (moderated for non-subscribers)
L: linux-tegra@vger.kernel.org
STAGING - WILC1000 WIFI DRIVER
M: Johnny Kim <johnny.kim@atmel.com>
-M: Rachel Kim <rachel.kim@atmel.com>
-M: Dean Lee <dean.lee@atmel.com>
+M: Austin Shin <austin.shin@atmel.com>
M: Chris Park <chris.park@atmel.com>
+M: Tony Cho <tony.cho@atmel.com>
+M: Glen Lee <glen.lee@atmel.com>
+M: Leo Kim <leo.kim@atmel.com>
L: linux-wireless@vger.kernel.org
S: Supported
F: drivers/staging/wilc1000/
SYNOPSYS ARC ARCHITECTURE
M: Vineet Gupta <vgupta@synopsys.com>
+L: linux-snps-arc@lists.infraded.org
S: Supported
F: arch/arc/
F: Documentation/devicetree/bindings/arc/*
+F: Documentation/devicetree/bindings/interrupt-controller/snps,arc*
F: drivers/tty/serial/arc_uart.c
T: git git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc.git
S: Maintained
F: drivers/platform/x86/toshiba_haps.c
+TOSHIBA WMI HOTKEYS DRIVER
+M: Azael Avalos <coproscefalo@gmail.com>
+L: platform-driver-x86@vger.kernel.org
+S: Maintained
+F: drivers/platform/x86/toshiba-wmi.c
+
TOSHIBA SMM DRIVER
M: Jonathan Buzzard <jonathan@buzzard.org.uk>
W: http://www.buzzard.org.uk/toshiba/
TPM DEVICE DRIVER
M: Peter Huewe <peterhuewe@gmx.de>
M: Marcel Selhorst <tpmdd@selhorst.net>
+M: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
R: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
W: http://tpmdd.sourceforge.net
L: tpmdd-devel@lists.sourceforge.net (moderated for non-subscribers)
F: Documentation/fb/uvesafb.txt
F: drivers/video/fbdev/uvesafb.*
+VF610 NAND DRIVER
+M: Stefan Agner <stefan@agner.ch>
+L: linux-mtd@lists.infradead.org
+S: Supported
+F: drivers/mtd/nand/vf610_nfc.c
+
VFAT/FAT/MSDOS FILESYSTEM
M: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
S: Maintained
F: drivers/media/v4l2-core/videobuf2-*
F: include/media/videobuf2-*
+VIRTUAL SERIO DEVICE DRIVER
+M: Stephen Chandler Paul <thatslyude@gmail.com>
+S: Maintained
+F: drivers/input/serio/userio.c
+F: include/uapi/linux/userio.h
+
VIRTIO CONSOLE DRIVER
M: Amit Shah <amit.shah@redhat.com>
L: virtualization@lists.linux-foundation.org
S: Maintained
F: drivers/net/ethernet/via/via-velocity.*
+VIRT LIB
+M: Alex Williamson <alex.williamson@redhat.com>
+M: Paolo Bonzini <pbonzini@redhat.com>
+L: kvm@vger.kernel.org
+S: Supported
+F: virt/lib/
+
VIVID VIRTUAL VIDEO DRIVER
M: Hans Verkuil <hverkuil@xs4all.nl>
L: linux-media@vger.kernel.org
L: netdev@vger.kernel.org
S: Maintained
F: drivers/net/vrf.c
-F: include/net/vrf.h
F: Documentation/networking/vrf.txt
VT1211 HARDWARE MONITOR DRIVER
S: Maintained
F: drivers/net/wireless/wl3501*
-WM97XX TOUCHSCREEN DRIVERS
-M: Mark Brown <broonie@kernel.org>
-M: Liam Girdwood <lrg@slimlogic.co.uk>
-L: linux-input@vger.kernel.org
-W: https://github.com/CirrusLogic/linux-drivers/wiki
-S: Supported
-F: drivers/input/touchscreen/*wm97*
-F: include/linux/wm97xx.h
-
WOLFSON MICROELECTRONICS DRIVERS
L: patches@opensource.wolfsonmicro.com
T: git https://github.com/CirrusLogic/linux-drivers.git
W: https://github.com/CirrusLogic/linux-drivers/wiki
S: Supported
F: Documentation/hwmon/wm83??
+F: Documentation/devicetree/bindings/extcon/extcon-arizona.txt
+F: Documentation/devicetree/bindings/regulator/arizona-regulator.txt
+F: Documentation/devicetree/bindings/mfd/arizona.txt
F: arch/arm/mach-s3c64xx/mach-crag6410*
F: drivers/clk/clk-wm83*.c
F: drivers/extcon/extcon-arizona.c
T: git git://git.infradead.org/users/dvhart/linux-platform-drivers-x86.git
S: Maintained
F: drivers/platform/x86/
+F: drivers/platform/olpc/
X86 MCE INFRASTRUCTURE
M: Tony Luck <tony.luck@intel.com>
S: Maintained
F: arch/x86/kernel/cpu/mcheck/*
+X86 MICROCODE UPDATE SUPPORT
+M: Borislav Petkov <bp@alien8.de>
+S: Maintained
+F: arch/x86/kernel/cpu/microcode/*
+
X86 VDSO
M: Andy Lutomirski <luto@amacapital.net>
L: linux-kernel@vger.kernel.org
ZSMALLOC COMPRESSED SLAB MEMORY ALLOCATOR
M: Minchan Kim <minchan@kernel.org>
M: Nitin Gupta <ngupta@vflare.org>
+R: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
L: linux-mm@kvack.org
S: Maintained
F: mm/zsmalloc.c
* SOFTWARE.
*/
-#include <asm-generic/kmap_types.h>
+#include <linux/highmem.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/errno.h>
dev->ib_dev.detach_mcast = mlx5_ib_mcg_detach;
dev->ib_dev.process_mad = mlx5_ib_process_mad;
dev->ib_dev.alloc_mr = mlx5_ib_alloc_mr;
- dev->ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list;
- dev->ib_dev.free_fast_reg_page_list = mlx5_ib_free_fast_reg_page_list;
+ dev->ib_dev.map_mr_sg = mlx5_ib_map_mr_sg;
dev->ib_dev.check_mr_status = mlx5_ib_check_mr_status;
dev->ib_dev.get_port_immutable = mlx5_port_immutable;
#include <linux/timer.h>
#include <linux/vmalloc.h>
#include <linux/etherdevice.h>
+#include <linux/net_tstamp.h>
#include <asm/io.h>
+ #include "t4_chip_type.h"
#include "cxgb4_uld.h"
#define CH_WARN(adap, fmt, ...) dev_warn(adap->pdev_dev, fmt, ## __VA_ARGS__)
unsigned char width;
};
- #define CHELSIO_CHIP_CODE(version, revision) (((version) << 4) | (revision))
- #define CHELSIO_CHIP_FPGA 0x100
- #define CHELSIO_CHIP_VERSION(code) (((code) >> 4) & 0xf)
- #define CHELSIO_CHIP_RELEASE(code) ((code) & 0xf)
-
- #define CHELSIO_T4 0x4
- #define CHELSIO_T5 0x5
- #define CHELSIO_T6 0x6
-
- enum chip_type {
- T4_A1 = CHELSIO_CHIP_CODE(CHELSIO_T4, 1),
- T4_A2 = CHELSIO_CHIP_CODE(CHELSIO_T4, 2),
- T4_FIRST_REV = T4_A1,
- T4_LAST_REV = T4_A2,
-
- T5_A0 = CHELSIO_CHIP_CODE(CHELSIO_T5, 0),
- T5_A1 = CHELSIO_CHIP_CODE(CHELSIO_T5, 1),
- T5_FIRST_REV = T5_A0,
- T5_LAST_REV = T5_A1,
-
- T6_A0 = CHELSIO_CHIP_CODE(CHELSIO_T6, 0),
- T6_FIRST_REV = T6_A0,
- T6_LAST_REV = T6_A0,
- };
-
struct devlog_params {
u32 memtype; /* which memory (EDC0, EDC1, MC) */
u32 start; /* start of log in firmware memory */
#ifdef CONFIG_CHELSIO_T4_FCOE
struct cxgb_fcoe fcoe;
#endif /* CONFIG_CHELSIO_T4_FCOE */
+ bool rxtstamp; /* Enable TS */
+ struct hwtstamp_config tstamp_config;
};
struct dentry;
/* A packet gather list */
struct pkt_gl {
+ u64 sgetstamp; /* SGE Time Stamp for Ingress Packet */
struct page_frag frags[MAX_SKB_FRAGS];
void *va; /* virtual address of first byte */
unsigned int nfrags; /* # of fragments */
bool tid_release_task_busy;
struct dentry *debugfs_root;
- u32 use_bd; /* Use SGE Back Door intfc for reading SGE Contexts */
- u32 trace_rss; /* 1 implies that different RSS flit per filter is
+ bool use_bd; /* Use SGE Back Door intfc for reading SGE Contexts */
+ bool trace_rss; /* 1 implies that different RSS flit per filter is
* used per filter else if 0 default RSS flit is
* used for all 4 filters.
*/
return adap->params.offload;
}
- static inline int is_t6(enum chip_type chip)
- {
- return CHELSIO_CHIP_VERSION(chip) == CHELSIO_T6;
- }
-
- static inline int is_t5(enum chip_type chip)
- {
- return CHELSIO_CHIP_VERSION(chip) == CHELSIO_T5;
- }
-
- static inline int is_t4(enum chip_type chip)
- {
- return CHELSIO_CHIP_VERSION(chip) == CHELSIO_T4;
- }
-
static inline u32 t4_read_reg(struct adapter *adap, u32 reg_addr)
{
return readl(adap->regs + reg_addr);
#endif
#define DRV_VERSION "2.0.0-ko"
const char cxgb4_driver_version[] = DRV_VERSION;
-#define DRV_DESC "Chelsio T4/T5 Network Driver"
+#define DRV_DESC "Chelsio T4/T5/T6 Network Driver"
/* Host shadow copy of ingress filter entry. This is in host native format
* and doesn't match the ordering or bit order, etc. of the hardware of the
MODULE_DEVICE_TABLE(pci, cxgb4_pci_tbl);
MODULE_FIRMWARE(FW4_FNAME);
MODULE_FIRMWARE(FW5_FNAME);
+MODULE_FIRMWARE(FW6_FNAME);
/*
* Normally we're willing to become the firmware's Master PF but will be happy
else {
static const char *fc[] = { "no", "Rx", "Tx", "Tx/Rx" };
- const char *s = "10Mbps";
+ const char *s;
const struct port_info *p = netdev_priv(dev);
switch (p->link_cfg.speed) {
case 40000:
s = "40Gbps";
break;
+ default:
+ pr_info("%s: unsupported speed: %d\n",
+ dev->name, p->link_cfg.speed);
+ return;
}
netdev_info(dev, "link up, %s, full-duplex, %s PAUSE\n", s,
}
EXPORT_SYMBOL(cxgb4_best_aligned_mtu);
+ /**
+ * cxgb4_tp_smt_idx - Get the Source Mac Table index for this VI
+ * @chip: chip type
+ * @viid: VI id of the given port
+ *
+ * Return the SMT index for this VI.
+ */
+ unsigned int cxgb4_tp_smt_idx(enum chip_type chip, unsigned int viid)
+ {
+ /* In T4/T5, SMT contains 256 SMAC entries organized in
+ * 128 rows of 2 entries each.
+ * In T6, SMT contains 256 SMAC entries in 256 rows.
+ * TODO: The below code needs to be updated when we add support
+ * for 256 VFs.
+ */
+ if (CHELSIO_CHIP_VERSION(chip) <= CHELSIO_T5)
+ return ((viid & 0x7f) << 1);
+ else
+ return (viid & 0x7f);
+ }
+ EXPORT_SYMBOL(cxgb4_tp_smt_idx);
+
/**
* cxgb4_port_chan - get the HW channel of a port
* @dev: the net device for the port
ret = t4_mdio_wr(pi->adapter, mbox, prtad, devad,
data->reg_num, data->val_in);
break;
+ case SIOCGHWTSTAMP:
+ return copy_to_user(req->ifr_data, &pi->tstamp_config,
+ sizeof(pi->tstamp_config)) ?
+ -EFAULT : 0;
+ case SIOCSHWTSTAMP:
+ if (copy_from_user(&pi->tstamp_config, req->ifr_data,
+ sizeof(pi->tstamp_config)))
+ return -EFAULT;
+
+ switch (pi->tstamp_config.rx_filter) {
+ case HWTSTAMP_FILTER_NONE:
+ pi->rxtstamp = false;
+ break;
+ case HWTSTAMP_FILTER_ALL:
+ pi->rxtstamp = true;
+ break;
+ default:
+ pi->tstamp_config.rx_filter = HWTSTAMP_FILTER_NONE;
+ return -ERANGE;
+ }
+
+ return copy_to_user(req->ifr_data, &pi->tstamp_config,
+ sizeof(pi->tstamp_config)) ?
+ -EFAULT : 0;
default:
return -EOPNOTSUPP;
}
t4_get_tp_version(adap, &adap->params.tp_vers);
ret = t4_check_fw_version(adap);
/* If firmware is too old (not supported by driver) force an update. */
- if (ret == -EFAULT)
+ if (ret)
state = DEV_STATE_UNINIT;
if ((adap->flags & MASTER_PF) && state != DEV_STATE_INIT) {
struct fw_info *fw_info;
}
for (i = 0; i < allocated; ++i)
adap->msix_info[i].vec = entries[i].vector;
+ dev_info(adap->pdev_dev, "%d MSI-X vectors allocated, "
+ "nic %d iscsi %d rdma cpl %d rdma ciq %d\n",
+ allocated, s->max_ethqsets, s->ofldqsets, s->rdmaqs,
+ s->rdmaciqs);
kfree(entries);
return 0;
[27] = "Port beacon support",
[28] = "RX-ALL support",
[29] = "802.1ad offload support",
+ [31] = "Modifying loopback source checks using UPDATE_QP support",
+ [32] = "Loopback source checks support",
};
int i;
MLX4_GET(field32, outbox, QUERY_DEV_CAP_EXT_2_FLAGS_OFFSET);
if (field32 & (1 << 16))
dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_UPDATE_QP;
+ if (field32 & (1 << 18))
+ dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_UPDATE_QP_SRC_CHECK_LB;
+ if (field32 & (1 << 19))
+ dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_LB_SRC_CHK;
if (field32 & (1 << 26))
dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_VLAN_CONTROL;
if (field32 & (1 << 20))
return -EOPNOTSUPP;
}
EXPORT_SYMBOL(set_phv_bit);
+
+void mlx4_replace_zero_macs(struct mlx4_dev *dev)
+{
+ int i;
+ u8 mac_addr[ETH_ALEN];
+
+ dev->port_random_macs = 0;
+ for (i = 1; i <= dev->caps.num_ports; ++i)
+ if (!dev->caps.def_mac[i] &&
+ dev->caps.port_type[i] == MLX4_PORT_TYPE_ETH) {
+ eth_random_addr(mac_addr);
+ dev->port_random_macs |= 1 << i;
+ dev->caps.def_mac[i] = mlx4_mac_to_u64(mac_addr);
+ }
+}
+EXPORT_SYMBOL_GPL(mlx4_replace_zero_macs);
u64 qp_mask = 0;
int err = 0;
+ if (!attr || (attr & ~MLX4_UPDATE_QP_SUPPORTED_ATTRS))
+ return -EINVAL;
+
mailbox = mlx4_alloc_cmd_mailbox(dev);
if (IS_ERR(mailbox))
return PTR_ERR(mailbox);
cmd = (struct mlx4_update_qp_context *)mailbox->buf;
- if (!attr || (attr & ~MLX4_UPDATE_QP_SUPPORTED_ATTRS))
- return -EINVAL;
-
if (attr & MLX4_UPDATE_QP_SMAC) {
pri_addr_path_mask |= 1ULL << MLX4_UPD_QP_PATH_MASK_MAC_INDEX;
cmd->qp_context.pri_path.grh_mylmc = params->smac_index;
}
+ if (attr & MLX4_UPDATE_QP_ETH_SRC_CHECK_MC_LB) {
+ if (!(dev->caps.flags2
+ & MLX4_DEV_CAP_FLAG2_UPDATE_QP_SRC_CHECK_LB)) {
+ mlx4_warn(dev,
+ "Trying to set src check LB, but it isn't supported\n");
+ err = -ENOTSUPP;
+ goto out;
+ }
+ pri_addr_path_mask |=
+ 1ULL << MLX4_UPD_QP_PATH_MASK_ETH_SRC_CHECK_MC_LB;
+ if (params->flags &
+ MLX4_UPDATE_QP_PARAMS_FLAGS_ETH_CHECK_MC_LB) {
+ cmd->qp_context.pri_path.fl |=
+ MLX4_FL_ETH_SRC_CHECK_MC_LB;
+ }
+ }
+
if (attr & MLX4_UPDATE_QP_VSD) {
qp_mask |= 1ULL << MLX4_UPD_QP_MASK_VSD;
if (params->flags & MLX4_UPDATE_QP_PARAMS_FLAGS_VSD_ENABLE)
err = mlx4_cmd(dev, mailbox->dma, qpn & 0xffffff, 0,
MLX4_CMD_UPDATE_QP, MLX4_CMD_TIME_CLASS_A,
MLX4_CMD_NATIVE);
-
+ out:
mlx4_free_cmd_mailbox(dev, mailbox);
return err;
}
}
}
+ /* preserve IF_COUNTER flag */
+ qpc->pri_path.vlan_control &=
+ MLX4_CTRL_ETH_SRC_CHECK_IF_COUNTER;
if (vp_oper->state.link_state == IFLA_VF_LINK_STATE_DISABLE &&
dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_UPDATE_QP) {
- qpc->pri_path.vlan_control =
+ qpc->pri_path.vlan_control |=
MLX4_VLAN_CTRL_ETH_TX_BLOCK_TAGGED |
MLX4_VLAN_CTRL_ETH_TX_BLOCK_PRIO_TAGGED |
MLX4_VLAN_CTRL_ETH_TX_BLOCK_UNTAGGED |
MLX4_VLAN_CTRL_ETH_RX_BLOCK_UNTAGGED |
MLX4_VLAN_CTRL_ETH_RX_BLOCK_TAGGED;
} else if (0 != vp_oper->state.default_vlan) {
- qpc->pri_path.vlan_control =
+ qpc->pri_path.vlan_control |=
MLX4_VLAN_CTRL_ETH_TX_BLOCK_TAGGED |
MLX4_VLAN_CTRL_ETH_RX_BLOCK_PRIO_TAGGED |
MLX4_VLAN_CTRL_ETH_RX_BLOCK_UNTAGGED;
} else { /* priority tagged */
- qpc->pri_path.vlan_control =
+ qpc->pri_path.vlan_control |=
MLX4_VLAN_CTRL_ETH_TX_BLOCK_TAGGED |
MLX4_VLAN_CTRL_ETH_RX_BLOCK_TAGGED;
}
return 0;
undo:
- for (--i; i >= base; --i)
+ for (--i; i >= 0; --i) {
rb_erase(&res_arr[i]->node, root);
+ list_del_init(&res_arr[i]->list);
+ }
spin_unlock_irq(mlx4_tlock(dev));
update_gid(dev, inbox, (u8)slave);
adjust_proxy_tun_qkey(dev, vhcr, qpc);
orig_sched_queue = qpc->pri_path.sched_queue;
- err = update_vport_qp_param(dev, inbox, slave, qpn);
- if (err)
- return err;
err = get_res(dev, slave, qpn, RES_QP, &qp);
if (err)
goto out;
}
+ err = update_vport_qp_param(dev, inbox, slave, qpn);
+ if (err)
+ goto out;
+
err = mlx4_DMA_wrapper(dev, slave, vhcr, inbox, outbox, cmd);
out:
/* if no error, save sched queue value passed in by VF. This is
}
- #define MLX4_UPD_QP_PATH_MASK_SUPPORTED (1ULL << MLX4_UPD_QP_PATH_MASK_MAC_INDEX)
+ #define MLX4_UPD_QP_PATH_MASK_SUPPORTED ( \
+ 1ULL << MLX4_UPD_QP_PATH_MASK_MAC_INDEX |\
+ 1ULL << MLX4_UPD_QP_PATH_MASK_ETH_SRC_CHECK_MC_LB)
int mlx4_UPDATE_QP_wrapper(struct mlx4_dev *dev, int slave,
struct mlx4_vhcr *vhcr,
struct mlx4_cmd_mailbox *inbox,
(pri_addr_path_mask & ~MLX4_UPD_QP_PATH_MASK_SUPPORTED))
return -EPERM;
+ if ((pri_addr_path_mask &
+ (1ULL << MLX4_UPD_QP_PATH_MASK_ETH_SRC_CHECK_MC_LB)) &&
+ !(dev->caps.flags2 &
+ MLX4_DEV_CAP_FLAG2_UPDATE_QP_SRC_CHECK_LB)) {
+ mlx4_warn(dev,
+ "Src check LB for slave %d isn't supported\n",
+ slave);
+ return -ENOTSUPP;
+ }
+
/* Just change the smac for the QP */
err = get_res(dev, slave, qpn, RES_QP, &rqp);
if (err) {
#define IBLND_N_SCHED_HIGH 4
typedef struct {
- int *kib_dev_failover; /* HCA failover */
- unsigned int *kib_service; /* IB service number */
- int *kib_min_reconnect_interval; /* first failed connection
- * retry... */
- int *kib_max_reconnect_interval; /* ...exponentially increasing
- * to this */
- int *kib_cksum; /* checksum kib_msg_t? */
- int *kib_timeout; /* comms timeout (seconds) */
- int *kib_keepalive; /* keepalive timeout (seconds) */
- int *kib_ntx; /* # tx descs */
- int *kib_credits; /* # concurrent sends */
- int *kib_peertxcredits; /* # concurrent sends to 1 peer */
- int *kib_peerrtrcredits; /* # per-peer router buffer
- * credits */
- int *kib_peercredits_hiw; /* # when eagerly to return
- * credits */
- int *kib_peertimeout; /* seconds to consider peer dead */
- char **kib_default_ipif; /* default IPoIB interface */
- int *kib_retry_count;
- int *kib_rnr_retry_count;
- int *kib_concurrent_sends; /* send work queue sizing */
- int *kib_ib_mtu; /* IB MTU */
- int *kib_map_on_demand; /* map-on-demand if RD has more
- * fragments than this value, 0
- * disable map-on-demand */
- int *kib_fmr_pool_size; /* # FMRs in pool */
- int *kib_fmr_flush_trigger; /* When to trigger FMR flush */
- int *kib_fmr_cache; /* enable FMR pool cache? */
- int *kib_require_priv_port; /* accept only privileged ports */
- int *kib_use_priv_port; /* use privileged port for active
- * connect */
- int *kib_nscheds; /* # threads on each CPT */
+ int *kib_dev_failover; /* HCA failover */
+ unsigned int *kib_service; /* IB service number */
+ int *kib_min_reconnect_interval; /* first failed connection retry... */
+ int *kib_max_reconnect_interval; /* exponentially increasing to this */
+ int *kib_cksum; /* checksum kib_msg_t? */
+ int *kib_timeout; /* comms timeout (seconds) */
+ int *kib_keepalive; /* keepalive timeout (seconds) */
+ int *kib_ntx; /* # tx descs */
+ int *kib_credits; /* # concurrent sends */
+ int *kib_peertxcredits; /* # concurrent sends to 1 peer */
+ int *kib_peerrtrcredits; /* # per-peer router buffer credits */
+ int *kib_peercredits_hiw; /* # when eagerly to return credits */
+ int *kib_peertimeout; /* seconds to consider peer dead */
+ char **kib_default_ipif; /* default IPoIB interface */
+ int *kib_retry_count;
+ int *kib_rnr_retry_count;
+ int *kib_concurrent_sends; /* send work queue sizing */
+ int *kib_ib_mtu; /* IB MTU */
+ int *kib_map_on_demand; /* map-on-demand if RD has more */
+ /* fragments than this value, 0 */
+ /* disable map-on-demand */
+ int *kib_fmr_pool_size; /* # FMRs in pool */
+ int *kib_fmr_flush_trigger; /* When to trigger FMR flush */
+ int *kib_fmr_cache; /* enable FMR pool cache? */
+ int *kib_require_priv_port; /* accept only privileged ports */
+ int *kib_use_priv_port; /* use privileged port for active connect */
+ int *kib_nscheds; /* # threads on each CPT */
} kib_tunables_t;
extern kib_tunables_t kiblnd_tunables;
IBLND_CREDIT_HIGHWATER_V1 : \
*kiblnd_tunables.kib_peercredits_hiw) /* when eagerly to return credits */
- #define kiblnd_rdma_create_id(cb, dev, ps, qpt) rdma_create_id(cb, dev, ps, qpt)
+ #define kiblnd_rdma_create_id(cb, dev, ps, qpt) rdma_create_id(&init_net, \
+ cb, dev, \
+ ps, qpt)
static inline int
kiblnd_concurrent_sends_v1(void)
unsigned long ibd_next_failover;
int ibd_failed_failover; /* # failover failures */
unsigned int ibd_failover; /* failover in progress */
- unsigned int ibd_can_failover; /* IPoIB interface is a bonding
- * master */
+ unsigned int ibd_can_failover; /* IPoIB interface is a bonding master */
struct list_head ibd_nets;
struct kib_hca_dev *ibd_hdev;
} kib_dev_t;
char ps_name[IBLND_POOL_NAME_LEN]; /* pool set name */
struct list_head ps_pool_list; /* list of pools */
struct list_head ps_failed_pool_list;/* failed pool list */
- unsigned long ps_next_retry; /* time stamp for retry if
- * failed to allocate */
+ unsigned long ps_next_retry; /* time stamp for retry if */
+ /* failed to allocate */
int ps_increasing; /* is allocating new pool */
int ps_pool_size; /* new pool size */
int ps_cpt; /* CPT id */
kib_ps_pool_create_t ps_pool_create; /* create a new pool */
kib_ps_pool_destroy_t ps_pool_destroy; /* destroy a pool */
- kib_ps_node_init_t ps_node_init; /* initialize new allocated
- * node */
+ kib_ps_node_init_t ps_node_init; /* initialize new allocated node */
kib_ps_node_fini_t ps_node_fini; /* finalize node */
} kib_poolset_t;
typedef struct kib_pool {
- struct list_head po_list; /* chain on pool list */
- struct list_head po_free_list; /* pre-allocated node */
- kib_poolset_t *po_owner; /* pool_set of this pool */
- unsigned long po_deadline; /* deadline of this pool */
- int po_allocated; /* # of elements in use */
- int po_failed; /* pool is created on failed
- * HCA */
- int po_size; /* # of pre-allocated elements */
+ struct list_head po_list; /* chain on pool list */
+ struct list_head po_free_list; /* pre-allocated node */
+ kib_poolset_t *po_owner; /* pool_set of this pool */
+ unsigned long po_deadline; /* deadline of this pool */
+ int po_allocated; /* # of elements in use */
+ int po_failed; /* pool is created on failed HCA */
+ int po_size; /* # of pre-allocated elements */
} kib_pool_t;
typedef struct {
int fps_pool_size;
int fps_flush_trigger;
int fps_increasing; /* is allocating new pool */
- unsigned long fps_next_retry; /* time stamp for retry if
- * failed to allocate */
+ unsigned long fps_next_retry; /* time stamp for retry if*/
+ /* failed to allocate */
} kib_fmr_poolset_t;
typedef struct {
};
typedef struct {
- int kib_init; /* initialisation state */
- int kib_shutdown; /* shut down? */
- struct list_head kib_devs; /* IB devices extant */
- struct list_head kib_failed_devs; /* list head of failed
- * devices */
- wait_queue_head_t kib_failover_waitq; /* schedulers sleep here */
- atomic_t kib_nthreads; /* # live threads */
- rwlock_t kib_global_lock; /* stabilize net/dev/peer/conn
- * ops */
- struct list_head *kib_peers; /* hash table of all my known
- * peers */
- int kib_peer_hash_size; /* size of kib_peers */
- void *kib_connd; /* the connd task
- * (serialisation assertions)
- */
- struct list_head kib_connd_conns; /* connections to
- * setup/teardown */
- struct list_head kib_connd_zombies; /* connections with zero
- * refcount */
- wait_queue_head_t kib_connd_waitq; /* connection daemon sleeps
- * here */
- spinlock_t kib_connd_lock; /* serialise */
- struct ib_qp_attr kib_error_qpa; /* QP->ERROR */
- struct kib_sched_info **kib_scheds; /* percpt data for schedulers
- */
+ int kib_init; /* initialisation state */
+ int kib_shutdown; /* shut down? */
+ struct list_head kib_devs; /* IB devices extant */
+ struct list_head kib_failed_devs; /* list head of failed devices */
+ wait_queue_head_t kib_failover_waitq; /* schedulers sleep here */
+ atomic_t kib_nthreads; /* # live threads */
+ rwlock_t kib_global_lock; /* stabilize net/dev/peer/conn ops */
+ struct list_head *kib_peers; /* hash table of all my known peers */
+ int kib_peer_hash_size; /* size of kib_peers */
+ void *kib_connd; /* the connd task (serialisation assertions) */
+ struct list_head kib_connd_conns; /* connections to setup/teardown */
+ struct list_head kib_connd_zombies; /* connections with zero refcount */
+ wait_queue_head_t kib_connd_waitq; /* connection daemon sleeps here */
+ spinlock_t kib_connd_lock; /* serialise */
+ struct ib_qp_attr kib_error_qpa; /* QP->ERROR */
+ struct kib_sched_info **kib_scheds; /* percpt data for schedulers */
} kib_data_t;
#define IBLND_INIT_NOTHING 0
#define IBLND_REJECT_FATAL 3 /* Anything else */
#define IBLND_REJECT_CONN_UNCOMPAT 4 /* incompatible version peer */
#define IBLND_REJECT_CONN_STALE 5 /* stale peer */
-#define IBLND_REJECT_RDMA_FRAGS 6 /* Fatal: peer's rdma frags can't match
- * mine */
-#define IBLND_REJECT_MSG_QUEUE_SIZE 7 /* Fatal: peer's msg queue size can't
- * match mine */
+#define IBLND_REJECT_RDMA_FRAGS 6 /* Fatal: peer's rdma frags can't match */
+ /* mine */
+#define IBLND_REJECT_MSG_QUEUE_SIZE 7 /* Fatal: peer's msg queue size can't */
+ /* match mine */
/***********************************************************************/
{
struct list_head rx_list; /* queue for attention */
struct kib_conn *rx_conn; /* owning conn */
- int rx_nob; /* # bytes received (-1 while
- * posted) */
+ int rx_nob; /* # bytes received (-1 while posted) */
enum ib_wc_status rx_status; /* completion status */
kib_msg_t *rx_msg; /* message buffer (host vaddr) */
__u64 rx_msgaddr; /* message buffer (I/O addr) */
struct ib_sge rx_sge; /* ...and its memory */
} kib_rx_t;
-#define IBLND_POSTRX_DONT_POST 0 /* don't post */
-#define IBLND_POSTRX_NO_CREDIT 1 /* post: no credits */
-#define IBLND_POSTRX_PEER_CREDIT 2 /* post: give peer back 1 credit */
-#define IBLND_POSTRX_RSRVD_CREDIT 3 /* post: give myself back 1 reserved
- * credit */
+#define IBLND_POSTRX_DONT_POST 0 /* don't post */
+#define IBLND_POSTRX_NO_CREDIT 1 /* post: no credits */
+#define IBLND_POSTRX_PEER_CREDIT 2 /* post: give peer back 1 credit */
+#define IBLND_POSTRX_RSRVD_CREDIT 3 /* post: give self back 1 reserved credit */
typedef struct kib_tx /* transmit message */
{
- struct list_head tx_list; /* queue on idle_txs ibc_tx_queue
- * etc. */
- kib_tx_pool_t *tx_pool; /* pool I'm from */
- struct kib_conn *tx_conn; /* owning conn */
- short tx_sending; /* # tx callbacks outstanding */
- short tx_queued; /* queued for sending */
- short tx_waiting; /* waiting for peer */
- int tx_status; /* LNET completion status */
- unsigned long tx_deadline; /* completion deadline */
- __u64 tx_cookie; /* completion cookie */
- lnet_msg_t *tx_lntmsg[2]; /* lnet msgs to finalize on
- * completion */
- kib_msg_t *tx_msg; /* message buffer (host vaddr) */
- __u64 tx_msgaddr; /* message buffer (I/O addr) */
+ struct list_head tx_list; /* queue on idle_txs ibc_tx_queue etc. */
+ kib_tx_pool_t *tx_pool; /* pool I'm from */
+ struct kib_conn *tx_conn; /* owning conn */
+ short tx_sending; /* # tx callbacks outstanding */
+ short tx_queued; /* queued for sending */
+ short tx_waiting; /* waiting for peer */
+ int tx_status; /* LNET completion status */
+ unsigned long tx_deadline; /* completion deadline */
+ __u64 tx_cookie; /* completion cookie */
+ lnet_msg_t *tx_lntmsg[2]; /* lnet msgs to finalize on completion */
+ kib_msg_t *tx_msg; /* message buffer (host vaddr) */
+ __u64 tx_msgaddr; /* message buffer (I/O addr) */
DECLARE_PCI_UNMAP_ADDR(tx_msgunmap); /* for dma_unmap_single() */
- int tx_nwrq; /* # send work items */
- struct ib_rdma_wr *tx_wrq; /* send work items... */
- struct ib_sge *tx_sge; /* ...and their memory */
- kib_rdma_desc_t *tx_rd; /* rdma descriptor */
- int tx_nfrags; /* # entries in... */
- struct scatterlist *tx_frags; /* dma_map_sg descriptor */
- __u64 *tx_pages; /* rdma phys page addrs */
- kib_fmr_t fmr; /* FMR */
- int tx_dmadir; /* dma direction */
+ int tx_nwrq; /* # send work items */
- struct ib_send_wr *tx_wrq; /* send work items... */
++ struct ib_rdma_wr *tx_wrq; /* send work items... */
+ struct ib_sge *tx_sge; /* ...and their memory */
+ kib_rdma_desc_t *tx_rd; /* rdma descriptor */
+ int tx_nfrags; /* # entries in... */
+ struct scatterlist *tx_frags; /* dma_map_sg descriptor */
+ __u64 *tx_pages; /* rdma phys page addrs */
+ kib_fmr_t fmr; /* FMR */
+ int tx_dmadir; /* dma direction */
} kib_tx_t;
typedef struct kib_connvars {
} kib_connvars_t;
typedef struct kib_conn {
- struct kib_sched_info *ibc_sched; /* scheduler information */
- struct kib_peer *ibc_peer; /* owning peer */
- kib_hca_dev_t *ibc_hdev; /* HCA bound on */
- struct list_head ibc_list; /* stash on peer's conn
- * list */
- struct list_head ibc_sched_list; /* schedule for attention */
- __u16 ibc_version; /* version of connection */
- __u64 ibc_incarnation; /* which instance of the
- * peer */
- atomic_t ibc_refcount; /* # users */
- int ibc_state; /* what's happening */
- int ibc_nsends_posted; /* # uncompleted sends */
- int ibc_noops_posted; /* # uncompleted NOOPs */
- int ibc_credits; /* # credits I have */
+ struct kib_sched_info *ibc_sched; /* scheduler information */
+ struct kib_peer *ibc_peer; /* owning peer */
+ kib_hca_dev_t *ibc_hdev; /* HCA bound on */
+ struct list_head ibc_list; /* stash on peer's conn list */
+ struct list_head ibc_sched_list; /* schedule for attention */
+ __u16 ibc_version; /* version of connection */
+ __u64 ibc_incarnation; /* which instance of the peer */
+ atomic_t ibc_refcount; /* # users */
+ int ibc_state; /* what's happening */
+ int ibc_nsends_posted; /* # uncompleted sends */
+ int ibc_noops_posted; /* # uncompleted NOOPs */
+ int ibc_credits; /* # credits I have */
int ibc_outstanding_credits; /* # credits to return */
int ibc_reserved_credits; /* # ACK/DONE msg credits */
- int ibc_comms_error; /* set on comms error */
- unsigned int ibc_nrx:16; /* receive buffers owned */
- unsigned int ibc_scheduled:1; /* scheduled for attention
- */
- unsigned int ibc_ready:1; /* CQ callback fired */
- unsigned long ibc_last_send; /* time of last send */
- struct list_head ibc_connd_list; /* link chain for
- * kiblnd_check_conns only
- */
- struct list_head ibc_early_rxs; /* rxs completed before
- * ESTABLISHED */
- struct list_head ibc_tx_noops; /* IBLND_MSG_NOOPs for
- * IBLND_MSG_VERSION_1 */
- struct list_head ibc_tx_queue; /* sends that need a credit
- */
- struct list_head ibc_tx_queue_nocred; /* sends that don't need a
- * credit */
- struct list_head ibc_tx_queue_rsrvd; /* sends that need to
- * reserve an ACK/DONE msg
- */
- struct list_head ibc_active_txs; /* active tx awaiting
- * completion */
- spinlock_t ibc_lock; /* serialise */
- kib_rx_t *ibc_rxs; /* the rx descs */
- kib_pages_t *ibc_rx_pages; /* premapped rx msg pages */
-
- struct rdma_cm_id *ibc_cmid; /* CM id */
- struct ib_cq *ibc_cq; /* completion queue */
-
- kib_connvars_t *ibc_connvars; /* in-progress connection
- * state */
+ int ibc_comms_error; /* set on comms error */
+ unsigned int ibc_nrx:16; /* receive buffers owned */
+ unsigned int ibc_scheduled:1; /* scheduled for attention */
+ unsigned int ibc_ready:1; /* CQ callback fired */
+ unsigned long ibc_last_send; /* time of last send */
+ struct list_head ibc_connd_list; /* link chain for */
+ /* kiblnd_check_conns only */
+ struct list_head ibc_early_rxs; /* rxs completed before ESTABLISHED */
+ struct list_head ibc_tx_noops; /* IBLND_MSG_NOOPs for */
+ /* IBLND_MSG_VERSION_1 */
+ struct list_head ibc_tx_queue; /* sends that need a credit */
+ struct list_head ibc_tx_queue_nocred; /* sends that don't need a */
+ /* credit */
+ struct list_head ibc_tx_queue_rsrvd; /* sends that need to */
+ /* reserve an ACK/DONE msg */
+ struct list_head ibc_active_txs; /* active tx awaiting completion */
+ spinlock_t ibc_lock; /* serialise */
+ kib_rx_t *ibc_rxs; /* the rx descs */
+ kib_pages_t *ibc_rx_pages; /* premapped rx msg pages */
+
+ struct rdma_cm_id *ibc_cmid; /* CM id */
+ struct ib_cq *ibc_cq; /* completion queue */
+
+ kib_connvars_t *ibc_connvars; /* in-progress connection state */
} kib_conn_t;
#define IBLND_CONN_INIT 0 /* being initialised */
return NULL;
}
-/* CAVEAT EMPTOR: We rely on descriptor alignment to allow us to use the
- * lowest bits of the work request id to stash the work item type. */
+/* CAVEAT EMPTOR: We rely on descriptor alignment to allow us to use the */
+/* lowest bits of the work request id to stash the work item type. */
#define IBLND_WID_TX 0
#define IBLND_WID_RDMA 1
offsetof(kib_putack_msg_t, ibpam_rd.rd_frags[n]);
}
-
static inline __u64
kiblnd_dma_mapping_error(struct ib_device *dev, u64 dma_addr)
{
return ib_sg_dma_len(dev, sg);
}
-/* XXX We use KIBLND_CONN_PARAM(e) as writable buffer, it's not strictly
- * right because OFED1.2 defines it as const, to use it we have to add
- * (void *) cast to overcome "const" */
+/* XXX We use KIBLND_CONN_PARAM(e) as writable buffer, it's not strictly */
+/* right because OFED1.2 defines it as const, to use it we have to add */
+/* (void *) cast to overcome "const" */
#define KIBLND_CONN_PARAM(e) ((e)->param.conn.private_data)
#define KIBLND_CONN_PARAM_LEN(e) ((e)->param.conn.private_data_len)
-
struct ib_mr *kiblnd_find_rd_dma_mr(kib_hca_dev_t *hdev,
kib_rdma_desc_t *rd);
struct ib_mr *kiblnd_find_dma_mr(kib_hca_dev_t *hdev,
__u64 addr, __u64 size);
void kiblnd_map_rx_descs(kib_conn_t *conn);
void kiblnd_unmap_rx_descs(kib_conn_t *conn);
-int kiblnd_map_tx(lnet_ni_t *ni, kib_tx_t *tx,
- kib_rdma_desc_t *rd, int nfrags);
-void kiblnd_unmap_tx(lnet_ni_t *ni, kib_tx_t *tx);
void kiblnd_pool_free_node(kib_pool_t *pool, struct list_head *node);
struct list_head *kiblnd_pool_alloc_node(kib_poolset_t *ps);
#include "o2iblnd.h"
+static void kiblnd_unmap_tx(lnet_ni_t *ni, kib_tx_t *tx);
+
static void
kiblnd_tx_done(lnet_ni_t *ni, kib_tx_t *tx)
{
rx->rx_nob = -1; /* flag posted */
+ /* NB: need an extra reference after ib_post_recv because we don't
+ * own this rx (and rx::rx_conn) anymore, LU-5678.
+ */
+ kiblnd_conn_addref(conn);
rc = ib_post_recv(conn->ibc_cmid->qp, &rx->rx_wrq, &bad_wrq);
- if (rc != 0) {
+ if (unlikely(rc != 0)) {
CERROR("Can't post rx for %s: %d, bad_wrq: %p\n",
libcfs_nid2str(conn->ibc_peer->ibp_nid), rc, bad_wrq);
rx->rx_nob = 0;
}
if (conn->ibc_state < IBLND_CONN_ESTABLISHED) /* Initial post */
- return rc;
+ goto out;
- if (rc != 0) {
+ if (unlikely(rc != 0)) {
kiblnd_close_conn(conn, rc);
kiblnd_drop_rx(rx); /* No more posts for this rx */
- return rc;
+ goto out;
}
if (credit == IBLND_POSTRX_NO_CREDIT)
- return 0;
+ goto out;
spin_lock(&conn->ibc_lock);
if (credit == IBLND_POSTRX_PEER_CREDIT)
spin_unlock(&conn->ibc_lock);
kiblnd_check_sends(conn);
- return 0;
+out:
+ kiblnd_conn_decref(conn);
+ return rc;
}
static kib_tx_t *
}
if (tx->tx_status == 0) { /* success so far */
- if (status < 0) { /* failed? */
+ if (status < 0) /* failed? */
tx->tx_status = status;
- } else if (txtype == IBLND_MSG_GET_REQ) {
+ else if (txtype == IBLND_MSG_GET_REQ)
lnet_set_reply_msg_len(ni, tx->tx_lntmsg[1], status);
- }
}
tx->tx_waiting = 0;
return 0;
}
-void
-kiblnd_unmap_tx(lnet_ni_t *ni, kib_tx_t *tx)
+static void kiblnd_unmap_tx(lnet_ni_t *ni, kib_tx_t *tx)
{
kib_net_t *net = ni->ni_data;
}
}
-int
-kiblnd_map_tx(lnet_ni_t *ni, kib_tx_t *tx,
- kib_rdma_desc_t *rd, int nfrags)
+static int kiblnd_map_tx(lnet_ni_t *ni, kib_tx_t *tx, kib_rdma_desc_t *rd,
+ int nfrags)
{
kib_hca_dev_t *hdev = tx->tx_pool->tpo_hdev;
kib_net_t *net = ni->ni_data;
return -EINVAL;
}
-
static int
kiblnd_setup_rd_iov(lnet_ni_t *ni, kib_tx_t *tx, kib_rdma_desc_t *rd,
unsigned int niov, struct kvec *iov, int offset, int nob)
/* close_conn will launch failover */
rc = -ENETDOWN;
} else {
- rc = ib_post_send(conn->ibc_cmid->qp, tx->tx_wrq, &bad_wrq);
+ rc = ib_post_send(conn->ibc_cmid->qp, &tx->tx_wrq->wr, &bad_wrq);
}
conn->ibc_last_send = jiffies;
{
kib_hca_dev_t *hdev = tx->tx_pool->tpo_hdev;
struct ib_sge *sge = &tx->tx_sge[tx->tx_nwrq];
- struct ib_send_wr *wrq = &tx->tx_wrq[tx->tx_nwrq];
+ struct ib_rdma_wr *wrq = &tx->tx_wrq[tx->tx_nwrq];
int nob = offsetof(kib_msg_t, ibm_u) + body_nob;
struct ib_mr *mr;
memset(wrq, 0, sizeof(*wrq));
- wrq->next = NULL;
- wrq->wr_id = kiblnd_ptr2wreqid(tx, IBLND_WID_TX);
- wrq->sg_list = sge;
- wrq->num_sge = 1;
- wrq->opcode = IB_WR_SEND;
- wrq->send_flags = IB_SEND_SIGNALED;
+ wrq->wr.next = NULL;
+ wrq->wr.wr_id = kiblnd_ptr2wreqid(tx, IBLND_WID_TX);
+ wrq->wr.sg_list = sge;
+ wrq->wr.num_sge = 1;
+ wrq->wr.opcode = IB_WR_SEND;
+ wrq->wr.send_flags = IB_SEND_SIGNALED;
tx->tx_nwrq++;
}
kib_msg_t *ibmsg = tx->tx_msg;
kib_rdma_desc_t *srcrd = tx->tx_rd;
struct ib_sge *sge = &tx->tx_sge[0];
- struct ib_send_wr *wrq = &tx->tx_wrq[0];
+ struct ib_rdma_wr *wrq = &tx->tx_wrq[0], *next;
int rc = resid;
int srcidx;
int dstidx;
sge->length = wrknob;
wrq = &tx->tx_wrq[tx->tx_nwrq];
+ next = wrq + 1;
- wrq->next = wrq + 1;
- wrq->wr_id = kiblnd_ptr2wreqid(tx, IBLND_WID_RDMA);
- wrq->sg_list = sge;
- wrq->num_sge = 1;
- wrq->opcode = IB_WR_RDMA_WRITE;
- wrq->send_flags = 0;
+ wrq->wr.next = &next->wr;
+ wrq->wr.wr_id = kiblnd_ptr2wreqid(tx, IBLND_WID_RDMA);
+ wrq->wr.sg_list = sge;
+ wrq->wr.num_sge = 1;
+ wrq->wr.opcode = IB_WR_RDMA_WRITE;
+ wrq->wr.send_flags = 0;
- wrq->wr.rdma.remote_addr = kiblnd_rd_frag_addr(dstrd, dstidx);
- wrq->wr.rdma.rkey = kiblnd_rd_frag_key(dstrd, dstidx);
+ wrq->remote_addr = kiblnd_rd_frag_addr(dstrd, dstidx);
+ wrq->rkey = kiblnd_rd_frag_key(dstrd, dstidx);
srcidx = kiblnd_rd_consume_frag(srcrd, srcidx, wrknob);
dstidx = kiblnd_rd_consume_frag(dstrd, dstidx, wrknob);
unsigned int payload_offset = lntmsg->msg_offset;
unsigned int payload_nob = lntmsg->msg_len;
kib_msg_t *ibmsg;
+ kib_rdma_desc_t *rd;
kib_tx_t *tx;
int nob;
int rc;
}
ibmsg = tx->tx_msg;
-
+ rd = &ibmsg->ibm_u.get.ibgm_rd;
if ((lntmsg->msg_md->md_options & LNET_MD_KIOV) == 0)
- rc = kiblnd_setup_rd_iov(ni, tx,
- &ibmsg->ibm_u.get.ibgm_rd,
+ rc = kiblnd_setup_rd_iov(ni, tx, rd,
lntmsg->msg_md->md_niov,
lntmsg->msg_md->md_iov.iov,
0, lntmsg->msg_md->md_length);
else
- rc = kiblnd_setup_rd_kiov(ni, tx,
- &ibmsg->ibm_u.get.ibgm_rd,
+ rc = kiblnd_setup_rd_kiov(ni, tx, rd,
lntmsg->msg_md->md_niov,
lntmsg->msg_md->md_iov.kiov,
0, lntmsg->msg_md->md_length);
return -EIO;
}
- nob = offsetof(kib_get_msg_t, ibgm_rd.rd_frags[tx->tx_nfrags]);
+ nob = offsetof(kib_get_msg_t, ibgm_rd.rd_frags[rd->rd_nfrags]);
ibmsg->ibm_u.get.ibgm_cookie = tx->tx_cookie;
ibmsg->ibm_u.get.ibgm_hdr = *hdr;
kib_msg_t *rxmsg = rx->rx_msg;
kib_conn_t *conn = rx->rx_conn;
kib_tx_t *tx;
- kib_msg_t *txmsg;
int nob;
int post_credit = IBLND_POSTRX_PEER_CREDIT;
int rc = 0;
lnet_finalize(ni, lntmsg, 0);
break;
- case IBLND_MSG_PUT_REQ:
+ case IBLND_MSG_PUT_REQ: {
+ kib_msg_t *txmsg;
+ kib_rdma_desc_t *rd;
+
if (mlen == 0) {
lnet_finalize(ni, lntmsg, 0);
kiblnd_send_completion(rx->rx_conn, IBLND_MSG_PUT_NAK, 0,
}
txmsg = tx->tx_msg;
+ rd = &txmsg->ibm_u.putack.ibpam_rd;
if (kiov == NULL)
- rc = kiblnd_setup_rd_iov(ni, tx,
- &txmsg->ibm_u.putack.ibpam_rd,
+ rc = kiblnd_setup_rd_iov(ni, tx, rd,
niov, iov, offset, mlen);
else
- rc = kiblnd_setup_rd_kiov(ni, tx,
- &txmsg->ibm_u.putack.ibpam_rd,
+ rc = kiblnd_setup_rd_kiov(ni, tx, rd,
niov, kiov, offset, mlen);
if (rc != 0) {
CERROR("Can't setup PUT sink for %s: %d\n",
break;
}
- nob = offsetof(kib_putack_msg_t, ibpam_rd.rd_frags[tx->tx_nfrags]);
+ nob = offsetof(kib_putack_msg_t, ibpam_rd.rd_frags[rd->rd_nfrags]);
txmsg->ibm_u.putack.ibpam_src_cookie = rxmsg->ibm_u.putreq.ibprm_cookie;
txmsg->ibm_u.putack.ibpam_dst_cookie = tx->tx_cookie;
/* reposted buffer reserved for PUT_DONE */
post_credit = IBLND_POSTRX_NO_CREDIT;
break;
+ }
case IBLND_MSG_GET_REQ:
if (lntmsg != NULL) {
unsigned long flags;
int rc;
struct sockaddr_in *peer_addr;
+
LASSERT(!in_interrupt());
/* cmid inherits 'context' from the corresponding listener id */
if (*kiblnd_tunables.kib_require_priv_port &&
ntohs(peer_addr->sin_port) >= PROT_SOCK) {
__u32 ip = ntohl(peer_addr->sin_addr.s_addr);
+
CERROR("Peer's port (%pI4h:%hu) is not privileged\n",
&ip, ntohs(peer_addr->sin_port));
goto failed;
* consuming my CQ I could be called after all completions have
* occurred. But in this case, ibc_nrx == 0 && ibc_nsends_posted == 0
* and this CQ is about to be destroyed so I NOOP. */
- kib_conn_t *conn = (kib_conn_t *)arg;
+ kib_conn_t *conn = arg;
struct kib_sched_info *sched = conn->ibc_sched;
unsigned long flags;
spin_unlock_irqrestore(&qp->lock, flags);
vq_repbuf_free(c2dev, reply);
- bail0:
+bail0:
vq_req_free(c2dev, vq_req);
pr_debug("%s:%d qp=%p, cur_state=%s\n",
err = c2_errno(reply);
vq_repbuf_free(c2dev, reply);
- bail0:
+bail0:
vq_req_free(c2dev, vq_req);
return err;
}
spin_unlock_irqrestore(&qp->lock, flags);
vq_repbuf_free(c2dev, reply);
- bail0:
+bail0:
vq_req_free(c2dev, vq_req);
return err;
}
return 0;
- bail6:
+bail6:
iounmap(qp->sq_mq.peer);
- bail5:
+bail5:
destroy_qp(c2dev, qp);
- bail4:
+bail4:
vq_repbuf_free(c2dev, reply);
- bail3:
+bail3:
vq_req_free(c2dev, vq_req);
- bail2:
+bail2:
c2_free_mqsp(qp->rq_mq.shared);
- bail1:
+bail1:
c2_free_mqsp(qp->sq_mq.shared);
- bail0:
+bail0:
c2_free_qpn(c2dev, qp->qpn);
return err;
}
flags |= SQ_READ_FENCE;
}
wr.sqwr.rdma_write.remote_stag =
- cpu_to_be32(ib_wr->wr.rdma.rkey);
+ cpu_to_be32(rdma_wr(ib_wr)->rkey);
wr.sqwr.rdma_write.remote_to =
- cpu_to_be64(ib_wr->wr.rdma.remote_addr);
+ cpu_to_be64(rdma_wr(ib_wr)->remote_addr);
err = move_sgl((struct c2_data_addr *)
& (wr.sqwr.rdma_write.data),
ib_wr->sg_list,
wr.sqwr.rdma_read.local_to =
cpu_to_be64(ib_wr->sg_list->addr);
wr.sqwr.rdma_read.remote_stag =
- cpu_to_be32(ib_wr->wr.rdma.rkey);
+ cpu_to_be32(rdma_wr(ib_wr)->rkey);
wr.sqwr.rdma_read.remote_to =
- cpu_to_be64(ib_wr->wr.rdma.remote_addr);
+ cpu_to_be64(rdma_wr(ib_wr)->remote_addr);
wr.sqwr.rdma_read.length =
cpu_to_be32(ib_wr->sg_list->length);
break;
m = 0;
n = 0;
for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry) {
- void *vaddr;
-
- vaddr = page_address(sg_page(sg));
- if (!vaddr) {
- ret = ERR_PTR(-EINVAL);
- goto bail;
- }
- mr->mr.map[m]->segs[n].vaddr = vaddr;
- mr->mr.map[m]->segs[n].length = umem->page_size;
- n++;
- if (n == HFI1_SEGSZ) {
- m++;
- n = 0;
- }
+ void *vaddr;
+
+ vaddr = page_address(sg_page(sg));
+ if (!vaddr) {
+ ret = ERR_PTR(-EINVAL);
+ goto bail;
+ }
+ mr->mr.map[m]->segs[n].vaddr = vaddr;
+ mr->mr.map[m]->segs[n].length = umem->page_size;
+ n++;
+ if (n == HFI1_SEGSZ) {
+ m++;
+ n = 0;
+ }
}
ret = &mr->ibmr;
/*
* Allocate a memory region usable with the
- * IB_WR_FAST_REG_MR send work request.
+ * IB_WR_REG_MR send work request.
*
* Return the memory region on success, otherwise return an errno.
+ * FIXME: IB_WR_REG_MR is not supported
*/
struct ib_mr *hfi1_alloc_mr(struct ib_pd *pd,
enum ib_mr_type mr_type,
return &mr->ibmr;
}
- struct ib_fast_reg_page_list *
- hfi1_alloc_fast_reg_page_list(struct ib_device *ibdev, int page_list_len)
- {
- unsigned size = page_list_len * sizeof(u64);
- struct ib_fast_reg_page_list *pl;
-
- if (size > PAGE_SIZE)
- return ERR_PTR(-EINVAL);
-
- pl = kzalloc(sizeof(*pl), GFP_KERNEL);
- if (!pl)
- return ERR_PTR(-ENOMEM);
-
- pl->page_list = kzalloc(size, GFP_KERNEL);
- if (!pl->page_list)
- goto err_free;
-
- return pl;
-
- err_free:
- kfree(pl);
- return ERR_PTR(-ENOMEM);
- }
-
- void hfi1_free_fast_reg_page_list(struct ib_fast_reg_page_list *pl)
- {
- kfree(pl->page_list);
- kfree(pl);
- }
-
/**
* hfi1_alloc_fmr - allocate a fast memory region
* @pd: the protection domain for this memory region
goto bail;
}
ohdr->u.rc.reth.vaddr =
- cpu_to_be64(wqe->wr.wr.rdma.remote_addr);
+ cpu_to_be64(wqe->rdma_wr.remote_addr);
ohdr->u.rc.reth.rkey =
- cpu_to_be32(wqe->wr.wr.rdma.rkey);
+ cpu_to_be32(wqe->rdma_wr.rkey);
ohdr->u.rc.reth.length = cpu_to_be32(len);
hwords += sizeof(struct ib_reth) / sizeof(u32);
wqe->lpsn = wqe->psn;
wqe->lpsn = qp->s_next_psn++;
}
ohdr->u.rc.reth.vaddr =
- cpu_to_be64(wqe->wr.wr.rdma.remote_addr);
+ cpu_to_be64(wqe->rdma_wr.remote_addr);
ohdr->u.rc.reth.rkey =
- cpu_to_be32(wqe->wr.wr.rdma.rkey);
+ cpu_to_be32(wqe->rdma_wr.rkey);
ohdr->u.rc.reth.length = cpu_to_be32(len);
qp->s_state = OP(RDMA_READ_REQUEST);
hwords += sizeof(ohdr->u.rc.reth) / sizeof(u32);
if (wqe->wr.opcode == IB_WR_ATOMIC_CMP_AND_SWP) {
qp->s_state = OP(COMPARE_SWAP);
ohdr->u.atomic_eth.swap_data = cpu_to_be64(
- wqe->wr.wr.atomic.swap);
+ wqe->atomic_wr.swap);
ohdr->u.atomic_eth.compare_data = cpu_to_be64(
- wqe->wr.wr.atomic.compare_add);
+ wqe->atomic_wr.compare_add);
} else {
qp->s_state = OP(FETCH_ADD);
ohdr->u.atomic_eth.swap_data = cpu_to_be64(
- wqe->wr.wr.atomic.compare_add);
+ wqe->atomic_wr.compare_add);
ohdr->u.atomic_eth.compare_data = 0;
}
ohdr->u.atomic_eth.vaddr[0] = cpu_to_be32(
- wqe->wr.wr.atomic.remote_addr >> 32);
+ wqe->atomic_wr.remote_addr >> 32);
ohdr->u.atomic_eth.vaddr[1] = cpu_to_be32(
- wqe->wr.wr.atomic.remote_addr);
+ wqe->atomic_wr.remote_addr);
ohdr->u.atomic_eth.rkey = cpu_to_be32(
- wqe->wr.wr.atomic.rkey);
+ wqe->atomic_wr.rkey);
hwords += sizeof(struct ib_atomic_eth) / sizeof(u32);
ss = NULL;
len = 0;
*/
len = (delta_psn(qp->s_psn, wqe->psn)) * pmtu;
ohdr->u.rc.reth.vaddr =
- cpu_to_be64(wqe->wr.wr.rdma.remote_addr + len);
+ cpu_to_be64(wqe->rdma_wr.remote_addr + len);
ohdr->u.rc.reth.rkey =
- cpu_to_be32(wqe->wr.wr.rdma.rkey);
+ cpu_to_be32(wqe->rdma_wr.rkey);
ohdr->u.rc.reth.length = cpu_to_be32(wqe->length - len);
qp->s_state = OP(RDMA_READ_REQUEST);
hwords += sizeof(ohdr->u.rc.reth) / sizeof(u32);
struct pio_buf *pbuf;
struct hfi1_ib_header hdr;
struct hfi1_other_headers *ohdr;
+ unsigned long flags;
/* Don't send ACK or NAK if a RDMA read or atomic is pending. */
if (qp->s_flags & HFI1_S_RESP_PENDING)
queue_ack:
this_cpu_inc(*ibp->rc_qacks);
- spin_lock(&qp->s_lock);
+ spin_lock_irqsave(&qp->s_lock, flags);
qp->s_flags |= HFI1_S_ACK_PENDING | HFI1_S_RESP_PENDING;
qp->s_nak_state = qp->r_nak_state;
qp->s_ack_psn = qp->r_ack_psn;
/* Schedule the send tasklet. */
hfi1_schedule_send(qp);
- spin_unlock(&qp->s_lock);
+ spin_unlock_irqrestore(&qp->s_lock, flags);
}
/**
ibp->n_rc_timeouts++;
qp->s_flags &= ~HFI1_S_TIMER;
del_timer(&qp->s_timer);
+ trace_hfi1_rc_timeout(qp, qp->s_last_psn + 1);
restart_rc(qp, qp->s_last_psn + 1, 1);
hfi1_schedule_send(qp);
}
*
* This is called from rc_rcv_resp() to process an incoming RC ACK
* for the given QP.
- * Called at interrupt level with the QP s_lock held.
+ * May be called at interrupt level, with the QP s_lock held.
* Returns 1 if OK, 0 if current operation should be aborted (NAK).
*/
static int do_rc_ack(struct hfi1_qp *qp, u32 aeth, u32 psn, int opcode,
spin_lock_irqsave(&qp->s_lock, flags);
+ trace_hfi1_rc_ack(qp, psn);
+
/* Ignore invalid responses. */
if (cmp_psn(psn, qp->s_next_psn) >= 0)
goto ack_done;
u8 i, prev;
int old_req;
+ trace_hfi1_rc_rcv_error(qp, psn);
if (diff > 0) {
/*
* Packet sequence error.
u32 lqpn, u32 rqpn, u8 svc_type)
{
struct opa_hfi1_cong_log_event_internal *cc_event;
+ unsigned long flags;
if (sl >= OPA_MAX_SLS)
return;
- spin_lock(&ppd->cc_log_lock);
+ spin_lock_irqsave(&ppd->cc_log_lock, flags);
ppd->threshold_cong_event_map[sl/8] |= 1 << (sl % 8);
ppd->threshold_event_counter++;
/* keep timestamp in units of 1.024 usec */
cc_event->timestamp = ktime_to_ns(ktime_get()) / 1024;
- spin_unlock(&ppd->cc_log_lock);
+ spin_unlock_irqrestore(&ppd->cc_log_lock, flags);
}
void process_becn(struct hfi1_pportdata *ppd, u8 sl, u16 rlid, u32 lqpn,
u16 ccti, ccti_incr, ccti_timer, ccti_limit;
u8 trigger_threshold;
struct cc_state *cc_state;
+ unsigned long flags;
if (sl >= OPA_MAX_SLS)
return;
trigger_threshold =
cc_state->cong_setting.entries[sl].trigger_threshold;
- spin_lock(&ppd->cca_timer_lock);
+ spin_lock_irqsave(&ppd->cca_timer_lock, flags);
if (cca_timer->ccti < ccti_limit) {
if (cca_timer->ccti + ccti_incr <= ccti_limit)
set_link_ipg(ppd);
}
- spin_unlock(&ppd->cca_timer_lock);
+ spin_unlock_irqrestore(&ppd->cca_timer_lock, flags);
ccti = cca_timer->ccti;
*
* This is called from qp_rcv() to process an incoming RC packet
* for the given QP.
- * Called at interrupt level.
+ * May be called at interrupt level.
*/
void hfi1_rc_rcv(struct hfi1_packet *packet)
{
struct hfi1_other_headers *ohdr;
struct hfi1_ibport *ibp = to_iport(qp->ibqp.device, qp->port_num);
int diff;
- u8 opcode;
+ u32 opcode;
u32 psn;
/* Check for GRH */
if (wqe->length == 0)
break;
if (unlikely(!hfi1_rkey_ok(qp, &qp->r_sge.sge, wqe->length,
- wqe->wr.wr.rdma.remote_addr,
- wqe->wr.wr.rdma.rkey,
+ wqe->rdma_wr.remote_addr,
+ wqe->rdma_wr.rkey,
IB_ACCESS_REMOTE_WRITE)))
goto acc_err;
qp->r_sge.sg_list = NULL;
if (unlikely(!(qp->qp_access_flags & IB_ACCESS_REMOTE_READ)))
goto inv_err;
if (unlikely(!hfi1_rkey_ok(qp, &sqp->s_sge.sge, wqe->length,
- wqe->wr.wr.rdma.remote_addr,
- wqe->wr.wr.rdma.rkey,
+ wqe->rdma_wr.remote_addr,
+ wqe->rdma_wr.rkey,
IB_ACCESS_REMOTE_READ)))
goto acc_err;
release = 0;
if (unlikely(!(qp->qp_access_flags & IB_ACCESS_REMOTE_ATOMIC)))
goto inv_err;
if (unlikely(!hfi1_rkey_ok(qp, &qp->r_sge.sge, sizeof(u64),
- wqe->wr.wr.atomic.remote_addr,
- wqe->wr.wr.atomic.rkey,
+ wqe->atomic_wr.remote_addr,
+ wqe->atomic_wr.rkey,
IB_ACCESS_REMOTE_ATOMIC)))
goto acc_err;
/* Perform atomic OP and save result. */
maddr = (atomic64_t *) qp->r_sge.sge.vaddr;
- sdata = wqe->wr.wr.atomic.compare_add;
+ sdata = wqe->atomic_wr.compare_add;
*(u64 *) sqp->s_sge.sge.vaddr =
(wqe->wr.opcode == IB_WR_ATOMIC_FETCH_AND_ADD) ?
(u64) atomic64_add_return(sdata, maddr) - sdata :
(u64) cmpxchg((u64 *) qp->r_sge.sge.vaddr,
- sdata, wqe->wr.wr.atomic.swap);
+ sdata, wqe->atomic_wr.swap);
hfi1_put_mr(qp->r_sge.sge.mr);
qp->r_sge.num_sge = 0;
goto send_comp;
return sizeof(struct ib_grh) / sizeof(u32);
}
-/*
- * free_ahg - clear ahg from QP
- */
-void clear_ahg(struct hfi1_qp *qp)
-{
- qp->s_hdr->ahgcount = 0;
- qp->s_flags &= ~(HFI1_S_AHG_VALID | HFI1_S_AHG_CLEAR);
- if (qp->s_sde)
- sdma_ahg_free(qp->s_sde, qp->s_ahgidx);
- qp->s_ahgidx = -1;
- qp->s_sde = NULL;
-}
-
#define BTH2_OFFSET (offsetof(struct hfi1_pio_header, hdr.u.oth.bth[2]) / 4)
/**
ohdr->bth[2] = cpu_to_be32(bth2);
}
+/* when sending, force a reschedule every one of these periods */
+#define SEND_RESCHED_TIMEOUT (5 * HZ) /* 5s in jiffies */
+
/**
* hfi1_do_send - perform a send on a QP
* @work: contains a pointer to the QP
struct hfi1_pportdata *ppd = ppd_from_ibp(ibp);
int (*make_req)(struct hfi1_qp *qp);
unsigned long flags;
+ unsigned long timeout;
if ((qp->ibqp.qp_type == IB_QPT_RC ||
qp->ibqp.qp_type == IB_QPT_UC) &&
spin_unlock_irqrestore(&qp->s_lock, flags);
+ timeout = jiffies + SEND_RESCHED_TIMEOUT;
do {
/* Check for a constructed packet to be sent. */
if (qp->s_hdrwords != 0) {
/* Record that s_hdr is empty. */
qp->s_hdrwords = 0;
}
+
+ /* allow other tasks to run */
+ if (unlikely(time_after(jiffies, timeout))) {
+ cond_resched();
+ ppd->dd->verbs_dev.n_send_schedule++;
+ timeout = jiffies + SEND_RESCHED_TIMEOUT;
+ }
} while (make_req(qp));
}
if (qp->ibqp.qp_type == IB_QPT_UD ||
qp->ibqp.qp_type == IB_QPT_SMI ||
qp->ibqp.qp_type == IB_QPT_GSI)
- atomic_dec(&to_iah(wqe->wr.wr.ud.ah)->refcount);
+ atomic_dec(&to_iah(wqe->ud_wr.ah)->refcount);
/* See ch. 11.2.4.1 and 10.7.3.1 */
if (!(qp->s_flags & HFI1_S_SIGNAL_REQ_WR) ||
int status,
int drained);
+/* Length of buffer to create verbs txreq cache name */
+#define TXREQ_NAME_LEN 24
+
/*
* Note that it is OK to post send work requests in the SQE and ERR
* states; hfi1_do_send() will process them and generate error
* undefined operations.
* Make sure buffer is large enough to hold the result for atomics.
*/
- if (wr->opcode == IB_WR_FAST_REG_MR) {
- return -EINVAL;
- } else if (qp->ibqp.qp_type == IB_QPT_UC) {
+ if (qp->ibqp.qp_type == IB_QPT_UC) {
if ((unsigned) wr->opcode >= IB_WR_RDMA_READ)
return -EINVAL;
} else if (qp->ibqp.qp_type != IB_QPT_RC) {
wr->opcode != IB_WR_SEND_WITH_IMM)
return -EINVAL;
/* Check UD destination address PD */
- if (qp->ibqp.pd != wr->wr.ud.ah->pd)
+ if (qp->ibqp.pd != ud_wr(wr)->ah->pd)
return -EINVAL;
} else if ((unsigned) wr->opcode > IB_WR_ATOMIC_FETCH_AND_ADD)
return -EINVAL;
rkt = &to_idev(qp->ibqp.device)->lk_table;
pd = to_ipd(qp->ibqp.pd);
wqe = get_swqe_ptr(qp, qp->s_head);
- wqe->wr = *wr;
+
+
+ if (qp->ibqp.qp_type != IB_QPT_UC &&
+ qp->ibqp.qp_type != IB_QPT_RC)
+ memcpy(&wqe->ud_wr, ud_wr(wr), sizeof(wqe->ud_wr));
+ else if (wr->opcode == IB_WR_RDMA_WRITE_WITH_IMM ||
+ wr->opcode == IB_WR_RDMA_WRITE ||
+ wr->opcode == IB_WR_RDMA_READ)
+ memcpy(&wqe->rdma_wr, rdma_wr(wr), sizeof(wqe->rdma_wr));
+ else if (wr->opcode == IB_WR_ATOMIC_CMP_AND_SWP ||
+ wr->opcode == IB_WR_ATOMIC_FETCH_AND_ADD)
+ memcpy(&wqe->atomic_wr, atomic_wr(wr), sizeof(wqe->atomic_wr));
+ else
+ memcpy(&wqe->wr, wr, sizeof(wqe->wr));
+
wqe->length = 0;
j = 0;
if (wr->num_sge) {
if (wqe->length > 0x80000000U)
goto bail_inval_free;
} else {
- struct hfi1_ah *ah = to_iah(wr->wr.ud.ah);
+ struct hfi1_ah *ah = to_iah(ud_wr(wr)->ah);
atomic_inc(&ah->refcount);
}
u32 tlen = packet->tlen;
struct hfi1_pportdata *ppd = rcd->ppd;
struct hfi1_ibport *ibp = &ppd->ibport_data;
+ unsigned long flags;
u32 qp_num;
int lnh;
u8 opcode;
goto drop;
list_for_each_entry_rcu(p, &mcast->qp_list, list) {
packet->qp = p->qp;
- spin_lock(&packet->qp->r_lock);
+ spin_lock_irqsave(&packet->qp->r_lock, flags);
if (likely((qp_ok(opcode, packet))))
opcode_handler_tbl[opcode](packet);
- spin_unlock(&packet->qp->r_lock);
+ spin_unlock_irqrestore(&packet->qp->r_lock, flags);
}
/*
* Notify hfi1_multicast_detach() if it is waiting for us
rcu_read_unlock();
goto drop;
}
- spin_lock(&packet->qp->r_lock);
+ spin_lock_irqsave(&packet->qp->r_lock, flags);
if (likely((qp_ok(opcode, packet))))
opcode_handler_tbl[opcode](packet);
- spin_unlock(&packet->qp->r_lock);
+ spin_unlock_irqrestore(&packet->qp->r_lock, flags);
rcu_read_unlock();
}
return;
}
return 0;
}
+
/*
* egress_pkey_matches_entry - return 1 if the pkey matches ent (ent
* being an entry from the ingress partition key table), return 0
static void verbs_txreq_kmem_cache_ctor(void *obj)
{
- struct verbs_txreq *tx = (struct verbs_txreq *)obj;
+ struct verbs_txreq *tx = obj;
memset(tx, 0, sizeof(*tx));
}
int ret;
size_t lcpysz = IB_DEVICE_NAME_MAX;
u16 descq_cnt;
+ char buf[TXREQ_NAME_LEN];
ret = hfi1_qp_init(dev);
if (ret)
descq_cnt = sdma_get_descq_cnt();
+ snprintf(buf, sizeof(buf), "hfi1_%u_vtxreq_cache", dd->unit);
/* SLAB_HWCACHE_ALIGN for AHG */
- dev->verbs_txreq_cache = kmem_cache_create("hfi1_vtxreq_cache",
+ dev->verbs_txreq_cache = kmem_cache_create(buf,
sizeof(struct verbs_txreq),
0, SLAB_HWCACHE_ALIGN,
verbs_txreq_kmem_cache_ctor);
ibdev->reg_user_mr = hfi1_reg_user_mr;
ibdev->dereg_mr = hfi1_dereg_mr;
ibdev->alloc_mr = hfi1_alloc_mr;
- ibdev->alloc_fast_reg_page_list = hfi1_alloc_fast_reg_page_list;
- ibdev->free_fast_reg_page_list = hfi1_free_fast_reg_page_list;
ibdev->alloc_fmr = hfi1_alloc_fmr;
ibdev->map_phys_fmr = hfi1_map_phys_fmr;
ibdev->unmap_fmr = hfi1_unmap_fmr;
* in qp->s_max_sge.
*/
struct hfi1_swqe {
- struct ib_send_wr wr; /* don't use wr.sg_list */
+ union {
+ struct ib_send_wr wr; /* don't use wr.sg_list */
+ struct ib_rdma_wr rdma_wr;
+ struct ib_atomic_wr atomic_wr;
+ struct ib_ud_wr ud_wr;
+ };
u32 psn; /* first packet sequence number */
u32 lpsn; /* last packet sequence number */
u32 ssn; /* send sequence number */
u64 n_piowait;
u64 n_txwait;
u64 n_kmem_wait;
+ u64 n_send_schedule;
u32 n_pds_allocated; /* number of PDs allocated for device */
spinlock_t n_pds_lock;
enum ib_mr_type mr_type,
u32 max_entries);
- struct ib_fast_reg_page_list *hfi1_alloc_fast_reg_page_list(
- struct ib_device *ibdev, int page_list_len);
-
- void hfi1_free_fast_reg_page_list(struct ib_fast_reg_page_list *pl);
-
- int hfi1_fast_reg_mr(struct hfi1_qp *qp, struct ib_send_wr *wr);
-
struct ib_fmr *hfi1_alloc_fmr(struct ib_pd *pd, int mr_access_flags,
struct ib_fmr_attr *fmr_attr);
u32 hfi1_make_grh(struct hfi1_ibport *ibp, struct ib_grh *hdr,
struct ib_global_route *grh, u32 hwords, u32 nwords);
-void clear_ahg(struct hfi1_qp *qp);
-
void hfi1_make_ruc_header(struct hfi1_qp *qp, struct hfi1_other_headers *ohdr,
u32 bth0, u32 bth2, int middle);
* SOFTWARE.
*/
-#include <linux/sched.h>
#include <linux/spinlock.h>
#include "ipath_verbs.h"
if (wqe->length == 0)
break;
if (unlikely(!ipath_rkey_ok(qp, &qp->r_sge, wqe->length,
- wqe->wr.wr.rdma.remote_addr,
- wqe->wr.wr.rdma.rkey,
+ wqe->rdma_wr.remote_addr,
+ wqe->rdma_wr.rkey,
IB_ACCESS_REMOTE_WRITE)))
goto acc_err;
break;
if (unlikely(!(qp->qp_access_flags & IB_ACCESS_REMOTE_READ)))
goto inv_err;
if (unlikely(!ipath_rkey_ok(qp, &sqp->s_sge, wqe->length,
- wqe->wr.wr.rdma.remote_addr,
- wqe->wr.wr.rdma.rkey,
+ wqe->rdma_wr.remote_addr,
+ wqe->rdma_wr.rkey,
IB_ACCESS_REMOTE_READ)))
goto acc_err;
qp->r_sge.sge = wqe->sg_list[0];
if (unlikely(!(qp->qp_access_flags & IB_ACCESS_REMOTE_ATOMIC)))
goto inv_err;
if (unlikely(!ipath_rkey_ok(qp, &qp->r_sge, sizeof(u64),
- wqe->wr.wr.atomic.remote_addr,
- wqe->wr.wr.atomic.rkey,
+ wqe->atomic_wr.remote_addr,
+ wqe->atomic_wr.rkey,
IB_ACCESS_REMOTE_ATOMIC)))
goto acc_err;
/* Perform atomic OP and save result. */
maddr = (atomic64_t *) qp->r_sge.sge.vaddr;
- sdata = wqe->wr.wr.atomic.compare_add;
+ sdata = wqe->atomic_wr.compare_add;
*(u64 *) sqp->s_sge.sge.vaddr =
(wqe->wr.opcode == IB_WR_ATOMIC_FETCH_AND_ADD) ?
(u64) atomic64_add_return(sdata, maddr) - sdata :
(u64) cmpxchg((u64 *) qp->r_sge.sge.vaddr,
- sdata, wqe->wr.wr.atomic.swap);
+ sdata, wqe->atomic_wr.swap);
goto send_comp;
default:
* SOFTWARE.
*/
-#include <linux/sched.h>
#include <rdma/ib_smi.h>
#include "ipath_verbs.h"
u32 rlen;
u32 length;
- qp = ipath_lookup_qpn(&dev->qp_table, swqe->wr.wr.ud.remote_qpn);
+ qp = ipath_lookup_qpn(&dev->qp_table, swqe->ud_wr.remote_qpn);
if (!qp || !(ib_ipath_state_ops[qp->state] & IPATH_PROCESS_RECV_OK)) {
dev->n_pkt_drops++;
goto done;
* qkey from the QP context instead of the WR (see 10.2.5).
*/
if (unlikely(qp->ibqp.qp_num &&
- ((int) swqe->wr.wr.ud.remote_qkey < 0 ?
- sqp->qkey : swqe->wr.wr.ud.remote_qkey) != qp->qkey)) {
+ ((int) swqe->ud_wr.remote_qkey < 0 ?
+ sqp->qkey : swqe->ud_wr.remote_qkey) != qp->qkey)) {
/* XXX OK to lose a count once in a while. */
dev->qkey_violations++;
dev->n_pkt_drops++;
} else
spin_unlock_irqrestore(&rq->lock, flags);
- ah_attr = &to_iah(swqe->wr.wr.ud.ah)->attr;
+ ah_attr = &to_iah(swqe->ud_wr.ah)->attr;
if (ah_attr->ah_flags & IB_AH_GRH) {
ipath_copy_sge(&rsge, &ah_attr->grh, sizeof(struct ib_grh));
wc.wc_flags |= IB_WC_GRH;
wc.port_num = 1;
/* Signal completion event if the solicited bit is set. */
ipath_cq_enter(to_icq(qp->ibqp.recv_cq), &wc,
- swqe->wr.send_flags & IB_SEND_SOLICITED);
+ swqe->ud_wr.wr.send_flags & IB_SEND_SOLICITED);
drop:
if (atomic_dec_and_test(&qp->refcount))
wake_up(&qp->wait);
next_cur = 0;
/* Construct the header. */
- ah_attr = &to_iah(wqe->wr.wr.ud.ah)->attr;
+ ah_attr = &to_iah(wqe->ud_wr.ah)->attr;
if (ah_attr->dlid >= IPATH_MULTICAST_LID_BASE) {
if (ah_attr->dlid != IPATH_PERMISSIVE_LID)
dev->n_multicast_xmit++;
qp->s_wqe = wqe;
qp->s_sge.sge = wqe->sg_list[0];
qp->s_sge.sg_list = wqe->sg_list + 1;
- qp->s_sge.num_sge = wqe->wr.num_sge;
+ qp->s_sge.num_sge = wqe->ud_wr.wr.num_sge;
if (ah_attr->ah_flags & IB_AH_GRH) {
/* Header size in 32-bit words. */
lrh0 = IPATH_LRH_BTH;
ohdr = &qp->s_hdr.u.oth;
}
- if (wqe->wr.opcode == IB_WR_SEND_WITH_IMM) {
+ if (wqe->ud_wr.wr.opcode == IB_WR_SEND_WITH_IMM) {
qp->s_hdrwords++;
- ohdr->u.ud.imm_data = wqe->wr.ex.imm_data;
+ ohdr->u.ud.imm_data = wqe->ud_wr.wr.ex.imm_data;
bth0 = IB_OPCODE_UD_SEND_ONLY_WITH_IMMEDIATE << 24;
} else
bth0 = IB_OPCODE_UD_SEND_ONLY << 24;
qp->s_hdr.lrh[3] = cpu_to_be16(lid);
} else
qp->s_hdr.lrh[3] = IB_LID_PERMISSIVE;
- if (wqe->wr.send_flags & IB_SEND_SOLICITED)
+ if (wqe->ud_wr.wr.send_flags & IB_SEND_SOLICITED)
bth0 |= 1 << 23;
bth0 |= extra_bytes << 20;
bth0 |= qp->ibqp.qp_type == IB_QPT_SMI ? IPATH_DEFAULT_P_KEY :
ohdr->bth[1] = ah_attr->dlid >= IPATH_MULTICAST_LID_BASE &&
ah_attr->dlid != IPATH_PERMISSIVE_LID ?
cpu_to_be32(IPATH_MULTICAST_QPN) :
- cpu_to_be32(wqe->wr.wr.ud.remote_qpn);
+ cpu_to_be32(wqe->ud_wr.remote_qpn);
ohdr->bth[2] = cpu_to_be32(qp->s_next_psn++ & IPATH_PSN_MASK);
/*
* Qkeys with the high order bit set mean use the
* qkey from the QP context instead of the WR (see 10.2.5).
*/
- ohdr->u.ud.deth[0] = cpu_to_be32((int)wqe->wr.wr.ud.remote_qkey < 0 ?
- qp->qkey : wqe->wr.wr.ud.remote_qkey);
+ ohdr->u.ud.deth[0] = cpu_to_be32((int)wqe->ud_wr.remote_qkey < 0 ?
+ qp->qkey : wqe->ud_wr.remote_qkey);
ohdr->u.ud.deth[1] = cpu_to_be32(qp->ibqp.qp_num);
done:
wr->opcode != IB_WR_SEND_WITH_IMM)
goto bail_inval;
/* Check UD destination address PD */
- if (qp->ibqp.pd != wr->wr.ud.ah->pd)
+ if (qp->ibqp.pd != ud_wr(wr)->ah->pd)
goto bail_inval;
} else if ((unsigned) wr->opcode > IB_WR_ATOMIC_FETCH_AND_ADD)
goto bail_inval;
}
wqe = get_swqe_ptr(qp, qp->s_head);
- wqe->wr = *wr;
+
+ if (qp->ibqp.qp_type != IB_QPT_UC &&
+ qp->ibqp.qp_type != IB_QPT_RC)
+ memcpy(&wqe->ud_wr, ud_wr(wr), sizeof(wqe->ud_wr));
+ else if (wr->opcode == IB_WR_RDMA_WRITE_WITH_IMM ||
+ wr->opcode == IB_WR_RDMA_WRITE ||
+ wr->opcode == IB_WR_RDMA_READ)
+ memcpy(&wqe->rdma_wr, rdma_wr(wr), sizeof(wqe->rdma_wr));
+ else if (wr->opcode == IB_WR_ATOMIC_CMP_AND_SWP ||
+ wr->opcode == IB_WR_ATOMIC_FETCH_AND_ADD)
+ memcpy(&wqe->atomic_wr, atomic_wr(wr), sizeof(wqe->atomic_wr));
+ else
+ memcpy(&wqe->wr, wr, sizeof(wqe->wr));
+
wqe->length = 0;
if (wr->num_sge) {
acc = wr->opcode >= IB_WR_RDMA_READ ?
dev->ipath_spkts = tc - dev->ipath_spkts;
dev->ipath_rpkts = td - dev->ipath_rpkts;
dev->ipath_xmit_wait = te - dev->ipath_xmit_wait;
- }
- else
+ } else {
dev->pma_sample_interval--;
+ }
}
spin_unlock_irqrestore(&dev->pending_lock, flags);
dd->ipath_gpio_mask);
}
- init_timer(&dd->verbs_timer);
- dd->verbs_timer.function = __verbs_timer;
- dd->verbs_timer.data = (unsigned long)dd;
+ setup_timer(&dd->verbs_timer, __verbs_timer, (unsigned long)dd);
+
dd->verbs_timer.expires = jiffies + 1;
add_timer(&dd->verbs_timer);
dev = &idev->ibdev;
if (dd->ipath_sdma_descq_cnt) {
- tx = kmalloc(dd->ipath_sdma_descq_cnt * sizeof *tx,
- GFP_KERNEL);
+ tx = kmalloc_array(dd->ipath_sdma_descq_cnt, sizeof *tx,
+ GFP_KERNEL);
if (tx == NULL) {
ret = -ENOMEM;
goto err_tx;
* the LKEY). The remaining bits act as a generation number or tag.
*/
idev->lk_table.max = 1 << ib_ipath_lkey_table_size;
- idev->lk_table.table = kzalloc(idev->lk_table.max *
+ idev->lk_table.table = kcalloc(idev->lk_table.max,
sizeof(*idev->lk_table.table),
GFP_KERNEL);
if (idev->lk_table.table == NULL) {
MLX4_DEV_CAP_FLAG2_IGNORE_FCS = 1LL << 28,
MLX4_DEV_CAP_FLAG2_PHV_EN = 1LL << 29,
MLX4_DEV_CAP_FLAG2_SKIP_OUTER_VLAN = 1LL << 30,
+ MLX4_DEV_CAP_FLAG2_UPDATE_QP_SRC_CHECK_LB = 1ULL << 31,
+ MLX4_DEV_CAP_FLAG2_LB_SRC_CHK = 1ULL << 32,
};
enum {
struct mlx4_quotas quotas;
struct radix_tree_root qp_table_tree;
u8 rev_id;
+ u8 port_random_macs;
char board_id[MLX4_BOARD_ID_LEN];
int numa_node;
int oper_log_mgm_entry_size;
#include "rds.h"
#include "ib.h"
-static unsigned int fmr_pool_size = RDS_FMR_POOL_SIZE;
-unsigned int fmr_message_size = RDS_FMR_SIZE + 1; /* +1 allows for unaligned MRs */
+unsigned int rds_ib_fmr_1m_pool_size = RDS_FMR_1M_POOL_SIZE;
+unsigned int rds_ib_fmr_8k_pool_size = RDS_FMR_8K_POOL_SIZE;
unsigned int rds_ib_retry_count = RDS_IB_DEFAULT_RETRY_COUNT;
-module_param(fmr_pool_size, int, 0444);
-MODULE_PARM_DESC(fmr_pool_size, " Max number of fmr per HCA");
-module_param(fmr_message_size, int, 0444);
-MODULE_PARM_DESC(fmr_message_size, " Max size of a RDMA transfer");
+module_param(rds_ib_fmr_1m_pool_size, int, 0444);
+MODULE_PARM_DESC(rds_ib_fmr_1m_pool_size, " Max number of 1M fmr per HCA");
+module_param(rds_ib_fmr_8k_pool_size, int, 0444);
+MODULE_PARM_DESC(rds_ib_fmr_8k_pool_size, " Max number of 8K fmr per HCA");
module_param(rds_ib_retry_count, int, 0444);
MODULE_PARM_DESC(rds_ib_retry_count, " Number of hw retries before reporting an error");
struct rds_ib_device *rds_ibdev = container_of(work,
struct rds_ib_device, free_work);
- if (rds_ibdev->mr_pool)
- rds_ib_destroy_mr_pool(rds_ibdev->mr_pool);
+ if (rds_ibdev->mr_8k_pool)
+ rds_ib_destroy_mr_pool(rds_ibdev->mr_8k_pool);
+ if (rds_ibdev->mr_1m_pool)
+ rds_ib_destroy_mr_pool(rds_ibdev->mr_1m_pool);
if (rds_ibdev->pd)
ib_dealloc_pd(rds_ibdev->pd);
rds_ibdev->max_sge = min(dev_attr->max_sge, RDS_IB_MAX_SGE);
rds_ibdev->fmr_max_remaps = dev_attr->max_map_per_fmr?: 32;
- rds_ibdev->max_fmrs = dev_attr->max_fmr ?
- min_t(unsigned int, dev_attr->max_fmr, fmr_pool_size) :
- fmr_pool_size;
+ rds_ibdev->max_1m_fmrs = dev_attr->max_mr ?
+ min_t(unsigned int, (dev_attr->max_mr / 2),
+ rds_ib_fmr_1m_pool_size) : rds_ib_fmr_1m_pool_size;
+
+ rds_ibdev->max_8k_fmrs = dev_attr->max_mr ?
+ min_t(unsigned int, ((dev_attr->max_mr / 2) * RDS_MR_8K_SCALE),
+ rds_ib_fmr_8k_pool_size) : rds_ib_fmr_8k_pool_size;
rds_ibdev->max_initiator_depth = dev_attr->max_qp_init_rd_atom;
rds_ibdev->max_responder_resources = dev_attr->max_qp_rd_atom;
goto put_dev;
}
- rds_ibdev->mr_pool = rds_ib_create_mr_pool(rds_ibdev);
- if (IS_ERR(rds_ibdev->mr_pool)) {
- rds_ibdev->mr_pool = NULL;
+ rds_ibdev->mr_1m_pool =
+ rds_ib_create_mr_pool(rds_ibdev, RDS_IB_MR_1M_POOL);
+ if (IS_ERR(rds_ibdev->mr_1m_pool)) {
+ rds_ibdev->mr_1m_pool = NULL;
goto put_dev;
}
+ rds_ibdev->mr_8k_pool =
+ rds_ib_create_mr_pool(rds_ibdev, RDS_IB_MR_8K_POOL);
+ if (IS_ERR(rds_ibdev->mr_8k_pool)) {
+ rds_ibdev->mr_8k_pool = NULL;
+ goto put_dev;
+ }
+
+ rdsdebug("RDS/IB: max_mr = %d, max_wrs = %d, max_sge = %d, fmr_max_remaps = %d, max_1m_fmrs = %d, max_8k_fmrs = %d\n",
+ dev_attr->max_fmr, rds_ibdev->max_wrs, rds_ibdev->max_sge,
+ rds_ibdev->fmr_max_remaps, rds_ibdev->max_1m_fmrs,
+ rds_ibdev->max_8k_fmrs);
+
INIT_LIST_HEAD(&rds_ibdev->ipaddr_list);
INIT_LIST_HEAD(&rds_ibdev->conn_list);
/* Create a CMA ID and try to bind it. This catches both
* IB and iWARP capable NICs.
*/
- cm_id = rdma_create_id(NULL, NULL, RDMA_PS_TCP, IB_QPT_RC);
+ cm_id = rdma_create_id(&init_net, NULL, NULL, RDMA_PS_TCP, IB_QPT_RC);
if (IS_ERR(cm_id))
return PTR_ERR(cm_id);
#include "rds.h"
#include "rdma_transport.h"
-#define RDS_FMR_SIZE 256
-#define RDS_FMR_POOL_SIZE 8192
+#define RDS_FMR_1M_POOL_SIZE (8192 / 2)
+#define RDS_FMR_1M_MSG_SIZE 256
+#define RDS_FMR_8K_MSG_SIZE 2
+#define RDS_MR_8K_SCALE (256 / (RDS_FMR_8K_MSG_SIZE + 1))
+#define RDS_FMR_8K_POOL_SIZE (RDS_MR_8K_SCALE * (8192 / 2))
#define RDS_IB_MAX_SGE 8
#define RDS_IB_RECV_SGE 2
#define RDS_IB_RECYCLE_BATCH_COUNT 32
+#define RDS_IB_WC_MAX 32
+#define RDS_IB_SEND_OP BIT_ULL(63)
+
extern struct rw_semaphore rds_ib_devices_lock;
extern struct list_head rds_ib_devices;
struct rds_ib_send_work {
void *s_op;
- struct ib_send_wr s_wr;
+ union {
+ struct ib_send_wr s_wr;
+ struct ib_rdma_wr s_rdma_wr;
+ struct ib_atomic_wr s_atomic_wr;
+ };
struct ib_sge s_sge[RDS_IB_MAX_SGE];
unsigned long s_queued;
};
atomic_t w_free_ctr;
};
+/* Rings are posted with all the allocations they'll need to queue the
+ * incoming message to the receiving socket so this can't fail.
+ * All fragments start with a header, so we can make sure we're not receiving
+ * garbage, and we can tell a small 8 byte fragment from an ACK frame.
+ */
+struct rds_ib_ack_state {
+ u64 ack_next;
+ u64 ack_recv;
+ unsigned int ack_required:1;
+ unsigned int ack_next_valid:1;
+ unsigned int ack_recv_valid:1;
+};
+
+
struct rds_ib_device;
struct rds_ib_connection {
struct ib_pd *i_pd;
struct ib_cq *i_send_cq;
struct ib_cq *i_recv_cq;
+ struct ib_wc i_send_wc[RDS_IB_WC_MAX];
+ struct ib_wc i_recv_wc[RDS_IB_WC_MAX];
+
+ /* interrupt handling */
+ struct tasklet_struct i_send_tasklet;
+ struct tasklet_struct i_recv_tasklet;
/* tx */
struct rds_ib_work_ring i_send_ring;
atomic_t i_signaled_sends;
/* rx */
- struct tasklet_struct i_recv_tasklet;
struct mutex i_recv_mutex;
struct rds_ib_work_ring i_recv_ring;
struct rds_ib_incoming *i_ibinc;
struct rds_ib_ipaddr {
struct list_head list;
__be32 ipaddr;
+ struct rcu_head rcu;
+};
+
+enum {
+ RDS_IB_MR_8K_POOL,
+ RDS_IB_MR_1M_POOL,
};
struct rds_ib_device {
struct list_head conn_list;
struct ib_device *dev;
struct ib_pd *pd;
- struct rds_ib_mr_pool *mr_pool;
- unsigned int fmr_max_remaps;
unsigned int max_fmrs;
+ struct rds_ib_mr_pool *mr_1m_pool;
+ struct rds_ib_mr_pool *mr_8k_pool;
+ unsigned int fmr_max_remaps;
+ unsigned int max_8k_fmrs;
+ unsigned int max_1m_fmrs;
int max_sge;
unsigned int max_wrs;
unsigned int max_initiator_depth;
struct rds_ib_statistics {
uint64_t s_ib_connect_raced;
uint64_t s_ib_listen_closed_stale;
- uint64_t s_ib_tx_cq_call;
+ uint64_t s_ib_evt_handler_call;
+ uint64_t s_ib_tasklet_call;
uint64_t s_ib_tx_cq_event;
uint64_t s_ib_tx_ring_full;
uint64_t s_ib_tx_throttle;
uint64_t s_ib_tx_sg_mapping_failure;
uint64_t s_ib_tx_stalled;
uint64_t s_ib_tx_credit_updates;
- uint64_t s_ib_rx_cq_call;
uint64_t s_ib_rx_cq_event;
uint64_t s_ib_rx_ring_empty;
uint64_t s_ib_rx_refill_from_cq;
uint64_t s_ib_ack_send_delayed;
uint64_t s_ib_ack_send_piggybacked;
uint64_t s_ib_ack_received;
- uint64_t s_ib_rdma_mr_alloc;
- uint64_t s_ib_rdma_mr_free;
- uint64_t s_ib_rdma_mr_used;
- uint64_t s_ib_rdma_mr_pool_flush;
- uint64_t s_ib_rdma_mr_pool_wait;
- uint64_t s_ib_rdma_mr_pool_depleted;
+ uint64_t s_ib_rdma_mr_8k_alloc;
+ uint64_t s_ib_rdma_mr_8k_free;
+ uint64_t s_ib_rdma_mr_8k_used;
+ uint64_t s_ib_rdma_mr_8k_pool_flush;
+ uint64_t s_ib_rdma_mr_8k_pool_wait;
+ uint64_t s_ib_rdma_mr_8k_pool_depleted;
+ uint64_t s_ib_rdma_mr_1m_alloc;
+ uint64_t s_ib_rdma_mr_1m_free;
+ uint64_t s_ib_rdma_mr_1m_used;
+ uint64_t s_ib_rdma_mr_1m_pool_flush;
+ uint64_t s_ib_rdma_mr_1m_pool_wait;
+ uint64_t s_ib_rdma_mr_1m_pool_depleted;
uint64_t s_ib_atomic_cswp;
uint64_t s_ib_atomic_fadd;
};
void rds_ib_dev_put(struct rds_ib_device *rds_ibdev);
extern struct ib_client rds_ib_client;
-extern unsigned int fmr_message_size;
+extern unsigned int rds_ib_fmr_1m_pool_size;
+extern unsigned int rds_ib_fmr_8k_pool_size;
extern unsigned int rds_ib_retry_count;
extern spinlock_t ib_nodev_conns_lock;
void rds_ib_add_conn(struct rds_ib_device *rds_ibdev, struct rds_connection *conn);
void rds_ib_remove_conn(struct rds_ib_device *rds_ibdev, struct rds_connection *conn);
void rds_ib_destroy_nodev_conns(void);
-struct rds_ib_mr_pool *rds_ib_create_mr_pool(struct rds_ib_device *);
+struct rds_ib_mr_pool *rds_ib_create_mr_pool(struct rds_ib_device *rds_dev,
+ int npages);
void rds_ib_get_mr_info(struct rds_ib_device *rds_ibdev, struct rds_info_rdma_connection *iinfo);
void rds_ib_destroy_mr_pool(struct rds_ib_mr_pool *);
void *rds_ib_get_mr(struct scatterlist *sg, unsigned long nents,
void rds_ib_recv_refill(struct rds_connection *conn, int prefill, gfp_t gfp);
void rds_ib_inc_free(struct rds_incoming *inc);
int rds_ib_inc_copy_to_user(struct rds_incoming *inc, struct iov_iter *to);
-void rds_ib_recv_cq_comp_handler(struct ib_cq *cq, void *context);
+void rds_ib_recv_cqe_handler(struct rds_ib_connection *ic, struct ib_wc *wc,
+ struct rds_ib_ack_state *state);
void rds_ib_recv_tasklet_fn(unsigned long data);
void rds_ib_recv_init_ring(struct rds_ib_connection *ic);
void rds_ib_recv_clear_ring(struct rds_ib_connection *ic);
void rds_ib_attempt_ack(struct rds_ib_connection *ic);
void rds_ib_ack_send_complete(struct rds_ib_connection *ic);
u64 rds_ib_piggyb_ack(struct rds_ib_connection *ic);
+void rds_ib_set_ack(struct rds_ib_connection *ic, u64 seq, int ack_required);
/* ib_ring.c */
void rds_ib_ring_init(struct rds_ib_work_ring *ring, u32 nr);
void rds_ib_xmit_complete(struct rds_connection *conn);
int rds_ib_xmit(struct rds_connection *conn, struct rds_message *rm,
unsigned int hdr_off, unsigned int sg, unsigned int off);
-void rds_ib_send_cq_comp_handler(struct ib_cq *cq, void *context);
+void rds_ib_send_cqe_handler(struct rds_ib_connection *ic, struct ib_wc *wc);
void rds_ib_send_init_ring(struct rds_ib_connection *ic);
void rds_ib_send_clear_ring(struct rds_ib_connection *ic);
int rds_ib_xmit_rdma(struct rds_connection *conn, struct rm_rdma_op *op);
event->event, ib_event_msg(event->event), data);
}
+/* Plucking the oldest entry from the ring can be done concurrently with
+ * the thread refilling the ring. Each ring operation is protected by
+ * spinlocks and the transient state of refilling doesn't change the
+ * recording of which entry is oldest.
+ *
+ * This relies on IB only calling one cq comp_handler for each cq so that
+ * there will only be one caller of rds_recv_incoming() per RDS connection.
+ */
+static void rds_ib_cq_comp_handler_recv(struct ib_cq *cq, void *context)
+{
+ struct rds_connection *conn = context;
+ struct rds_ib_connection *ic = conn->c_transport_data;
+
+ rdsdebug("conn %p cq %p\n", conn, cq);
+
+ rds_ib_stats_inc(s_ib_evt_handler_call);
+
+ tasklet_schedule(&ic->i_recv_tasklet);
+}
+
+static void poll_cq(struct rds_ib_connection *ic, struct ib_cq *cq,
+ struct ib_wc *wcs,
+ struct rds_ib_ack_state *ack_state)
+{
+ int nr;
+ int i;
+ struct ib_wc *wc;
+
+ while ((nr = ib_poll_cq(cq, RDS_IB_WC_MAX, wcs)) > 0) {
+ for (i = 0; i < nr; i++) {
+ wc = wcs + i;
+ rdsdebug("wc wr_id 0x%llx status %u byte_len %u imm_data %u\n",
+ (unsigned long long)wc->wr_id, wc->status,
+ wc->byte_len, be32_to_cpu(wc->ex.imm_data));
+
+ if (wc->wr_id & RDS_IB_SEND_OP)
+ rds_ib_send_cqe_handler(ic, wc);
+ else
+ rds_ib_recv_cqe_handler(ic, wc, ack_state);
+ }
+ }
+}
+
+static void rds_ib_tasklet_fn_send(unsigned long data)
+{
+ struct rds_ib_connection *ic = (struct rds_ib_connection *)data;
+ struct rds_connection *conn = ic->conn;
+ struct rds_ib_ack_state state;
+
+ rds_ib_stats_inc(s_ib_tasklet_call);
+
+ memset(&state, 0, sizeof(state));
+ poll_cq(ic, ic->i_send_cq, ic->i_send_wc, &state);
+ ib_req_notify_cq(ic->i_send_cq, IB_CQ_NEXT_COMP);
+ poll_cq(ic, ic->i_send_cq, ic->i_send_wc, &state);
+
+ if (rds_conn_up(conn) &&
+ (!test_bit(RDS_LL_SEND_FULL, &conn->c_flags) ||
+ test_bit(0, &conn->c_map_queued)))
+ rds_send_xmit(ic->conn);
+}
+
+static void rds_ib_tasklet_fn_recv(unsigned long data)
+{
+ struct rds_ib_connection *ic = (struct rds_ib_connection *)data;
+ struct rds_connection *conn = ic->conn;
+ struct rds_ib_device *rds_ibdev = ic->rds_ibdev;
+ struct rds_ib_ack_state state;
+
+ if (!rds_ibdev)
+ rds_conn_drop(conn);
+
+ rds_ib_stats_inc(s_ib_tasklet_call);
+
+ memset(&state, 0, sizeof(state));
+ poll_cq(ic, ic->i_recv_cq, ic->i_recv_wc, &state);
+ ib_req_notify_cq(ic->i_recv_cq, IB_CQ_SOLICITED);
+ poll_cq(ic, ic->i_recv_cq, ic->i_recv_wc, &state);
+
+ if (state.ack_next_valid)
+ rds_ib_set_ack(ic, state.ack_next, state.ack_required);
+ if (state.ack_recv_valid && state.ack_recv > ic->i_ack_recv) {
+ rds_send_drop_acked(conn, state.ack_recv, NULL);
+ ic->i_ack_recv = state.ack_recv;
+ }
+
+ if (rds_conn_up(conn))
+ rds_ib_attempt_ack(ic);
+}
+
static void rds_ib_qp_event_handler(struct ib_event *event, void *data)
{
struct rds_connection *conn = data;
}
}
+static void rds_ib_cq_comp_handler_send(struct ib_cq *cq, void *context)
+{
+ struct rds_connection *conn = context;
+ struct rds_ib_connection *ic = conn->c_transport_data;
+
+ rdsdebug("conn %p cq %p\n", conn, cq);
+
+ rds_ib_stats_inc(s_ib_evt_handler_call);
+
+ tasklet_schedule(&ic->i_send_tasklet);
+}
+
/*
* This needs to be very careful to not leave IS_ERR pointers around for
* cleanup to trip over.
ic->i_pd = rds_ibdev->pd;
cq_attr.cqe = ic->i_send_ring.w_nr + 1;
- ic->i_send_cq = ib_create_cq(dev, rds_ib_send_cq_comp_handler,
+
+ ic->i_send_cq = ib_create_cq(dev, rds_ib_cq_comp_handler_send,
rds_ib_cq_event_handler, conn,
&cq_attr);
if (IS_ERR(ic->i_send_cq)) {
}
cq_attr.cqe = ic->i_recv_ring.w_nr;
- ic->i_recv_cq = ib_create_cq(dev, rds_ib_recv_cq_comp_handler,
+ ic->i_recv_cq = ib_create_cq(dev, rds_ib_cq_comp_handler_recv,
rds_ib_cq_event_handler, conn,
&cq_attr);
if (IS_ERR(ic->i_recv_cq)) {
/* XXX I wonder what affect the port space has */
/* delegate cm event handler to rdma_transport */
- ic->i_cm_id = rdma_create_id(rds_rdma_cm_event_handler, conn,
+ ic->i_cm_id = rdma_create_id(&init_net, rds_rdma_cm_event_handler, conn,
RDMA_PS_TCP, IB_QPT_RC);
if (IS_ERR(ic->i_cm_id)) {
ret = PTR_ERR(ic->i_cm_id);
wait_event(rds_ib_ring_empty_wait,
rds_ib_ring_empty(&ic->i_recv_ring) &&
(atomic_read(&ic->i_signaled_sends) == 0));
+ tasklet_kill(&ic->i_send_tasklet);
tasklet_kill(&ic->i_recv_tasklet);
/* first destroy the ib state that generates callbacks */
}
INIT_LIST_HEAD(&ic->ib_node);
- tasklet_init(&ic->i_recv_tasklet, rds_ib_recv_tasklet_fn,
- (unsigned long) ic);
+ tasklet_init(&ic->i_send_tasklet, rds_ib_tasklet_fn_send,
+ (unsigned long)ic);
+ tasklet_init(&ic->i_recv_tasklet, rds_ib_tasklet_fn_recv,
+ (unsigned long)ic);
mutex_init(&ic->i_recv_mutex);
#ifndef KERNEL_HAS_ATOMIC64
spin_lock_init(&ic->i_ack_lock);
send->s_op = NULL;
- send->s_wr.wr_id = i;
+ send->s_wr.wr_id = i | RDS_IB_SEND_OP;
send->s_wr.sg_list = send->s_sge;
send->s_wr.ex.imm_data = 0;
* unallocs the next free entry in the ring it doesn't alter which is
* the next to be freed, which is what this is concerned with.
*/
-void rds_ib_send_cq_comp_handler(struct ib_cq *cq, void *context)
+void rds_ib_send_cqe_handler(struct rds_ib_connection *ic, struct ib_wc *wc)
{
- struct rds_connection *conn = context;
- struct rds_ib_connection *ic = conn->c_transport_data;
struct rds_message *rm = NULL;
- struct ib_wc wc;
+ struct rds_connection *conn = ic->conn;
struct rds_ib_send_work *send;
u32 completed;
u32 oldest;
u32 i = 0;
- int ret;
int nr_sig = 0;
- rdsdebug("cq %p conn %p\n", cq, conn);
- rds_ib_stats_inc(s_ib_tx_cq_call);
- ret = ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
- if (ret)
- rdsdebug("ib_req_notify_cq send failed: %d\n", ret);
-
- while (ib_poll_cq(cq, 1, &wc) > 0) {
- rdsdebug("wc wr_id 0x%llx status %u (%s) byte_len %u imm_data %u\n",
- (unsigned long long)wc.wr_id, wc.status,
- ib_wc_status_msg(wc.status), wc.byte_len,
- be32_to_cpu(wc.ex.imm_data));
- rds_ib_stats_inc(s_ib_tx_cq_event);
-
- if (wc.wr_id == RDS_IB_ACK_WR_ID) {
- if (time_after(jiffies, ic->i_ack_queued + HZ/2))
- rds_ib_stats_inc(s_ib_tx_stalled);
- rds_ib_ack_send_complete(ic);
- continue;
- }
- oldest = rds_ib_ring_oldest(&ic->i_send_ring);
+ rdsdebug("wc wr_id 0x%llx status %u (%s) byte_len %u imm_data %u\n",
+ (unsigned long long)wc->wr_id, wc->status,
+ ib_wc_status_msg(wc->status), wc->byte_len,
+ be32_to_cpu(wc->ex.imm_data));
+ rds_ib_stats_inc(s_ib_tx_cq_event);
- completed = rds_ib_ring_completed(&ic->i_send_ring, wc.wr_id, oldest);
+ if (wc->wr_id == RDS_IB_ACK_WR_ID) {
+ if (time_after(jiffies, ic->i_ack_queued + HZ / 2))
+ rds_ib_stats_inc(s_ib_tx_stalled);
+ rds_ib_ack_send_complete(ic);
+ return;
+ }
- for (i = 0; i < completed; i++) {
- send = &ic->i_sends[oldest];
- if (send->s_wr.send_flags & IB_SEND_SIGNALED)
- nr_sig++;
+ oldest = rds_ib_ring_oldest(&ic->i_send_ring);
- rm = rds_ib_send_unmap_op(ic, send, wc.status);
+ completed = rds_ib_ring_completed(&ic->i_send_ring,
+ (wc->wr_id & ~RDS_IB_SEND_OP),
+ oldest);
- if (time_after(jiffies, send->s_queued + HZ/2))
- rds_ib_stats_inc(s_ib_tx_stalled);
+ for (i = 0; i < completed; i++) {
+ send = &ic->i_sends[oldest];
+ if (send->s_wr.send_flags & IB_SEND_SIGNALED)
+ nr_sig++;
- if (send->s_op) {
- if (send->s_op == rm->m_final_op) {
- /* If anyone waited for this message to get flushed out, wake
- * them up now */
- rds_message_unmapped(rm);
- }
- rds_message_put(rm);
- send->s_op = NULL;
- }
+ rm = rds_ib_send_unmap_op(ic, send, wc->status);
+
+ if (time_after(jiffies, send->s_queued + HZ / 2))
+ rds_ib_stats_inc(s_ib_tx_stalled);
- oldest = (oldest + 1) % ic->i_send_ring.w_nr;
+ if (send->s_op) {
+ if (send->s_op == rm->m_final_op) {
+ /* If anyone waited for this message to get
+ * flushed out, wake them up now
+ */
+ rds_message_unmapped(rm);
+ }
+ rds_message_put(rm);
+ send->s_op = NULL;
}
- rds_ib_ring_free(&ic->i_send_ring, completed);
- rds_ib_sub_signaled(ic, nr_sig);
- nr_sig = 0;
+ oldest = (oldest + 1) % ic->i_send_ring.w_nr;
+ }
- if (test_and_clear_bit(RDS_LL_SEND_FULL, &conn->c_flags) ||
- test_bit(0, &conn->c_map_queued))
- queue_delayed_work(rds_wq, &conn->c_send_w, 0);
+ rds_ib_ring_free(&ic->i_send_ring, completed);
+ rds_ib_sub_signaled(ic, nr_sig);
+ nr_sig = 0;
- /* We expect errors as the qp is drained during shutdown */
- if (wc.status != IB_WC_SUCCESS && rds_conn_up(conn)) {
- rds_ib_conn_error(conn, "send completion on %pI4 had status "
- "%u (%s), disconnecting and reconnecting\n",
- &conn->c_faddr, wc.status,
- ib_wc_status_msg(wc.status));
- }
+ if (test_and_clear_bit(RDS_LL_SEND_FULL, &conn->c_flags) ||
+ test_bit(0, &conn->c_map_queued))
+ queue_delayed_work(rds_wq, &conn->c_send_w, 0);
+
+ /* We expect errors as the qp is drained during shutdown */
+ if (wc->status != IB_WC_SUCCESS && rds_conn_up(conn)) {
+ rds_ib_conn_error(conn, "send completion on %pI4 had status %u (%s), disconnecting and reconnecting\n",
+ &conn->c_faddr, wc->status,
+ ib_wc_status_msg(wc->status));
}
}
send->s_queued = jiffies;
if (op->op_type == RDS_ATOMIC_TYPE_CSWP) {
- send->s_wr.opcode = IB_WR_MASKED_ATOMIC_CMP_AND_SWP;
- send->s_wr.wr.atomic.compare_add = op->op_m_cswp.compare;
- send->s_wr.wr.atomic.swap = op->op_m_cswp.swap;
- send->s_wr.wr.atomic.compare_add_mask = op->op_m_cswp.compare_mask;
- send->s_wr.wr.atomic.swap_mask = op->op_m_cswp.swap_mask;
+ send->s_atomic_wr.wr.opcode = IB_WR_MASKED_ATOMIC_CMP_AND_SWP;
+ send->s_atomic_wr.compare_add = op->op_m_cswp.compare;
+ send->s_atomic_wr.swap = op->op_m_cswp.swap;
+ send->s_atomic_wr.compare_add_mask = op->op_m_cswp.compare_mask;
+ send->s_atomic_wr.swap_mask = op->op_m_cswp.swap_mask;
} else { /* FADD */
- send->s_wr.opcode = IB_WR_MASKED_ATOMIC_FETCH_AND_ADD;
- send->s_wr.wr.atomic.compare_add = op->op_m_fadd.add;
- send->s_wr.wr.atomic.swap = 0;
- send->s_wr.wr.atomic.compare_add_mask = op->op_m_fadd.nocarry_mask;
- send->s_wr.wr.atomic.swap_mask = 0;
+ send->s_atomic_wr.wr.opcode = IB_WR_MASKED_ATOMIC_FETCH_AND_ADD;
+ send->s_atomic_wr.compare_add = op->op_m_fadd.add;
+ send->s_atomic_wr.swap = 0;
+ send->s_atomic_wr.compare_add_mask = op->op_m_fadd.nocarry_mask;
+ send->s_atomic_wr.swap_mask = 0;
}
nr_sig = rds_ib_set_wr_signal_state(ic, send, op->op_notify);
- send->s_wr.num_sge = 1;
- send->s_wr.next = NULL;
- send->s_wr.wr.atomic.remote_addr = op->op_remote_addr;
- send->s_wr.wr.atomic.rkey = op->op_rkey;
+ send->s_atomic_wr.wr.num_sge = 1;
+ send->s_atomic_wr.wr.next = NULL;
+ send->s_atomic_wr.remote_addr = op->op_remote_addr;
+ send->s_atomic_wr.rkey = op->op_rkey;
send->s_op = op;
rds_message_addref(container_of(send->s_op, struct rds_message, atomic));
if (nr_sig)
atomic_add(nr_sig, &ic->i_signaled_sends);
- failed_wr = &send->s_wr;
- ret = ib_post_send(ic->i_cm_id->qp, &send->s_wr, &failed_wr);
+ failed_wr = &send->s_atomic_wr.wr;
+ ret = ib_post_send(ic->i_cm_id->qp, &send->s_atomic_wr.wr, &failed_wr);
rdsdebug("ic %p send %p (wr %p) ret %d wr %p\n", ic,
- send, &send->s_wr, ret, failed_wr);
- BUG_ON(failed_wr != &send->s_wr);
+ send, &send->s_atomic_wr, ret, failed_wr);
+ BUG_ON(failed_wr != &send->s_atomic_wr.wr);
if (ret) {
printk(KERN_WARNING "RDS/IB: atomic ib_post_send to %pI4 "
"returned %d\n", &conn->c_faddr, ret);
goto out;
}
- if (unlikely(failed_wr != &send->s_wr)) {
+ if (unlikely(failed_wr != &send->s_atomic_wr.wr)) {
printk(KERN_WARNING "RDS/IB: atomic ib_post_send() rc=%d, but failed_wqe updated!\n", ret);
- BUG_ON(failed_wr != &send->s_wr);
+ BUG_ON(failed_wr != &send->s_atomic_wr.wr);
}
out:
nr_sig += rds_ib_set_wr_signal_state(ic, send, op->op_notify);
send->s_wr.opcode = op->op_write ? IB_WR_RDMA_WRITE : IB_WR_RDMA_READ;
- send->s_wr.wr.rdma.remote_addr = remote_addr;
- send->s_wr.wr.rdma.rkey = op->op_rkey;
+ send->s_rdma_wr.remote_addr = remote_addr;
+ send->s_rdma_wr.rkey = op->op_rkey;
if (num_sge > max_sge) {
- send->s_wr.num_sge = max_sge;
+ send->s_rdma_wr.wr.num_sge = max_sge;
num_sge -= max_sge;
} else {
- send->s_wr.num_sge = num_sge;
+ send->s_rdma_wr.wr.num_sge = num_sge;
}
- send->s_wr.next = NULL;
+ send->s_rdma_wr.wr.next = NULL;
if (prev)
- prev->s_wr.next = &send->s_wr;
+ prev->s_rdma_wr.wr.next = &send->s_rdma_wr.wr;
- for (j = 0; j < send->s_wr.num_sge && scat != &op->op_sg[op->op_count]; j++) {
+ for (j = 0; j < send->s_rdma_wr.wr.num_sge &&
+ scat != &op->op_sg[op->op_count]; j++) {
len = ib_sg_dma_len(ic->i_cm_id->device, scat);
send->s_sge[j].addr =
ib_sg_dma_address(ic->i_cm_id->device, scat);
}
rdsdebug("send %p wr %p num_sge %u next %p\n", send,
- &send->s_wr, send->s_wr.num_sge, send->s_wr.next);
+ &send->s_rdma_wr.wr,
+ send->s_rdma_wr.wr.num_sge,
+ send->s_rdma_wr.wr.next);
prev = send;
if (++send == &ic->i_sends[ic->i_send_ring.w_nr])
if (nr_sig)
atomic_add(nr_sig, &ic->i_signaled_sends);
- failed_wr = &first->s_wr;
- ret = ib_post_send(ic->i_cm_id->qp, &first->s_wr, &failed_wr);
+ failed_wr = &first->s_rdma_wr.wr;
+ ret = ib_post_send(ic->i_cm_id->qp, &first->s_rdma_wr.wr, &failed_wr);
rdsdebug("ic %p first %p (wr %p) ret %d wr %p\n", ic,
- first, &first->s_wr, ret, failed_wr);
- BUG_ON(failed_wr != &first->s_wr);
+ first, &first->s_rdma_wr.wr, ret, failed_wr);
+ BUG_ON(failed_wr != &first->s_rdma_wr.wr);
if (ret) {
printk(KERN_WARNING "RDS/IB: rdma ib_post_send to %pI4 "
"returned %d\n", &conn->c_faddr, ret);
goto out;
}
- if (unlikely(failed_wr != &first->s_wr)) {
+ if (unlikely(failed_wr != &first->s_rdma_wr.wr)) {
printk(KERN_WARNING "RDS/IB: ib_post_send() rc=%d, but failed_wqe updated!\n", ret);
- BUG_ON(failed_wr != &first->s_wr);
+ BUG_ON(failed_wr != &first->s_rdma_wr.wr);
}
struct rdma_cm_id *cm_id;
struct ib_mr *mr;
- struct ib_fast_reg_page_list *page_list;
struct rds_iw_mapping mapping;
unsigned char remap_count;
int max_pages;
};
-static int rds_iw_flush_mr_pool(struct rds_iw_mr_pool *pool, int free_all);
+static void rds_iw_flush_mr_pool(struct rds_iw_mr_pool *pool, int free_all);
static void rds_iw_mr_pool_flush_worker(struct work_struct *work);
- static int rds_iw_init_fastreg(struct rds_iw_mr_pool *pool, struct rds_iw_mr *ibmr);
- static int rds_iw_map_fastreg(struct rds_iw_mr_pool *pool,
+ static int rds_iw_init_reg(struct rds_iw_mr_pool *pool, struct rds_iw_mr *ibmr);
+ static int rds_iw_map_reg(struct rds_iw_mr_pool *pool,
struct rds_iw_mr *ibmr,
struct scatterlist *sg, unsigned int nents);
static void rds_iw_free_fastreg(struct rds_iw_mr_pool *pool, struct rds_iw_mr *ibmr);
sg->bytes = 0;
}
- static u64 *rds_iw_map_scatterlist(struct rds_iw_device *rds_iwdev,
- struct rds_iw_scatterlist *sg)
+ static int rds_iw_map_scatterlist(struct rds_iw_device *rds_iwdev,
+ struct rds_iw_scatterlist *sg)
{
struct ib_device *dev = rds_iwdev->dev;
- u64 *dma_pages = NULL;
- int i, j, ret;
+ int i, ret;
WARN_ON(sg->dma_len);
sg->dma_len = ib_dma_map_sg(dev, sg->list, sg->len, DMA_BIDIRECTIONAL);
if (unlikely(!sg->dma_len)) {
printk(KERN_WARNING "RDS/IW: dma_map_sg failed!\n");
- return ERR_PTR(-EBUSY);
+ return -EBUSY;
}
sg->bytes = 0;
if (sg->dma_npages > fastreg_message_size)
goto out_unmap;
- dma_pages = kmalloc(sizeof(u64) * sg->dma_npages, GFP_ATOMIC);
- if (!dma_pages) {
- ret = -ENOMEM;
- goto out_unmap;
- }
- for (i = j = 0; i < sg->dma_len; ++i) {
- unsigned int dma_len = ib_sg_dma_len(dev, &sg->list[i]);
- u64 dma_addr = ib_sg_dma_address(dev, &sg->list[i]);
- u64 end_addr;
- end_addr = dma_addr + dma_len;
- dma_addr &= ~PAGE_MASK;
- for (; dma_addr < end_addr; dma_addr += PAGE_SIZE)
- dma_pages[j++] = dma_addr;
- BUG_ON(j > sg->dma_npages);
- }
-
- return dma_pages;
+ return 0;
out_unmap:
ib_dma_unmap_sg(rds_iwdev->dev, sg->list, sg->len, DMA_BIDIRECTIONAL);
sg->dma_len = 0;
- kfree(dma_pages);
- return ERR_PTR(ret);
+ return ret;
}
INIT_LIST_HEAD(&ibmr->mapping.m_list);
ibmr->mapping.m_mr = ibmr;
- err = rds_iw_init_fastreg(pool, ibmr);
+ err = rds_iw_init_reg(pool, ibmr);
if (err)
goto out_no_cigar;
* If the number of MRs allocated exceeds the limit, we also try
* to free as many MRs as needed to get back to this limit.
*/
-static int rds_iw_flush_mr_pool(struct rds_iw_mr_pool *pool, int free_all)
+static void rds_iw_flush_mr_pool(struct rds_iw_mr_pool *pool, int free_all)
{
struct rds_iw_mr *ibmr, *next;
LIST_HEAD(unmap_list);
LIST_HEAD(kill_list);
unsigned long flags;
unsigned int nfreed = 0, ncleaned = 0, unpinned = 0;
- int ret = 0;
rds_iw_stats_inc(s_iw_rdma_mr_pool_flush);
atomic_sub(nfreed, &pool->item_count);
mutex_unlock(&pool->flush_lock);
- return ret;
}
static void rds_iw_mr_pool_flush_worker(struct work_struct *work)
ibmr->cm_id = cm_id;
ibmr->device = rds_iwdev;
- ret = rds_iw_map_fastreg(rds_iwdev->mr_pool, ibmr, sg, nents);
+ ret = rds_iw_map_reg(rds_iwdev->mr_pool, ibmr, sg, nents);
if (ret == 0)
*key_ret = ibmr->mr->rkey;
else
}
/*
- * iWARP fastreg handling
+ * iWARP reg handling
*
* The life cycle of a fastreg registration is a bit different from
* FMRs.
* This creates a bit of a problem for us, as we do not have the destination
* IP in GET_MR, so the connection must be setup prior to the GET_MR call for
* RDMA to be correctly setup. If a fastreg request is present, rds_iw_xmit
- * will try to queue a LOCAL_INV (if needed) and a FAST_REG_MR work request
+ * will try to queue a LOCAL_INV (if needed) and a REG_MR work request
* before queuing the SEND. When completions for these arrive, they are
* dispatched to the MR has a bit set showing that RDMa can be performed.
*
* The expectation there is that this invalidation step includes ALL
* PREVIOUSLY FREED MRs.
*/
- static int rds_iw_init_fastreg(struct rds_iw_mr_pool *pool,
- struct rds_iw_mr *ibmr)
+ static int rds_iw_init_reg(struct rds_iw_mr_pool *pool,
+ struct rds_iw_mr *ibmr)
{
struct rds_iw_device *rds_iwdev = pool->device;
- struct ib_fast_reg_page_list *page_list = NULL;
struct ib_mr *mr;
int err;
return err;
}
- /* FIXME - this is overkill, but mapping->m_sg.dma_len/mapping->m_sg.dma_npages
- * is not filled in.
- */
- page_list = ib_alloc_fast_reg_page_list(rds_iwdev->dev, pool->max_message_size);
- if (IS_ERR(page_list)) {
- err = PTR_ERR(page_list);
-
- printk(KERN_WARNING "RDS/IW: ib_alloc_fast_reg_page_list failed (err=%d)\n", err);
- ib_dereg_mr(mr);
- return err;
- }
-
- ibmr->page_list = page_list;
ibmr->mr = mr;
return 0;
}
- static int rds_iw_rdma_build_fastreg(struct rds_iw_mapping *mapping)
+ static int rds_iw_rdma_reg_mr(struct rds_iw_mapping *mapping)
{
struct rds_iw_mr *ibmr = mapping->m_mr;
- struct ib_send_wr f_wr, *failed_wr;
- int ret;
+ struct rds_iw_scatterlist *m_sg = &mapping->m_sg;
+ struct ib_reg_wr reg_wr;
+ struct ib_send_wr *failed_wr;
+ int ret, n;
+
+ n = ib_map_mr_sg_zbva(ibmr->mr, m_sg->list, m_sg->len, PAGE_SIZE);
+ if (unlikely(n != m_sg->len))
+ return n < 0 ? n : -EINVAL;
+
+ reg_wr.wr.next = NULL;
+ reg_wr.wr.opcode = IB_WR_REG_MR;
+ reg_wr.wr.wr_id = RDS_IW_REG_WR_ID;
+ reg_wr.wr.num_sge = 0;
+ reg_wr.mr = ibmr->mr;
+ reg_wr.key = mapping->m_rkey;
+ reg_wr.access = IB_ACCESS_LOCAL_WRITE |
+ IB_ACCESS_REMOTE_READ |
+ IB_ACCESS_REMOTE_WRITE;
/*
- * Perform a WR for the fast_reg_mr. Each individual page
+ * Perform a WR for the reg_mr. Each individual page
* in the sg list is added to the fast reg page list and placed
- * inside the fast_reg_mr WR. The key used is a rolling 8bit
+ * inside the reg_mr WR. The key used is a rolling 8bit
* counter, which should guarantee uniqueness.
*/
ib_update_fast_reg_key(ibmr->mr, ibmr->remap_count++);
mapping->m_rkey = ibmr->mr->rkey;
- memset(&f_wr, 0, sizeof(f_wr));
- f_wr.wr_id = RDS_IW_FAST_REG_WR_ID;
- f_wr.opcode = IB_WR_FAST_REG_MR;
- f_wr.wr.fast_reg.length = mapping->m_sg.bytes;
- f_wr.wr.fast_reg.rkey = mapping->m_rkey;
- f_wr.wr.fast_reg.page_list = ibmr->page_list;
- f_wr.wr.fast_reg.page_list_len = mapping->m_sg.dma_len;
- f_wr.wr.fast_reg.page_shift = PAGE_SHIFT;
- f_wr.wr.fast_reg.access_flags = IB_ACCESS_LOCAL_WRITE |
- IB_ACCESS_REMOTE_READ |
- IB_ACCESS_REMOTE_WRITE;
- f_wr.wr.fast_reg.iova_start = 0;
- f_wr.send_flags = IB_SEND_SIGNALED;
-
- failed_wr = &f_wr;
- ret = ib_post_send(ibmr->cm_id->qp, &f_wr, &failed_wr);
- BUG_ON(failed_wr != &f_wr);
+ failed_wr = ®_wr.wr;
+ ret = ib_post_send(ibmr->cm_id->qp, ®_wr.wr, &failed_wr);
+ BUG_ON(failed_wr != ®_wr.wr);
if (ret)
printk_ratelimited(KERN_WARNING "RDS/IW: %s:%d ib_post_send returned %d\n",
__func__, __LINE__, ret);
return ret;
}
- static int rds_iw_map_fastreg(struct rds_iw_mr_pool *pool,
- struct rds_iw_mr *ibmr,
- struct scatterlist *sg,
- unsigned int sg_len)
+ static int rds_iw_map_reg(struct rds_iw_mr_pool *pool,
+ struct rds_iw_mr *ibmr,
+ struct scatterlist *sg,
+ unsigned int sg_len)
{
struct rds_iw_device *rds_iwdev = pool->device;
struct rds_iw_mapping *mapping = &ibmr->mapping;
u64 *dma_pages;
- int i, ret = 0;
+ int ret = 0;
rds_iw_set_scatterlist(&mapping->m_sg, sg, sg_len);
- dma_pages = rds_iw_map_scatterlist(rds_iwdev, &mapping->m_sg);
- if (IS_ERR(dma_pages)) {
- ret = PTR_ERR(dma_pages);
+ ret = rds_iw_map_scatterlist(rds_iwdev, &mapping->m_sg);
+ if (ret) {
dma_pages = NULL;
goto out;
}
goto out;
}
- for (i = 0; i < mapping->m_sg.dma_npages; ++i)
- ibmr->page_list->page_list[i] = dma_pages[i];
-
- ret = rds_iw_rdma_build_fastreg(mapping);
+ ret = rds_iw_rdma_reg_mr(mapping);
if (ret)
goto out;
static void rds_iw_destroy_fastreg(struct rds_iw_mr_pool *pool,
struct rds_iw_mr *ibmr)
{
- if (ibmr->page_list)
- ib_free_fast_reg_page_list(ibmr->page_list);
if (ibmr->mr)
ib_dereg_mr(ibmr->mr);
}
u64 rs_offset,
bool last)
{
- struct ib_send_wr read_wr;
+ struct ib_rdma_wr read_wr;
int pages_needed = PAGE_ALIGN(*page_offset + rs_length) >> PAGE_SHIFT;
struct svc_rdma_op_ctxt *ctxt = svc_rdma_get_context(xprt);
int ret, read, pno;
ctxt->direction = DMA_FROM_DEVICE;
ctxt->read_hdr = head;
pages_needed = min_t(int, pages_needed, xprt->sc_max_sge_rd);
- read = min_t(int, pages_needed << PAGE_SHIFT, rs_length);
+ read = min_t(int, (pages_needed << PAGE_SHIFT) - *page_offset,
+ rs_length);
for (pno = 0; pno < pages_needed; pno++) {
int len = min_t(int, rs_length, PAGE_SIZE - pg_off);
clear_bit(RDMACTXT_F_LAST_CTXT, &ctxt->flags);
memset(&read_wr, 0, sizeof(read_wr));
- read_wr.wr_id = (unsigned long)ctxt;
- read_wr.opcode = IB_WR_RDMA_READ;
- ctxt->wr_op = read_wr.opcode;
- read_wr.send_flags = IB_SEND_SIGNALED;
- read_wr.wr.rdma.rkey = rs_handle;
- read_wr.wr.rdma.remote_addr = rs_offset;
- read_wr.sg_list = ctxt->sge;
- read_wr.num_sge = pages_needed;
-
- ret = svc_rdma_send(xprt, &read_wr);
+ read_wr.wr.wr_id = (unsigned long)ctxt;
+ read_wr.wr.opcode = IB_WR_RDMA_READ;
+ ctxt->wr_op = read_wr.wr.opcode;
+ read_wr.wr.send_flags = IB_SEND_SIGNALED;
+ read_wr.rkey = rs_handle;
+ read_wr.remote_addr = rs_offset;
+ read_wr.wr.sg_list = ctxt->sge;
+ read_wr.wr.num_sge = pages_needed;
+
+ ret = svc_rdma_send(xprt, &read_wr.wr);
if (ret) {
pr_err("svcrdma: Error %d posting RDMA_READ\n", ret);
set_bit(XPT_CLOSE, &xprt->sc_xprt.xpt_flags);
u64 rs_offset,
bool last)
{
- struct ib_send_wr read_wr;
+ struct ib_rdma_wr read_wr;
struct ib_send_wr inv_wr;
- struct ib_send_wr fastreg_wr;
+ struct ib_reg_wr reg_wr;
u8 key;
- int pages_needed = PAGE_ALIGN(*page_offset + rs_length) >> PAGE_SHIFT;
+ int nents = PAGE_ALIGN(*page_offset + rs_length) >> PAGE_SHIFT;
struct svc_rdma_op_ctxt *ctxt = svc_rdma_get_context(xprt);
struct svc_rdma_fastreg_mr *frmr = svc_rdma_get_frmr(xprt);
- int ret, read, pno;
+ int ret, read, pno, dma_nents, n;
u32 pg_off = *page_offset;
u32 pg_no = *page_no;
ctxt->direction = DMA_FROM_DEVICE;
ctxt->frmr = frmr;
- pages_needed = min_t(int, pages_needed, xprt->sc_frmr_pg_list_len);
- read = min_t(int, (pages_needed << PAGE_SHIFT) - *page_offset,
- rs_length);
+ nents = min_t(unsigned int, nents, xprt->sc_frmr_pg_list_len);
- read = min_t(int, nents << PAGE_SHIFT, rs_length);
++ read = min_t(int, (nents << PAGE_SHIFT) - *page_offset, rs_length);
- frmr->kva = page_address(rqstp->rq_arg.pages[pg_no]);
frmr->direction = DMA_FROM_DEVICE;
frmr->access_flags = (IB_ACCESS_LOCAL_WRITE|IB_ACCESS_REMOTE_WRITE);
- frmr->map_len = pages_needed << PAGE_SHIFT;
- frmr->page_list_len = pages_needed;
+ frmr->sg_nents = nents;
- for (pno = 0; pno < pages_needed; pno++) {
+ for (pno = 0; pno < nents; pno++) {
int len = min_t(int, rs_length, PAGE_SIZE - pg_off);
head->arg.pages[pg_no] = rqstp->rq_arg.pages[pg_no];
head->arg.len += len;
if (!pg_off)
head->count++;
+
+ sg_set_page(&frmr->sg[pno], rqstp->rq_arg.pages[pg_no],
+ len, pg_off);
+
rqstp->rq_respages = &rqstp->rq_arg.pages[pg_no+1];
rqstp->rq_next_page = rqstp->rq_respages + 1;
- frmr->page_list->page_list[pno] =
- ib_dma_map_page(xprt->sc_cm_id->device,
- head->arg.pages[pg_no], 0,
- PAGE_SIZE, DMA_FROM_DEVICE);
- ret = ib_dma_mapping_error(xprt->sc_cm_id->device,
- frmr->page_list->page_list[pno]);
- if (ret)
- goto err;
- atomic_inc(&xprt->sc_dma_used);
/* adjust offset and wrap to next page if needed */
pg_off += len;
else
clear_bit(RDMACTXT_F_LAST_CTXT, &ctxt->flags);
+ dma_nents = ib_dma_map_sg(xprt->sc_cm_id->device,
+ frmr->sg, frmr->sg_nents,
+ frmr->direction);
+ if (!dma_nents) {
+ pr_err("svcrdma: failed to dma map sg %p\n",
+ frmr->sg);
+ return -ENOMEM;
+ }
+ atomic_inc(&xprt->sc_dma_used);
+
+ n = ib_map_mr_sg(frmr->mr, frmr->sg, frmr->sg_nents, PAGE_SIZE);
+ if (unlikely(n != frmr->sg_nents)) {
+ pr_err("svcrdma: failed to map mr %p (%d/%d elements)\n",
+ frmr->mr, n, frmr->sg_nents);
+ return n < 0 ? n : -EINVAL;
+ }
+
/* Bump the key */
key = (u8)(frmr->mr->lkey & 0x000000FF);
ib_update_fast_reg_key(frmr->mr, ++key);
- ctxt->sge[0].addr = (unsigned long)frmr->kva + *page_offset;
+ ctxt->sge[0].addr = frmr->mr->iova;
ctxt->sge[0].lkey = frmr->mr->lkey;
- ctxt->sge[0].length = read;
+ ctxt->sge[0].length = frmr->mr->length;
ctxt->count = 1;
ctxt->read_hdr = head;
- /* Prepare FASTREG WR */
- memset(&fastreg_wr, 0, sizeof(fastreg_wr));
- fastreg_wr.opcode = IB_WR_FAST_REG_MR;
- fastreg_wr.send_flags = IB_SEND_SIGNALED;
- fastreg_wr.wr.fast_reg.iova_start = (unsigned long)frmr->kva;
- fastreg_wr.wr.fast_reg.page_list = frmr->page_list;
- fastreg_wr.wr.fast_reg.page_list_len = frmr->page_list_len;
- fastreg_wr.wr.fast_reg.page_shift = PAGE_SHIFT;
- fastreg_wr.wr.fast_reg.length = frmr->map_len;
- fastreg_wr.wr.fast_reg.access_flags = frmr->access_flags;
- fastreg_wr.wr.fast_reg.rkey = frmr->mr->lkey;
- fastreg_wr.next = &read_wr;
+ /* Prepare REG WR */
+ reg_wr.wr.opcode = IB_WR_REG_MR;
+ reg_wr.wr.wr_id = 0;
+ reg_wr.wr.send_flags = IB_SEND_SIGNALED;
+ reg_wr.wr.num_sge = 0;
+ reg_wr.mr = frmr->mr;
+ reg_wr.key = frmr->mr->lkey;
+ reg_wr.access = frmr->access_flags;
+ reg_wr.wr.next = &read_wr.wr;
/* Prepare RDMA_READ */
memset(&read_wr, 0, sizeof(read_wr));
- read_wr.send_flags = IB_SEND_SIGNALED;
- read_wr.wr.rdma.rkey = rs_handle;
- read_wr.wr.rdma.remote_addr = rs_offset;
- read_wr.sg_list = ctxt->sge;
- read_wr.num_sge = 1;
+ read_wr.wr.send_flags = IB_SEND_SIGNALED;
+ read_wr.rkey = rs_handle;
+ read_wr.remote_addr = rs_offset;
+ read_wr.wr.sg_list = ctxt->sge;
+ read_wr.wr.num_sge = 1;
if (xprt->sc_dev_caps & SVCRDMA_DEVCAP_READ_W_INV) {
- read_wr.opcode = IB_WR_RDMA_READ_WITH_INV;
- read_wr.wr_id = (unsigned long)ctxt;
- read_wr.ex.invalidate_rkey = ctxt->frmr->mr->lkey;
+ read_wr.wr.opcode = IB_WR_RDMA_READ_WITH_INV;
+ read_wr.wr.wr_id = (unsigned long)ctxt;
+ read_wr.wr.ex.invalidate_rkey = ctxt->frmr->mr->lkey;
} else {
- read_wr.opcode = IB_WR_RDMA_READ;
- read_wr.next = &inv_wr;
+ read_wr.wr.opcode = IB_WR_RDMA_READ;
+ read_wr.wr.next = &inv_wr;
/* Prepare invalidate */
memset(&inv_wr, 0, sizeof(inv_wr));
inv_wr.wr_id = (unsigned long)ctxt;
inv_wr.send_flags = IB_SEND_SIGNALED | IB_SEND_FENCE;
inv_wr.ex.invalidate_rkey = frmr->mr->lkey;
}
- ctxt->wr_op = read_wr.opcode;
+ ctxt->wr_op = read_wr.wr.opcode;
/* Post the chain */
- ret = svc_rdma_send(xprt, &fastreg_wr);
+ ret = svc_rdma_send(xprt, ®_wr.wr);
if (ret) {
pr_err("svcrdma: Error %d posting RDMA_READ\n", ret);
set_bit(XPT_CLOSE, &xprt->sc_xprt.xpt_flags);
atomic_inc(&rdma_stat_read);
return ret;
err:
- svc_rdma_unmap_dma(ctxt);
+ ib_dma_unmap_sg(xprt->sc_cm_id->device,
+ frmr->sg, frmr->sg_nents, frmr->direction);
svc_rdma_put_context(ctxt, 0);
svc_rdma_put_frmr(xprt, frmr);
return ret;
rqstp->rq_arg.page_base = head->arg.page_base;
/* rq_respages starts after the last arg page */
- rqstp->rq_respages = &rqstp->rq_arg.pages[page_no];
+ rqstp->rq_respages = &rqstp->rq_pages[page_no];
rqstp->rq_next_page = rqstp->rq_respages + 1;
/* Rebuild rq_arg head and tail. */
init_completion(&ia->ri_done);
- id = rdma_create_id(rpcrdma_conn_upcall, xprt, RDMA_PS_TCP, IB_QPT_RC);
+ id = rdma_create_id(&init_net, rpcrdma_conn_upcall, xprt, RDMA_PS_TCP,
+ IB_QPT_RC);
if (IS_ERR(id)) {
rc = PTR_ERR(id);
dprintk("RPC: %s: rdma_create_id() failed %i\n",
cancel_delayed_work_sync(&ep->rep_connect_worker);
- if (ia->ri_id->qp) {
+ if (ia->ri_id->qp)
rpcrdma_ep_disconnect(ep, ia);
+
+ rpcrdma_clean_cq(ep->rep_attr.recv_cq);
+ rpcrdma_clean_cq(ep->rep_attr.send_cq);
+
+ if (ia->ri_id->qp) {
rdma_destroy_qp(ia->ri_id);
ia->ri_id->qp = NULL;
}
- rpcrdma_clean_cq(ep->rep_attr.recv_cq);
rc = ib_destroy_cq(ep->rep_attr.recv_cq);
if (rc)
dprintk("RPC: %s: ib_destroy_cq returned %i\n",
__func__, rc);
- rpcrdma_clean_cq(ep->rep_attr.send_cq);
rc = ib_destroy_cq(ep->rep_attr.send_cq);
if (rc)
dprintk("RPC: %s: ib_destroy_cq returned %i\n",