Bjorn Helgaas [Thu, 27 Mar 2025 18:14:46 +0000 (13:14 -0500)]
Merge branch 'pci/dt-bindings'
- Add qcom,pcie-ipq5332 binding (Varadarajan Narayanan)
- Convert fsl,mpc83xx-pcie binding to YAML (J. Neuschäfer)
- Add qcom i.MX8QM and i.MX8QXP/DXP optional DMA interrupt (Alexander
Stein)
- Drop deprecated layerscape 'num-ib-windows' and 'num-ob-windows' from
example (Krzysztof Kozlowski)
- Drop unnecessary layerscape 'status' from example (Krzysztof Kozlowski)
- Add common pci-ep-bus.yaml schema for exporting several peripherals of a
single PCI function via devicetree (Andrea della Porta)
* pci/dt-bindings:
dt-bindings: PCI: Add common schema for devices accessible through PCI BARs
dt-bindings: PCI: fsl,layerscape-pcie-ep: Drop unnecessary status from example
dt-bindings: PCI: fsl,layerscape-pcie-ep: Drop deprecated windows
dt-bindings: PCI: fsl,imx6q-pcie: Add optional DMA interrupt
dt-bindings: PCI: Convert fsl,mpc83xx-pcie to YAML
dt-bindings: PCI: qcom: Document the IPQ5332 PCIe controller
Bjorn Helgaas [Thu, 27 Mar 2025 18:14:45 +0000 (13:14 -0500)]
Merge branch 'pci/devtree-create'
- Add device_add_of_node() to set dev->of_node and dev->fwnode only if they
haven't been set already (Herve Codina)
- Allow of_pci_set_address() to set the DT address property for root bus
nodes, where there is no PCI bridge to supply the PCI bus/device/function
part of the property (Herve Codina)
- Create DT nodes for PCI host bridges to enable loading device tree
overlays to create platform devices for PCI devices that have several
features that require multiple drivers (Herve Codina)
* pci/devtree-create:
PCI: of: Create device tree PCI host bridge node
PCI: of_property: Constify parameter in of_pci_get_addr_flags()
PCI: of_property: Add support for NULL pdev in of_pci_set_address()
PCI: of: Use device_{add,remove}_of_node() to attach of_node to existing device
driver core: Introduce device_{add,remove}_of_node()
Bjorn Helgaas [Thu, 27 Mar 2025 18:14:45 +0000 (13:14 -0500)]
Merge branch 'pci/resource'
- Use pci_resource_n() to simplify BAR/window resource lookup (Ilpo
Järvinen)
- Fix typo that repeatedly distributed resources to a bridge instead of
iterating over subordinate bridges, which resulted in too little space to
assign some BARs (Kai-Heng Feng)
- Relax bridge window tail sizing for optional resources, e.g., IOV BARs,
to avoid failures when removing and re-adding devices (Ilpo Järvinen)
- Fix a double counting error for I/O resources, as we previously did for
memory resources (Ilpo Järvinen)
- Use resource_set_{range,size}() helpers in more places (Ilpo Järvinen)
- Add pci_resource_is_iov() to identify IOV resources (Ilpo Järvinen)
- Add pci_resource_num() to look up the BAR number from the resource
pointer (Ilpo Järvinen)
- Add restore_dev_resource() to simplify code that resources saved device
resources (Ilpo Järvinen)
- Allow drivers to enable devices even if we haven't assigned optional IOV
resources to them (Ilpo Järvinen)
- Improve debug output during resource reallocation (Ilpo Järvinen)
- Rework handling of optional resources (IOV BARs, ROMs) to reduce failures
if we can't allocate them (Ilpo Järvinen)
- Move declarations of pci_rescan_bus_bridge_resize(),
pci_reassign_bridge_resources(), and CardBus-related sizes from
include/linux/pci.h to drivers/pci/pci.h since they're not used outside
the PCI core (Ilpo Järvinen)
- Make pci_setup_bridge() static (Ilpo Järvinen)
- Fix a NULL dereference in the SR-IOV VF creation error path (Shay Drory)
- Fix s390 mmio_read/write syscalls, which didn't cause page faults in some
cases, which broke vfio-pci lazy mapping on first access (Niklas
Schnelle)
- Add pdev->non_mappable_bars to replace CONFIG_VFIO_PCI_MMAP, which was
disabled only for s390 (Niklas Schnelle)
- Support mmap of PCI resources on s390 except for ISM devices (Niklas
Schnelle)
* pci/resource:
s390/pci: Support mmap() of PCI resources except for ISM devices
s390/pci: Introduce pdev->non_mappable_bars and replace VFIO_PCI_MMAP
s390/pci: Fix s390_mmio_read/write syscall page fault handling
PCI: Fix NULL dereference in SR-IOV VF creation error path
PCI: Move cardbus IO size declarations into pci/pci.h
PCI: Make pci_setup_bridge() static
PCI: Move resource reassignment func declarations into pci/pci.h
PCI: Move pci_rescan_bus_bridge_resize() declaration to pci/pci.h
PCI: Fix BAR resizing when VF BARs are assigned
PCI: Do not claim to release resource falsely
PCI: Increase Resizable BAR support from 512 GB to 128 TB
PCI: Rework optional resource handling
PCI: Perform reset_resource() and build fail list in sync
PCI: Use res->parent to check if resource is assigned
PCI: Add debug print when releasing resources before retry
PCI: Indicate optional resource assignment failures
PCI: Always have realloc_head in __assign_resources_sorted()
PCI: Extend enable to check for any optional resource
PCI: Add restore_dev_resource()
PCI: Remove incorrect comment from pci_reassign_resource()
PCI: Consolidate assignment loop next round preparation
PCI: Rename retval to ret
PCI: Use while loop and break instead of gotos
PCI: Refactor pdev_sort_resources() & __dev_sort_resources()
PCI: Converge return paths in __assign_resources_sorted()
PCI: Add dev & res local variables to resource assignment funcs
PCI: Add pci_resource_num() helper
PCI: Check resource_size() separately
PCI: Add pci_resource_is_iov() to identify IOV resources
PCI: Use resource_set_{range,size}() helpers
PCI: Use SZ_* instead of literals in setup-bus.c
PCI: Fix old_size lower bound in calculate_iosize() too
PCI: Allow relaxed bridge window tail sizing for optional resources
PCI: Simplify size1 assignment logic
PCI: Use min_align, not unrelated add_align, for size0
PCI: Remove add_align overwrite unrelated to size0
PCI: Use downstream bridges for distributing resources
PCI: Cleanup dev->resource + resno to use pci_resource_n()
Bjorn Helgaas [Thu, 27 Mar 2025 18:14:45 +0000 (13:14 -0500)]
Merge branch 'pci/reset'
- Log debug messages about reset methods being used (Bjorn Helgaas)
- Avoid reset when it has been disabled via sysfs (Nishanth Aravamudan)
* pci/reset:
PCI: Avoid reset when disabled via sysfs
PCI: Log debug messages about reset method
Bjorn Helgaas [Thu, 27 Mar 2025 18:14:44 +0000 (13:14 -0500)]
Merge branch 'pci/pwrctrl'
- Create pwrctrl devices in pci_scan_device() to make it more symmetric
with pci_pwrctrl_unregister() and make pwrctrl devices for PCI bridges
possible (Manivannan Sadhasivam)
- Unregister pwrctrl devices in pci_destroy_dev() so DOE, ASPM, etc. can
still access devices after pci_stop_dev() (Manivannan Sadhasivam)
- If there's a pwrctrl device for a PCI device, skip scanning it because
the pwrctrl core will rescan the bus after the device is powered on
(Manivannan Sadhasivam)
- Add a pwrctrl driver for PCI slots based on voltage regulators described
via devicetree (Manivannan Sadhasivam)
* pci/pwrctrl:
PCI/pwrctrl: Add pwrctrl driver for PCI slots
dt-bindings: vendor-prefixes: Document the 'pciclass' prefix
PCI/pwrctrl: Skip scanning for the device further if pwrctrl device is created
PCI/pwrctrl: Move pci_pwrctrl_unregister() to pci_destroy_dev()
PCI/pwrctrl: Move creation of pwrctrl devices to pci_scan_device()
Bjorn Helgaas [Thu, 27 Mar 2025 18:14:44 +0000 (13:14 -0500)]
Merge branch 'pci/pm'
- Allow PCI bridges to go to D3Hot on all non-x86 systems (Manivannan
Sadhasivam)
* pci/pm:
PCI: Allow PCI bridges to go to D3Hot on all non-x86
Bjorn Helgaas [Thu, 27 Mar 2025 18:14:44 +0000 (13:14 -0500)]
Merge branch 'pci/hotplug'
- Drop shpchp module init/exit logging (Ilpo Järvinen)
- Replace shpchp dbg() with ctrl_dbg() and remove unused dbg(), err(),
info(), warn() wrappers (Ilpo Järvinen)
- Drop 'shpchp_debug' module parameter in favor of standard dynamic
debugging (Ilpo Järvinen)
- Drop unused .get_power(), .set_power() function pointers (Guilherme
Giacomo Simoes)
- Drop superfluous pci_hotplug_slot_list (Lukas Wunner)
- Drop superfluous try_module_get() calls (Lukas Wunner)
- Drop superfluous NULL pointer checks (Lukas Wunner)
- Pass struct hotplug_slot pointers directly to avoid backpointer
dereferencing in has_*_file() (Lukas Wunner)
- Inline pci_hp_{create,remove}_module_link() to reduce exported symbols
(Lukas Wunner)
- Disable hotplug interrupts in portdrv only when pciehp is not enabled to
prevent issuing two hotplug commands too close together (Feng Tang)
- Skip pciehp 'device replaced' check if the device has been removed to
address a common deadlock when resuming after a device was removed during
system sleep (Lukas Wunner)
- Don't enable pciehp hotplug interupt when resuming in poll mode (Ilpo
Järvinen)
* pci/hotplug:
PCI: pciehp: Don't enable HPIE when resuming in poll mode
PCI: pciehp: Avoid unnecessary device replacement check
PCI/portdrv: Only disable pciehp interrupts early when needed
PCI: hotplug: Inline pci_hp_{create,remove}_module_link()
PCI: hotplug: Avoid backpointer dereferencing in has_*_file()
PCI: hotplug: Drop superfluous NULL pointer checks in has_*_file()
PCI: hotplug: Drop superfluous try_module_get() calls
PCI: hotplug: Drop superfluous pci_hotplug_slot_list
PCI: cpcihp: Remove unused .get_power() and .set_power()
PCI: shpchp: Remove 'shpchp_debug' module parameter
PCI: shpchp: Remove unused logging wrappers
PCI: shpchp: Change dbg() -> ctrl_dbg()
PCI: shpchp: Remove logging from module init/exit functions
Bjorn Helgaas [Thu, 27 Mar 2025 18:14:43 +0000 (13:14 -0500)]
Merge branch 'pci/enumeration'
- Enable Configuration RRS SV early instead of during child bus scanning
(Bjorn Helgaas)
- Cache offset of Resizable BAR capability to avoid redundant searches for
it (Bjorn Helgaas)
- Fix reference leaks in pci_register_host_bridge() and
pci_alloc_child_bus() (Ma Ke)
- Drop put_device() in pci_register_host_bridge() left over from converting
device_register() to device_add() (Dan Carpenter)
* pci/enumeration:
PCI: Remove stray put_device() in pci_register_host_bridge()
PCI: Fix reference leak in pci_alloc_child_bus()
PCI: Fix reference leak in pci_register_host_bridge()
PCI: Cache offset of Resizable BAR capability
PCI: Enable Configuration RRS SV early
Bjorn Helgaas [Thu, 27 Mar 2025 18:14:43 +0000 (13:14 -0500)]
Merge branch 'pci/doe'
- Rename DOE 'protocol' to 'feature' to follow spec terminology (Alistair
Francis)
- Expose supported DOE features via sysfs (Alistair Francis)
- Allow DOE support to be enabled even if CXL isn't enabled (Alistair
Francis)
* pci/doe:
PCI/DOE: Allow enabling DOE without CXL
PCI/DOE: Expose DOE features via sysfs
PCI/DOE: Rename Discovery Response Data Object Contents to type
PCI/DOE: Rename DOE protocol to feature
Bjorn Helgaas [Thu, 27 Mar 2025 18:14:43 +0000 (13:14 -0500)]
Merge branch 'pci/devres'
- Enlarge the devres table[] to accommodate bridge windows, ROM, IOV BARs,
etc (Philipp Stanner)
- Validate BAR index in devres interfaces (Philipp Stanner)
* pci/devres:
PCI: Check BAR index for validity
PCI: Fix wrong length of devres array
Bjorn Helgaas [Thu, 27 Mar 2025 18:14:42 +0000 (13:14 -0500)]
Merge branch 'pci/bwctrl'
- Add set_pcie_speed.sh to TEST_PROGS to fix issue when executing the
set_pcie_cooling_state.sh test case (Yi Lai)
- Fix the pcie_bwctrl_select_speed() return value in cases where a
non-compliant device doesn't advertise valid supported speeds (Ilpo
Järvinen)
- Avoid a NULL pointer dereference when we run out of bus numbers to assign
for a bridge secondary bus (Lukas Wunner)
* pci/bwctrl:
PCI/bwctrl: Fix NULL pointer dereference on bus number exhaustion
PCI/bwctrl: Fix pcie_bwctrl_select_speed() return type
selftests/pcie_bwctrl: Add 'set_pcie_speed.sh' to TEST_PROGS
Bjorn Helgaas [Thu, 27 Mar 2025 18:14:42 +0000 (13:14 -0500)]
Merge branch 'pci/aspm'
- Delay pcie_link_state deallocation to avoid dangling pointers that cause
invalid references during hot-unplug (Daniel Stodden)
* pci/aspm:
PCI/ASPM: Fix link state exit during switch upstream function removal
Bjorn Helgaas [Thu, 27 Mar 2025 18:14:42 +0000 (13:14 -0500)]
Merge branch 'pci/aer'
- Implement local aer_printk() since AER is the only place that prints a
message with level depending on the error severity (Ilpo Järvinen)
* pci/aer:
PCI/ERR: Handle TLP Log in Flit mode
PCI: Track Flit Mode Status & print it with link status
PCI/AER: Descope pci_printk() to aer_printk()
Bjorn Helgaas [Thu, 27 Mar 2025 18:14:41 +0000 (13:14 -0500)]
Merge branch 'pci/acs'
- Fix bugs in 'pci=config_acs=' kernel command line parameter (Tushar Dave)
* pci/acs:
PCI/ACS: Fix 'pci=config_acs=' parameter
Andrea della Porta [Wed, 19 Mar 2025 21:52:24 +0000 (22:52 +0100)]
dt-bindings: PCI: Add common schema for devices accessible through PCI BARs
Common YAML schema for devices that exports internal peripherals through
PCI BARs. The BARs are exposed as simple-buses through which the
peripherals can be accessed.
This is not intended to be used as a standalone binding, but should be
included by device specific bindings.
Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: fix typo]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/096ab7addb39e498e28ac2526c07157cc9327c42.1742418429.git.andrea.porta@suse.com
Lukas Wunner [Sat, 22 Mar 2025 18:52:08 +0000 (19:52 +0100)]
PCI/bwctrl: Fix NULL pointer dereference on bus number exhaustion
When BIOS neglects to assign bus numbers to PCI bridges, the kernel
attempts to correct that during PCI device enumeration. If it runs out
of bus numbers, no pci_bus is allocated and the "subordinate" pointer in
the bridge's pci_dev remains NULL.
The PCIe bandwidth controller erroneously does not check for a NULL
subordinate pointer and dereferences it on probe.
Bandwidth control of unusable devices below the bridge is of questionable
utility, so simply error out instead. This mirrors what PCIe hotplug does
since commit
62e4492c3063 ("PCI: Prevent NULL dereference during pciehp
probe").
The PCI core emits a message with KERN_INFO severity if it has run out of
bus numbers. PCIe hotplug emits an additional message with KERN_ERR
severity to inform the user that hotplug functionality is disabled at the
bridge. A similar message for bandwidth control does not seem merited,
given that its only purpose so far is to expose an up-to-date link speed
in sysfs and throttle the link speed on certain laptops with limited
Thermal Design Power. So error out silently.
User-visible messages:
pci 0000:16:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[...]
pci_bus 0000:45: busn_res: [bus 45-74] end is updated to 74
pci 0000:16:02.0: devices behind bridge are unusable because [bus 45-74] cannot be assigned for them
[...]
pcieport 0000:16:02.0: pciehp: Hotplug bridge without secondary bus, ignoring
[...]
BUG: kernel NULL pointer dereference
RIP: pcie_update_link_speed
pcie_bwnotif_enable
pcie_bwnotif_probe
pcie_port_probe_service
really_probe
Fixes:
665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller")
Reported-by: Wouter Bijlsma <wouter@wouterbijlsma.nl>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219906
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Tested-by: Wouter Bijlsma <wouter@wouterbijlsma.nl>
Cc: stable@vger.kernel.org # v6.13+
Link: https://lore.kernel.org/r/3b6c8d973aedc48860640a9d75d20528336f1f3c.1742669372.git.lukas@wunner.de
Alistair Francis [Thu, 6 Mar 2025 07:52:11 +0000 (17:52 +1000)]
PCI/DOE: Allow enabling DOE without CXL
PCIe devices (not CXL) can support DOE as well, so allow DOE to be enabled
even if CXL isn't.
Link: https://lore.kernel.org/r/20250306075211.1855177-4-alistair@alistair23.me
Signed-off-by: Alistair Francis <alistair@alistair23.me>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Alistair Francis [Thu, 6 Mar 2025 07:52:10 +0000 (17:52 +1000)]
PCI/DOE: Expose DOE features via sysfs
PCIe r6.0 added support for Data Object Exchange (DOE). When DOE is
supported, the DOE Discovery Feature must be implemented per PCIe r6.1, sec
6.30.1.1. DOE allows a requester to obtain information about the other DOE
features supported by the device.
The kernel already queries the DOE features supported and caches the
values. Expose the values in sysfs to allow user space to determine which
DOE features are supported by the PCIe device.
By exposing the information to userspace, tools like lspci can relay the
information to users. By listing all of the supported features we can allow
userspace to parse the list, which might include vendor specific features
as well as yet to be supported features.
As the DOE Discovery feature must always be supported we treat it as a
special named attribute case. This allows the usual PCI attribute_group
handling to correctly create the doe_features directory when registering
pci_doe_sysfs_group (otherwise it doesn't and sysfs_add_file_to_group()
will seg fault).
After this patch is supported you can see something like this when
attaching a DOE device:
$ ls /sys/devices/pci0000:00/0000:00:02.0//doe*
0001:01 0001:02 doe_discovery
Link: https://lore.kernel.org/r/20250306075211.1855177-3-alistair@alistair23.me
Signed-off-by: Alistair Francis <alistair@alistair23.me>
[bhelgaas: drop pci_doe_sysfs_init() stub return, make
DEVICE_ATTR_RO(doe_discovery) static]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Niklas Schnelle [Wed, 26 Feb 2025 12:07:47 +0000 (13:07 +0100)]
s390/pci: Support mmap() of PCI resources except for ISM devices
So far s390 does not allow mmap() of PCI resources to user-space via the
usual mechanisms, though it does use it for RDMA. For the PCI sysfs
resource files and /proc/bus/pci it defines neither HAVE_PCI_MMAP nor
ARCH_GENERIC_PCI_MMAP_RESOURCE. For vfio-pci s390 previously relied on
disabled VFIO_PCI_MMAP and now relies on setting pdev->non_mappable_bars
for all devices.
This is partly because access to mapped PCI resources from user-space
requires special PCI load/store memory-I/O (MIO) instructions, or the
special MMIO syscalls when these are not available. Still, such access is
possible and useful not just for RDMA, in fact not being able to mmap() PCI
resources has previously caused extra work when testing devices.
One thing that doesn't work with PCI resources mapped to user-space though
is the s390 specific virtual ISM device. Not only because the BAR size of
256 TiB prevents mapping the whole BAR but also because access requires use
of the legacy PCI instructions which are not accessible to user-space on
systems with the newer MIO PCI instructions.
Now with the pdev->non_mappable_bars flag ISM can be excluded from mapping
its resources while making this functionality available for all other PCI
devices. To this end introduce a minimal implementation of PCI_QUIRKS and
use that to set pdev->non_mappable_bars for ISM devices only. Then also set
ARCH_GENERIC_PCI_MMAP_RESOURCE to take advantage of the generic
implementation of pci_mmap_resource_range() enabling only the newer sysfs
mmap() interface. This follows the recommendation in
Documentation/PCI/sysfs-pci.rst.
Link: https://lore.kernel.org/r/20250226-vfio_pci_mmap-v7-3-c5c0f1d26efd@linux.ibm.com
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Niklas Schnelle [Wed, 26 Feb 2025 12:07:46 +0000 (13:07 +0100)]
s390/pci: Introduce pdev->non_mappable_bars and replace VFIO_PCI_MMAP
The ability to map PCI resources to user-space is controlled by global
defines. For vfio there is VFIO_PCI_MMAP which is only disabled on s390 and
controls mapping of PCI resources using vfio-pci with a fallback option via
the pread()/pwrite() interface.
For the PCI core there is ARCH_GENERIC_PCI_MMAP_RESOURCE which enables a
generic implementation for mapping PCI resources plus the newer sysfs
interface. Then there is HAVE_PCI_MMAP which can be used with custom
definitions of pci_mmap_resource_range() and the historical /proc/bus/pci
interface. Both mechanisms are all or nothing.
For s390 mapping PCI resources is possible and useful for testing and
certain applications such as QEMU's vfio-pci based user-space NVMe driver.
For certain devices, however access to PCI resources via mappings to
user-space is not possible and these must be excluded from the general PCI
resource mapping mechanisms.
Introduce pdev->non_mappable_bars to indicate that a PCI device's BARs can
not be accessed via mappings to user-space. In the future this enables
per-device restrictions of PCI resource mapping.
For now, set this flag for all PCI devices on s390 in line with the
existing, general disable of PCI resource mapping. As s390 is the only user
of the VFI_PCI_MMAP Kconfig options this can already be replaced with a
check of this new flag. Also add similar checks in the other code protected
by HAVE_PCI_MMAP respectively ARCH_GENERIC_PCI_MMAP in preparation for
enabling these for supported devices.
Link: https://lore.kernel.org/lkml/20250212132808.08dcf03c.alex.williamson@redhat.com/
Link: https://lore.kernel.org/r/20250226-vfio_pci_mmap-v7-2-c5c0f1d26efd@linux.ibm.com
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Niklas Schnelle [Wed, 26 Feb 2025 12:07:45 +0000 (13:07 +0100)]
s390/pci: Fix s390_mmio_read/write syscall page fault handling
The s390 MMIO syscalls when using the classic PCI instructions do not
cause a page fault when follow_pfnmap_start() fails due to the page not
being present. Besides being a general deficiency this breaks vfio-pci's
mmap() handling once VFIO_PCI_MMAP gets enabled as this lazily maps on
first access. Fix this by following a failed follow_pfnmap_start() with
fixup_user_page() and retrying the follow_pfnmap_start(). Also fix
a VM_READ vs VM_WRITE mixup in the read syscall.
Link: https://lore.kernel.org/r/20250226-vfio_pci_mmap-v7-1-c5c0f1d26efd@linux.ibm.com
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Shay Drory [Mon, 10 Mar 2025 08:45:24 +0000 (10:45 +0200)]
PCI: Fix NULL dereference in SR-IOV VF creation error path
Clean up when virtfn setup fails to prevent NULL pointer dereference
during device removal. The kernel oops below occurred due to incorrect
error handling flow when pci_setup_device() fails.
Add pci_iov_scan_device(), which handles virtfn allocation and setup and
cleans up if pci_setup_device() fails, so pci_iov_add_virtfn() doesn't need
to call pci_stop_and_remove_bus_device(). This prevents accessing
partially initialized virtfn devices during removal.
BUG: kernel NULL pointer dereference, address:
00000000000000d0
RIP: 0010:device_del+0x3d/0x3d0
Call Trace:
pci_remove_bus_device+0x7c/0x100
pci_iov_add_virtfn+0xfa/0x200
sriov_enable+0x208/0x420
mlx5_core_sriov_configure+0x6a/0x160 [mlx5_core]
sriov_numvfs_store+0xae/0x1a0
Link: https://lore.kernel.org/r/20250310084524.599225-1-shayd@nvidia.com
Fixes:
e3f30d563a38 ("PCI: Make pci_destroy_dev() concurrent safe")
Signed-off-by: Shay Drory <shayd@nvidia.com>
[bhelgaas: commit log, return ERR_PTR(-ENOMEM) directly]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Keith Busch <kbusch@kernel.org>
Ilpo Järvinen [Fri, 21 Mar 2025 16:31:03 +0000 (18:31 +0200)]
PCI/bwctrl: Fix pcie_bwctrl_select_speed() return type
pcie_bwctrl_select_speed() should take __fls() of the speed bit, not return
it as a raw value. Instead of directly returning 2.5GT/s speed bit, simply
assign the fallback speed (2.5GT/s) into supported_speeds variable to share
the normal return path that calls pcie_supported_speeds2target_speed() to
calculate __fls().
This code path is not very likely to execute because
pcie_get_supported_speeds() should provide valid ->supported_speeds but a
spec violating device could fail to synthesize any speed in
pcie_get_supported_speeds(). It could also happen in case the
supported_speeds intersection is empty (also a violation of the current
PCIe specs).
Link: https://lore.kernel.org/r/20250321163103.5145-1-ilpo.jarvinen@linux.intel.com
Fixes:
de9a6c8d5dbf ("PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link Speed")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Ilpo Järvinen [Fri, 21 Mar 2025 16:21:14 +0000 (18:21 +0200)]
PCI: pciehp: Don't enable HPIE when resuming in poll mode
PCIe hotplug can operate in poll mode without interrupt handlers using a
polling kthread only.
eb34da60edee ("PCI: pciehp: Disable hotplug
interrupt during suspend") failed to consider that and enables HPIE
(Hot-Plug Interrupt Enable) unconditionally when resuming the Port.
Only set HPIE if non-poll mode is in use. This makes
pcie_enable_interrupt() match how pcie_enable_notification() already
handles HPIE.
Link: https://lore.kernel.org/r/20250321162114.3939-1-ilpo.jarvinen@linux.intel.com
Fixes:
eb34da60edee ("PCI: pciehp: Disable hotplug interrupt during suspend")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Ilpo Järvinen [Tue, 11 Mar 2025 17:47:01 +0000 (19:47 +0200)]
PCI: Move cardbus IO size declarations into pci/pci.h
For some reason, cardbus related io/mem size declarations are in
linux/pci.h, whereas non-cardbus sizes are already in pci/pci.h.
Move all them into one place in pci/pci.h.
Link: https://lore.kernel.org/r/20250311174701.3586-4-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Ilpo Järvinen [Tue, 11 Mar 2025 17:47:00 +0000 (19:47 +0200)]
PCI: Make pci_setup_bridge() static
pci_setup_bridge() is only used within setup-bus.c. Therefore, make it a
static function.
Link: https://lore.kernel.org/r/20250311174701.3586-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Ilpo Järvinen [Tue, 11 Mar 2025 17:46:59 +0000 (19:46 +0200)]
PCI: Move resource reassignment func declarations into pci/pci.h
Neither pci_reassign_bridge_resources() nor pci_reassign_resource() is used
outside of the PCI subsystem. They seem to be naturally static functions
but since resource fitting/assignment is split between setup-bus.c and
setup-res.c, they fall into different sides of the divide and need to be
declared.
Move the declarations of pci_reassign_bridge_resources() and
pci_reassign_resource() into pci/pci.h to keep them internal to PCI
subsystem.
Link: https://lore.kernel.org/r/20250311174701.3586-2-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Ilpo Järvinen [Tue, 11 Mar 2025 17:46:58 +0000 (19:46 +0200)]
PCI: Move pci_rescan_bus_bridge_resize() declaration to pci/pci.h
pci_rescan_bus_bridge_resize() is only used by code inside PCI subsystem.
The comment also falsely advertises it to be for hotplug drivers, yet the
only caller is from sysfs store function. Move the function declaration
into pci/pci.h.
Link: https://lore.kernel.org/r/20250311174701.3586-1-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Ilpo Järvinen [Thu, 20 Mar 2025 14:28:37 +0000 (16:28 +0200)]
PCI: Fix BAR resizing when VF BARs are assigned
__resource_resize_store() attempts to release all resources of the device
before attempting the resize. The loop, however, only covers standard BARs
(< PCI_STD_NUM_BARS). If a device has VF BARs that are assigned,
pci_reassign_bridge_resources() finds the bridge window still has some
assigned child resources and returns -NOENT which makes
pci_resize_resource() to detect an error and abort the resize.
Change the release loop to cover all resources up to VF BARs which allows
the resize operation to release the bridge windows and attempt to assigned
them again with the different size.
If SR-IOV is enabled, disallow resize as it requires releasing also IOV
resources.
Link: https://lore.kernel.org/r/20250320142837.8027-1-ilpo.jarvinen@linux.intel.com
Fixes:
91fa127794ac ("PCI: Expose PCIe Resizable BAR support via sysfs")
Reported-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Manivannan Sadhasivam [Thu, 20 Mar 2025 18:06:04 +0000 (11:06 -0700)]
PCI: Allow PCI bridges to go to D3Hot on all non-x86
Currently, pci_bridge_d3_possible() encodes a variety of decision factors
when deciding whether a given bridge can be put into D3. A particular one
of note is for "recent enough PCIe ports." Per Rafael [0]:
"There were hardware issues related to PM on x86 platforms predating
the introduction of Connected Standby in Windows. For instance,
programming a port into D3hot by writing to its PMCSR might cause the
PCIe link behind it to go down and the only way to revive it was to
power cycle the Root Complex. And similar."
Thus, this function contains a DMI-based check for post-2015 BIOS.
The above factors (Windows, x86) don't really apply to non-x86 systems, and
also, many such systems don't have BIOS or DMI. However, we'd like to be
able to suspend bridges on non-x86 systems too.
Restrict the "recent enough" check to x86. If we find further
incompatibilities, it probably makes sense to expand on the deny-list
approach (i.e., bridge_d3_blacklist or similar).
Link: https://lore.kernel.org/r/20250320110604.v6.1.Id0a0e78ab0421b6bce51c4b0b87e6aebdfc69ec7@changeid
Link: https://lore.kernel.org/linux-pci/CAJZ5v0j_6jeMAQ7eFkZBe5Yi+USGzysxAgfemYh=-zq4h5W+Qg@mail.gmail.com/
Link: https://lore.kernel.org/linux-pci/20240227225442.GA249898@bhelgaas/
Link: https://lore.kernel.org/linux-pci/20240828210705.GA37859@bhelgaas/
[Brian: rewrite to !X86 based on Rafael's suggestions]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Philipp Stanner [Wed, 12 Mar 2025 08:06:35 +0000 (09:06 +0100)]
PCI: Check BAR index for validity
Many functions in PCI use accessor macros such as pci_resource_len(),
which take a BAR index. That index, however, is never checked for
validity, potentially resulting in undefined behavior by overflowing the
array pci_dev.resource in the macro pci_resource_n().
Since many users of those macros directly assign the accessed value to
an unsigned integer, the macros cannot be changed easily anymore to
return -EINVAL for invalid indexes. Consequently, the problem has to be
mitigated in higher layers.
Add pci_bar_index_valid(). Use it where appropriate.
Link: https://lore.kernel.org/r/20250312080634.13731-4-phasta@kernel.org
Closes: https://lore.kernel.org/all/
adb53b1f-29e1-3d14-0e61-
351fd2d3ff0d@linux.intel.com/
Reported-by: Bingbu Cao <bingbu.cao@linux.intel.com>
Signed-off-by: Philipp Stanner <phasta@kernel.org>
[kwilczynski: correct if-statement condition the pci_bar_index_is_valid()
helper function uses, tidy up code comments]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: fix typo]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Lukas Wunner [Tue, 11 Mar 2025 06:27:32 +0000 (07:27 +0100)]
PCI: pciehp: Avoid unnecessary device replacement check
Hot-removal of nested PCI hotplug ports suffers from a long-standing race
condition which can lead to a deadlock: A parent hotplug port acquires
pci_lock_rescan_remove(), then waits for pciehp to unbind from a child
hotplug port. Meanwhile that child hotplug port tries to acquire
pci_lock_rescan_remove() as well in order to remove its own children.
The deadlock only occurs if the parent acquires pci_lock_rescan_remove()
first, not if the child happens to acquire it first.
Several workarounds to avoid the issue have been proposed and discarded
over the years, e.g.:
https://lore.kernel.org/r/
4c882e25194ba8282b78fe963fec8faae7cf23eb.
1529173804.git.lukas@wunner.de/
A proper fix is being worked on, but needs more time as it is nontrivial
and necessarily intrusive.
Recent commit
9d573d19547b ("PCI: pciehp: Detect device replacement during
system sleep") provokes more frequent occurrence of the deadlock when
removing more than one Thunderbolt device during system sleep. The commit
sought to detect device replacement, but also triggered on device removal.
Differentiating reliably between replacement and removal is impossible
because pci_get_dsn() returns 0 both if the device was removed, as well as
if it was replaced with one lacking a Device Serial Number.
Avoid the more frequent occurrence of the deadlock by checking whether the
hotplug port itself was hot-removed. If so, there's no sense in checking
whether its child device was replaced.
This works because the ->resume_noirq() callback is invoked in top-down
order for the entire hierarchy: A parent hotplug port detecting device
replacement (or removal) marks all children as removed using
pci_dev_set_disconnected() and a child hotplug port can then reliably
detect being removed.
Link: https://lore.kernel.org/r/02f166e24c87d6cde4085865cce9adfdfd969688.1741674172.git.lukas@wunner.de
Fixes:
9d573d19547b ("PCI: pciehp: Detect device replacement during system sleep")
Reported-by: Kenneth Crudup <kenny@panix.com>
Closes: https://lore.kernel.org/r/
83d9302a-f743-43e4-9de2-
2dd66d91ab5b@panix.com/
Reported-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Closes: https://lore.kernel.org/r/
20240926125909.
2362244-1-acelan.kao@canonical.com/
Tested-by: Kenneth Crudup <kenny@panix.com>
Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: stable@vger.kernel.org # v6.11+
Philipp Stanner [Wed, 12 Mar 2025 08:06:34 +0000 (09:06 +0100)]
PCI: Fix wrong length of devres array
The array for the iomapping cookie addresses has a length of
PCI_STD_NUM_BARS. This constant, however, only describes standard BARs;
while PCI can allow for additional, special BARs.
The total number of PCI resources is described by constant
PCI_NUM_RESOURCES, which is also used in, e.g., pci_select_bars().
Thus, the devres array has so far been too small.
Change the length of the devres array to PCI_NUM_RESOURCES.
Link: https://lore.kernel.org/r/20250312080634.13731-3-phasta@kernel.org
Fixes:
bbaff68bf4a4 ("PCI: Add managed partial-BAR request and map infrastructure")
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Cc: stable@vger.kernel.org # v6.11+
Dan Carpenter [Fri, 7 Mar 2025 08:46:34 +0000 (11:46 +0300)]
PCI: Remove stray put_device() in pci_register_host_bridge()
This put_device() was accidentally left over from when we changed the code
from using device_register() to calling device_add(). Delete it.
Link: https://lore.kernel.org/r/55b24870-89fb-4c91-b85d-744e35db53c2@stanley.mountain
Fixes:
9885440b16b8 ("PCI: Fix pci_host_bridge struct device release/free handling")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Ma Ke [Sun, 2 Feb 2025 06:23:57 +0000 (14:23 +0800)]
PCI: Fix reference leak in pci_alloc_child_bus()
If device_register(&child->dev) fails, call put_device() to explicitly
release child->dev, per the comment at device_register().
Found by code review.
Link: https://lore.kernel.org/r/20250202062357.872971-1-make24@iscas.ac.cn
Fixes:
4f535093cf8f ("PCI: Put pci_dev in device tree as early as possible")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: stable@vger.kernel.org
Ma Ke [Tue, 25 Feb 2025 02:14:40 +0000 (10:14 +0800)]
PCI: Fix reference leak in pci_register_host_bridge()
If device_register() fails, call put_device() to give up the reference to
avoid a memory leak, per the comment at device_register().
Found by code review.
Link: https://lore.kernel.org/r/20250225021440.3130264-1-make24@iscas.ac.cn
Fixes:
37d6a0a6f470 ("PCI: Add pci_register_host_bridge() interface")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
[bhelgaas: squash Dan Carpenter's double free fix from
https://lore.kernel.org/r/
db806a6c-a91b-4e5a-a84b-
6b7e01bdac85@stanley.mountain]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Bjorn Helgaas [Sat, 15 Feb 2025 00:03:01 +0000 (18:03 -0600)]
PCI: Cache offset of Resizable BAR capability
Previously most resizable BAR interfaces (pci_rebar_get_possible_sizes(),
pci_rebar_set_size(), etc) as well as pci_restore_state() searched config
space for a Resizable BAR capability. Most devices don't have such a
capability, so this is wasted effort, especially for pci_restore_state().
Search for a Resizable BAR capability once at enumeration-time and cache
the offset so we don't have to search every time we need it. No functional
change intended.
Link: https://lore.kernel.org/r/20250215000301.175097-3-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Bjorn Helgaas [Mon, 3 Mar 2025 21:02:17 +0000 (15:02 -0600)]
PCI: Enable Configuration RRS SV early
Following a reset, a Function may respond to Config Requests with Request
Retry Status (RRS) Completion Status to indicate that it is temporarily
unable to process the Request, but will be able to process the Request in
the future (PCIe r6.0, sec 2.3.1).
If the Configuration RRS Software Visibility feature is enabled and a Root
Complex receives RRS for a config read of the Vendor ID, the Root Complex
completes the Request to the host by returning PCI_VENDOR_ID_PCI_SIG,
0x0001 (sec 2.3.2).
The Config RRS SV feature applies only to Root Ports and is not directly
related to pci_scan_bridge_extend(). Move the RRS SV enable to
set_pcie_port_type() where we handle other PCIe-specific configuration.
Link: https://lore.kernel.org/r/20250303210217.199504-1-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Krzysztof Kozlowski [Fri, 7 Mar 2025 08:13:27 +0000 (09:13 +0100)]
dt-bindings: PCI: fsl,layerscape-pcie-ep: Drop unnecessary status from example
Device nodes in the examples are supposed to be enabled, so the schema
will be validated against them. Keeping them disabled hides potential
errors.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://lore.kernel.org/r/20250307081327.35153-2-krzysztof.kozlowski@linaro.org
Krzysztof Kozlowski [Fri, 7 Mar 2025 08:13:26 +0000 (09:13 +0100)]
dt-bindings: PCI: fsl,layerscape-pcie-ep: Drop deprecated windows
The example DTS uses 'num-ib-windows' and 'num-ob-windows' properties
but these are not defined in the binding. Binding also does not
reference snps,dw-pcie-common.yaml, probably because it is quite
different even though the device is based on Synopsys controller.
The properties are actually deprecated, so simply drop them from the
example.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://lore.kernel.org/r/20250307081327.35153-1-krzysztof.kozlowski@linaro.org
Ilpo Järvinen [Fri, 7 Mar 2025 14:09:22 +0000 (16:09 +0200)]
PCI: Do not claim to release resource falsely
pci_release_resource() will print "... releasing" regardless of the
resource being assigned or not. Move the print after the res->parent check
to avoid claiming the kernel would be releasing an unassigned resource.
Likely, none of the current callers pass a resource that is unassigned so
this change is mostly to correct the non-sensical order than to remove
errorneous printouts.
Link: https://lore.kernel.org/r/20250307140922.5776-1-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Zhiyuan Dai [Fri, 7 Mar 2025 05:35:29 +0000 (13:35 +0800)]
PCI: Increase Resizable BAR support from 512 GB to 128 TB
Per PCIe r6.0, sec 7.8.6.2, devices can advertise Resizable BAR sizes up to
128 TB in the Resizable BAR Capability register. Larger sizes can be
advertised via the Capability register, but that requires an API change.
Update pci_rebar_get_possible_sizes() and pbus_size_mem() to increase the
sizes we currently support from 512 GB to 128 TB.
Link: https://lore.kernel.org/r/20250307053535.44918-1-daizhiyuan@phytium.com.cn
Signed-off-by: Zhiyuan Dai <daizhiyuan@phytium.com.cn>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Alistair Francis [Thu, 6 Mar 2025 07:52:09 +0000 (17:52 +1000)]
PCI/DOE: Rename Discovery Response Data Object Contents to type
PCIe r6.1, sec 6.30.1.1, describes a "Vendor ID", a "Data Object Type" and
"Next Index" as the fields in the DOE Discovery Response Data Object. The
DOE driver currently uses both the terms 'type' and 'prot' for the second
element.
Rename all uses of the DOE Discovery Response Data Object to use 'type' as
the second element of the object header, instead of type/prot as it
currently is.
Link: https://lore.kernel.org/r/20250306075211.1855177-2-alistair@alistair23.me
Signed-off-by: Alistair Francis <alistair@alistair23.me>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Alistair Francis [Thu, 6 Mar 2025 07:52:08 +0000 (17:52 +1000)]
PCI/DOE: Rename DOE protocol to feature
DOE r1.1 replaced all occurrences of "protocol" with the term "feature" or
"Data Object Type". PCIe r6.1 incorporated that change.
Rename the existing terms protocol with feature.
Link: https://lore.kernel.org/r/20250306075211.1855177-1-alistair@alistair23.me
Signed-off-by: Alistair Francis <alistair@alistair23.me>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Alexander Stein [Tue, 25 Feb 2025 10:27:21 +0000 (11:27 +0100)]
dt-bindings: PCI: fsl,imx6q-pcie: Add optional DMA interrupt
The i.MX8QM and i.MX8QXP/DXP have an additional interrupt for DMA.
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://lore.kernel.org/r/20250225102726.654070-2-alexander.stein@ew.tq-group.com
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
J. Neuschäfer [Thu, 20 Feb 2025 12:29:58 +0000 (13:29 +0100)]
dt-bindings: PCI: Convert fsl,mpc83xx-pcie to YAML
Formalise the binding for the PCI controllers in the Freescale MPC8xxx
chip family. Information about PCI-X-specific properties was taken from
fsl,pci.txt. The examples were taken from mpc8315erdb.dts and
xpedite5200_xmon.dts.
Signed-off-by: J. Neuschäfer <j.ne@posteo.net>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://lore.kernel.org/r/20250220-ppcyaml-pci-v3-1-ca94a4f62a85@posteo.net
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Varadarajan Narayanan [Thu, 20 Feb 2025 09:42:49 +0000 (15:12 +0530)]
dt-bindings: PCI: qcom: Document the IPQ5332 PCIe controller
Document the PCIe controller on IPQ5332 platform. IPQ5332 will use
IPQ9574 as the compatible fallback in the future.
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Varadarajan Narayanan <quic_varada@quicinc.com>
Link: https://lore.kernel.org/r/20250220094251.230936-6-quic_varada@quicinc.com
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Nishanth Aravamudan [Fri, 7 Feb 2025 20:56:00 +0000 (14:56 -0600)]
PCI: Avoid reset when disabled via sysfs
After
d88f521da3ef ("PCI: Allow userspace to query and set device reset
mechanism"), userspace can disable reset of specific PCI devices by writing
an empty string to the sysfs reset_method file.
However, pci_slot_resettable() does not check pci_reset_supported(), which
means that pci_reset_function() will still reset the device even if
userspace has disabled all the reset methods.
I was able to reproduce this issue with a vfio device passed to a qemu
guest, where I had disabled PCI reset via sysfs.
Add an explicit check of pci_reset_supported() in both
pci_slot_resettable() and pci_bus_resettable() to ensure both the reset
status and reset execution are bypassed if an administrator disables it for
a device.
Link: https://lore.kernel.org/r/20250207205600.1846178-1-naravamudan@nvidia.com
Fixes:
d88f521da3ef ("PCI: Allow userspace to query and set device reset mechanism")
Signed-off-by: Nishanth Aravamudan <naravamudan@nvidia.com>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Raphael Norwitz <raphael.norwitz@nutanix.com>
Cc: Amey Narkhede <ameynarkhede03@gmail.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Yishai Hadas <yishaih@nvidia.com>
Cc: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Feng Tang [Mon, 3 Mar 2025 02:36:30 +0000 (10:36 +0800)]
PCI/portdrv: Only disable pciehp interrupts early when needed
Firmware developers reported that Linux issues two PCIe hotplug commands in
very short intervals on an ARM server, which doesn't comply with the PCIe
spec. According to PCIe r6.1, sec 6.7.3.2, if the Command Completed event
is supported, software must wait for a command to complete or wait at
least 1 second before sending a new command.
In the failure case, the first PCIe hotplug command is from
get_port_device_capability(), which sends a command to disable PCIe hotplug
interrupts without waiting for its completion, and the second command comes
from pcie_enable_notification() of pciehp driver, which enables hotplug
interrupts again.
Fix this by only disabling the hotplug interrupts when the pciehp driver is
not enabled.
Link: https://lore.kernel.org/r/20250303023630.78397-1-feng.tang@linux.alibaba.com
Fixes:
2bd50dd800b5 ("PCI: PCIe: Disable PCIe port services during port initialization")
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Feng Tang <feng.tang@linux.alibaba.com>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Lukas Wunner [Tue, 25 Feb 2025 17:06:05 +0000 (18:06 +0100)]
PCI: hotplug: Inline pci_hp_{create,remove}_module_link()
For no apparent reason, the pci_hp_{create,remove}_module_link() helpers
live in slot.c, even though they're only called from two functions in
pci_hotplug_core.c.
Inline the helpers to reduce code size and number of exported symbols.
Link: https://lore.kernel.org/r/c207f03cfe32ae9002d9b453001a1dd63d9ab3fb.1740501868.git.lukas@wunner.de
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Lukas Wunner [Tue, 25 Feb 2025 17:06:04 +0000 (18:06 +0100)]
PCI: hotplug: Avoid backpointer dereferencing in has_*_file()
The PCI hotplug core contains five has_*_file() functions to determine
whether a certain sysfs file shall be added (or removed) for a given
hotplug slot.
The functions receive a struct pci_slot pointer which they have to
dereference back to a struct hotplug_slot.
Avoid by passing them a struct hotplug_slot pointer directly.
Link: https://lore.kernel.org/r/5b2f5b4ac45285953d00fd7637732a93fd40d26e.1740501868.git.lukas@wunner.de
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Lukas Wunner [Tue, 25 Feb 2025 17:06:03 +0000 (18:06 +0100)]
PCI: hotplug: Drop superfluous NULL pointer checks in has_*_file()
The PCI hotplug core contains five has_*_file() functions to determine
whether a certain sysfs file shall be added (or removed) for a given
hotplug slot.
The functions perform NULL pointer checks for the hotplug_slot and its
hotplug_slot_ops. However the callers already perform these checks:
pci_hp_register()
__pci_hp_register()
__pci_hp_initialize()
pci_hp_deregister()
pci_hp_del()
The only way to actually trigger these checks is to call pci_hp_add()
without having called pci_hp_initialize().
Amend pci_hp_add() to catch that and drop the now superfluous NULL
pointer checks in has_*_file().
Drop the same superfluous checks from pci_hp_create_module_link(),
which is (only) called from pci_hp_add().
Link: https://lore.kernel.org/r/37d1928edf8c3201a8b10794f1db3142e16e02b9.1740501868.git.lukas@wunner.de
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Lukas Wunner [Tue, 25 Feb 2025 17:06:02 +0000 (18:06 +0100)]
PCI: hotplug: Drop superfluous try_module_get() calls
In December 2002, historic commit
https://git.kernel.org/tglx/history/c/
bec7aa00ffe5
("[PATCH] more module warning fixes")
amended the PCI hotplug core to acquire a reference on the hotplug
driver module when a sysfs attribute is accessed. That was necessary
because back in the day, sysfs code did not take any precautions to
prevent module unloading when an attribute was accessed.
Soon after in July 2003, historic commit
https://git.kernel.org/tglx/history/c/
1cf6d20f6078
("[PATCH] SYSFS: add module referencing to sysfs attribute files.")
addressed that deficiency. But the commit neglected to remove the now
unnecessary reference acquisition from the PCI hotplug core.
The commit acquired a module reference for the entire duration between
open() and close() of a sysfs attribute. This made it impossible to
unload a module while attributes were kept open by user space.
That's possible today:
When a hotplug driver module is unloaded, it removes sysfs attributes of
all its hotplug slots by calling pci_hp_del(). This will wait for any
concurrent user space operation to finish:
pci_hp_del()
fs_remove_slot()
sysfs_remove_file()
sysfs_remove_file_ns()
kernfs_remove_by_name_ns()
__kernfs_remove()
kernfs_drain()
A user space operation such as read() briefly acquires a reference on
the attribute with kernfs_get_active(). kernfs_drain() waits until all
such references are released before allowing attribute removal. Once
the attribute is removed, any subsequent user space operation on a still
open attribute file will return -ENODEV.
Thus, reference acquisition by the PCI hotplug core is still unnecessary
today. So drop it at long last.
Link: https://lore.kernel.org/r/ed950fa2722967be4491146c7b867c1e7be11d37.1740501868.git.lukas@wunner.de
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Lukas Wunner [Tue, 25 Feb 2025 17:06:01 +0000 (18:06 +0100)]
PCI: hotplug: Drop superfluous pci_hotplug_slot_list
The PCI hotplug core keeps a list of all registered slots. Its sole
purpose is to WARN() on slot removal if another slot is using the same
name.
But this can never happen because already on slot creation, an error is
returned and multiple messages are emitted if a slot's name is
duplicated:
pci_hp_register()
__pci_hp_register()
__pci_hp_initialize()
pci_create_slot()
kobject_init_and_add()
kobject_add_varg()
kobject_add_internal()
create_dir()
sysfs_create_dir_ns()
kernfs_create_dir_ns()
sysfs_warn_dup()
pr_warn("cannot create duplicate filename ...")
pr_err("%s failed for %s with -EEXIST, ...");
Drop the superfluous list.
Link: https://lore.kernel.org/r/603735bc50eb370bc7f1c358441ac671360bab25.1740501868.git.lukas@wunner.de
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Yi Lai [Fri, 28 Feb 2025 07:00:59 +0000 (15:00 +0800)]
selftests/pcie_bwctrl: Add 'set_pcie_speed.sh' to TEST_PROGS
The test shell script "set_pcie_speed.sh" is not installed in INSTALL_PATH.
Attempting to execute set_pcie_cooling_state.sh shows warning:
./set_pcie_cooling_state.sh: line 119: ./set_pcie_speed.sh: No such file or directory
Add "set_pcie_speed.sh" to TEST_PROGS.
Link: https://lore.kernel.org/r/Z8FfK8rN30lKzvVV@ly-workstation
Fixes:
838f12c3d551 ("selftests/pcie_bwctrl: Create selftests")
Signed-off-by: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Bjorn Helgaas [Mon, 3 Mar 2025 20:42:20 +0000 (14:42 -0600)]
PCI: Log debug messages about reset method
Log pci_dbg() messages about the reset methods we attempt and any errors
(-ENOTTY means "try the next method").
Set CONFIG_DYNAMIC_DEBUG=y and enable by booting with
dyndbg="file drivers/pci/* +p" or enable at runtime:
# echo "file drivers/pci/* +p" > /sys/kernel/debug/dynamic_debug/control
Link: https://lore.kernel.org/r/20250303204220.197172-1-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Herve Codina [Mon, 24 Feb 2025 14:13:55 +0000 (15:13 +0100)]
PCI: of: Create device tree PCI host bridge node
PCI devices device tree nodes can be already created. This was introduced
by commit
407d1a51921e ("PCI: Create device tree node for bridge").
In order to have device tree nodes related to PCI devices attached on their
PCI root bus (the PCI bus handled by the PCI host bridge), a PCI root bus
device tree node is needed. This root bus node will be used as the parent
node of the first level devices scanned on the bus. On device tree based
systems, this PCI root bus device tree node is set to the node of the
related PCI host bridge. The PCI host bridge node is available in the
device tree used to describe the hardware passed at boot.
On non device tree based system (such as ACPI), a device tree node for the
PCI host bridge or for the root bus does not exist. Indeed, the PCI host
bridge is not described in a device tree used at boot simply because no
device tree is passed at boot.
The device tree PCI host bridge node creation needs to be done at runtime.
This is done in the same way as for the creation of the PCI device nodes.
I.e. node and properties are created based on computed information done by
the PCI core. Also, as is done on device tree based systems, this PCI host
bridge node is used for the PCI root bus.
With this done, hardware available in a PCI device that doesn't follow the
PCI model consisting in one PCI function handled by one driver can be
described by a device tree overlay loaded by the PCI device driver on non
device tree based systems. Those PCI devices provide a single PCI function
that includes several functionalities that require different drivers. The
device tree overlay describes the internal devices and their relationships.
It allows to load drivers needed by those different devices in order to
have functionalities handled.
Link: https://lore.kernel.org/r/20250224141356.36325-6-herve.codina@bootlin.com
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Herve Codina [Mon, 24 Feb 2025 14:13:54 +0000 (15:13 +0100)]
PCI: of_property: Constify parameter in of_pci_get_addr_flags()
The res parameter has no reason to be a pointer to an un-const struct
resource. Indeed, struct resource is not supposed to be modified by the
function.
Constify the res parameter.
Link: https://lore.kernel.org/r/20250224141356.36325-5-herve.codina@bootlin.com
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Herve Codina [Mon, 24 Feb 2025 14:13:53 +0000 (15:13 +0100)]
PCI: of_property: Add support for NULL pdev in of_pci_set_address()
The pdev (pointer to a struct pci_dev) parameter of of_pci_set_address()
cannot be NULL.
In order to use of_pci_set_address() when creating the PCI root bus node,
it needs to support a NULL pdev parameter. Indeed, in the case of the PCI
root bus node creation, no pdev is available and of_pci_set_address() will
be used with the bridge windows.
Allow to call of_pci_set_address() with a NULL pdev.
Link: https://lore.kernel.org/r/20250224141356.36325-4-herve.codina@bootlin.com
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Herve Codina [Mon, 24 Feb 2025 14:13:52 +0000 (15:13 +0100)]
PCI: of: Use device_{add,remove}_of_node() to attach of_node to existing device
The commit
407d1a51921e ("PCI: Create device tree node for bridge")
creates of_node for PCI devices. The newly created of_node is attached
to an existing device. This is done setting directly pdev->dev.of_node
in the code.
Even if pdev->dev.of_node cannot be previously set, this doesn't handle
the fwnode field of the struct device. Indeed, this field needs to be
set if it hasn't already been set.
device_{add,remove}_of_node() have been introduced to handle this case.
Use them instead of the direct setting.
Link: https://lore.kernel.org/r/20250224141356.36325-3-herve.codina@bootlin.com
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Herve Codina [Mon, 24 Feb 2025 14:13:51 +0000 (15:13 +0100)]
driver core: Introduce device_{add,remove}_of_node()
An of_node can be set to a device using device_set_node(), which does not
prevent any of_node and/or fwnode overwrites.
When adding an of_node on an already present device, the following
operations need to be done:
- Attach the of_node only if no of_node is already attached
- Attach the of_node as a fwnode if no fwnode were already attached
This is the purpose of device_add_of_node(). device_remove_of_node()
reverts the operations done by device_add_of_node().
Link: https://lore.kernel.org/r/20250224141356.36325-2-herve.codina@bootlin.com
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Guilherme Giacomo Simoes [Mon, 17 Feb 2025 18:56:38 +0000 (15:56 -0300)]
PCI: cpcihp: Remove unused .get_power() and .set_power()
The .get_power() and .set_power() function pointers in struct
cpci_hp_controller_ops were declared but never implemented by any
driver.
Thus, to improve code readability and reduce resource usage,
remove these pointers and the code that has never been used.
Link: https://lore.kernel.org/r/20250217185638.398925-1-trintaeoitogc@gmail.com
Signed-off-by: Guilherme Giacomo Simoes <trintaeoitogc@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Ilpo Järvinen [Fri, 7 Feb 2025 16:18:36 +0000 (18:18 +0200)]
PCI/ERR: Handle TLP Log in Flit mode
Flit mode introduced in PCIe r6.0 alters how the TLP Header Log is
presented through AER and DPC Capability registers. The TLP Prefix Log
Register is not present with Flit mode, and the register becomes an
extension of the TLP Header Log (PCIe r6.1 secs 7.8.4.12 & 7.9.14.13).
Adapt pcie_read_tlp_log() and struct pcie_tlp_log to read and store the
extended TLP Header Log when the Link is in Flit mode. As the Prefix Log
and Extended TLP Header are not present at the same time, a C union can be
used.
Determining whether the error occurred while the Link was in Flit mode is a
bit complicated. In case of AER, the Advanced Error Capabilities and
Control Register directly tells whether the error was logged in Flit mode
or not (PCIe r6.1 sec 7.8.4.7). The DPC Capability (PCIe r6.1 sec 7.9.14),
unfortunately, does not contain the same information.
Unlike AER, the DPC Capability does not provide a way to discern whether
the error was logged in Flit mode (this is confirmed by PCI WG to be an
oversight in the spec). DPC will bring the Link down immediately following
an error, which makes it impossible to acquire the Flit Mode Status
directly from the Link Status 2 register because Flit Mode Status is only
set in certain Link states (PCIe r6.1 sec 7.5.3.20). As a workaround, use
the flit_mode value stored into the struct pci_bus.
Link: https://lore.kernel.org/r/20250207161836.2755-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Ilpo Järvinen [Fri, 7 Feb 2025 16:18:35 +0000 (18:18 +0200)]
PCI: Track Flit Mode Status & print it with link status
PCIe r6.0 added Flit mode, which mainly alters HW behavior, but there are
some OS visible changes. The OS visible changes include differences in the
layout of some capabilities and interpretation of the TLP headers (in
diagnostics situations).
To be able to determine which mode the PCIe Link is using, store the Flit
Mode Status (PCIe r6.1 sec 7.5.3.20) information in addition to the Link
speed into struct pci_bus in pcie_update_link_speed().
Link: https://lore.kernel.org/r/20250207161836.2755-2-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: use unsigned int:1 instead of bool, update flit_mode setting]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Ilpo Järvinen [Mon, 16 Dec 2024 16:10:12 +0000 (18:10 +0200)]
PCI/AER: Descope pci_printk() to aer_printk()
include/linux/pci.h provides low-level pci_printk() interface that is
only used by AER because it needs to print the same message with
different levels depending on the error severity. No other PCI code
uses that functionality and calls pci_<level>() logging functions
directly with the appropriate level.
Descope pci_printk() into AER as aer_printk().
Link: https://lore.kernel.org/r/20241216161012.1774-5-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: retain pci_printk() for now since shpchp still uses it and I
moved those patches to a different branch]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tushar Dave [Fri, 7 Feb 2025 03:03:38 +0000 (19:03 -0800)]
PCI/ACS: Fix 'pci=config_acs=' parameter
Commit
47c8846a49ba ("PCI: Extend ACS configurability") introduced bugs
that fail to configure ACS ctrl to the value specified by the kernel
parameter. Essentially there are two bugs:
1) When ACS is configured for multiple PCI devices using 'config_acs'
kernel parameter, it results into error "PCI: Can't parse ACS command
line parameter". This is due to a bug that doesn't preserve the ACS
mask, but instead overwrites the mask with value 0.
For example, using 'config_acs' to configure ACS ctrl for multiple BDFs
fails:
Kernel command line: pci=config_acs=
1111011@0020:02:00.0;101xxxx@0039:00:00.0 "dyndbg=file drivers/pci/pci.c +p"
PCI: Can't parse ACS command line parameter
pci 0020:02:00.0: ACS mask = 0x007f
pci 0020:02:00.0: ACS flags = 0x007b
pci 0020:02:00.0: Configured ACS to 0x007b
After this fix:
Kernel command line: pci=config_acs=
1111011@0020:02:00.0;101xxxx@0039:00:00.0 "dyndbg=file drivers/pci/pci.c +p"
pci 0020:02:00.0: ACS mask = 0x007f
pci 0020:02:00.0: ACS flags = 0x007b
pci 0020:02:00.0: ACS control = 0x005f
pci 0020:02:00.0: ACS fw_ctrl = 0x0053
pci 0020:02:00.0: Configured ACS to 0x007b
pci 0039:00:00.0: ACS mask = 0x0070
pci 0039:00:00.0: ACS flags = 0x0050
pci 0039:00:00.0: ACS control = 0x001d
pci 0039:00:00.0: ACS fw_ctrl = 0x0000
pci 0039:00:00.0: Configured ACS to 0x0050
2) In the bit manipulation logic, we copy the bit from the firmware
settings when mask bit 0.
For example, 'disable_acs_redir' fails to clear all three ACS P2P redir
bits due to the wrong bit fiddling:
Kernel command line: pci=disable_acs_redir=0020:02:00.0;0030:02:00.0;0039:00:00.0 "dyndbg=file drivers/pci/pci.c +p"
pci 0020:02:00.0: ACS mask = 0x002c
pci 0020:02:00.0: ACS flags = 0xffd3
pci 0020:02:00.0: Configured ACS to 0xfffb
pci 0030:02:00.0: ACS mask = 0x002c
pci 0030:02:00.0: ACS flags = 0xffd3
pci 0030:02:00.0: Configured ACS to 0xffdf
pci 0039:00:00.0: ACS mask = 0x002c
pci 0039:00:00.0: ACS flags = 0xffd3
pci 0039:00:00.0: Configured ACS to 0xffd3
After this fix:
Kernel command line: pci=disable_acs_redir=0020:02:00.0;0030:02:00.0;0039:00:00.0 "dyndbg=file drivers/pci/pci.c +p"
pci 0020:02:00.0: ACS mask = 0x002c
pci 0020:02:00.0: ACS flags = 0xffd3
pci 0020:02:00.0: ACS control = 0x007f
pci 0020:02:00.0: ACS fw_ctrl = 0x007b
pci 0020:02:00.0: Configured ACS to 0x0053
pci 0030:02:00.0: ACS mask = 0x002c
pci 0030:02:00.0: ACS flags = 0xffd3
pci 0030:02:00.0: ACS control = 0x005f
pci 0030:02:00.0: ACS fw_ctrl = 0x005f
pci 0030:02:00.0: Configured ACS to 0x0053
pci 0039:00:00.0: ACS mask = 0x002c
pci 0039:00:00.0: ACS flags = 0xffd3
pci 0039:00:00.0: ACS control = 0x001d
pci 0039:00:00.0: ACS fw_ctrl = 0x0000
pci 0039:00:00.0: Configured ACS to 0x0000
Link: https://lore.kernel.org/r/20250207030338.456887-1-tdave@nvidia.com
Fixes:
47c8846a49ba ("PCI: Extend ACS configurability")
Signed-off-by: Tushar Dave <tdave@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Manivannan Sadhasivam [Thu, 16 Jan 2025 14:09:15 +0000 (19:39 +0530)]
PCI/pwrctrl: Add pwrctrl driver for PCI slots
This driver is used to control the power state of the devices attached to
the PCI slots. Currently, it controls the voltage rails of the PCI slots
defined in the devicetree node of the root port.
The voltage rails for PCI slots are documented in the DT-schema:
https://github.com/devicetree-org/dt-schema/blob/v2024.11/dtschema/schemas/pci/pci-bus-common.yaml#L153
Since this driver has to work with different kind of slots (PCIe
x1/x4/x8/x16, Mini PCIe, PCI, etc.), the driver is thus using the
of_regulator_bulk_get_all() API to obtain the voltage regulators defined
in the DT node, instead of hardcoding them.
As such, the DT node of the root port should define the relevant supply
properties corresponding to the voltage rails of the PCI slot.
Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250116-pci-pwrctrl-slot-v3-5-827473c8fbf4@linaro.org
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Manivannan Sadhasivam [Thu, 16 Jan 2025 14:09:14 +0000 (19:39 +0530)]
dt-bindings: vendor-prefixes: Document the 'pciclass' prefix
The "pciclass" is an existing prefix used to identify the PCI bridge
devices, but it is not a vendor prefix. So document it in the non-vendor
prefix list.
Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250116-pci-pwrctrl-slot-v3-4-827473c8fbf4@linaro.org
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Manivannan Sadhasivam [Thu, 16 Jan 2025 14:09:13 +0000 (19:39 +0530)]
PCI/pwrctrl: Skip scanning for the device further if pwrctrl device is created
The pwrctrl core will rescan the bus once the device is powered on. So
there is no need to continue scanning for the device further.
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250116-pci-pwrctrl-slot-v3-3-827473c8fbf4@linaro.org
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Manivannan Sadhasivam [Thu, 16 Jan 2025 14:09:12 +0000 (19:39 +0530)]
PCI/pwrctrl: Move pci_pwrctrl_unregister() to pci_destroy_dev()
The PCI core will try to access the devices even after pci_stop_dev()
for things like Data Object Exchange (DOE), ASPM, etc.
So, move pci_pwrctrl_unregister() to the near end of pci_destroy_dev()
to make sure that the devices are powered down only after the PCI core
is done with them.
Suggested-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250116-pci-pwrctrl-slot-v3-2-827473c8fbf4@linaro.org
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Manivannan Sadhasivam [Thu, 16 Jan 2025 14:09:11 +0000 (19:39 +0530)]
PCI/pwrctrl: Move creation of pwrctrl devices to pci_scan_device()
Current way of creating pwrctrl devices requires iterating through the
child devicetree nodes of the PCI bridge in pci_pwrctrl_create_devices().
Even though it works, it creates confusion as there is no symmetry between
this and pci_pwrctrl_unregister() function that removes the pwrctrl
devices.
So to make these two functions symmetric, move the creation of pwrctrl
devices to pci_scan_device(). During the scan of each device in a slot,
the devicetree node (if exists) for the PCI device will be checked. If it
has the supplies populated, then the pwrctrl device will be created.
Since the PCI device scan happens so early, there would be no "struct
pci_dev" available for the device. So the host bridge is used as the
parent of all pwrctrl devices.
One nice side effect of this move is that, it is now possible to have
pwrctrl devices for PCI bridges as well (to control the supplies of PCI
slots).
Suggested-by: Lukas Wunner <lukas@wunner.de>
Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20250116-pci-pwrctrl-slot-v3-1-827473c8fbf4@linaro.org
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Daniel Stodden [Mon, 23 Dec 2024 03:39:08 +0000 (19:39 -0800)]
PCI/ASPM: Fix link state exit during switch upstream function removal
Before
456d8aa37d0f ("PCI/ASPM: Disable ASPM on MFD function removal to
avoid use-after-free"), we would free the ASPM link only after the last
function on the bus pertaining to the given link was removed.
That was too late. If function 0 is removed before sibling function,
link->downstream would point to free'd memory after.
After above change, we freed the ASPM parent link state upon any function
removal on the bus pertaining to a given link.
That is too early. If the link is to a PCIe switch with MFD on the upstream
port, then removing functions other than 0 first would free a link which
still remains parent_link to the remaining downstream ports.
The resulting GPFs are especially frequent during hot-unplug, because
pciehp removes devices on the link bus in reverse order.
On that switch, function 0 is the virtual P2P bridge to the internal bus.
Free exactly when function 0 is removed -- before the parent link is
obsolete, but after all subordinate links are gone.
Link: https://lore.kernel.org/r/e12898835f25234561c9d7de4435590d957b85d9.1734924854.git.dns@arista.com
Fixes:
456d8aa37d0f ("PCI/ASPM: Disable ASPM on MFD function removal to avoid use-after-free")
Signed-off-by: Daniel Stodden <dns@arista.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Ilpo Järvinen [Mon, 17 Feb 2025 09:55:50 +0000 (11:55 +0200)]
PCI: shpchp: Remove 'shpchp_debug' module parameter
The "shpchp_debug" module parameter is used to enable debug logging. The
generic ability to turn on/off debug prints dynamically covers this use
case already so there is no need for module specific debug handling. The
ctrl_dbg() wrapper also uses a low-level pci_printk() despite always using
KERN_DEBUG level.
Remove "shpchp_debug" parameter and convert ctrl_dbg() to use pci_dbg().
From now on, shpchp can be debugged using the normal dynamic debugger by
setting CONFIG_DYNAMIC_DEBUG=y and then either adding to kernel cmdline:
dyndbg="file drivers/pci/hotplug/shpchp* +p"
or using this command on a running kernel:
echo 'file drivers/pci/hotplug/shpchp* +p' > /sys/kernel/debug/dynamic_debug/control
Link: https://lore.kernel.org/r/20250217095550.2789-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Ilpo Järvinen [Mon, 17 Feb 2025 09:55:49 +0000 (11:55 +0200)]
PCI: shpchp: Remove unused logging wrappers
The shpchp hotplug driver defines logging wrapper with generic names which
are just duplicates of existing generic printk() wrappers. They are also
unused so remove them.
Link: https://lore.kernel.org/r/20250217095550.2789-2-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Ilpo Järvinen [Mon, 16 Dec 2024 16:10:10 +0000 (18:10 +0200)]
PCI: shpchp: Change dbg() -> ctrl_dbg()
Convert the last user of dbg() to use ctrl_dbg().
Link: https://lore.kernel.org/r/20241216161012.1774-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Ilpo Järvinen [Mon, 16 Dec 2024 16:10:09 +0000 (18:10 +0200)]
PCI: shpchp: Remove logging from module init/exit functions
The logging in shpchp module init/exit functions is not very useful.
Remove it.
Link: https://lore.kernel.org/r/20241216161012.1774-2-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Ilpo Järvinen [Mon, 16 Dec 2024 17:56:32 +0000 (19:56 +0200)]
PCI: Rework optional resource handling
Remove and rescan cycle can result in failure to assign a bridge window if
it becomes larger than before the remove. The bridge window size will
include space for disabled Expansion ROM, which can causes the bridge
window to not fit anymore into the same address space slot on rescan if the
Expansion ROM resource was not assigned before the remove. In addition, the
optional resource handling is not internally consistent.
The resource fitting logic supports three main types of optional resources:
- IOV BARs
- Expansion ROMs
- Bridge window size variation due to optional resources
In addition to the above, resizable BARs beyond their current size will
require handling optional variation in resource sizes within the resource
fitting algorithm (not yet done by the resource fitting code).
There are multiple inconsistencies related to optional resource handling:
a) The allocation failure of disabled expansion ROM requires special case
inside assign_requested_resources_sorted().
b) The optionality of disabled expansion ROM is not considered during
bridge window sizing in pbus_size_mem().
c) Setting resource size to zero for optional resource in pbus_size_mem()
is problematic because it makes also the alignment invalid, which is
checked by pdev_sort_resources().
Optional IOV resources have their size set to zero by pbus_size_mem()
but the information about size is stored externally in struct pci_sriov
and complex call-chain trickery in pci_resource_alignment() ensures IOV
resources return a valid alignment despite having zero resource size. A
solution that is specific to IOV resources makes it hard to use the same
solution for other types of resources such as expansion ROM.
Simply changing pbus_size_mem() is not sufficient to fully address the main
issue because it would introduce disparity between bridge window sizing and
resource allocation. Due to size-based ordering of the resource list during
assignment loop, an Expansion ROM resource could steal space from some
other resource and make the other resource not fit if the Expansion ROM is
larger than the other resource. Thus, the resource assignment functions
need to be changed as well.
Make optional resource handling more straightforward. Use
pci_resource_is_optional() to determine if a resource is optional in both
bridge window sizing and assignment failure classification to ensure they
always align. Indicate with a parameter to
assign_requested_resources_sorted() whether it should attempt to allocate
optional resources or not.
Always try first to assign all resources (also when realloc_head is not
provided). This is required for calls from
pci_assign_unassigned_root_bus_resources() that provide realloc_head only
with some of its iterations.
Non-bridge-window optional resources in realloc_head now have add_size 0.
This condition has to be detected in reassign_resources_sorted() before
reassigning them (which would fail as there is no size change). Removing
add_size=0 optional resources entirely from realloc_head might eventually
be doable but further rework in __assign_resources_sorted() is needed first
to support such a change.
Link: https://lore.kernel.org/r/20241216175632.4175-26-ilpo.jarvinen@linux.intel.com
Reported-by: Jia Yao <jia.yao@intel.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219547
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Jia Yao <jia.yao@intel.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
Ilpo Järvinen [Mon, 16 Dec 2024 17:56:31 +0000 (19:56 +0200)]
PCI: Perform reset_resource() and build fail list in sync
Resetting a resource is problematic as it prevents attempting to allocate
the resource later, unless something in between restores the resource.
Similarly, if fail_head does not contain all resources that were reset,
those resources cannot be restored later.
The entire reset/restore cycle adds complexity and leaving resources in the
reset state causes issues to other code such as for checks done in
pci_enable_resources(). Take a small step towards not resetting resources
by delaying reset until the end of resource assignment and build failure
list (fail_head) in sync with the reset to avoid leaving behind resources
that cannot be restored (for the case where the caller provides fail_head
in the first place to allow restore somewhere in the callchain, as is not
all callers pass non-NULL fail_head).
Leave the Expansion ROM check temporarily in place while building the
failure list until an upcoming change that reworks optional resource
handling.
Ideally, whole resource reset could be removed but doing that in one step
would be non-tractable due to complexity of all related code.
Link: https://lore.kernel.org/r/20241216175632.4175-25-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
Ilpo Järvinen [Mon, 16 Dec 2024 17:56:30 +0000 (19:56 +0200)]
PCI: Use res->parent to check if resource is assigned
reassign_resources_sorted() uses resource_size() to select between
pci_assign_resource() and pci_reassign_resource(). Due to twisted way
bridge window sizing in pbus_size_mem() sets resource sizes to 0, it works
to match into IOV resources but that is going to be changed by an upcoming
change.
Replace resource_size() check with res->parent check that is the true
dividing line in between whether assign or reassign function should be used
for the resource.
Link: https://lore.kernel.org/r/20241216175632.4175-24-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
Ilpo Järvinen [Mon, 16 Dec 2024 17:56:29 +0000 (19:56 +0200)]
PCI: Add debug print when releasing resources before retry
PCI resource fitting is somewhat hard to track because it performs many
actions without logging them. In the case inside
__assign_resources_sorted(), the resources are released before resource
assignment is going to be retried in a different order. That is just one
level of retries the resource fitting performs overall so tracking it
through repeated assignments or failures of a resource gets messy rather
quickly.
Simply announce the release explicitly using pci_dbg() so it is clear what
is going on with each resource.
Link: https://lore.kernel.org/r/20241216175632.4175-23-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
Ilpo Järvinen [Mon, 16 Dec 2024 17:56:28 +0000 (19:56 +0200)]
PCI: Indicate optional resource assignment failures
Add pci_dbg() to note that an assignment failure was for an optional
resource and reword existing message about resource resize to say the
change was optional.
Link: https://lore.kernel.org/r/20241216175632.4175-22-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
Ilpo Järvinen [Mon, 16 Dec 2024 17:56:27 +0000 (19:56 +0200)]
PCI: Always have realloc_head in __assign_resources_sorted()
Add a dummy list to always have a non-NULL realloc head in
__assign_resources_sorted() as it allows only checking list_empty().
In future, it would be good to ensure all callers provide a valid
realloc_head but that is relatively complex to do in practice and not
necessary for the subsequent optional resource handling fix.
Link: https://lore.kernel.org/r/20241216175632.4175-21-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
Ilpo Järvinen [Mon, 16 Dec 2024 17:56:26 +0000 (19:56 +0200)]
PCI: Extend enable to check for any optional resource
pci_enable_resources() checks if device's io and mem resources are all
assigned and disallows enable if any resource failed to assign (*) but
makes an exception for the case of disabled extension ROM. There are other
optional resources, however.
Add pci_resource_is_optional() and use it instead of
pci_resource_is_disabled_rom() to cover also IOV resources that are also
optional as per pbus_size_mem().
As there will be more users of pci_resource_is_optional() inside
setup-bus.c in changes coming up after this one, the function is placed
there.
(*) In practice, resource fitting code calls reset_resource() for any
resource it fails to assign which clears resource's ->flags causing
pci_enable_resources() to never detect failed resource assignments.
This seems undesirable internal logic inconsistency, effectively
reset_resource() prevents pci_enable_resources() from functioning as
intended. This is one step of many that will be needed towards removing
reset_resource().
Link: https://lore.kernel.org/r/20241216175632.4175-20-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
Ilpo Järvinen [Mon, 16 Dec 2024 17:56:25 +0000 (19:56 +0200)]
PCI: Add restore_dev_resource()
Resource fitting needs to restore the saved dev resources in a few places.
Add a restore_dev_resource() helper for that.
Link: https://lore.kernel.org/r/20241216175632.4175-19-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
Ilpo Järvinen [Mon, 16 Dec 2024 17:56:24 +0000 (19:56 +0200)]
PCI: Remove incorrect comment from pci_reassign_resource()
Commit
a4ac9fea016f ("PCI : Calculate right add_size") removed including
min_align into new_size in pci_reassign_resource() which is the correct
thing to do. However, it also added a snakeoil comment that the resource
would already be aligned with min_align which is incorrect.
A resource that is assigned earlier is aligned with the old alignment, NOT
with the new requested alignment (min_align) until later deep within the
reassignment callchain. Thus, remove the incorrect comment.
Link: https://lore.kernel.org/r/20241216175632.4175-18-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Ilpo Järvinen [Mon, 16 Dec 2024 17:56:23 +0000 (19:56 +0200)]
PCI: Consolidate assignment loop next round preparation
pci_assign_unassigned_root_bus_resources() and
pci_assign_unassigned_bridge_resources() have a loop that may perform
several rounds to assign resources. The code to prepare for the next round
is identical.
Consolidate the code that prepares for the next assignment round into
pci_prepare_next_assign_round().
Link: https://lore.kernel.org/r/20241216175632.4175-17-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
Ilpo Järvinen [Mon, 16 Dec 2024 17:56:22 +0000 (19:56 +0200)]
PCI: Rename retval to ret
Rename 'retval' to 'ret' in pci_assign_unassigned_bridge_resources().
Link: https://lore.kernel.org/r/20241216175632.4175-16-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
Ilpo Järvinen [Mon, 16 Dec 2024 17:56:21 +0000 (19:56 +0200)]
PCI: Use while loop and break instead of gotos
pci_assign_unassigned_root_bus_resources() and
pci_assign_unassigned_bridge_resources() contain ad hoc loops using
backwards goto and gotos out of the loop. Replace them with while loops
and break statements.
While reindenting the loop bodies, add braces & remove parenthesis.
Link: https://lore.kernel.org/r/20241216175632.4175-15-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
Ilpo Järvinen [Mon, 16 Dec 2024 17:56:20 +0000 (19:56 +0200)]
PCI: Refactor pdev_sort_resources() & __dev_sort_resources()
Reduce level of call nesting by calling pdev_sort_resources() directly
and by moving the tests done inside __dev_sort_resources() into
pdev_resources_assignable() helper.
Link: https://lore.kernel.org/r/20241216175632.4175-14-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
Ilpo Järvinen [Mon, 16 Dec 2024 17:56:19 +0000 (19:56 +0200)]
PCI: Converge return paths in __assign_resources_sorted()
All return paths want to free head list in __assign_resources_sorted(), so
add a label and use goto.
Link: https://lore.kernel.org/r/20241216175632.4175-13-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
Ilpo Järvinen [Mon, 16 Dec 2024 17:56:18 +0000 (19:56 +0200)]
PCI: Add dev & res local variables to resource assignment funcs
Many PCI resource allocation related functions process struct
pci_dev_resource items which hold the struct pci_dev and resource pointers.
Reduce the number of lines that need indirection by adding 'dev' and 'res'
local variable to hold the pointers.
Link: https://lore.kernel.org/r/20241216175632.4175-12-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
Ilpo Järvinen [Mon, 16 Dec 2024 17:56:17 +0000 (19:56 +0200)]
PCI: Add pci_resource_num() helper
A few places in PCI code, mainly in setup-bus.c, need to reverse lookup the
index of a resource in pci_dev's resource array. Create pci_resource_num()
helper to avoid repeating the pointer arithmetic trick used to calculate
the index.
Link: https://lore.kernel.org/r/20241216175632.4175-11-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
Ilpo Järvinen [Mon, 16 Dec 2024 17:56:16 +0000 (19:56 +0200)]
PCI: Check resource_size() separately
Instead of chaining logic inside if () condition so that multiple lines are
required, make !resource_size() a separate check and use continue.
Link: https://lore.kernel.org/r/20241216175632.4175-10-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
Michał Winiarski [Mon, 16 Dec 2024 17:56:15 +0000 (19:56 +0200)]
PCI: Add pci_resource_is_iov() to identify IOV resources
There are multiple places where special handling is required for IOV
resources.
Extract the identification of IOV resources to pci_resource_is_iov() and
drop a few ifdefs.
Link: https://lore.kernel.org/r/20241216175632.4175-9-ilpo.jarvinen@linux.intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
Ilpo Järvinen [Mon, 16 Dec 2024 17:56:14 +0000 (19:56 +0200)]
PCI: Use resource_set_{range,size}() helpers
A few sites that could use resource_set_range/size() in setup-bus.c were
not picked up earlier due to them no matching the usual pattern. Convert
them now.
These are more cases similar to
783602c920e9 ("PCI: Use
resource_set_{range,size}() helpers").
Link: https://lore.kernel.org/r/20241216175632.4175-8-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: add
783602c920e9 history]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
Ilpo Järvinen [Mon, 16 Dec 2024 17:56:13 +0000 (19:56 +0200)]
PCI: Use SZ_* instead of literals in setup-bus.c
Convert literals in setup-bus.c to SZ_* defines that make the size more
human readable.
As the code is now self-explanatory, eliminate comments about the size.
Link: https://lore.kernel.org/r/20241216175632.4175-7-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
Ilpo Järvinen [Mon, 16 Dec 2024 17:56:12 +0000 (19:56 +0200)]
PCI: Fix old_size lower bound in calculate_iosize() too
Commit
903534fa7d30 ("PCI: Fix resource double counting on remove &
rescan") fixed double counting of mem resources because of old_size being
applied too early.
Fix a similar counting bug on the io resource side.
Link: https://lore.kernel.org/r/20241216175632.4175-6-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
Ilpo Järvinen [Mon, 16 Dec 2024 17:56:11 +0000 (19:56 +0200)]
PCI: Allow relaxed bridge window tail sizing for optional resources
Commit
566f1dd52816 ("PCI: Relax bridge window tail sizing rules")
relaxed the bridge window requirements for non-optional size (size0)
but pbus_size_mem() also handles optional sizes (IOV resources) using
size1. This can manifest, e.g., as a failure to resize a BAR back to
its original size after it was first shrunk when device has a VF BAR
resource because the bridge window (size1) is enlarged beyond what is
strictly required to fit the downstream resources.
Allow using relaxed bridge window tail sizing rules also with the optional
resources (size1) so that the remove/realloc cycle during BAR resize
(smaller and back to the original size) does not fail unexpectedly due to
increase in bridge window size demand.
Also move add_align calculation to more logical place next to size1
assignment as they are strongly related to each other.
Link: https://lore.kernel.org/r/20241216175632.4175-5-ilpo.jarvinen@linux.intel.com
Fixes:
566f1dd52816 ("PCI: Relax bridge window tail sizing rules")
Reported-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
Ilpo Järvinen [Mon, 16 Dec 2024 17:56:10 +0000 (19:56 +0200)]
PCI: Simplify size1 assignment logic
In pbus_size_io() and pbus_size_mem(), a complex ?: operation is performed
to set size1. Decompose this so it's easier to read.
In the case of pbus_size_mem(), simply initializing size1 to zero ensures
the size1 checks work as expected.
Link: https://lore.kernel.org/r/20241216175632.4175-4-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>
Ilpo Järvinen [Mon, 16 Dec 2024 17:56:09 +0000 (19:56 +0200)]
PCI: Use min_align, not unrelated add_align, for size0
Commit
566f1dd52816 ("PCI: Relax bridge window tail sizing rules")
relaxed bridge window tail alignment rule for the non-optional part
(size0, no add_size/add_align). The required alignment given for
pbus_upstream_space_available(), however, was add_align which relates
only to size1 alignment.
As pbus_upstream_space_available() only selects between normal and relaxed
tail alignment of the bridge window, the different alignment only makes
relaxed tail alignment to be used more often than what was intended, which
should be harmless because relaxed tail alignment itself should work in all
cases.
For consistency, change pbus_upstream_space_available() call to use
min_align which is the alignment that is going to be used for the bridge
window in the case where size0 sized allocation is attempted.
Link: https://lore.kernel.org/r/20241216175632.4175-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xiaochun Lee <lixc17@lenovo.com>