From: Gregory Price Date: Mon, 12 May 2025 16:21:18 +0000 (-0400) Subject: cxl: update documentation structure in prep for new docs X-Git-Tag: v6.16-rc1~60^2~9^2~16 X-Git-Url: https://git.kernel.dk/?a=commitdiff_plain;h=a770647294bb679c48b5fbc49388f1d85e9f2e9f;p=linux-block.git cxl: update documentation structure in prep for new docs Restructure the cxl folder to make adding docs per-page cleaner. Signed-off-by: Gregory Price Reviewed-by: Dave Jiang Link: https://patch.msgid.link/20250512162134.3596150-2-gourry@gourry.net Signed-off-by: Dave Jiang --- diff --git a/Documentation/driver-api/cxl/access-coordinates.rst b/Documentation/driver-api/cxl/access-coordinates.rst deleted file mode 100644 index b07950ea30c9..000000000000 --- a/Documentation/driver-api/cxl/access-coordinates.rst +++ /dev/null @@ -1,91 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 -.. include:: - -================================== -CXL Access Coordinates Computation -================================== - -Shared Upstream Link Calculation -================================ -For certain CXL region construction with endpoints behind CXL switches (SW) or -Root Ports (RP), there is the possibility of the total bandwidth for all -the endpoints behind a switch being more than the switch upstream link. -A similar situation can occur within the host, upstream of the root ports. -The CXL driver performs an additional pass after all the targets have -arrived for a region in order to recalculate the bandwidths with possible -upstream link being a limiting factor in mind. - -The algorithm assumes the configuration is a symmetric topology as that -maximizes performance. When asymmetric topology is detected, the calculation -is aborted. An asymmetric topology is detected during topology walk where the -number of RPs detected as a grandparent is not equal to the number of devices -iterated in the same iteration loop. The assumption is made that subtle -asymmetry in properties does not happen and all paths to EPs are equal. - -There can be multiple switches under an RP. There can be multiple RPs under -a CXL Host Bridge (HB). There can be multiple HBs under a CXL Fixed Memory -Window Structure (CFMWS). - -An example hierarchy: - -> CFMWS 0 -> | -> _________|_________ -> | | -> ACPI0017-0 ACPI0017-1 -> GP0/HB0/ACPI0016-0 GP1/HB1/ACPI0016-1 -> | | | | -> RP0 RP1 RP2 RP3 -> | | | | -> SW 0 SW 1 SW 2 SW 3 -> | | | | | | | | -> EP0 EP1 EP2 EP3 EP4 EP5 EP6 EP7 - -Computation for the example hierarchy: - -Min (GP0 to CPU BW, - Min(SW 0 Upstream Link to RP0 BW, - Min(SW0SSLBIS for SW0DSP0 (EP0), EP0 DSLBIS, EP0 Upstream Link) + - Min(SW0SSLBIS for SW0DSP1 (EP1), EP1 DSLBIS, EP1 Upstream link)) + - Min(SW 1 Upstream Link to RP1 BW, - Min(SW1SSLBIS for SW1DSP0 (EP2), EP2 DSLBIS, EP2 Upstream Link) + - Min(SW1SSLBIS for SW1DSP1 (EP3), EP3 DSLBIS, EP3 Upstream link))) + -Min (GP1 to CPU BW, - Min(SW 2 Upstream Link to RP2 BW, - Min(SW2SSLBIS for SW2DSP0 (EP4), EP4 DSLBIS, EP4 Upstream Link) + - Min(SW2SSLBIS for SW2DSP1 (EP5), EP5 DSLBIS, EP5 Upstream link)) + - Min(SW 3 Upstream Link to RP3 BW, - Min(SW3SSLBIS for SW3DSP0 (EP6), EP6 DSLBIS, EP6 Upstream Link) + - Min(SW3SSLBIS for SW3DSP1 (EP7), EP7 DSLBIS, EP7 Upstream link)))) - -The calculation starts at cxl_region_shared_upstream_perf_update(). A xarray -is created to collect all the endpoint bandwidths via the -cxl_endpoint_gather_bandwidth() function. The min() of bandwidth from the -endpoint CDAT and the upstream link bandwidth is calculated. If the endpoint -has a CXL switch as a parent, then min() of calculated bandwidth and the -bandwidth from the SSLBIS for the switch downstream port that is associated -with the endpoint is calculated. The final bandwidth is stored in a -'struct cxl_perf_ctx' in the xarray indexed by a device pointer. If the -endpoint is direct attached to a root port (RP), the device pointer would be an -RP device. If the endpoint is behind a switch, the device pointer would be the -upstream device of the parent switch. - -At the next stage, the code walks through one or more switches if they exist -in the topology. For endpoints directly attached to RPs, this step is skipped. -If there is another switch upstream, the code takes the min() of the current -gathered bandwidth and the upstream link bandwidth. If there's a switch -upstream, then the SSLBIS of the upstream switch. - -Once the topology walk reaches the RP, whether it's direct attached endpoints -or walking through the switch(es), cxl_rp_gather_bandwidth() is called. At -this point all the bandwidths are aggregated per each host bridge, which is -also the index for the resulting xarray. - -The next step is to take the min() of the per host bridge bandwidth and the -bandwidth from the Generic Port (GP). The bandwidths for the GP is retrieved -via ACPI tables SRAT/HMAT. The min bandwidth are aggregated under the same -ACPI0017 device to form a new xarray. - -Finally, the cxl_region_update_bandwidth() is called and the aggregated -bandwidth from all the members of the last xarray is updated for the -access coordinates residing in the cxl region (cxlr) context. diff --git a/Documentation/driver-api/cxl/index.rst b/Documentation/driver-api/cxl/index.rst index 965ba90e8fb7..fe1594dc6778 100644 --- a/Documentation/driver-api/cxl/index.rst +++ b/Documentation/driver-api/cxl/index.rst @@ -4,12 +4,22 @@ Compute Express Link ==================== +CXL device configuration has a complex handoff between platform (Hardware, +BIOS, EFI), OS (early boot, core kernel, driver), and user policy decisions +that have impacts on each other. The docs here break up configurations steps. + +.. toctree:: + :maxdepth: 2 + :caption: Overview + + theory-of-operation + maturity-map + .. toctree:: :maxdepth: 1 + :caption: Linux Kernel Configuration - memory-devices - access-coordinates + linux/access-coordinates - maturity-map .. only:: subproject and html diff --git a/Documentation/driver-api/cxl/linux/access-coordinates.rst b/Documentation/driver-api/cxl/linux/access-coordinates.rst new file mode 100644 index 000000000000..b07950ea30c9 --- /dev/null +++ b/Documentation/driver-api/cxl/linux/access-coordinates.rst @@ -0,0 +1,91 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. include:: + +================================== +CXL Access Coordinates Computation +================================== + +Shared Upstream Link Calculation +================================ +For certain CXL region construction with endpoints behind CXL switches (SW) or +Root Ports (RP), there is the possibility of the total bandwidth for all +the endpoints behind a switch being more than the switch upstream link. +A similar situation can occur within the host, upstream of the root ports. +The CXL driver performs an additional pass after all the targets have +arrived for a region in order to recalculate the bandwidths with possible +upstream link being a limiting factor in mind. + +The algorithm assumes the configuration is a symmetric topology as that +maximizes performance. When asymmetric topology is detected, the calculation +is aborted. An asymmetric topology is detected during topology walk where the +number of RPs detected as a grandparent is not equal to the number of devices +iterated in the same iteration loop. The assumption is made that subtle +asymmetry in properties does not happen and all paths to EPs are equal. + +There can be multiple switches under an RP. There can be multiple RPs under +a CXL Host Bridge (HB). There can be multiple HBs under a CXL Fixed Memory +Window Structure (CFMWS). + +An example hierarchy: + +> CFMWS 0 +> | +> _________|_________ +> | | +> ACPI0017-0 ACPI0017-1 +> GP0/HB0/ACPI0016-0 GP1/HB1/ACPI0016-1 +> | | | | +> RP0 RP1 RP2 RP3 +> | | | | +> SW 0 SW 1 SW 2 SW 3 +> | | | | | | | | +> EP0 EP1 EP2 EP3 EP4 EP5 EP6 EP7 + +Computation for the example hierarchy: + +Min (GP0 to CPU BW, + Min(SW 0 Upstream Link to RP0 BW, + Min(SW0SSLBIS for SW0DSP0 (EP0), EP0 DSLBIS, EP0 Upstream Link) + + Min(SW0SSLBIS for SW0DSP1 (EP1), EP1 DSLBIS, EP1 Upstream link)) + + Min(SW 1 Upstream Link to RP1 BW, + Min(SW1SSLBIS for SW1DSP0 (EP2), EP2 DSLBIS, EP2 Upstream Link) + + Min(SW1SSLBIS for SW1DSP1 (EP3), EP3 DSLBIS, EP3 Upstream link))) + +Min (GP1 to CPU BW, + Min(SW 2 Upstream Link to RP2 BW, + Min(SW2SSLBIS for SW2DSP0 (EP4), EP4 DSLBIS, EP4 Upstream Link) + + Min(SW2SSLBIS for SW2DSP1 (EP5), EP5 DSLBIS, EP5 Upstream link)) + + Min(SW 3 Upstream Link to RP3 BW, + Min(SW3SSLBIS for SW3DSP0 (EP6), EP6 DSLBIS, EP6 Upstream Link) + + Min(SW3SSLBIS for SW3DSP1 (EP7), EP7 DSLBIS, EP7 Upstream link)))) + +The calculation starts at cxl_region_shared_upstream_perf_update(). A xarray +is created to collect all the endpoint bandwidths via the +cxl_endpoint_gather_bandwidth() function. The min() of bandwidth from the +endpoint CDAT and the upstream link bandwidth is calculated. If the endpoint +has a CXL switch as a parent, then min() of calculated bandwidth and the +bandwidth from the SSLBIS for the switch downstream port that is associated +with the endpoint is calculated. The final bandwidth is stored in a +'struct cxl_perf_ctx' in the xarray indexed by a device pointer. If the +endpoint is direct attached to a root port (RP), the device pointer would be an +RP device. If the endpoint is behind a switch, the device pointer would be the +upstream device of the parent switch. + +At the next stage, the code walks through one or more switches if they exist +in the topology. For endpoints directly attached to RPs, this step is skipped. +If there is another switch upstream, the code takes the min() of the current +gathered bandwidth and the upstream link bandwidth. If there's a switch +upstream, then the SSLBIS of the upstream switch. + +Once the topology walk reaches the RP, whether it's direct attached endpoints +or walking through the switch(es), cxl_rp_gather_bandwidth() is called. At +this point all the bandwidths are aggregated per each host bridge, which is +also the index for the resulting xarray. + +The next step is to take the min() of the per host bridge bandwidth and the +bandwidth from the Generic Port (GP). The bandwidths for the GP is retrieved +via ACPI tables SRAT/HMAT. The min bandwidth are aggregated under the same +ACPI0017 device to form a new xarray. + +Finally, the cxl_region_update_bandwidth() is called and the aggregated +bandwidth from all the members of the last xarray is updated for the +access coordinates residing in the cxl region (cxlr) context. diff --git a/Documentation/driver-api/cxl/memory-devices.rst b/Documentation/driver-api/cxl/memory-devices.rst deleted file mode 100644 index d732c42526df..000000000000 --- a/Documentation/driver-api/cxl/memory-devices.rst +++ /dev/null @@ -1,398 +0,0 @@ -.. SPDX-License-Identifier: GPL-2.0 -.. include:: - -=================================== -Compute Express Link Memory Devices -=================================== - -A Compute Express Link Memory Device is a CXL component that implements the -CXL.mem protocol. It contains some amount of volatile memory, persistent memory, -or both. It is enumerated as a PCI device for configuration and passing -messages over an MMIO mailbox. Its contribution to the System Physical -Address space is handled via HDM (Host Managed Device Memory) decoders -that optionally define a device's contribution to an interleaved address -range across multiple devices underneath a host-bridge or interleaved -across host-bridges. - -CXL Bus: Theory of Operation -============================ -Similar to how a RAID driver takes disk objects and assembles them into a new -logical device, the CXL subsystem is tasked to take PCIe and ACPI objects and -assemble them into a CXL.mem decode topology. The need for runtime configuration -of the CXL.mem topology is also similar to RAID in that different environments -with the same hardware configuration may decide to assemble the topology in -contrasting ways. One may choose performance (RAID0) striping memory across -multiple Host Bridges and endpoints while another may opt for fault tolerance -and disable any striping in the CXL.mem topology. - -Platform firmware enumerates a menu of interleave options at the "CXL root port" -(Linux term for the top of the CXL decode topology). From there, PCIe topology -dictates which endpoints can participate in which Host Bridge decode regimes. -Each PCIe Switch in the path between the root and an endpoint introduces a point -at which the interleave can be split. For example platform firmware may say at a -given range only decodes to 1 one Host Bridge, but that Host Bridge may in turn -interleave cycles across multiple Root Ports. An intervening Switch between a -port and an endpoint may interleave cycles across multiple Downstream Switch -Ports, etc. - -Here is a sample listing of a CXL topology defined by 'cxl_test'. The 'cxl_test' -module generates an emulated CXL topology of 2 Host Bridges each with 2 Root -Ports. Each of those Root Ports are connected to 2-way switches with endpoints -connected to those downstream ports for a total of 8 endpoints:: - - # cxl list -BEMPu -b cxl_test - { - "bus":"root3", - "provider":"cxl_test", - "ports:root3":[ - { - "port":"port5", - "host":"cxl_host_bridge.1", - "ports:port5":[ - { - "port":"port8", - "host":"cxl_switch_uport.1", - "endpoints:port8":[ - { - "endpoint":"endpoint9", - "host":"mem2", - "memdev":{ - "memdev":"mem2", - "pmem_size":"256.00 MiB (268.44 MB)", - "ram_size":"256.00 MiB (268.44 MB)", - "serial":"0x1", - "numa_node":1, - "host":"cxl_mem.1" - } - }, - { - "endpoint":"endpoint15", - "host":"mem6", - "memdev":{ - "memdev":"mem6", - "pmem_size":"256.00 MiB (268.44 MB)", - "ram_size":"256.00 MiB (268.44 MB)", - "serial":"0x5", - "numa_node":1, - "host":"cxl_mem.5" - } - } - ] - }, - { - "port":"port12", - "host":"cxl_switch_uport.3", - "endpoints:port12":[ - { - "endpoint":"endpoint17", - "host":"mem8", - "memdev":{ - "memdev":"mem8", - "pmem_size":"256.00 MiB (268.44 MB)", - "ram_size":"256.00 MiB (268.44 MB)", - "serial":"0x7", - "numa_node":1, - "host":"cxl_mem.7" - } - }, - { - "endpoint":"endpoint13", - "host":"mem4", - "memdev":{ - "memdev":"mem4", - "pmem_size":"256.00 MiB (268.44 MB)", - "ram_size":"256.00 MiB (268.44 MB)", - "serial":"0x3", - "numa_node":1, - "host":"cxl_mem.3" - } - } - ] - } - ] - }, - { - "port":"port4", - "host":"cxl_host_bridge.0", - "ports:port4":[ - { - "port":"port6", - "host":"cxl_switch_uport.0", - "endpoints:port6":[ - { - "endpoint":"endpoint7", - "host":"mem1", - "memdev":{ - "memdev":"mem1", - "pmem_size":"256.00 MiB (268.44 MB)", - "ram_size":"256.00 MiB (268.44 MB)", - "serial":"0", - "numa_node":0, - "host":"cxl_mem.0" - } - }, - { - "endpoint":"endpoint14", - "host":"mem5", - "memdev":{ - "memdev":"mem5", - "pmem_size":"256.00 MiB (268.44 MB)", - "ram_size":"256.00 MiB (268.44 MB)", - "serial":"0x4", - "numa_node":0, - "host":"cxl_mem.4" - } - } - ] - }, - { - "port":"port10", - "host":"cxl_switch_uport.2", - "endpoints:port10":[ - { - "endpoint":"endpoint16", - "host":"mem7", - "memdev":{ - "memdev":"mem7", - "pmem_size":"256.00 MiB (268.44 MB)", - "ram_size":"256.00 MiB (268.44 MB)", - "serial":"0x6", - "numa_node":0, - "host":"cxl_mem.6" - } - }, - { - "endpoint":"endpoint11", - "host":"mem3", - "memdev":{ - "memdev":"mem3", - "pmem_size":"256.00 MiB (268.44 MB)", - "ram_size":"256.00 MiB (268.44 MB)", - "serial":"0x2", - "numa_node":0, - "host":"cxl_mem.2" - } - } - ] - } - ] - } - ] - } - -In that listing each "root", "port", and "endpoint" object correspond a kernel -'struct cxl_port' object. A 'cxl_port' is a device that can decode CXL.mem to -its descendants. So "root" claims non-PCIe enumerable platform decode ranges and -decodes them to "ports", "ports" decode to "endpoints", and "endpoints" -represent the decode from SPA (System Physical Address) to DPA (Device Physical -Address). - -Continuing the RAID analogy, disks have both topology metadata and on device -metadata that determine RAID set assembly. CXL Port topology and CXL Port link -status is metadata for CXL.mem set assembly. The CXL Port topology is enumerated -by the arrival of a CXL.mem device. I.e. unless and until the PCIe core attaches -the cxl_pci driver to a CXL Memory Expander there is no role for CXL Port -objects. Conversely for hot-unplug / removal scenarios, there is no need for -the Linux PCI core to tear down switch-level CXL resources because the endpoint -->remove() event cleans up the port data that was established to support that -Memory Expander. - -The port metadata and potential decode schemes that a give memory device may -participate can be determined via a command like:: - - # cxl list -BDMu -d root -m mem3 - { - "bus":"root3", - "provider":"cxl_test", - "decoders:root3":[ - { - "decoder":"decoder3.1", - "resource":"0x8030000000", - "size":"512.00 MiB (536.87 MB)", - "volatile_capable":true, - "nr_targets":2 - }, - { - "decoder":"decoder3.3", - "resource":"0x8060000000", - "size":"512.00 MiB (536.87 MB)", - "pmem_capable":true, - "nr_targets":2 - }, - { - "decoder":"decoder3.0", - "resource":"0x8020000000", - "size":"256.00 MiB (268.44 MB)", - "volatile_capable":true, - "nr_targets":1 - }, - { - "decoder":"decoder3.2", - "resource":"0x8050000000", - "size":"256.00 MiB (268.44 MB)", - "pmem_capable":true, - "nr_targets":1 - } - ], - "memdevs:root3":[ - { - "memdev":"mem3", - "pmem_size":"256.00 MiB (268.44 MB)", - "ram_size":"256.00 MiB (268.44 MB)", - "serial":"0x2", - "numa_node":0, - "host":"cxl_mem.2" - } - ] - } - -...which queries the CXL topology to ask "given CXL Memory Expander with a kernel -device name of 'mem3' which platform level decode ranges may this device -participate". A given expander can participate in multiple CXL.mem interleave -sets simultaneously depending on how many decoder resource it has. In this -example mem3 can participate in one or more of a PMEM interleave that spans to -Host Bridges, a PMEM interleave that targets a single Host Bridge, a Volatile -memory interleave that spans 2 Host Bridges, and a Volatile memory interleave -that only targets a single Host Bridge. - -Conversely the memory devices that can participate in a given platform level -decode scheme can be determined via a command like the following:: - - # cxl list -MDu -d 3.2 - [ - { - "memdevs":[ - { - "memdev":"mem1", - "pmem_size":"256.00 MiB (268.44 MB)", - "ram_size":"256.00 MiB (268.44 MB)", - "serial":"0", - "numa_node":0, - "host":"cxl_mem.0" - }, - { - "memdev":"mem5", - "pmem_size":"256.00 MiB (268.44 MB)", - "ram_size":"256.00 MiB (268.44 MB)", - "serial":"0x4", - "numa_node":0, - "host":"cxl_mem.4" - }, - { - "memdev":"mem7", - "pmem_size":"256.00 MiB (268.44 MB)", - "ram_size":"256.00 MiB (268.44 MB)", - "serial":"0x6", - "numa_node":0, - "host":"cxl_mem.6" - }, - { - "memdev":"mem3", - "pmem_size":"256.00 MiB (268.44 MB)", - "ram_size":"256.00 MiB (268.44 MB)", - "serial":"0x2", - "numa_node":0, - "host":"cxl_mem.2" - } - ] - }, - { - "root decoders":[ - { - "decoder":"decoder3.2", - "resource":"0x8050000000", - "size":"256.00 MiB (268.44 MB)", - "pmem_capable":true, - "nr_targets":1 - } - ] - } - ] - -...where the naming scheme for decoders is "decoder.". - -Driver Infrastructure -===================== - -This section covers the driver infrastructure for a CXL memory device. - -CXL Memory Device ------------------ - -.. kernel-doc:: drivers/cxl/pci.c - :doc: cxl pci - -.. kernel-doc:: drivers/cxl/pci.c - :internal: - -.. kernel-doc:: drivers/cxl/mem.c - :doc: cxl mem - -.. kernel-doc:: drivers/cxl/cxlmem.h - :internal: - -.. kernel-doc:: drivers/cxl/core/memdev.c - :identifiers: - -CXL Port --------- -.. kernel-doc:: drivers/cxl/port.c - :doc: cxl port - -CXL Core --------- -.. kernel-doc:: drivers/cxl/cxl.h - :doc: cxl objects - -.. kernel-doc:: drivers/cxl/cxl.h - :internal: - -.. kernel-doc:: drivers/cxl/core/hdm.c - :doc: cxl core hdm - -.. kernel-doc:: drivers/cxl/core/hdm.c - :identifiers: - -.. kernel-doc:: drivers/cxl/core/cdat.c - :identifiers: - -.. kernel-doc:: drivers/cxl/core/port.c - :doc: cxl core - -.. kernel-doc:: drivers/cxl/core/port.c - :identifiers: - -.. kernel-doc:: drivers/cxl/core/pci.c - :doc: cxl core pci - -.. kernel-doc:: drivers/cxl/core/pci.c - :identifiers: - -.. kernel-doc:: drivers/cxl/core/pmem.c - :doc: cxl pmem - -.. kernel-doc:: drivers/cxl/core/regs.c - :doc: cxl registers - -.. kernel-doc:: drivers/cxl/core/mbox.c - :doc: cxl mbox - -CXL Regions ------------ -.. kernel-doc:: drivers/cxl/core/region.c - :doc: cxl core region - -.. kernel-doc:: drivers/cxl/core/region.c - :identifiers: - -External Interfaces -=================== - -CXL IOCTL Interface -------------------- - -.. kernel-doc:: include/uapi/linux/cxl_mem.h - :doc: UAPI - -.. kernel-doc:: include/uapi/linux/cxl_mem.h - :internal: diff --git a/Documentation/driver-api/cxl/theory-of-operation.rst b/Documentation/driver-api/cxl/theory-of-operation.rst new file mode 100644 index 000000000000..32739e253453 --- /dev/null +++ b/Documentation/driver-api/cxl/theory-of-operation.rst @@ -0,0 +1,398 @@ +.. SPDX-License-Identifier: GPL-2.0 +.. include:: + +=============================================== +Compute Express Link Driver Theory of Operation +=============================================== + +A Compute Express Link Memory Device is a CXL component that implements the +CXL.mem protocol. It contains some amount of volatile memory, persistent memory, +or both. It is enumerated as a PCI device for configuration and passing +messages over an MMIO mailbox. Its contribution to the System Physical +Address space is handled via HDM (Host Managed Device Memory) decoders +that optionally define a device's contribution to an interleaved address +range across multiple devices underneath a host-bridge or interleaved +across host-bridges. + +The CXL Bus +=========== +Similar to how a RAID driver takes disk objects and assembles them into a new +logical device, the CXL subsystem is tasked to take PCIe and ACPI objects and +assemble them into a CXL.mem decode topology. The need for runtime configuration +of the CXL.mem topology is also similar to RAID in that different environments +with the same hardware configuration may decide to assemble the topology in +contrasting ways. One may choose performance (RAID0) striping memory across +multiple Host Bridges and endpoints while another may opt for fault tolerance +and disable any striping in the CXL.mem topology. + +Platform firmware enumerates a menu of interleave options at the "CXL root port" +(Linux term for the top of the CXL decode topology). From there, PCIe topology +dictates which endpoints can participate in which Host Bridge decode regimes. +Each PCIe Switch in the path between the root and an endpoint introduces a point +at which the interleave can be split. For example platform firmware may say at a +given range only decodes to 1 one Host Bridge, but that Host Bridge may in turn +interleave cycles across multiple Root Ports. An intervening Switch between a +port and an endpoint may interleave cycles across multiple Downstream Switch +Ports, etc. + +Here is a sample listing of a CXL topology defined by 'cxl_test'. The 'cxl_test' +module generates an emulated CXL topology of 2 Host Bridges each with 2 Root +Ports. Each of those Root Ports are connected to 2-way switches with endpoints +connected to those downstream ports for a total of 8 endpoints:: + + # cxl list -BEMPu -b cxl_test + { + "bus":"root3", + "provider":"cxl_test", + "ports:root3":[ + { + "port":"port5", + "host":"cxl_host_bridge.1", + "ports:port5":[ + { + "port":"port8", + "host":"cxl_switch_uport.1", + "endpoints:port8":[ + { + "endpoint":"endpoint9", + "host":"mem2", + "memdev":{ + "memdev":"mem2", + "pmem_size":"256.00 MiB (268.44 MB)", + "ram_size":"256.00 MiB (268.44 MB)", + "serial":"0x1", + "numa_node":1, + "host":"cxl_mem.1" + } + }, + { + "endpoint":"endpoint15", + "host":"mem6", + "memdev":{ + "memdev":"mem6", + "pmem_size":"256.00 MiB (268.44 MB)", + "ram_size":"256.00 MiB (268.44 MB)", + "serial":"0x5", + "numa_node":1, + "host":"cxl_mem.5" + } + } + ] + }, + { + "port":"port12", + "host":"cxl_switch_uport.3", + "endpoints:port12":[ + { + "endpoint":"endpoint17", + "host":"mem8", + "memdev":{ + "memdev":"mem8", + "pmem_size":"256.00 MiB (268.44 MB)", + "ram_size":"256.00 MiB (268.44 MB)", + "serial":"0x7", + "numa_node":1, + "host":"cxl_mem.7" + } + }, + { + "endpoint":"endpoint13", + "host":"mem4", + "memdev":{ + "memdev":"mem4", + "pmem_size":"256.00 MiB (268.44 MB)", + "ram_size":"256.00 MiB (268.44 MB)", + "serial":"0x3", + "numa_node":1, + "host":"cxl_mem.3" + } + } + ] + } + ] + }, + { + "port":"port4", + "host":"cxl_host_bridge.0", + "ports:port4":[ + { + "port":"port6", + "host":"cxl_switch_uport.0", + "endpoints:port6":[ + { + "endpoint":"endpoint7", + "host":"mem1", + "memdev":{ + "memdev":"mem1", + "pmem_size":"256.00 MiB (268.44 MB)", + "ram_size":"256.00 MiB (268.44 MB)", + "serial":"0", + "numa_node":0, + "host":"cxl_mem.0" + } + }, + { + "endpoint":"endpoint14", + "host":"mem5", + "memdev":{ + "memdev":"mem5", + "pmem_size":"256.00 MiB (268.44 MB)", + "ram_size":"256.00 MiB (268.44 MB)", + "serial":"0x4", + "numa_node":0, + "host":"cxl_mem.4" + } + } + ] + }, + { + "port":"port10", + "host":"cxl_switch_uport.2", + "endpoints:port10":[ + { + "endpoint":"endpoint16", + "host":"mem7", + "memdev":{ + "memdev":"mem7", + "pmem_size":"256.00 MiB (268.44 MB)", + "ram_size":"256.00 MiB (268.44 MB)", + "serial":"0x6", + "numa_node":0, + "host":"cxl_mem.6" + } + }, + { + "endpoint":"endpoint11", + "host":"mem3", + "memdev":{ + "memdev":"mem3", + "pmem_size":"256.00 MiB (268.44 MB)", + "ram_size":"256.00 MiB (268.44 MB)", + "serial":"0x2", + "numa_node":0, + "host":"cxl_mem.2" + } + } + ] + } + ] + } + ] + } + +In that listing each "root", "port", and "endpoint" object correspond a kernel +'struct cxl_port' object. A 'cxl_port' is a device that can decode CXL.mem to +its descendants. So "root" claims non-PCIe enumerable platform decode ranges and +decodes them to "ports", "ports" decode to "endpoints", and "endpoints" +represent the decode from SPA (System Physical Address) to DPA (Device Physical +Address). + +Continuing the RAID analogy, disks have both topology metadata and on device +metadata that determine RAID set assembly. CXL Port topology and CXL Port link +status is metadata for CXL.mem set assembly. The CXL Port topology is enumerated +by the arrival of a CXL.mem device. I.e. unless and until the PCIe core attaches +the cxl_pci driver to a CXL Memory Expander there is no role for CXL Port +objects. Conversely for hot-unplug / removal scenarios, there is no need for +the Linux PCI core to tear down switch-level CXL resources because the endpoint +->remove() event cleans up the port data that was established to support that +Memory Expander. + +The port metadata and potential decode schemes that a give memory device may +participate can be determined via a command like:: + + # cxl list -BDMu -d root -m mem3 + { + "bus":"root3", + "provider":"cxl_test", + "decoders:root3":[ + { + "decoder":"decoder3.1", + "resource":"0x8030000000", + "size":"512.00 MiB (536.87 MB)", + "volatile_capable":true, + "nr_targets":2 + }, + { + "decoder":"decoder3.3", + "resource":"0x8060000000", + "size":"512.00 MiB (536.87 MB)", + "pmem_capable":true, + "nr_targets":2 + }, + { + "decoder":"decoder3.0", + "resource":"0x8020000000", + "size":"256.00 MiB (268.44 MB)", + "volatile_capable":true, + "nr_targets":1 + }, + { + "decoder":"decoder3.2", + "resource":"0x8050000000", + "size":"256.00 MiB (268.44 MB)", + "pmem_capable":true, + "nr_targets":1 + } + ], + "memdevs:root3":[ + { + "memdev":"mem3", + "pmem_size":"256.00 MiB (268.44 MB)", + "ram_size":"256.00 MiB (268.44 MB)", + "serial":"0x2", + "numa_node":0, + "host":"cxl_mem.2" + } + ] + } + +...which queries the CXL topology to ask "given CXL Memory Expander with a kernel +device name of 'mem3' which platform level decode ranges may this device +participate". A given expander can participate in multiple CXL.mem interleave +sets simultaneously depending on how many decoder resource it has. In this +example mem3 can participate in one or more of a PMEM interleave that spans to +Host Bridges, a PMEM interleave that targets a single Host Bridge, a Volatile +memory interleave that spans 2 Host Bridges, and a Volatile memory interleave +that only targets a single Host Bridge. + +Conversely the memory devices that can participate in a given platform level +decode scheme can be determined via a command like the following:: + + # cxl list -MDu -d 3.2 + [ + { + "memdevs":[ + { + "memdev":"mem1", + "pmem_size":"256.00 MiB (268.44 MB)", + "ram_size":"256.00 MiB (268.44 MB)", + "serial":"0", + "numa_node":0, + "host":"cxl_mem.0" + }, + { + "memdev":"mem5", + "pmem_size":"256.00 MiB (268.44 MB)", + "ram_size":"256.00 MiB (268.44 MB)", + "serial":"0x4", + "numa_node":0, + "host":"cxl_mem.4" + }, + { + "memdev":"mem7", + "pmem_size":"256.00 MiB (268.44 MB)", + "ram_size":"256.00 MiB (268.44 MB)", + "serial":"0x6", + "numa_node":0, + "host":"cxl_mem.6" + }, + { + "memdev":"mem3", + "pmem_size":"256.00 MiB (268.44 MB)", + "ram_size":"256.00 MiB (268.44 MB)", + "serial":"0x2", + "numa_node":0, + "host":"cxl_mem.2" + } + ] + }, + { + "root decoders":[ + { + "decoder":"decoder3.2", + "resource":"0x8050000000", + "size":"256.00 MiB (268.44 MB)", + "pmem_capable":true, + "nr_targets":1 + } + ] + } + ] + +...where the naming scheme for decoders is "decoder.". + +Driver Infrastructure +===================== + +This section covers the driver infrastructure for a CXL memory device. + +CXL Memory Device +----------------- + +.. kernel-doc:: drivers/cxl/pci.c + :doc: cxl pci + +.. kernel-doc:: drivers/cxl/pci.c + :internal: + +.. kernel-doc:: drivers/cxl/mem.c + :doc: cxl mem + +.. kernel-doc:: drivers/cxl/cxlmem.h + :internal: + +.. kernel-doc:: drivers/cxl/core/memdev.c + :identifiers: + +CXL Port +-------- +.. kernel-doc:: drivers/cxl/port.c + :doc: cxl port + +CXL Core +-------- +.. kernel-doc:: drivers/cxl/cxl.h + :doc: cxl objects + +.. kernel-doc:: drivers/cxl/cxl.h + :internal: + +.. kernel-doc:: drivers/cxl/core/hdm.c + :doc: cxl core hdm + +.. kernel-doc:: drivers/cxl/core/hdm.c + :identifiers: + +.. kernel-doc:: drivers/cxl/core/cdat.c + :identifiers: + +.. kernel-doc:: drivers/cxl/core/port.c + :doc: cxl core + +.. kernel-doc:: drivers/cxl/core/port.c + :identifiers: + +.. kernel-doc:: drivers/cxl/core/pci.c + :doc: cxl core pci + +.. kernel-doc:: drivers/cxl/core/pci.c + :identifiers: + +.. kernel-doc:: drivers/cxl/core/pmem.c + :doc: cxl pmem + +.. kernel-doc:: drivers/cxl/core/regs.c + :doc: cxl registers + +.. kernel-doc:: drivers/cxl/core/mbox.c + :doc: cxl mbox + +CXL Regions +----------- +.. kernel-doc:: drivers/cxl/core/region.c + :doc: cxl core region + +.. kernel-doc:: drivers/cxl/core/region.c + :identifiers: + +External Interfaces +=================== + +CXL IOCTL Interface +------------------- + +.. kernel-doc:: include/uapi/linux/cxl_mem.h + :doc: UAPI + +.. kernel-doc:: include/uapi/linux/cxl_mem.h + :internal: