Commit | Line | Data |
---|---|---|
a6546f89 | 1 | .. SPDX-License-Identifier: GPL-2.0-only |
2a26ed8e MCC |
2 | .. include:: <isonum.txt> |
3 | ||
4 | ===================== | |
5 | VFIO Mediated devices | |
6 | ===================== | |
7 | ||
8 | :Copyright: |copy| 2016, NVIDIA CORPORATION. All rights reserved. | |
9 | :Author: Neo Jia <cjia@nvidia.com> | |
10 | :Author: Kirti Wankhede <kwankhede@nvidia.com> | |
11 | ||
2a26ed8e | 12 | |
8e1c5a40 KW |
13 | |
14 | Virtual Function I/O (VFIO) Mediated devices[1] | |
15 | =============================================== | |
16 | ||
17 | The number of use cases for virtualizing DMA devices that do not have built-in | |
18 | SR_IOV capability is increasing. Previously, to virtualize such devices, | |
19 | developers had to create their own management interfaces and APIs, and then | |
20 | integrate them with user space software. To simplify integration with user space | |
21 | software, we have identified common requirements and a unified management | |
22 | interface for such devices. | |
23 | ||
24 | The VFIO driver framework provides unified APIs for direct device access. It is | |
25 | an IOMMU/device-agnostic framework for exposing direct device access to user | |
26 | space in a secure, IOMMU-protected environment. This framework is used for | |
27 | multiple devices, such as GPUs, network adapters, and compute accelerators. With | |
28 | direct device access, virtual machines or user space applications have direct | |
29 | access to the physical device. This framework is reused for mediated devices. | |
30 | ||
31 | The mediated core driver provides a common interface for mediated device | |
32 | management that can be used by drivers of different devices. This module | |
33 | provides a generic interface to perform these operations: | |
34 | ||
35 | * Create and destroy a mediated device | |
36 | * Add a mediated device to and remove it from a mediated bus driver | |
37 | * Add a mediated device to and remove it from an IOMMU group | |
38 | ||
39 | The mediated core driver also provides an interface to register a bus driver. | |
40 | For example, the mediated VFIO mdev driver is designed for mediated devices and | |
41 | supports VFIO APIs. The mediated bus driver adds a mediated device to and | |
42 | removes it from a VFIO group. | |
43 | ||
44 | The following high-level block diagram shows the main components and interfaces | |
45 | in the VFIO mediated driver framework. The diagram shows NVIDIA, Intel, and IBM | |
2a26ed8e | 46 | devices as examples, as these devices are the first devices to use this module:: |
8e1c5a40 KW |
47 | |
48 | +---------------+ | |
49 | | | | |
50 | | +-----------+ | mdev_register_driver() +--------------+ | |
51 | | | | +<------------------------+ | | |
52 | | | mdev | | | | | |
53 | | | bus | +------------------------>+ vfio_mdev.ko |<-> VFIO user | |
54 | | | driver | | probe()/remove() | | APIs | |
55 | | | | | +--------------+ | |
56 | | +-----------+ | | |
57 | | | | |
58 | | MDEV CORE | | |
59 | | MODULE | | |
60 | | mdev.ko | | |
61 | | +-----------+ | mdev_register_device() +--------------+ | |
62 | | | | +<------------------------+ | | |
63 | | | | | | nvidia.ko |<-> physical | |
64 | | | | +------------------------>+ | device | |
65 | | | | | callbacks +--------------+ | |
66 | | | Physical | | | |
67 | | | device | | mdev_register_device() +--------------+ | |
68 | | | interface | |<------------------------+ | | |
69 | | | | | | i915.ko |<-> physical | |
70 | | | | +------------------------>+ | device | |
71 | | | | | callbacks +--------------+ | |
72 | | | | | | |
73 | | | | | mdev_register_device() +--------------+ | |
74 | | | | +<------------------------+ | | |
75 | | | | | | ccw_device.ko|<-> physical | |
76 | | | | +------------------------>+ | device | |
77 | | | | | callbacks +--------------+ | |
78 | | +-----------+ | | |
79 | +---------------+ | |
80 | ||
81 | ||
82 | Registration Interfaces | |
83 | ======================= | |
84 | ||
85 | The mediated core driver provides the following types of registration | |
86 | interfaces: | |
87 | ||
88 | * Registration interface for a mediated bus driver | |
89 | * Physical device driver interface | |
90 | ||
91 | Registration Interface for a Mediated Bus Driver | |
92 | ------------------------------------------------ | |
93 | ||
88a21f26 | 94 | The registration interface for a mediated device driver provides the following |
2a26ed8e | 95 | structure to represent a mediated device's driver:: |
8e1c5a40 KW |
96 | |
97 | /* | |
98 | * struct mdev_driver [2] - Mediated device's driver | |
8e1c5a40 KW |
99 | * @probe: called when new device created |
100 | * @remove: called when device removed | |
101 | * @driver: device driver structure | |
102 | */ | |
103 | struct mdev_driver { | |
2a3d15f2 JG |
104 | int (*probe) (struct mdev_device *dev); |
105 | void (*remove) (struct mdev_device *dev); | |
6b42f491 | 106 | struct attribute_group **supported_type_groups; |
8e1c5a40 KW |
107 | struct device_driver driver; |
108 | }; | |
109 | ||
110 | A mediated bus driver for mdev should use this structure in the function calls | |
111 | to register and unregister itself with the core driver: | |
112 | ||
2a26ed8e | 113 | * Register:: |
8e1c5a40 | 114 | |
d1877e63 | 115 | int mdev_register_driver(struct mdev_driver *drv); |
8e1c5a40 | 116 | |
2a26ed8e | 117 | * Unregister:: |
8e1c5a40 | 118 | |
d1877e63 | 119 | void mdev_unregister_driver(struct mdev_driver *drv); |
8e1c5a40 | 120 | |
6b42f491 JG |
121 | The mediated bus driver's probe function should create a vfio_device on top of |
122 | the mdev_device and connect it to an appropriate implementation of | |
123 | vfio_device_ops. | |
8e1c5a40 | 124 | |
88a21f26 JG |
125 | When a driver wants to add the GUID creation sysfs to an existing device it has |
126 | probe'd to then it should call:: | |
8e1c5a40 | 127 | |
d1877e63 AW |
128 | int mdev_register_device(struct device *dev, |
129 | struct mdev_driver *mdev_driver); | |
8e1c5a40 | 130 | |
88a21f26 JG |
131 | This will provide the 'mdev_supported_types/XX/create' files which can then be |
132 | used to trigger the creation of a mdev_device. The created mdev_device will be | |
133 | attached to the specified driver. | |
134 | ||
135 | When the driver needs to remove itself it calls:: | |
8e1c5a40 | 136 | |
d1877e63 | 137 | void mdev_unregister_device(struct device *dev); |
8e1c5a40 | 138 | |
88a21f26 | 139 | Which will unbind and destroy all the created mdevs and remove the sysfs files. |
8e1c5a40 KW |
140 | |
141 | Mediated Device Management Interface Through sysfs | |
142 | ================================================== | |
143 | ||
144 | The management interface through sysfs enables user space software, such as | |
145 | libvirt, to query and configure mediated devices in a hardware-agnostic fashion. | |
146 | This management interface provides flexibility to the underlying physical | |
147 | device's driver to support features such as: | |
148 | ||
149 | * Mediated device hot plug | |
150 | * Multiple mediated devices in a single virtual machine | |
151 | * Multiple mediated devices from different physical devices | |
152 | ||
153 | Links in the mdev_bus Class Directory | |
154 | ------------------------------------- | |
155 | The /sys/class/mdev_bus/ directory contains links to devices that are registered | |
156 | with the mdev core driver. | |
157 | ||
158 | Directories and files under the sysfs for Each Physical Device | |
159 | -------------------------------------------------------------- | |
160 | ||
2a26ed8e MCC |
161 | :: |
162 | ||
163 | |- [parent physical device] | |
164 | |--- Vendor-specific-attributes [optional] | |
165 | |--- [mdev_supported_types] | |
166 | | |--- [<type-id>] | |
167 | | | |--- create | |
168 | | | |--- name | |
169 | | | |--- available_instances | |
170 | | | |--- device_api | |
171 | | | |--- description | |
172 | | | |--- [devices] | |
173 | | |--- [<type-id>] | |
174 | | | |--- create | |
175 | | | |--- name | |
176 | | | |--- available_instances | |
177 | | | |--- device_api | |
178 | | | |--- description | |
179 | | | |--- [devices] | |
180 | | |--- [<type-id>] | |
181 | | |--- create | |
182 | | |--- name | |
183 | | |--- available_instances | |
184 | | |--- device_api | |
185 | | |--- description | |
186 | | |--- [devices] | |
8e1c5a40 KW |
187 | |
188 | * [mdev_supported_types] | |
189 | ||
190 | The list of currently supported mediated device types and their details. | |
191 | ||
192 | [<type-id>], device_api, and available_instances are mandatory attributes | |
193 | that should be provided by vendor driver. | |
194 | ||
195 | * [<type-id>] | |
196 | ||
1c4f128e SD |
197 | The [<type-id>] name is created by adding the device driver string as a prefix |
198 | to the string provided by the vendor driver. This format of this name is as | |
2a26ed8e | 199 | follows:: |
8e1c5a40 KW |
200 | |
201 | sprintf(buf, "%s-%s", dev_driver_string(parent->dev), group->name); | |
202 | ||
9372e6fe | 203 | (or using mdev_parent_dev(mdev) to arrive at the parent device outside |
2a26ed8e | 204 | of the core mdev code) |
9372e6fe | 205 | |
8e1c5a40 KW |
206 | * device_api |
207 | ||
208 | This attribute should show which device API is being created, for example, | |
209 | "vfio-pci" for a PCI device. | |
210 | ||
211 | * available_instances | |
212 | ||
213 | This attribute should show the number of devices of type <type-id> that can be | |
214 | created. | |
215 | ||
216 | * [device] | |
217 | ||
218 | This directory contains links to the devices of type <type-id> that have been | |
2a26ed8e | 219 | created. |
8e1c5a40 KW |
220 | |
221 | * name | |
222 | ||
223 | This attribute should show human readable name. This is optional attribute. | |
224 | ||
225 | * description | |
226 | ||
227 | This attribute should show brief features/description of the type. This is | |
228 | optional attribute. | |
229 | ||
230 | Directories and Files Under the sysfs for Each mdev Device | |
231 | ---------------------------------------------------------- | |
232 | ||
2a26ed8e MCC |
233 | :: |
234 | ||
235 | |- [parent phy device] | |
236 | |--- [$MDEV_UUID] | |
8e1c5a40 KW |
237 | |--- remove |
238 | |--- mdev_type {link to its type} | |
239 | |--- vendor-specific-attributes [optional] | |
240 | ||
241 | * remove (write only) | |
2a26ed8e | 242 | |
8e1c5a40 KW |
243 | Writing '1' to the 'remove' file destroys the mdev device. The vendor driver can |
244 | fail the remove() callback if that device is active and the vendor driver | |
245 | doesn't support hot unplug. | |
246 | ||
2a26ed8e MCC |
247 | Example:: |
248 | ||
8e1c5a40 KW |
249 | # echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove |
250 | ||
2a26ed8e | 251 | Mediated device Hot plug |
8e1c5a40 KW |
252 | ------------------------ |
253 | ||
254 | Mediated devices can be created and assigned at runtime. The procedure to hot | |
255 | plug a mediated device is the same as the procedure to hot plug a PCI device. | |
256 | ||
257 | Translation APIs for Mediated Devices | |
258 | ===================================== | |
259 | ||
260 | The following APIs are provided for translating user pfn to host pfn in a VFIO | |
2a26ed8e | 261 | driver:: |
8e1c5a40 | 262 | |
44abdd16 | 263 | int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova, |
34a255e6 | 264 | int npage, int prot, struct page **pages); |
8e1c5a40 | 265 | |
44abdd16 | 266 | void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, |
2a26ed8e | 267 | int npage); |
8e1c5a40 KW |
268 | |
269 | These functions call back into the back-end IOMMU module by using the pin_pages | |
270 | and unpin_pages callbacks of the struct vfio_iommu_driver_ops[4]. Currently | |
271 | these callbacks are supported in the TYPE1 IOMMU module. To enable them for | |
272 | other IOMMU backend modules, such as PPC64 sPAPR module, they need to provide | |
273 | these two callback functions. | |
274 | ||
9d1a546c KW |
275 | Using the Sample Code |
276 | ===================== | |
277 | ||
278 | mtty.c in samples/vfio-mdev/ directory is a sample driver program to | |
279 | demonstrate how to use the mediated device framework. | |
280 | ||
281 | The sample driver creates an mdev device that simulates a serial port over a PCI | |
282 | card. | |
283 | ||
284 | 1. Build and load the mtty.ko module. | |
285 | ||
286 | This step creates a dummy device, /sys/devices/virtual/mtty/mtty/ | |
287 | ||
2a26ed8e MCC |
288 | Files in this device directory in sysfs are similar to the following:: |
289 | ||
290 | # tree /sys/devices/virtual/mtty/mtty/ | |
291 | /sys/devices/virtual/mtty/mtty/ | |
292 | |-- mdev_supported_types | |
293 | | |-- mtty-1 | |
294 | | | |-- available_instances | |
295 | | | |-- create | |
296 | | | |-- device_api | |
297 | | | |-- devices | |
298 | | | `-- name | |
299 | | `-- mtty-2 | |
300 | | |-- available_instances | |
301 | | |-- create | |
302 | | |-- device_api | |
303 | | |-- devices | |
304 | | `-- name | |
305 | |-- mtty_dev | |
306 | | `-- sample_mtty_dev | |
307 | |-- power | |
308 | | |-- autosuspend_delay_ms | |
309 | | |-- control | |
310 | | |-- runtime_active_time | |
311 | | |-- runtime_status | |
312 | | `-- runtime_suspended_time | |
313 | |-- subsystem -> ../../../../class/mtty | |
314 | `-- uevent | |
9d1a546c KW |
315 | |
316 | 2. Create a mediated device by using the dummy device that you created in the | |
2a26ed8e | 317 | previous step:: |
9d1a546c | 318 | |
2a26ed8e | 319 | # echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001" > \ |
9d1a546c KW |
320 | /sys/devices/virtual/mtty/mtty/mdev_supported_types/mtty-2/create |
321 | ||
2a26ed8e | 322 | 3. Add parameters to qemu-kvm:: |
9d1a546c | 323 | |
2a26ed8e MCC |
324 | -device vfio-pci,\ |
325 | sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001 | |
9d1a546c KW |
326 | |
327 | 4. Boot the VM. | |
328 | ||
329 | In the Linux guest VM, with no hardware on the host, the device appears | |
2a26ed8e MCC |
330 | as follows:: |
331 | ||
332 | # lspci -s 00:05.0 -xxvv | |
333 | 00:05.0 Serial controller: Device 4348:3253 (rev 10) (prog-if 02 [16550]) | |
334 | Subsystem: Device 4348:3253 | |
335 | Physical Slot: 5 | |
336 | Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- | |
337 | Stepping- SERR- FastB2B- DisINTx- | |
338 | Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- | |
339 | <TAbort- <MAbort- >SERR- <PERR- INTx- | |
340 | Interrupt: pin A routed to IRQ 10 | |
341 | Region 0: I/O ports at c150 [size=8] | |
342 | Region 1: I/O ports at c158 [size=8] | |
343 | Kernel driver in use: serial | |
344 | 00: 48 43 53 32 01 00 00 02 10 02 00 07 00 00 00 00 | |
345 | 10: 51 c1 00 00 59 c1 00 00 00 00 00 00 00 00 00 00 | |
346 | 20: 00 00 00 00 00 00 00 00 00 00 00 00 48 43 53 32 | |
347 | 30: 00 00 00 00 00 00 00 00 00 00 00 00 0a 01 00 00 | |
348 | ||
349 | In the Linux guest VM, dmesg output for the device is as follows: | |
350 | ||
351 | serial 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ 10 | |
352 | 0000:00:05.0: ttyS1 at I/O 0xc150 (irq = 10) is a 16550A | |
353 | 0000:00:05.0: ttyS2 at I/O 0xc158 (irq = 10) is a 16550A | |
354 | ||
355 | ||
356 | 5. In the Linux guest VM, check the serial ports:: | |
357 | ||
358 | # setserial -g /dev/ttyS* | |
359 | /dev/ttyS0, UART: 16550A, Port: 0x03f8, IRQ: 4 | |
360 | /dev/ttyS1, UART: 16550A, Port: 0xc150, IRQ: 10 | |
361 | /dev/ttyS2, UART: 16550A, Port: 0xc158, IRQ: 10 | |
9d1a546c | 362 | |
ce8cd407 | 363 | 6. Using minicom or any terminal emulation program, open port /dev/ttyS1 or |
9d1a546c KW |
364 | /dev/ttyS2 with hardware flow control disabled. |
365 | ||
366 | 7. Type data on the minicom terminal or send data to the terminal emulation | |
367 | program and read the data. | |
368 | ||
369 | Data is loop backed from hosts mtty driver. | |
370 | ||
2a26ed8e | 371 | 8. Destroy the mediated device that you created:: |
9d1a546c | 372 | |
2a26ed8e | 373 | # echo 1 > /sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001/remove |
9d1a546c | 374 | |
8e1c5a40 | 375 | References |
9d1a546c | 376 | ========== |
8e1c5a40 | 377 | |
baa293e9 | 378 | 1. See Documentation/driver-api/vfio.rst for more information on VFIO. |
2a26ed8e MCC |
379 | 2. struct mdev_driver in include/linux/mdev.h |
380 | 3. struct mdev_parent_ops in include/linux/mdev.h | |
381 | 4. struct vfio_iommu_driver_ops in include/linux/vfio.h |