Commit | Line | Data |
---|---|---|
2a26ed8e MCC |
1 | .. include:: <isonum.txt> |
2 | ||
3 | ===================== | |
4 | VFIO Mediated devices | |
5 | ===================== | |
6 | ||
7 | :Copyright: |copy| 2016, NVIDIA CORPORATION. All rights reserved. | |
8 | :Author: Neo Jia <cjia@nvidia.com> | |
9 | :Author: Kirti Wankhede <kwankhede@nvidia.com> | |
10 | ||
11 | This program is free software; you can redistribute it and/or modify | |
12 | it under the terms of the GNU General Public License version 2 as | |
13 | published by the Free Software Foundation. | |
14 | ||
8e1c5a40 KW |
15 | |
16 | Virtual Function I/O (VFIO) Mediated devices[1] | |
17 | =============================================== | |
18 | ||
19 | The number of use cases for virtualizing DMA devices that do not have built-in | |
20 | SR_IOV capability is increasing. Previously, to virtualize such devices, | |
21 | developers had to create their own management interfaces and APIs, and then | |
22 | integrate them with user space software. To simplify integration with user space | |
23 | software, we have identified common requirements and a unified management | |
24 | interface for such devices. | |
25 | ||
26 | The VFIO driver framework provides unified APIs for direct device access. It is | |
27 | an IOMMU/device-agnostic framework for exposing direct device access to user | |
28 | space in a secure, IOMMU-protected environment. This framework is used for | |
29 | multiple devices, such as GPUs, network adapters, and compute accelerators. With | |
30 | direct device access, virtual machines or user space applications have direct | |
31 | access to the physical device. This framework is reused for mediated devices. | |
32 | ||
33 | The mediated core driver provides a common interface for mediated device | |
34 | management that can be used by drivers of different devices. This module | |
35 | provides a generic interface to perform these operations: | |
36 | ||
37 | * Create and destroy a mediated device | |
38 | * Add a mediated device to and remove it from a mediated bus driver | |
39 | * Add a mediated device to and remove it from an IOMMU group | |
40 | ||
41 | The mediated core driver also provides an interface to register a bus driver. | |
42 | For example, the mediated VFIO mdev driver is designed for mediated devices and | |
43 | supports VFIO APIs. The mediated bus driver adds a mediated device to and | |
44 | removes it from a VFIO group. | |
45 | ||
46 | The following high-level block diagram shows the main components and interfaces | |
47 | in the VFIO mediated driver framework. The diagram shows NVIDIA, Intel, and IBM | |
2a26ed8e | 48 | devices as examples, as these devices are the first devices to use this module:: |
8e1c5a40 KW |
49 | |
50 | +---------------+ | |
51 | | | | |
52 | | +-----------+ | mdev_register_driver() +--------------+ | |
53 | | | | +<------------------------+ | | |
54 | | | mdev | | | | | |
55 | | | bus | +------------------------>+ vfio_mdev.ko |<-> VFIO user | |
56 | | | driver | | probe()/remove() | | APIs | |
57 | | | | | +--------------+ | |
58 | | +-----------+ | | |
59 | | | | |
60 | | MDEV CORE | | |
61 | | MODULE | | |
62 | | mdev.ko | | |
63 | | +-----------+ | mdev_register_device() +--------------+ | |
64 | | | | +<------------------------+ | | |
65 | | | | | | nvidia.ko |<-> physical | |
66 | | | | +------------------------>+ | device | |
67 | | | | | callbacks +--------------+ | |
68 | | | Physical | | | |
69 | | | device | | mdev_register_device() +--------------+ | |
70 | | | interface | |<------------------------+ | | |
71 | | | | | | i915.ko |<-> physical | |
72 | | | | +------------------------>+ | device | |
73 | | | | | callbacks +--------------+ | |
74 | | | | | | |
75 | | | | | mdev_register_device() +--------------+ | |
76 | | | | +<------------------------+ | | |
77 | | | | | | ccw_device.ko|<-> physical | |
78 | | | | +------------------------>+ | device | |
79 | | | | | callbacks +--------------+ | |
80 | | +-----------+ | | |
81 | +---------------+ | |
82 | ||
83 | ||
84 | Registration Interfaces | |
85 | ======================= | |
86 | ||
87 | The mediated core driver provides the following types of registration | |
88 | interfaces: | |
89 | ||
90 | * Registration interface for a mediated bus driver | |
91 | * Physical device driver interface | |
92 | ||
93 | Registration Interface for a Mediated Bus Driver | |
94 | ------------------------------------------------ | |
95 | ||
96 | The registration interface for a mediated bus driver provides the following | |
2a26ed8e | 97 | structure to represent a mediated device's driver:: |
8e1c5a40 KW |
98 | |
99 | /* | |
100 | * struct mdev_driver [2] - Mediated device's driver | |
101 | * @name: driver name | |
102 | * @probe: called when new device created | |
103 | * @remove: called when device removed | |
104 | * @driver: device driver structure | |
105 | */ | |
106 | struct mdev_driver { | |
107 | const char *name; | |
108 | int (*probe) (struct device *dev); | |
109 | void (*remove) (struct device *dev); | |
110 | struct device_driver driver; | |
111 | }; | |
112 | ||
113 | A mediated bus driver for mdev should use this structure in the function calls | |
114 | to register and unregister itself with the core driver: | |
115 | ||
2a26ed8e | 116 | * Register:: |
8e1c5a40 | 117 | |
2a26ed8e | 118 | extern int mdev_register_driver(struct mdev_driver *drv, |
8e1c5a40 KW |
119 | struct module *owner); |
120 | ||
2a26ed8e | 121 | * Unregister:: |
8e1c5a40 | 122 | |
2a26ed8e | 123 | extern void mdev_unregister_driver(struct mdev_driver *drv); |
8e1c5a40 KW |
124 | |
125 | The mediated bus driver is responsible for adding mediated devices to the VFIO | |
126 | group when devices are bound to the driver and removing mediated devices from | |
127 | the VFIO when devices are unbound from the driver. | |
128 | ||
129 | ||
130 | Physical Device Driver Interface | |
131 | -------------------------------- | |
132 | ||
42930553 AW |
133 | The physical device driver interface provides the mdev_parent_ops[3] structure |
134 | to define the APIs to manage work in the mediated core driver that is related | |
135 | to the physical device. | |
8e1c5a40 | 136 | |
42930553 | 137 | The structures in the mdev_parent_ops structure are as follows: |
8e1c5a40 KW |
138 | |
139 | * dev_attr_groups: attributes of the parent device | |
140 | * mdev_attr_groups: attributes of the mediated device | |
141 | * supported_config: attributes to define supported configurations | |
142 | ||
42930553 | 143 | The functions in the mdev_parent_ops structure are as follows: |
8e1c5a40 KW |
144 | |
145 | * create: allocate basic resources in a driver for a mediated device | |
146 | * remove: free resources in a driver when a mediated device is destroyed | |
147 | ||
002fe996 AW |
148 | (Note that mdev-core provides no implicit serialization of create/remove |
149 | callbacks per mdev parent device, per mdev type, or any other categorization. | |
150 | Vendor drivers are expected to be fully asynchronous in this respect or | |
151 | provide their own internal resource protection.) | |
152 | ||
42930553 | 153 | The callbacks in the mdev_parent_ops structure are as follows: |
8e1c5a40 KW |
154 | |
155 | * open: open callback of mediated device | |
156 | * close: close callback of mediated device | |
157 | * ioctl: ioctl callback of mediated device | |
158 | * read : read emulation callback | |
159 | * write: write emulation callback | |
160 | * mmap: mmap emulation callback | |
161 | ||
42930553 | 162 | A driver should use the mdev_parent_ops structure in the function call to |
2a26ed8e | 163 | register itself with the mdev core driver:: |
8e1c5a40 | 164 | |
2a26ed8e MCC |
165 | extern int mdev_register_device(struct device *dev, |
166 | const struct mdev_parent_ops *ops); | |
8e1c5a40 | 167 | |
42930553 | 168 | However, the mdev_parent_ops structure is not required in the function call |
2a26ed8e | 169 | that a driver should use to unregister itself with the mdev core driver:: |
8e1c5a40 | 170 | |
2a26ed8e | 171 | extern void mdev_unregister_device(struct device *dev); |
8e1c5a40 KW |
172 | |
173 | ||
174 | Mediated Device Management Interface Through sysfs | |
175 | ================================================== | |
176 | ||
177 | The management interface through sysfs enables user space software, such as | |
178 | libvirt, to query and configure mediated devices in a hardware-agnostic fashion. | |
179 | This management interface provides flexibility to the underlying physical | |
180 | device's driver to support features such as: | |
181 | ||
182 | * Mediated device hot plug | |
183 | * Multiple mediated devices in a single virtual machine | |
184 | * Multiple mediated devices from different physical devices | |
185 | ||
186 | Links in the mdev_bus Class Directory | |
187 | ------------------------------------- | |
188 | The /sys/class/mdev_bus/ directory contains links to devices that are registered | |
189 | with the mdev core driver. | |
190 | ||
191 | Directories and files under the sysfs for Each Physical Device | |
192 | -------------------------------------------------------------- | |
193 | ||
2a26ed8e MCC |
194 | :: |
195 | ||
196 | |- [parent physical device] | |
197 | |--- Vendor-specific-attributes [optional] | |
198 | |--- [mdev_supported_types] | |
199 | | |--- [<type-id>] | |
200 | | | |--- create | |
201 | | | |--- name | |
202 | | | |--- available_instances | |
203 | | | |--- device_api | |
204 | | | |--- description | |
205 | | | |--- [devices] | |
206 | | |--- [<type-id>] | |
207 | | | |--- create | |
208 | | | |--- name | |
209 | | | |--- available_instances | |
210 | | | |--- device_api | |
211 | | | |--- description | |
212 | | | |--- [devices] | |
213 | | |--- [<type-id>] | |
214 | | |--- create | |
215 | | |--- name | |
216 | | |--- available_instances | |
217 | | |--- device_api | |
218 | | |--- description | |
219 | | |--- [devices] | |
8e1c5a40 KW |
220 | |
221 | * [mdev_supported_types] | |
222 | ||
223 | The list of currently supported mediated device types and their details. | |
224 | ||
225 | [<type-id>], device_api, and available_instances are mandatory attributes | |
226 | that should be provided by vendor driver. | |
227 | ||
228 | * [<type-id>] | |
229 | ||
1c4f128e SD |
230 | The [<type-id>] name is created by adding the device driver string as a prefix |
231 | to the string provided by the vendor driver. This format of this name is as | |
2a26ed8e | 232 | follows:: |
8e1c5a40 KW |
233 | |
234 | sprintf(buf, "%s-%s", dev_driver_string(parent->dev), group->name); | |
235 | ||
9372e6fe | 236 | (or using mdev_parent_dev(mdev) to arrive at the parent device outside |
2a26ed8e | 237 | of the core mdev code) |
9372e6fe | 238 | |
8e1c5a40 KW |
239 | * device_api |
240 | ||
241 | This attribute should show which device API is being created, for example, | |
242 | "vfio-pci" for a PCI device. | |
243 | ||
244 | * available_instances | |
245 | ||
246 | This attribute should show the number of devices of type <type-id> that can be | |
247 | created. | |
248 | ||
249 | * [device] | |
250 | ||
251 | This directory contains links to the devices of type <type-id> that have been | |
2a26ed8e | 252 | created. |
8e1c5a40 KW |
253 | |
254 | * name | |
255 | ||
256 | This attribute should show human readable name. This is optional attribute. | |
257 | ||
258 | * description | |
259 | ||
260 | This attribute should show brief features/description of the type. This is | |
261 | optional attribute. | |
262 | ||
263 | Directories and Files Under the sysfs for Each mdev Device | |
264 | ---------------------------------------------------------- | |
265 | ||
2a26ed8e MCC |
266 | :: |
267 | ||
268 | |- [parent phy device] | |
269 | |--- [$MDEV_UUID] | |
8e1c5a40 KW |
270 | |--- remove |
271 | |--- mdev_type {link to its type} | |
272 | |--- vendor-specific-attributes [optional] | |
273 | ||
274 | * remove (write only) | |
2a26ed8e | 275 | |
8e1c5a40 KW |
276 | Writing '1' to the 'remove' file destroys the mdev device. The vendor driver can |
277 | fail the remove() callback if that device is active and the vendor driver | |
278 | doesn't support hot unplug. | |
279 | ||
2a26ed8e MCC |
280 | Example:: |
281 | ||
8e1c5a40 KW |
282 | # echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove |
283 | ||
2a26ed8e | 284 | Mediated device Hot plug |
8e1c5a40 KW |
285 | ------------------------ |
286 | ||
287 | Mediated devices can be created and assigned at runtime. The procedure to hot | |
288 | plug a mediated device is the same as the procedure to hot plug a PCI device. | |
289 | ||
290 | Translation APIs for Mediated Devices | |
291 | ===================================== | |
292 | ||
293 | The following APIs are provided for translating user pfn to host pfn in a VFIO | |
2a26ed8e | 294 | driver:: |
8e1c5a40 | 295 | |
2a26ed8e MCC |
296 | extern int vfio_pin_pages(struct device *dev, unsigned long *user_pfn, |
297 | int npage, int prot, unsigned long *phys_pfn); | |
8e1c5a40 | 298 | |
2a26ed8e MCC |
299 | extern int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn, |
300 | int npage); | |
8e1c5a40 KW |
301 | |
302 | These functions call back into the back-end IOMMU module by using the pin_pages | |
303 | and unpin_pages callbacks of the struct vfio_iommu_driver_ops[4]. Currently | |
304 | these callbacks are supported in the TYPE1 IOMMU module. To enable them for | |
305 | other IOMMU backend modules, such as PPC64 sPAPR module, they need to provide | |
306 | these two callback functions. | |
307 | ||
9d1a546c KW |
308 | Using the Sample Code |
309 | ===================== | |
310 | ||
311 | mtty.c in samples/vfio-mdev/ directory is a sample driver program to | |
312 | demonstrate how to use the mediated device framework. | |
313 | ||
314 | The sample driver creates an mdev device that simulates a serial port over a PCI | |
315 | card. | |
316 | ||
317 | 1. Build and load the mtty.ko module. | |
318 | ||
319 | This step creates a dummy device, /sys/devices/virtual/mtty/mtty/ | |
320 | ||
2a26ed8e MCC |
321 | Files in this device directory in sysfs are similar to the following:: |
322 | ||
323 | # tree /sys/devices/virtual/mtty/mtty/ | |
324 | /sys/devices/virtual/mtty/mtty/ | |
325 | |-- mdev_supported_types | |
326 | | |-- mtty-1 | |
327 | | | |-- available_instances | |
328 | | | |-- create | |
329 | | | |-- device_api | |
330 | | | |-- devices | |
331 | | | `-- name | |
332 | | `-- mtty-2 | |
333 | | |-- available_instances | |
334 | | |-- create | |
335 | | |-- device_api | |
336 | | |-- devices | |
337 | | `-- name | |
338 | |-- mtty_dev | |
339 | | `-- sample_mtty_dev | |
340 | |-- power | |
341 | | |-- autosuspend_delay_ms | |
342 | | |-- control | |
343 | | |-- runtime_active_time | |
344 | | |-- runtime_status | |
345 | | `-- runtime_suspended_time | |
346 | |-- subsystem -> ../../../../class/mtty | |
347 | `-- uevent | |
9d1a546c KW |
348 | |
349 | 2. Create a mediated device by using the dummy device that you created in the | |
2a26ed8e | 350 | previous step:: |
9d1a546c | 351 | |
2a26ed8e | 352 | # echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001" > \ |
9d1a546c KW |
353 | /sys/devices/virtual/mtty/mtty/mdev_supported_types/mtty-2/create |
354 | ||
2a26ed8e | 355 | 3. Add parameters to qemu-kvm:: |
9d1a546c | 356 | |
2a26ed8e MCC |
357 | -device vfio-pci,\ |
358 | sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001 | |
9d1a546c KW |
359 | |
360 | 4. Boot the VM. | |
361 | ||
362 | In the Linux guest VM, with no hardware on the host, the device appears | |
2a26ed8e MCC |
363 | as follows:: |
364 | ||
365 | # lspci -s 00:05.0 -xxvv | |
366 | 00:05.0 Serial controller: Device 4348:3253 (rev 10) (prog-if 02 [16550]) | |
367 | Subsystem: Device 4348:3253 | |
368 | Physical Slot: 5 | |
369 | Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- | |
370 | Stepping- SERR- FastB2B- DisINTx- | |
371 | Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- | |
372 | <TAbort- <MAbort- >SERR- <PERR- INTx- | |
373 | Interrupt: pin A routed to IRQ 10 | |
374 | Region 0: I/O ports at c150 [size=8] | |
375 | Region 1: I/O ports at c158 [size=8] | |
376 | Kernel driver in use: serial | |
377 | 00: 48 43 53 32 01 00 00 02 10 02 00 07 00 00 00 00 | |
378 | 10: 51 c1 00 00 59 c1 00 00 00 00 00 00 00 00 00 00 | |
379 | 20: 00 00 00 00 00 00 00 00 00 00 00 00 48 43 53 32 | |
380 | 30: 00 00 00 00 00 00 00 00 00 00 00 00 0a 01 00 00 | |
381 | ||
382 | In the Linux guest VM, dmesg output for the device is as follows: | |
383 | ||
384 | serial 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ 10 | |
385 | 0000:00:05.0: ttyS1 at I/O 0xc150 (irq = 10) is a 16550A | |
386 | 0000:00:05.0: ttyS2 at I/O 0xc158 (irq = 10) is a 16550A | |
387 | ||
388 | ||
389 | 5. In the Linux guest VM, check the serial ports:: | |
390 | ||
391 | # setserial -g /dev/ttyS* | |
392 | /dev/ttyS0, UART: 16550A, Port: 0x03f8, IRQ: 4 | |
393 | /dev/ttyS1, UART: 16550A, Port: 0xc150, IRQ: 10 | |
394 | /dev/ttyS2, UART: 16550A, Port: 0xc158, IRQ: 10 | |
9d1a546c | 395 | |
ce8cd407 | 396 | 6. Using minicom or any terminal emulation program, open port /dev/ttyS1 or |
9d1a546c KW |
397 | /dev/ttyS2 with hardware flow control disabled. |
398 | ||
399 | 7. Type data on the minicom terminal or send data to the terminal emulation | |
400 | program and read the data. | |
401 | ||
402 | Data is loop backed from hosts mtty driver. | |
403 | ||
2a26ed8e | 404 | 8. Destroy the mediated device that you created:: |
9d1a546c | 405 | |
2a26ed8e | 406 | # echo 1 > /sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001/remove |
9d1a546c | 407 | |
8e1c5a40 | 408 | References |
9d1a546c | 409 | ========== |
8e1c5a40 | 410 | |
2a26ed8e MCC |
411 | 1. See Documentation/vfio.txt for more information on VFIO. |
412 | 2. struct mdev_driver in include/linux/mdev.h | |
413 | 3. struct mdev_parent_ops in include/linux/mdev.h | |
414 | 4. struct vfio_iommu_driver_ops in include/linux/vfio.h |