Commit | Line | Data |
---|---|---|
8b4a503d MCC |
1 | =============================== |
2 | Adjunct Processor (AP) facility | |
3 | =============================== | |
4 | ||
5 | ||
6 | Introduction | |
492a6be1 TK |
7 | ============ |
8 | The Adjunct Processor (AP) facility is an IBM Z cryptographic facility comprised | |
9 | of three AP instructions and from 1 up to 256 PCIe cryptographic adapter cards. | |
10 | The AP devices provide cryptographic functions to all CPUs assigned to a | |
11 | linux system running in an IBM Z system LPAR. | |
12 | ||
13 | The AP adapter cards are exposed via the AP bus. The motivation for vfio-ap | |
14 | is to make AP cards available to KVM guests using the VFIO mediated device | |
15 | framework. This implementation relies considerably on the s390 virtualization | |
16 | facilities which do most of the hard work of providing direct access to AP | |
17 | devices. | |
18 | ||
8b4a503d | 19 | AP Architectural Overview |
492a6be1 TK |
20 | ========================= |
21 | To facilitate the comprehension of the design, let's start with some | |
22 | definitions: | |
23 | ||
24 | * AP adapter | |
25 | ||
26 | An AP adapter is an IBM Z adapter card that can perform cryptographic | |
27 | functions. There can be from 0 to 256 adapters assigned to an LPAR. Adapters | |
28 | assigned to the LPAR in which a linux host is running will be available to | |
29 | the linux host. Each adapter is identified by a number from 0 to 255; however, | |
30 | the maximum adapter number is determined by machine model and/or adapter type. | |
31 | When installed, an AP adapter is accessed by AP instructions executed by any | |
32 | CPU. | |
33 | ||
34 | The AP adapter cards are assigned to a given LPAR via the system's Activation | |
35 | Profile which can be edited via the HMC. When the linux host system is IPL'd | |
36 | in the LPAR, the AP bus detects the AP adapter cards assigned to the LPAR and | |
37 | creates a sysfs device for each assigned adapter. For example, if AP adapters | |
38 | 4 and 10 (0x0a) are assigned to the LPAR, the AP bus will create the following | |
8b4a503d | 39 | sysfs device entries:: |
492a6be1 TK |
40 | |
41 | /sys/devices/ap/card04 | |
42 | /sys/devices/ap/card0a | |
43 | ||
44 | Symbolic links to these devices will also be created in the AP bus devices | |
8b4a503d | 45 | sub-directory:: |
492a6be1 TK |
46 | |
47 | /sys/bus/ap/devices/[card04] | |
48 | /sys/bus/ap/devices/[card04] | |
49 | ||
50 | * AP domain | |
51 | ||
52 | An adapter is partitioned into domains. An adapter can hold up to 256 domains | |
53 | depending upon the adapter type and hardware configuration. A domain is | |
54 | identified by a number from 0 to 255; however, the maximum domain number is | |
55 | determined by machine model and/or adapter type.. A domain can be thought of | |
56 | as a set of hardware registers and memory used for processing AP commands. A | |
57 | domain can be configured with a secure private key used for clear key | |
58 | encryption. A domain is classified in one of two ways depending upon how it | |
59 | may be accessed: | |
60 | ||
61 | * Usage domains are domains that are targeted by an AP instruction to | |
62 | process an AP command. | |
63 | ||
64 | * Control domains are domains that are changed by an AP command sent to a | |
65 | usage domain; for example, to set the secure private key for the control | |
66 | domain. | |
67 | ||
68 | The AP usage and control domains are assigned to a given LPAR via the system's | |
69 | Activation Profile which can be edited via the HMC. When a linux host system | |
70 | is IPL'd in the LPAR, the AP bus module detects the AP usage and control | |
71 | domains assigned to the LPAR. The domain number of each usage domain and | |
72 | adapter number of each AP adapter are combined to create AP queue devices | |
73 | (see AP Queue section below). The domain number of each control domain will be | |
74 | represented in a bitmask and stored in a sysfs file | |
75 | /sys/bus/ap/ap_control_domain_mask. The bits in the mask, from most to least | |
76 | significant bit, correspond to domains 0-255. | |
77 | ||
78 | * AP Queue | |
79 | ||
80 | An AP queue is the means by which an AP command is sent to a usage domain | |
81 | inside a specific adapter. An AP queue is identified by a tuple | |
82 | comprised of an AP adapter ID (APID) and an AP queue index (APQI). The | |
83 | APQI corresponds to a given usage domain number within the adapter. This tuple | |
84 | forms an AP Queue Number (APQN) uniquely identifying an AP queue. AP | |
85 | instructions include a field containing the APQN to identify the AP queue to | |
86 | which the AP command is to be sent for processing. | |
87 | ||
88 | The AP bus will create a sysfs device for each APQN that can be derived from | |
89 | the cross product of the AP adapter and usage domain numbers detected when the | |
90 | AP bus module is loaded. For example, if adapters 4 and 10 (0x0a) and usage | |
91 | domains 6 and 71 (0x47) are assigned to the LPAR, the AP bus will create the | |
8b4a503d | 92 | following sysfs entries:: |
492a6be1 TK |
93 | |
94 | /sys/devices/ap/card04/04.0006 | |
95 | /sys/devices/ap/card04/04.0047 | |
96 | /sys/devices/ap/card0a/0a.0006 | |
97 | /sys/devices/ap/card0a/0a.0047 | |
98 | ||
99 | The following symbolic links to these devices will be created in the AP bus | |
8b4a503d | 100 | devices subdirectory:: |
492a6be1 TK |
101 | |
102 | /sys/bus/ap/devices/[04.0006] | |
103 | /sys/bus/ap/devices/[04.0047] | |
104 | /sys/bus/ap/devices/[0a.0006] | |
105 | /sys/bus/ap/devices/[0a.0047] | |
106 | ||
107 | * AP Instructions: | |
108 | ||
109 | There are three AP instructions: | |
110 | ||
111 | * NQAP: to enqueue an AP command-request message to a queue | |
112 | * DQAP: to dequeue an AP command-reply message from a queue | |
113 | * PQAP: to administer the queues | |
114 | ||
115 | AP instructions identify the domain that is targeted to process the AP | |
116 | command; this must be one of the usage domains. An AP command may modify a | |
117 | domain that is not one of the usage domains, but the modified domain | |
118 | must be one of the control domains. | |
119 | ||
8b4a503d | 120 | AP and SIE |
492a6be1 TK |
121 | ========== |
122 | Let's now take a look at how AP instructions executed on a guest are interpreted | |
123 | by the hardware. | |
124 | ||
125 | A satellite control block called the Crypto Control Block (CRYCB) is attached to | |
126 | our main hardware virtualization control block. The CRYCB contains three fields | |
127 | to identify the adapters, usage domains and control domains assigned to the KVM | |
128 | guest: | |
129 | ||
130 | * The AP Mask (APM) field is a bit mask that identifies the AP adapters assigned | |
131 | to the KVM guest. Each bit in the mask, from left to right (i.e. from most | |
132 | significant to least significant bit in big endian order), corresponds to | |
133 | an APID from 0-255. If a bit is set, the corresponding adapter is valid for | |
134 | use by the KVM guest. | |
135 | ||
136 | * The AP Queue Mask (AQM) field is a bit mask identifying the AP usage domains | |
137 | assigned to the KVM guest. Each bit in the mask, from left to right (i.e. from | |
138 | most significant to least significant bit in big endian order), corresponds to | |
139 | an AP queue index (APQI) from 0-255. If a bit is set, the corresponding queue | |
140 | is valid for use by the KVM guest. | |
141 | ||
142 | * The AP Domain Mask field is a bit mask that identifies the AP control domains | |
143 | assigned to the KVM guest. The ADM bit mask controls which domains can be | |
144 | changed by an AP command-request message sent to a usage domain from the | |
145 | guest. Each bit in the mask, from left to right (i.e. from most significant to | |
146 | least significant bit in big endian order), corresponds to a domain from | |
147 | 0-255. If a bit is set, the corresponding domain can be modified by an AP | |
148 | command-request message sent to a usage domain. | |
149 | ||
150 | If you recall from the description of an AP Queue, AP instructions include | |
151 | an APQN to identify the AP queue to which an AP command-request message is to be | |
152 | sent (NQAP and PQAP instructions), or from which a command-reply message is to | |
153 | be received (DQAP instruction). The validity of an APQN is defined by the matrix | |
154 | calculated from the APM and AQM; it is the cross product of all assigned adapter | |
155 | numbers (APM) with all assigned queue indexes (AQM). For example, if adapters 1 | |
156 | and 2 and usage domains 5 and 6 are assigned to a guest, the APQNs (1,5), (1,6), | |
157 | (2,5) and (2,6) will be valid for the guest. | |
158 | ||
159 | The APQNs can provide secure key functionality - i.e., a private key is stored | |
160 | on the adapter card for each of its domains - so each APQN must be assigned to | |
8b4a503d | 161 | at most one guest or to the linux host:: |
492a6be1 TK |
162 | |
163 | Example 1: Valid configuration: | |
164 | ------------------------------ | |
165 | Guest1: adapters 1,2 domains 5,6 | |
166 | Guest2: adapter 1,2 domain 7 | |
167 | ||
168 | This is valid because both guests have a unique set of APQNs: | |
169 | Guest1 has APQNs (1,5), (1,6), (2,5), (2,6); | |
170 | Guest2 has APQNs (1,7), (2,7) | |
171 | ||
172 | Example 2: Valid configuration: | |
173 | ------------------------------ | |
174 | Guest1: adapters 1,2 domains 5,6 | |
175 | Guest2: adapters 3,4 domains 5,6 | |
176 | ||
177 | This is also valid because both guests have a unique set of APQNs: | |
178 | Guest1 has APQNs (1,5), (1,6), (2,5), (2,6); | |
179 | Guest2 has APQNs (3,5), (3,6), (4,5), (4,6) | |
180 | ||
181 | Example 3: Invalid configuration: | |
182 | -------------------------------- | |
183 | Guest1: adapters 1,2 domains 5,6 | |
184 | Guest2: adapter 1 domains 6,7 | |
185 | ||
186 | This is an invalid configuration because both guests have access to | |
187 | APQN (1,6). | |
188 | ||
8b4a503d MCC |
189 | The Design |
190 | ========== | |
492a6be1 TK |
191 | The design introduces three new objects: |
192 | ||
193 | 1. AP matrix device | |
194 | 2. VFIO AP device driver (vfio_ap.ko) | |
195 | 3. VFIO AP mediated matrix pass-through device | |
196 | ||
197 | The VFIO AP device driver | |
198 | ------------------------- | |
199 | The VFIO AP (vfio_ap) device driver serves the following purposes: | |
200 | ||
201 | 1. Provides the interfaces to secure APQNs for exclusive use of KVM guests. | |
202 | ||
203 | 2. Sets up the VFIO mediated device interfaces to manage a mediated matrix | |
204 | device and creates the sysfs interfaces for assigning adapters, usage | |
205 | domains, and control domains comprising the matrix for a KVM guest. | |
206 | ||
207 | 3. Configures the APM, AQM and ADM in the CRYCB referenced by a KVM guest's | |
208 | SIE state description to grant the guest access to a matrix of AP devices | |
209 | ||
210 | Reserve APQNs for exclusive use of KVM guests | |
211 | --------------------------------------------- | |
212 | The following block diagram illustrates the mechanism by which APQNs are | |
8b4a503d MCC |
213 | reserved:: |
214 | ||
215 | +------------------+ | |
216 | 7 remove | | | |
217 | +--------------------> cex4queue driver | | |
218 | | | | | |
219 | | +------------------+ | |
220 | | | |
221 | | | |
222 | | +------------------+ +----------------+ | |
223 | | 5 register driver | | 3 create | | | |
224 | | +----------------> Device core +----------> matrix device | | |
225 | | | | | | | | |
226 | | | +--------^---------+ +----------------+ | |
227 | | | | | |
228 | | | +-------------------+ | |
229 | | | +-----------------------------------+ | | |
230 | | | | 4 register AP driver | | 2 register device | |
231 | | | | | | | |
232 | +--------+---+-v---+ +--------+-------+-+ | |
233 | | | | | | |
234 | | ap_bus +--------------------- > vfio_ap driver | | |
235 | | | 8 probe | | | |
236 | +--------^---------+ +--^--^------------+ | |
237 | 6 edit | | | | |
238 | apmask | +-----------------------------+ | 9 mdev create | |
239 | aqmask | | 1 modprobe | | |
240 | +--------+-----+---+ +----------------+-+ +----------------+ | |
241 | | | | |8 create | mediated | | |
242 | | admin | | VFIO device core |---------> matrix | | |
243 | | + | | | device | | |
244 | +------+-+---------+ +--------^---------+ +--------^-------+ | |
245 | | | | | | |
246 | | | 9 create vfio_ap-passthrough | | | |
247 | | +------------------------------+ | | |
248 | +-------------------------------------------------------------+ | |
249 | 10 assign adapter/domain/control domain | |
492a6be1 TK |
250 | |
251 | The process for reserving an AP queue for use by a KVM guest is: | |
252 | ||
253 | 1. The administrator loads the vfio_ap device driver | |
254 | 2. The vfio-ap driver during its initialization will register a single 'matrix' | |
255 | device with the device core. This will serve as the parent device for | |
256 | all mediated matrix devices used to configure an AP matrix for a guest. | |
257 | 3. The /sys/devices/vfio_ap/matrix device is created by the device core | |
8b4a503d | 258 | 4. The vfio_ap device driver will register with the AP bus for AP queue devices |
492a6be1 TK |
259 | of type 10 and higher (CEX4 and newer). The driver will provide the vfio_ap |
260 | driver's probe and remove callback interfaces. Devices older than CEX4 queues | |
261 | are not supported to simplify the implementation by not needlessly | |
262 | complicating the design by supporting older devices that will go out of | |
263 | service in the relatively near future, and for which there are few older | |
264 | systems around on which to test. | |
265 | 5. The AP bus registers the vfio_ap device driver with the device core | |
266 | 6. The administrator edits the AP adapter and queue masks to reserve AP queues | |
267 | for use by the vfio_ap device driver. | |
268 | 7. The AP bus removes the AP queues reserved for the vfio_ap driver from the | |
269 | default zcrypt cex4queue driver. | |
270 | 8. The AP bus probes the vfio_ap device driver to bind the queues reserved for | |
271 | it. | |
272 | 9. The administrator creates a passthrough type mediated matrix device to be | |
273 | used by a guest | |
8b4a503d MCC |
274 | 10. The administrator assigns the adapters, usage domains and control domains |
275 | to be exclusively used by a guest. | |
492a6be1 TK |
276 | |
277 | Set up the VFIO mediated device interfaces | |
278 | ------------------------------------------ | |
279 | The VFIO AP device driver utilizes the common interface of the VFIO mediated | |
280 | device core driver to: | |
8b4a503d | 281 | |
492a6be1 TK |
282 | * Register an AP mediated bus driver to add a mediated matrix device to and |
283 | remove it from a VFIO group. | |
284 | * Create and destroy a mediated matrix device | |
285 | * Add a mediated matrix device to and remove it from the AP mediated bus driver | |
286 | * Add a mediated matrix device to and remove it from an IOMMU group | |
287 | ||
288 | The following high-level block diagram shows the main components and interfaces | |
8b4a503d MCC |
289 | of the VFIO AP mediated matrix device driver:: |
290 | ||
291 | +-------------+ | |
292 | | | | |
293 | | +---------+ | mdev_register_driver() +--------------+ | |
294 | | | Mdev | +<-----------------------+ | | |
295 | | | bus | | | vfio_mdev.ko | | |
296 | | | driver | +----------------------->+ |<-> VFIO user | |
297 | | +---------+ | probe()/remove() +--------------+ APIs | |
298 | | | | |
299 | | MDEV CORE | | |
300 | | MODULE | | |
301 | | mdev.ko | | |
302 | | +---------+ | mdev_register_device() +--------------+ | |
303 | | |Physical | +<-----------------------+ | | |
304 | | | device | | | vfio_ap.ko |<-> matrix | |
305 | | |interface| +----------------------->+ | device | |
306 | | +---------+ | callback +--------------+ | |
307 | +-------------+ | |
492a6be1 TK |
308 | |
309 | During initialization of the vfio_ap module, the matrix device is registered | |
310 | with an 'mdev_parent_ops' structure that provides the sysfs attribute | |
311 | structures, mdev functions and callback interfaces for managing the mediated | |
312 | matrix device. | |
313 | ||
314 | * sysfs attribute structures: | |
8b4a503d MCC |
315 | |
316 | supported_type_groups | |
492a6be1 TK |
317 | The VFIO mediated device framework supports creation of user-defined |
318 | mediated device types. These mediated device types are specified | |
319 | via the 'supported_type_groups' structure when a device is registered | |
320 | with the mediated device framework. The registration process creates the | |
321 | sysfs structures for each mediated device type specified in the | |
322 | 'mdev_supported_types' sub-directory of the device being registered. Along | |
323 | with the device type, the sysfs attributes of the mediated device type are | |
324 | provided. | |
325 | ||
326 | The VFIO AP device driver will register one mediated device type for | |
327 | passthrough devices: | |
8b4a503d | 328 | |
492a6be1 | 329 | /sys/devices/vfio_ap/matrix/mdev_supported_types/vfio_ap-passthrough |
8b4a503d | 330 | |
492a6be1 | 331 | Only the read-only attributes required by the VFIO mdev framework will |
8b4a503d MCC |
332 | be provided:: |
333 | ||
334 | ... name | |
335 | ... device_api | |
336 | ... available_instances | |
337 | ... device_api | |
338 | ||
339 | Where: | |
340 | ||
341 | * name: | |
342 | specifies the name of the mediated device type | |
343 | * device_api: | |
344 | the mediated device type's API | |
345 | * available_instances: | |
346 | the number of mediated matrix passthrough devices | |
347 | that can be created | |
348 | * device_api: | |
349 | specifies the VFIO API | |
350 | mdev_attr_groups | |
492a6be1 TK |
351 | This attribute group identifies the user-defined sysfs attributes of the |
352 | mediated device. When a device is registered with the VFIO mediated device | |
353 | framework, the sysfs attribute files identified in the 'mdev_attr_groups' | |
354 | structure will be created in the mediated matrix device's directory. The | |
355 | sysfs attributes for a mediated matrix device are: | |
8b4a503d MCC |
356 | |
357 | assign_adapter / unassign_adapter: | |
492a6be1 TK |
358 | Write-only attributes for assigning/unassigning an AP adapter to/from the |
359 | mediated matrix device. To assign/unassign an adapter, the APID of the | |
360 | adapter is echoed to the respective attribute file. | |
8b4a503d | 361 | assign_domain / unassign_domain: |
492a6be1 TK |
362 | Write-only attributes for assigning/unassigning an AP usage domain to/from |
363 | the mediated matrix device. To assign/unassign a domain, the domain | |
364 | number of the the usage domain is echoed to the respective attribute | |
365 | file. | |
8b4a503d | 366 | matrix: |
492a6be1 TK |
367 | A read-only file for displaying the APQNs derived from the cross product |
368 | of the adapter and domain numbers assigned to the mediated matrix device. | |
8b4a503d | 369 | assign_control_domain / unassign_control_domain: |
492a6be1 TK |
370 | Write-only attributes for assigning/unassigning an AP control domain |
371 | to/from the mediated matrix device. To assign/unassign a control domain, | |
372 | the ID of the domain to be assigned/unassigned is echoed to the respective | |
373 | attribute file. | |
8b4a503d | 374 | control_domains: |
492a6be1 TK |
375 | A read-only file for displaying the control domain numbers assigned to the |
376 | mediated matrix device. | |
377 | ||
378 | * functions: | |
8b4a503d MCC |
379 | |
380 | create: | |
492a6be1 | 381 | allocates the ap_matrix_mdev structure used by the vfio_ap driver to: |
8b4a503d | 382 | |
492a6be1 TK |
383 | * Store the reference to the KVM structure for the guest using the mdev |
384 | * Store the AP matrix configuration for the adapters, domains, and control | |
385 | domains assigned via the corresponding sysfs attributes files | |
8b4a503d MCC |
386 | |
387 | remove: | |
492a6be1 TK |
388 | deallocates the mediated matrix device's ap_matrix_mdev structure. This will |
389 | be allowed only if a running guest is not using the mdev. | |
390 | ||
391 | * callback interfaces | |
8b4a503d MCC |
392 | |
393 | open: | |
492a6be1 TK |
394 | The vfio_ap driver uses this callback to register a |
395 | VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the mdev matrix | |
396 | device. The open is invoked when QEMU connects the VFIO iommu group | |
397 | for the mdev matrix device to the MDEV bus. Access to the KVM structure used | |
398 | to configure the KVM guest is provided via this callback. The KVM structure, | |
399 | is used to configure the guest's access to the AP matrix defined via the | |
400 | mediated matrix device's sysfs attribute files. | |
8b4a503d | 401 | release: |
492a6be1 TK |
402 | unregisters the VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the |
403 | mdev matrix device and deconfigures the guest's AP matrix. | |
404 | ||
8b4a503d | 405 | Configure the APM, AQM and ADM in the CRYCB |
492a6be1 TK |
406 | ------------------------------------------- |
407 | Configuring the AP matrix for a KVM guest will be performed when the | |
408 | VFIO_GROUP_NOTIFY_SET_KVM notifier callback is invoked. The notifier | |
409 | function is called when QEMU connects to KVM. The guest's AP matrix is | |
410 | configured via it's CRYCB by: | |
8b4a503d | 411 | |
492a6be1 TK |
412 | * Setting the bits in the APM corresponding to the APIDs assigned to the |
413 | mediated matrix device via its 'assign_adapter' interface. | |
414 | * Setting the bits in the AQM corresponding to the domains assigned to the | |
415 | mediated matrix device via its 'assign_domain' interface. | |
416 | * Setting the bits in the ADM corresponding to the domain dIDs assigned to the | |
417 | mediated matrix device via its 'assign_control_domains' interface. | |
418 | ||
419 | The CPU model features for AP | |
420 | ----------------------------- | |
421 | The AP stack relies on the presence of the AP instructions as well as two | |
422 | facilities: The AP Facilities Test (APFT) facility; and the AP Query | |
423 | Configuration Information (QCI) facility. These features/facilities are made | |
424 | available to a KVM guest via the following CPU model features: | |
425 | ||
426 | 1. ap: Indicates whether the AP instructions are installed on the guest. This | |
427 | feature will be enabled by KVM only if the AP instructions are installed | |
428 | on the host. | |
429 | ||
430 | 2. apft: Indicates the APFT facility is available on the guest. This facility | |
431 | can be made available to the guest only if it is available on the host (i.e., | |
432 | facility bit 15 is set). | |
433 | ||
434 | 3. apqci: Indicates the AP QCI facility is available on the guest. This facility | |
435 | can be made available to the guest only if it is available on the host (i.e., | |
436 | facility bit 12 is set). | |
437 | ||
438 | Note: If the user chooses to specify a CPU model different than the 'host' | |
439 | model to QEMU, the CPU model features and facilities need to be turned on | |
8b4a503d | 440 | explicitly; for example:: |
492a6be1 TK |
441 | |
442 | /usr/bin/qemu-system-s390x ... -cpu z13,ap=on,apqci=on,apft=on | |
443 | ||
444 | A guest can be precluded from using AP features/facilities by turning them off | |
8b4a503d | 445 | explicitly; for example:: |
492a6be1 TK |
446 | |
447 | /usr/bin/qemu-system-s390x ... -cpu host,ap=off,apqci=off,apft=off | |
448 | ||
449 | Note: If the APFT facility is turned off (apft=off) for the guest, the guest | |
450 | will not see any AP devices. The zcrypt device drivers that register for type 10 | |
451 | and newer AP devices - i.e., the cex4card and cex4queue device drivers - need | |
452 | the APFT facility to ascertain the facilities installed on a given AP device. If | |
453 | the APFT facility is not installed on the guest, then the probe of device | |
454 | drivers will fail since only type 10 and newer devices can be configured for | |
455 | guest use. | |
456 | ||
8b4a503d | 457 | Example |
492a6be1 TK |
458 | ======= |
459 | Let's now provide an example to illustrate how KVM guests may be given | |
460 | access to AP facilities. For this example, we will show how to configure | |
461 | three guests such that executing the lszcrypt command on the guests would | |
462 | look like this: | |
463 | ||
464 | Guest1 | |
465 | ------ | |
8b4a503d | 466 | =========== ===== ============ |
492a6be1 | 467 | CARD.DOMAIN TYPE MODE |
8b4a503d | 468 | =========== ===== ============ |
492a6be1 TK |
469 | 05 CEX5C CCA-Coproc |
470 | 05.0004 CEX5C CCA-Coproc | |
471 | 05.00ab CEX5C CCA-Coproc | |
472 | 06 CEX5A Accelerator | |
473 | 06.0004 CEX5A Accelerator | |
474 | 06.00ab CEX5C CCA-Coproc | |
8b4a503d | 475 | =========== ===== ============ |
492a6be1 TK |
476 | |
477 | Guest2 | |
478 | ------ | |
8b4a503d | 479 | =========== ===== ============ |
492a6be1 | 480 | CARD.DOMAIN TYPE MODE |
8b4a503d | 481 | =========== ===== ============ |
492a6be1 TK |
482 | 05 CEX5A Accelerator |
483 | 05.0047 CEX5A Accelerator | |
484 | 05.00ff CEX5A Accelerator | |
8b4a503d | 485 | =========== ===== ============ |
492a6be1 TK |
486 | |
487 | Guest2 | |
488 | ------ | |
8b4a503d | 489 | =========== ===== ============ |
492a6be1 | 490 | CARD.DOMAIN TYPE MODE |
8b4a503d | 491 | =========== ===== ============ |
492a6be1 TK |
492 | 06 CEX5A Accelerator |
493 | 06.0047 CEX5A Accelerator | |
494 | 06.00ff CEX5A Accelerator | |
8b4a503d | 495 | =========== ===== ============ |
492a6be1 TK |
496 | |
497 | These are the steps: | |
498 | ||
499 | 1. Install the vfio_ap module on the linux host. The dependency chain for the | |
500 | vfio_ap module is: | |
501 | * iommu | |
502 | * s390 | |
503 | * zcrypt | |
504 | * vfio | |
505 | * vfio_mdev | |
506 | * vfio_mdev_device | |
507 | * KVM | |
508 | ||
509 | To build the vfio_ap module, the kernel build must be configured with the | |
510 | following Kconfig elements selected: | |
511 | * IOMMU_SUPPORT | |
512 | * S390 | |
513 | * ZCRYPT | |
514 | * S390_AP_IOMMU | |
515 | * VFIO | |
516 | * VFIO_MDEV | |
517 | * VFIO_MDEV_DEVICE | |
518 | * KVM | |
519 | ||
8b4a503d MCC |
520 | If using make menuconfig select the following to build the vfio_ap module:: |
521 | ||
522 | -> Device Drivers | |
523 | -> IOMMU Hardware Support | |
524 | select S390 AP IOMMU Support | |
525 | -> VFIO Non-Privileged userspace driver framework | |
526 | -> Mediated device driver frramework | |
527 | -> VFIO driver for Mediated devices | |
528 | -> I/O subsystem | |
529 | -> VFIO support for AP devices | |
492a6be1 TK |
530 | |
531 | 2. Secure the AP queues to be used by the three guests so that the host can not | |
532 | access them. To secure them, there are two sysfs files that specify | |
533 | bitmasks marking a subset of the APQN range as 'usable by the default AP | |
534 | queue device drivers' or 'not usable by the default device drivers' and thus | |
535 | available for use by the vfio_ap device driver'. The location of the sysfs | |
8b4a503d | 536 | files containing the masks are:: |
492a6be1 | 537 | |
8b4a503d MCC |
538 | /sys/bus/ap/apmask |
539 | /sys/bus/ap/aqmask | |
492a6be1 TK |
540 | |
541 | The 'apmask' is a 256-bit mask that identifies a set of AP adapter IDs | |
542 | (APID). Each bit in the mask, from left to right (i.e., from most significant | |
543 | to least significant bit in big endian order), corresponds to an APID from | |
544 | 0-255. If a bit is set, the APID is marked as usable only by the default AP | |
545 | queue device drivers; otherwise, the APID is usable by the vfio_ap | |
546 | device driver. | |
547 | ||
548 | The 'aqmask' is a 256-bit mask that identifies a set of AP queue indexes | |
549 | (APQI). Each bit in the mask, from left to right (i.e., from most significant | |
550 | to least significant bit in big endian order), corresponds to an APQI from | |
551 | 0-255. If a bit is set, the APQI is marked as usable only by the default AP | |
552 | queue device drivers; otherwise, the APQI is usable by the vfio_ap device | |
553 | driver. | |
554 | ||
8b4a503d | 555 | Take, for example, the following mask:: |
492a6be1 TK |
556 | |
557 | 0x7dffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff | |
558 | ||
559 | It indicates: | |
560 | ||
561 | 1, 2, 3, 4, 5, and 7-255 belong to the default drivers' pool, and 0 and 6 | |
562 | belong to the vfio_ap device driver's pool. | |
563 | ||
564 | The APQN of each AP queue device assigned to the linux host is checked by the | |
565 | AP bus against the set of APQNs derived from the cross product of APIDs | |
566 | and APQIs marked as usable only by the default AP queue device drivers. If a | |
567 | match is detected, only the default AP queue device drivers will be probed; | |
568 | otherwise, the vfio_ap device driver will be probed. | |
569 | ||
570 | By default, the two masks are set to reserve all APQNs for use by the default | |
571 | AP queue device drivers. There are two ways the default masks can be changed: | |
572 | ||
573 | 1. The sysfs mask files can be edited by echoing a string into the | |
574 | respective sysfs mask file in one of two formats: | |
575 | ||
576 | * An absolute hex string starting with 0x - like "0x12345678" - sets | |
8b4a503d MCC |
577 | the mask. If the given string is shorter than the mask, it is padded |
578 | with 0s on the right; for example, specifying a mask value of 0x41 is | |
579 | the same as specifying:: | |
492a6be1 | 580 | |
8b4a503d | 581 | 0x4100000000000000000000000000000000000000000000000000000000000000 |
492a6be1 | 582 | |
8b4a503d MCC |
583 | Keep in mind that the mask reads from left to right (i.e., most |
584 | significant to least significant bit in big endian order), so the mask | |
585 | above identifies device numbers 1 and 7 (01000001). | |
492a6be1 | 586 | |
8b4a503d MCC |
587 | If the string is longer than the mask, the operation is terminated with |
588 | an error (EINVAL). | |
492a6be1 TK |
589 | |
590 | * Individual bits in the mask can be switched on and off by specifying | |
8b4a503d MCC |
591 | each bit number to be switched in a comma separated list. Each bit |
592 | number string must be prepended with a ('+') or minus ('-') to indicate | |
593 | the corresponding bit is to be switched on ('+') or off ('-'). Some | |
594 | valid values are: | |
492a6be1 | 595 | |
8b4a503d MCC |
596 | - "+0" switches bit 0 on |
597 | - "-13" switches bit 13 off | |
598 | - "+0x41" switches bit 65 on | |
599 | - "-0xff" switches bit 255 off | |
492a6be1 | 600 | |
8b4a503d | 601 | The following example: |
492a6be1 | 602 | |
8b4a503d | 603 | +0,-6,+0x47,-0xf0 |
492a6be1 | 604 | |
8b4a503d MCC |
605 | Switches bits 0 and 71 (0x47) on |
606 | ||
607 | Switches bits 6 and 240 (0xf0) off | |
608 | ||
609 | Note that the bits not specified in the list remain as they were before | |
610 | the operation. | |
492a6be1 TK |
611 | |
612 | 2. The masks can also be changed at boot time via parameters on the kernel | |
613 | command line like this: | |
614 | ||
8b4a503d | 615 | ap.apmask=0xffff ap.aqmask=0x40 |
492a6be1 | 616 | |
8b4a503d | 617 | This would create the following masks:: |
492a6be1 | 618 | |
8b4a503d MCC |
619 | apmask: |
620 | 0xffff000000000000000000000000000000000000000000000000000000000000 | |
492a6be1 | 621 | |
8b4a503d MCC |
622 | aqmask: |
623 | 0x4000000000000000000000000000000000000000000000000000000000000000 | |
492a6be1 | 624 | |
8b4a503d | 625 | Resulting in these two pools:: |
492a6be1 | 626 | |
8b4a503d MCC |
627 | default drivers pool: adapter 0-15, domain 1 |
628 | alternate drivers pool: adapter 16-255, domains 0, 2-255 | |
492a6be1 | 629 | |
8b4a503d MCC |
630 | Securing the APQNs for our example |
631 | ---------------------------------- | |
492a6be1 TK |
632 | To secure the AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, 06.0047, |
633 | 06.00ab, and 06.00ff for use by the vfio_ap device driver, the corresponding | |
8b4a503d | 634 | APQNs can either be removed from the default masks:: |
492a6be1 TK |
635 | |
636 | echo -5,-6 > /sys/bus/ap/apmask | |
637 | ||
638 | echo -4,-0x47,-0xab,-0xff > /sys/bus/ap/aqmask | |
639 | ||
8b4a503d | 640 | Or the masks can be set as follows:: |
492a6be1 TK |
641 | |
642 | echo 0xf9ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff \ | |
643 | > apmask | |
644 | ||
645 | echo 0xf7fffffffffffffffeffffffffffffffffffffffffeffffffffffffffffffffe \ | |
646 | > aqmask | |
647 | ||
648 | This will result in AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, | |
649 | 06.0047, 06.00ab, and 06.00ff getting bound to the vfio_ap device driver. The | |
650 | sysfs directory for the vfio_ap device driver will now contain symbolic links | |
8b4a503d MCC |
651 | to the AP queue devices bound to it:: |
652 | ||
653 | /sys/bus/ap | |
654 | ... [drivers] | |
655 | ...... [vfio_ap] | |
656 | ......... [05.0004] | |
657 | ......... [05.0047] | |
658 | ......... [05.00ab] | |
659 | ......... [05.00ff] | |
660 | ......... [06.0004] | |
661 | ......... [06.0047] | |
662 | ......... [06.00ab] | |
663 | ......... [06.00ff] | |
492a6be1 TK |
664 | |
665 | Keep in mind that only type 10 and newer adapters (i.e., CEX4 and later) | |
666 | can be bound to the vfio_ap device driver. The reason for this is to | |
667 | simplify the implementation by not needlessly complicating the design by | |
668 | supporting older devices that will go out of service in the relatively near | |
669 | future and for which there are few older systems on which to test. | |
670 | ||
671 | The administrator, therefore, must take care to secure only AP queues that | |
672 | can be bound to the vfio_ap device driver. The device type for a given AP | |
673 | queue device can be read from the parent card's sysfs directory. For example, | |
674 | to see the hardware type of the queue 05.0004: | |
675 | ||
8b4a503d | 676 | cat /sys/bus/ap/devices/card05/hwtype |
492a6be1 TK |
677 | |
678 | The hwtype must be 10 or higher (CEX4 or newer) in order to be bound to the | |
679 | vfio_ap device driver. | |
680 | ||
681 | 3. Create the mediated devices needed to configure the AP matrixes for the | |
682 | three guests and to provide an interface to the vfio_ap driver for | |
8b4a503d | 683 | use by the guests:: |
492a6be1 | 684 | |
8b4a503d MCC |
685 | /sys/devices/vfio_ap/matrix/ |
686 | --- [mdev_supported_types] | |
687 | ------ [vfio_ap-passthrough] (passthrough mediated matrix device type) | |
688 | --------- create | |
689 | --------- [devices] | |
492a6be1 | 690 | |
8b4a503d | 691 | To create the mediated devices for the three guests:: |
492a6be1 TK |
692 | |
693 | uuidgen > create | |
694 | uuidgen > create | |
695 | uuidgen > create | |
696 | ||
8b4a503d | 697 | or |
492a6be1 | 698 | |
8b4a503d MCC |
699 | echo $uuid1 > create |
700 | echo $uuid2 > create | |
701 | echo $uuid3 > create | |
492a6be1 TK |
702 | |
703 | This will create three mediated devices in the [devices] subdirectory named | |
704 | after the UUID written to the create attribute file. We call them $uuid1, | |
8b4a503d MCC |
705 | $uuid2 and $uuid3 and this is the sysfs directory structure after creation:: |
706 | ||
707 | /sys/devices/vfio_ap/matrix/ | |
708 | --- [mdev_supported_types] | |
709 | ------ [vfio_ap-passthrough] | |
710 | --------- [devices] | |
711 | ------------ [$uuid1] | |
712 | --------------- assign_adapter | |
713 | --------------- assign_control_domain | |
714 | --------------- assign_domain | |
715 | --------------- matrix | |
716 | --------------- unassign_adapter | |
717 | --------------- unassign_control_domain | |
718 | --------------- unassign_domain | |
719 | ||
720 | ------------ [$uuid2] | |
721 | --------------- assign_adapter | |
722 | --------------- assign_control_domain | |
723 | --------------- assign_domain | |
724 | --------------- matrix | |
725 | --------------- unassign_adapter | |
726 | ----------------unassign_control_domain | |
727 | ----------------unassign_domain | |
728 | ||
729 | ------------ [$uuid3] | |
730 | --------------- assign_adapter | |
731 | --------------- assign_control_domain | |
732 | --------------- assign_domain | |
733 | --------------- matrix | |
734 | --------------- unassign_adapter | |
735 | ----------------unassign_control_domain | |
736 | ----------------unassign_domain | |
492a6be1 TK |
737 | |
738 | 4. The administrator now needs to configure the matrixes for the mediated | |
739 | devices $uuid1 (for Guest1), $uuid2 (for Guest2) and $uuid3 (for Guest3). | |
740 | ||
8b4a503d | 741 | This is how the matrix is configured for Guest1:: |
492a6be1 TK |
742 | |
743 | echo 5 > assign_adapter | |
744 | echo 6 > assign_adapter | |
745 | echo 4 > assign_domain | |
746 | echo 0xab > assign_domain | |
747 | ||
8b4a503d MCC |
748 | Control domains can similarly be assigned using the assign_control_domain |
749 | sysfs file. | |
492a6be1 | 750 | |
8b4a503d MCC |
751 | If a mistake is made configuring an adapter, domain or control domain, |
752 | you can use the unassign_xxx files to unassign the adapter, domain or | |
753 | control domain. | |
492a6be1 | 754 | |
8b4a503d | 755 | To display the matrix configuration for Guest1:: |
492a6be1 | 756 | |
8b4a503d | 757 | cat matrix |
492a6be1 | 758 | |
8b4a503d | 759 | This is how the matrix is configured for Guest2:: |
492a6be1 TK |
760 | |
761 | echo 5 > assign_adapter | |
762 | echo 0x47 > assign_domain | |
763 | echo 0xff > assign_domain | |
764 | ||
8b4a503d | 765 | This is how the matrix is configured for Guest3:: |
492a6be1 TK |
766 | |
767 | echo 6 > assign_adapter | |
768 | echo 0x47 > assign_domain | |
769 | echo 0xff > assign_domain | |
770 | ||
771 | In order to successfully assign an adapter: | |
772 | ||
773 | * The adapter number specified must represent a value from 0 up to the | |
774 | maximum adapter number configured for the system. If an adapter number | |
775 | higher than the maximum is specified, the operation will terminate with | |
776 | an error (ENODEV). | |
777 | ||
778 | * All APQNs that can be derived from the adapter ID and the IDs of | |
779 | the previously assigned domains must be bound to the vfio_ap device | |
780 | driver. If no domains have yet been assigned, then there must be at least | |
781 | one APQN with the specified APID bound to the vfio_ap driver. If no such | |
782 | APQNs are bound to the driver, the operation will terminate with an | |
783 | error (EADDRNOTAVAIL). | |
784 | ||
785 | No APQN that can be derived from the adapter ID and the IDs of the | |
786 | previously assigned domains can be assigned to another mediated matrix | |
787 | device. If an APQN is assigned to another mediated matrix device, the | |
788 | operation will terminate with an error (EADDRINUSE). | |
789 | ||
790 | In order to successfully assign a domain: | |
791 | ||
792 | * The domain number specified must represent a value from 0 up to the | |
793 | maximum domain number configured for the system. If a domain number | |
794 | higher than the maximum is specified, the operation will terminate with | |
795 | an error (ENODEV). | |
796 | ||
797 | * All APQNs that can be derived from the domain ID and the IDs of | |
798 | the previously assigned adapters must be bound to the vfio_ap device | |
799 | driver. If no domains have yet been assigned, then there must be at least | |
800 | one APQN with the specified APQI bound to the vfio_ap driver. If no such | |
801 | APQNs are bound to the driver, the operation will terminate with an | |
802 | error (EADDRNOTAVAIL). | |
803 | ||
804 | No APQN that can be derived from the domain ID and the IDs of the | |
805 | previously assigned adapters can be assigned to another mediated matrix | |
806 | device. If an APQN is assigned to another mediated matrix device, the | |
807 | operation will terminate with an error (EADDRINUSE). | |
808 | ||
809 | In order to successfully assign a control domain, the domain number | |
810 | specified must represent a value from 0 up to the maximum domain number | |
811 | configured for the system. If a control domain number higher than the maximum | |
812 | is specified, the operation will terminate with an error (ENODEV). | |
813 | ||
8b4a503d | 814 | 5. Start Guest1:: |
492a6be1 | 815 | |
8b4a503d MCC |
816 | /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \ |
817 | -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid1 ... | |
492a6be1 | 818 | |
8b4a503d | 819 | 7. Start Guest2:: |
492a6be1 | 820 | |
8b4a503d MCC |
821 | /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \ |
822 | -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid2 ... | |
492a6be1 | 823 | |
8b4a503d | 824 | 7. Start Guest3:: |
492a6be1 | 825 | |
8b4a503d MCC |
826 | /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \ |
827 | -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid3 ... | |
492a6be1 TK |
828 | |
829 | When the guest is shut down, the mediated matrix devices may be removed. | |
830 | ||
8b4a503d | 831 | Using our example again, to remove the mediated matrix device $uuid1:: |
492a6be1 TK |
832 | |
833 | /sys/devices/vfio_ap/matrix/ | |
834 | --- [mdev_supported_types] | |
835 | ------ [vfio_ap-passthrough] | |
836 | --------- [devices] | |
837 | ------------ [$uuid1] | |
838 | --------------- remove | |
839 | ||
8b4a503d | 840 | :: |
492a6be1 TK |
841 | |
842 | echo 1 > remove | |
843 | ||
8b4a503d MCC |
844 | This will remove all of the mdev matrix device's sysfs structures including |
845 | the mdev device itself. To recreate and reconfigure the mdev matrix device, | |
846 | all of the steps starting with step 3 will have to be performed again. Note | |
847 | that the remove will fail if a guest using the mdev is still running. | |
492a6be1 | 848 | |
8b4a503d MCC |
849 | It is not necessary to remove an mdev matrix device, but one may want to |
850 | remove it if no guest will use it during the remaining lifetime of the linux | |
851 | host. If the mdev matrix device is removed, one may want to also reconfigure | |
852 | the pool of adapters and queues reserved for use by the default drivers. | |
492a6be1 TK |
853 | |
854 | Limitations | |
855 | =========== | |
856 | * The KVM/kernel interfaces do not provide a way to prevent restoring an APQN | |
857 | to the default drivers pool of a queue that is still assigned to a mediated | |
858 | device in use by a guest. It is incumbent upon the administrator to | |
859 | ensure there is no mediated device in use by a guest to which the APQN is | |
860 | assigned lest the host be given access to the private data of the AP queue | |
861 | device such as a private key configured specifically for the guest. | |
862 | ||
863 | * Dynamically modifying the AP matrix for a running guest (which would amount to | |
864 | hot(un)plug of AP devices for the guest) is currently not supported | |
865 | ||
866 | * Live guest migration is not supported for guests using AP devices. |