Commit | Line | Data |
---|---|---|
106ee47d MCC |
1 | .. SPDX-License-Identifier: GPL-2.0 |
2 | ||
3 | =================================================================== | |
9c1b96e3 AK |
4 | The Definitive KVM (Kernel-based Virtual Machine) API Documentation |
5 | =================================================================== | |
6 | ||
7 | 1. General description | |
106ee47d | 8 | ====================== |
9c1b96e3 AK |
9 | |
10 | The kvm API is a set of ioctls that are issued to control various aspects | |
80b10aa9 | 11 | of a virtual machine. The ioctls belong to the following classes: |
9c1b96e3 AK |
12 | |
13 | - System ioctls: These query and set global attributes which affect the | |
14 | whole kvm subsystem. In addition a system ioctl is used to create | |
5e124900 | 15 | virtual machines. |
9c1b96e3 AK |
16 | |
17 | - VM ioctls: These query and set attributes that affect an entire virtual | |
18 | machine, for example memory layout. In addition a VM ioctl is used to | |
ddba9180 | 19 | create virtual cpus (vcpus) and devices. |
9c1b96e3 | 20 | |
5e124900 SC |
21 | VM ioctls must be issued from the same process (address space) that was |
22 | used to create the VM. | |
9c1b96e3 AK |
23 | |
24 | - vcpu ioctls: These query and set attributes that control the operation | |
25 | of a single virtual cpu. | |
26 | ||
5e124900 SC |
27 | vcpu ioctls should be issued from the same thread that was used to create |
28 | the vcpu, except for asynchronous vcpu ioctl that are marked as such in | |
29 | the documentation. Otherwise, the first ioctl after switching threads | |
30 | could see a performance impact. | |
9c1b96e3 | 31 | |
ddba9180 SC |
32 | - device ioctls: These query and set attributes that control the operation |
33 | of a single device. | |
34 | ||
35 | device ioctls must be issued from the same process (address space) that | |
36 | was used to create the VM. | |
414fa985 | 37 | |
2044892d | 38 | 2. File descriptors |
106ee47d | 39 | =================== |
9c1b96e3 AK |
40 | |
41 | The kvm API is centered around file descriptors. An initial | |
42 | open("/dev/kvm") obtains a handle to the kvm subsystem; this handle | |
43 | can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this | |
2044892d | 44 | handle will create a VM file descriptor which can be used to issue VM |
ddba9180 SC |
45 | ioctls. A KVM_CREATE_VCPU or KVM_CREATE_DEVICE ioctl on a VM fd will |
46 | create a virtual cpu or device and return a file descriptor pointing to | |
47 | the new resource. Finally, ioctls on a vcpu or device fd can be used | |
48 | to control the vcpu or device. For vcpus, this includes the important | |
49 | task of actually running guest code. | |
9c1b96e3 AK |
50 | |
51 | In general file descriptors can be migrated among processes by means | |
52 | of fork() and the SCM_RIGHTS facility of unix domain socket. These | |
53 | kinds of tricks are explicitly not supported by kvm. While they will | |
54 | not cause harm to the host, their actual behavior is not guaranteed by | |
5e124900 SC |
55 | the API. See "General description" for details on the ioctl usage |
56 | model that is supported by KVM. | |
eca6be56 | 57 | |
c44456f2 | 58 | It is important to note that although VM ioctls may only be issued from |
919f6cd8 SC |
59 | the process that created the VM, a VM's lifecycle is associated with its |
60 | file descriptor, not its creator (process). In other words, the VM and | |
61 | its resources, *including the associated address space*, are not freed | |
62 | until the last reference to the VM's file descriptor has been released. | |
63 | For example, if fork() is issued after ioctl(KVM_CREATE_VM), the VM will | |
64 | not be freed until both the parent (original) process and its child have | |
65 | put their references to the VM's file descriptor. | |
66 | ||
67 | Because a VM's resources are not freed until the last reference to its | |
3747c5d3 | 68 | file descriptor is released, creating additional references to a VM |
919f6cd8 SC |
69 | via fork(), dup(), etc... without careful consideration is strongly |
70 | discouraged and may have unwanted side effects, e.g. memory allocated | |
71 | by and on behalf of the VM's process may not be freed/unaccounted when | |
72 | the VM is shut down. | |
73 | ||
74 | ||
9c1b96e3 | 75 | 3. Extensions |
106ee47d | 76 | ============= |
9c1b96e3 AK |
77 | |
78 | As of Linux 2.6.22, the KVM ABI has been stabilized: no backward | |
79 | incompatible change are allowed. However, there is an extension | |
80 | facility that allows backward-compatible extensions to the API to be | |
81 | queried and used. | |
82 | ||
c9f3f2d8 | 83 | The extension mechanism is not based on the Linux version number. |
9c1b96e3 AK |
84 | Instead, kvm defines extension identifiers and a facility to query |
85 | whether a particular extension identifier is available. If it is, a | |
86 | set of ioctls is available for application use. | |
87 | ||
414fa985 | 88 | |
9c1b96e3 | 89 | 4. API description |
106ee47d | 90 | ================== |
9c1b96e3 AK |
91 | |
92 | This section describes ioctls that can be used to control kvm guests. | |
93 | For each ioctl, the following information is provided along with a | |
94 | description: | |
95 | ||
106ee47d MCC |
96 | Capability: |
97 | which KVM extension provides this ioctl. Can be 'basic', | |
9c1b96e3 | 98 | which means that is will be provided by any kernel that supports |
7f05db6a | 99 | API version 12 (see section 4.1), a KVM_CAP_xyz constant, which |
9c1b96e3 | 100 | means availability needs to be checked with KVM_CHECK_EXTENSION |
7f05db6a MT |
101 | (see section 4.4), or 'none' which means that while not all kernels |
102 | support this ioctl, there's no capability bit to check its | |
103 | availability: for kernels that don't support the ioctl, | |
104 | the ioctl returns -ENOTTY. | |
9c1b96e3 | 105 | |
106ee47d MCC |
106 | Architectures: |
107 | which instruction set architectures provide this ioctl. | |
9c1b96e3 AK |
108 | x86 includes both i386 and x86_64. |
109 | ||
106ee47d MCC |
110 | Type: |
111 | system, vm, or vcpu. | |
9c1b96e3 | 112 | |
106ee47d MCC |
113 | Parameters: |
114 | what parameters are accepted by the ioctl. | |
9c1b96e3 | 115 | |
106ee47d MCC |
116 | Returns: |
117 | the return value. General error numbers (EBADF, ENOMEM, EINVAL) | |
9c1b96e3 AK |
118 | are not detailed, but errors with specific meanings are. |
119 | ||
414fa985 | 120 | |
9c1b96e3 | 121 | 4.1 KVM_GET_API_VERSION |
106ee47d | 122 | ----------------------- |
9c1b96e3 | 123 | |
106ee47d MCC |
124 | :Capability: basic |
125 | :Architectures: all | |
126 | :Type: system ioctl | |
127 | :Parameters: none | |
128 | :Returns: the constant KVM_API_VERSION (=12) | |
9c1b96e3 AK |
129 | |
130 | This identifies the API version as the stable kvm API. It is not | |
131 | expected that this number will change. However, Linux 2.6.20 and | |
132 | 2.6.21 report earlier versions; these are not documented and not | |
133 | supported. Applications should refuse to run if KVM_GET_API_VERSION | |
134 | returns a value other than 12. If this check passes, all ioctls | |
135 | described as 'basic' will be available. | |
136 | ||
414fa985 | 137 | |
9c1b96e3 | 138 | 4.2 KVM_CREATE_VM |
106ee47d | 139 | ----------------- |
9c1b96e3 | 140 | |
106ee47d MCC |
141 | :Capability: basic |
142 | :Architectures: all | |
143 | :Type: system ioctl | |
144 | :Parameters: machine type identifier (KVM_VM_*) | |
145 | :Returns: a VM fd that can be used to control the new virtual machine. | |
9c1b96e3 | 146 | |
bcb85c88 | 147 | The new VM has no virtual cpus and no memory. |
a8a3c426 | 148 | You probably want to use 0 as machine type. |
e08b9637 CO |
149 | |
150 | In order to create user controlled virtual machines on S390, check | |
151 | KVM_CAP_S390_UCONTROL and use the flag KVM_VM_S390_UCONTROL as | |
152 | privileged user (CAP_SYS_ADMIN). | |
9c1b96e3 | 153 | |
233a7cb2 SP |
154 | On arm64, the physical address size for a VM (IPA Size limit) is limited |
155 | to 40bits by default. The limit can be configured if the host supports the | |
156 | extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use | |
157 | KVM_VM_TYPE_ARM_IPA_SIZE(IPA_Bits) to set the size in the machine type | |
158 | identifier, where IPA_Bits is the maximum width of any physical | |
159 | address used by the VM. The IPA_Bits is encoded in bits[7-0] of the | |
160 | machine type identifier. | |
161 | ||
106ee47d | 162 | e.g, to configure a guest to use 48bit physical address size:: |
233a7cb2 SP |
163 | |
164 | vm_fd = ioctl(dev_fd, KVM_CREATE_VM, KVM_VM_TYPE_ARM_IPA_SIZE(48)); | |
165 | ||
106ee47d | 166 | The requested size (IPA_Bits) must be: |
233a7cb2 | 167 | |
106ee47d MCC |
168 | == ========================================================= |
169 | 0 Implies default size, 40bits (for backward compatibility) | |
170 | N Implies N bits, where N is a positive integer such that, | |
233a7cb2 | 171 | 32 <= N <= Host_IPA_Limit |
106ee47d | 172 | == ========================================================= |
233a7cb2 SP |
173 | |
174 | Host_IPA_Limit is the maximum possible value for IPA_Bits on the host and | |
175 | is dependent on the CPU capability and the kernel configuration. The limit can | |
176 | be retrieved using KVM_CAP_ARM_VM_IPA_SIZE of the KVM_CHECK_EXTENSION | |
177 | ioctl() at run-time. | |
178 | ||
7d717558 MZ |
179 | Creation of the VM will fail if the requested IPA size (whether it is |
180 | implicit or explicit) is unsupported on the host. | |
181 | ||
233a7cb2 SP |
182 | Please note that configuring the IPA size does not affect the capability |
183 | exposed by the guest CPUs in ID_AA64MMFR0_EL1[PARange]. It only affects | |
184 | size of the address translated by the stage2 level (guest physical to | |
185 | host physical address translations). | |
186 | ||
187 | ||
801e459a | 188 | 4.3 KVM_GET_MSR_INDEX_LIST, KVM_GET_MSR_FEATURE_INDEX_LIST |
106ee47d MCC |
189 | ---------------------------------------------------------- |
190 | ||
191 | :Capability: basic, KVM_CAP_GET_MSR_FEATURES for KVM_GET_MSR_FEATURE_INDEX_LIST | |
192 | :Architectures: x86 | |
193 | :Type: system ioctl | |
194 | :Parameters: struct kvm_msr_list (in/out) | |
195 | :Returns: 0 on success; -1 on error | |
9c1b96e3 | 196 | |
9c1b96e3 | 197 | Errors: |
106ee47d MCC |
198 | |
199 | ====== ============================================================ | |
200 | EFAULT the msr index list cannot be read from or written to | |
24e7475f | 201 | E2BIG the msr index list is too big to fit in the array specified by |
9c1b96e3 | 202 | the user. |
106ee47d | 203 | ====== ============================================================ |
9c1b96e3 | 204 | |
106ee47d MCC |
205 | :: |
206 | ||
207 | struct kvm_msr_list { | |
9c1b96e3 AK |
208 | __u32 nmsrs; /* number of msrs in entries */ |
209 | __u32 indices[0]; | |
106ee47d | 210 | }; |
9c1b96e3 | 211 | |
801e459a TL |
212 | The user fills in the size of the indices array in nmsrs, and in return |
213 | kvm adjusts nmsrs to reflect the actual number of msrs and fills in the | |
214 | indices array with their numbers. | |
215 | ||
216 | KVM_GET_MSR_INDEX_LIST returns the guest msrs that are supported. The list | |
217 | varies by kvm version and host processor, but does not change otherwise. | |
9c1b96e3 | 218 | |
2e2602ca AK |
219 | Note: if kvm indicates supports MCE (KVM_CAP_MCE), then the MCE bank MSRs are |
220 | not returned in the MSR list, as different vcpus can have a different number | |
221 | of banks, as set via the KVM_X86_SETUP_MCE ioctl. | |
222 | ||
801e459a TL |
223 | KVM_GET_MSR_FEATURE_INDEX_LIST returns the list of MSRs that can be passed |
224 | to the KVM_GET_MSRS system ioctl. This lets userspace probe host capabilities | |
225 | and processor features that are exposed via MSRs (e.g., VMX capabilities). | |
226 | This list also varies by kvm version and host processor, but does not change | |
227 | otherwise. | |
228 | ||
414fa985 | 229 | |
9c1b96e3 | 230 | 4.4 KVM_CHECK_EXTENSION |
106ee47d | 231 | ----------------------- |
9c1b96e3 | 232 | |
106ee47d MCC |
233 | :Capability: basic, KVM_CAP_CHECK_EXTENSION_VM for vm ioctl |
234 | :Architectures: all | |
235 | :Type: system ioctl, vm ioctl | |
236 | :Parameters: extension identifier (KVM_CAP_*) | |
237 | :Returns: 0 if unsupported; 1 (or some other positive integer) if supported | |
9c1b96e3 AK |
238 | |
239 | The API allows the application to query about extensions to the core | |
240 | kvm API. Userspace passes an extension identifier (an integer) and | |
241 | receives an integer that describes the extension availability. | |
242 | Generally 0 means no and 1 means yes, but some extensions may report | |
243 | additional information in the integer return value. | |
244 | ||
92b591a4 AG |
245 | Based on their initialization different VMs may have different capabilities. |
246 | It is thus encouraged to use the vm ioctl to query for capabilities (available | |
247 | with KVM_CAP_CHECK_EXTENSION_VM on the vm fd) | |
414fa985 | 248 | |
9c1b96e3 | 249 | 4.5 KVM_GET_VCPU_MMAP_SIZE |
106ee47d | 250 | -------------------------- |
9c1b96e3 | 251 | |
106ee47d MCC |
252 | :Capability: basic |
253 | :Architectures: all | |
254 | :Type: system ioctl | |
255 | :Parameters: none | |
256 | :Returns: size of vcpu mmap area, in bytes | |
9c1b96e3 AK |
257 | |
258 | The KVM_RUN ioctl (cf.) communicates with userspace via a shared | |
259 | memory region. This ioctl returns the size of that region. See the | |
260 | KVM_RUN documentation for details. | |
261 | ||
fb04a1ed PX |
262 | Besides the size of the KVM_RUN communication region, other areas of |
263 | the VCPU file descriptor can be mmap-ed, including: | |
264 | ||
265 | - if KVM_CAP_COALESCED_MMIO is available, a page at | |
266 | KVM_COALESCED_MMIO_PAGE_OFFSET * PAGE_SIZE; for historical reasons, | |
267 | this page is included in the result of KVM_GET_VCPU_MMAP_SIZE. | |
268 | KVM_CAP_COALESCED_MMIO is not documented yet. | |
269 | ||
270 | - if KVM_CAP_DIRTY_LOG_RING is available, a number of pages at | |
271 | KVM_DIRTY_LOG_PAGE_OFFSET * PAGE_SIZE. For more information on | |
272 | KVM_CAP_DIRTY_LOG_RING, see section 8.3. | |
273 | ||
414fa985 | 274 | |
9c1b96e3 | 275 | 4.6 KVM_SET_MEMORY_REGION |
106ee47d | 276 | ------------------------- |
9c1b96e3 | 277 | |
106ee47d MCC |
278 | :Capability: basic |
279 | :Architectures: all | |
280 | :Type: vm ioctl | |
281 | :Parameters: struct kvm_memory_region (in) | |
282 | :Returns: 0 on success, -1 on error | |
9c1b96e3 | 283 | |
b74a07be | 284 | This ioctl is obsolete and has been removed. |
9c1b96e3 | 285 | |
414fa985 | 286 | |
68ba6974 | 287 | 4.7 KVM_CREATE_VCPU |
106ee47d | 288 | ------------------- |
9c1b96e3 | 289 | |
106ee47d MCC |
290 | :Capability: basic |
291 | :Architectures: all | |
292 | :Type: vm ioctl | |
293 | :Parameters: vcpu id (apic id on x86) | |
294 | :Returns: vcpu fd on success, -1 on error | |
9c1b96e3 | 295 | |
0b1b1dfd GK |
296 | This API adds a vcpu to a virtual machine. No more than max_vcpus may be added. |
297 | The vcpu id is an integer in the range [0, max_vcpu_id). | |
8c3ba334 SL |
298 | |
299 | The recommended max_vcpus value can be retrieved using the KVM_CAP_NR_VCPUS of | |
300 | the KVM_CHECK_EXTENSION ioctl() at run-time. | |
301 | The maximum possible value for max_vcpus can be retrieved using the | |
302 | KVM_CAP_MAX_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time. | |
303 | ||
76d25402 PE |
304 | If the KVM_CAP_NR_VCPUS does not exist, you should assume that max_vcpus is 4 |
305 | cpus max. | |
8c3ba334 SL |
306 | If the KVM_CAP_MAX_VCPUS does not exist, you should assume that max_vcpus is |
307 | same as the value returned from KVM_CAP_NR_VCPUS. | |
9c1b96e3 | 308 | |
0b1b1dfd GK |
309 | The maximum possible value for max_vcpu_id can be retrieved using the |
310 | KVM_CAP_MAX_VCPU_ID of the KVM_CHECK_EXTENSION ioctl() at run-time. | |
311 | ||
312 | If the KVM_CAP_MAX_VCPU_ID does not exist, you should assume that max_vcpu_id | |
313 | is the same as the value returned from KVM_CAP_MAX_VCPUS. | |
314 | ||
371fefd6 PM |
315 | On powerpc using book3s_hv mode, the vcpus are mapped onto virtual |
316 | threads in one or more virtual CPU cores. (This is because the | |
317 | hardware requires all the hardware threads in a CPU core to be in the | |
318 | same partition.) The KVM_CAP_PPC_SMT capability indicates the number | |
36442687 AK |
319 | of vcpus per virtual core (vcore). The vcore id is obtained by |
320 | dividing the vcpu id by the number of vcpus per vcore. The vcpus in a | |
321 | given vcore will always be in the same physical core as each other | |
322 | (though that might be a different physical core from time to time). | |
323 | Userspace can control the threading (SMT) mode of the guest by its | |
324 | allocation of vcpu ids. For example, if userspace wants | |
325 | single-threaded guest vcpus, it should make all vcpu ids be a multiple | |
326 | of the number of vcpus per vcore. | |
327 | ||
5b1c1493 CO |
328 | For virtual cpus that have been created with S390 user controlled virtual |
329 | machines, the resulting vcpu fd can be memory mapped at page offset | |
330 | KVM_S390_SIE_PAGE_OFFSET in order to obtain a memory map of the virtual | |
331 | cpu's hardware control block. | |
332 | ||
414fa985 | 333 | |
68ba6974 | 334 | 4.8 KVM_GET_DIRTY_LOG (vm ioctl) |
106ee47d | 335 | -------------------------------- |
9c1b96e3 | 336 | |
106ee47d MCC |
337 | :Capability: basic |
338 | :Architectures: all | |
339 | :Type: vm ioctl | |
340 | :Parameters: struct kvm_dirty_log (in/out) | |
341 | :Returns: 0 on success, -1 on error | |
9c1b96e3 | 342 | |
106ee47d MCC |
343 | :: |
344 | ||
345 | /* for KVM_GET_DIRTY_LOG */ | |
346 | struct kvm_dirty_log { | |
9c1b96e3 AK |
347 | __u32 slot; |
348 | __u32 padding; | |
349 | union { | |
350 | void __user *dirty_bitmap; /* one bit per page */ | |
351 | __u64 padding; | |
352 | }; | |
106ee47d | 353 | }; |
9c1b96e3 AK |
354 | |
355 | Given a memory slot, return a bitmap containing any pages dirtied | |
356 | since the last call to this ioctl. Bit 0 is the first page in the | |
357 | memory slot. Ensure the entire structure is cleared to avoid padding | |
358 | issues. | |
359 | ||
01ead84c ZY |
360 | If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 of slot field specifies |
361 | the address space for which you want to return the dirty bitmap. See | |
362 | KVM_SET_USER_MEMORY_REGION for details on the usage of slot field. | |
f481b069 | 363 | |
2a31b9db | 364 | The bits in the dirty bitmap are cleared before the ioctl returns, unless |
d7547c55 | 365 | KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is enabled. For more information, |
2a31b9db | 366 | see the description of the capability. |
414fa985 | 367 | |
1cfc9c4b DW |
368 | Note that the Xen shared info page, if configured, shall always be assumed |
369 | to be dirty. KVM will not explicitly mark it such. | |
370 | ||
68ba6974 | 371 | 4.9 KVM_SET_MEMORY_ALIAS |
106ee47d | 372 | ------------------------ |
9c1b96e3 | 373 | |
106ee47d MCC |
374 | :Capability: basic |
375 | :Architectures: x86 | |
376 | :Type: vm ioctl | |
377 | :Parameters: struct kvm_memory_alias (in) | |
378 | :Returns: 0 (success), -1 (error) | |
9c1b96e3 | 379 | |
a1f4d395 | 380 | This ioctl is obsolete and has been removed. |
9c1b96e3 | 381 | |
414fa985 | 382 | |
68ba6974 | 383 | 4.10 KVM_RUN |
106ee47d MCC |
384 | ------------ |
385 | ||
386 | :Capability: basic | |
387 | :Architectures: all | |
388 | :Type: vcpu ioctl | |
389 | :Parameters: none | |
390 | :Returns: 0 on success, -1 on error | |
9c1b96e3 | 391 | |
9c1b96e3 | 392 | Errors: |
106ee47d | 393 | |
3557ae18 | 394 | ======= ============================================================== |
106ee47d | 395 | EINTR an unmasked signal is pending |
3557ae18 AE |
396 | ENOEXEC the vcpu hasn't been initialized or the guest tried to execute |
397 | instructions from device memory (arm64) | |
398 | ENOSYS data abort outside memslots with no syndrome info and | |
399 | KVM_CAP_ARM_NISV_TO_USER not enabled (arm64) | |
400 | EPERM SVE feature set but not finalized (arm64) | |
401 | ======= ============================================================== | |
9c1b96e3 AK |
402 | |
403 | This ioctl is used to run a guest virtual cpu. While there are no | |
404 | explicit parameters, there is an implicit parameter block that can be | |
405 | obtained by mmap()ing the vcpu fd at offset 0, with the size given by | |
406 | KVM_GET_VCPU_MMAP_SIZE. The parameter block is formatted as a 'struct | |
407 | kvm_run' (see below). | |
408 | ||
414fa985 | 409 | |
68ba6974 | 410 | 4.11 KVM_GET_REGS |
106ee47d | 411 | ----------------- |
9c1b96e3 | 412 | |
106ee47d | 413 | :Capability: basic |
3fbf4207 | 414 | :Architectures: all except arm64 |
106ee47d MCC |
415 | :Type: vcpu ioctl |
416 | :Parameters: struct kvm_regs (out) | |
417 | :Returns: 0 on success, -1 on error | |
9c1b96e3 AK |
418 | |
419 | Reads the general purpose registers from the vcpu. | |
420 | ||
106ee47d MCC |
421 | :: |
422 | ||
423 | /* x86 */ | |
424 | struct kvm_regs { | |
9c1b96e3 AK |
425 | /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */ |
426 | __u64 rax, rbx, rcx, rdx; | |
427 | __u64 rsi, rdi, rsp, rbp; | |
428 | __u64 r8, r9, r10, r11; | |
429 | __u64 r12, r13, r14, r15; | |
430 | __u64 rip, rflags; | |
106ee47d | 431 | }; |
9c1b96e3 | 432 | |
106ee47d MCC |
433 | /* mips */ |
434 | struct kvm_regs { | |
c2d2c21b JH |
435 | /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */ |
436 | __u64 gpr[32]; | |
437 | __u64 hi; | |
438 | __u64 lo; | |
439 | __u64 pc; | |
106ee47d | 440 | }; |
c2d2c21b | 441 | |
414fa985 | 442 | |
68ba6974 | 443 | 4.12 KVM_SET_REGS |
106ee47d | 444 | ----------------- |
9c1b96e3 | 445 | |
106ee47d | 446 | :Capability: basic |
3fbf4207 | 447 | :Architectures: all except arm64 |
106ee47d MCC |
448 | :Type: vcpu ioctl |
449 | :Parameters: struct kvm_regs (in) | |
450 | :Returns: 0 on success, -1 on error | |
9c1b96e3 AK |
451 | |
452 | Writes the general purpose registers into the vcpu. | |
453 | ||
454 | See KVM_GET_REGS for the data structure. | |
455 | ||
414fa985 | 456 | |
68ba6974 | 457 | 4.13 KVM_GET_SREGS |
106ee47d | 458 | ------------------ |
9c1b96e3 | 459 | |
106ee47d MCC |
460 | :Capability: basic |
461 | :Architectures: x86, ppc | |
462 | :Type: vcpu ioctl | |
463 | :Parameters: struct kvm_sregs (out) | |
464 | :Returns: 0 on success, -1 on error | |
9c1b96e3 AK |
465 | |
466 | Reads special registers from the vcpu. | |
467 | ||
106ee47d MCC |
468 | :: |
469 | ||
470 | /* x86 */ | |
471 | struct kvm_sregs { | |
9c1b96e3 AK |
472 | struct kvm_segment cs, ds, es, fs, gs, ss; |
473 | struct kvm_segment tr, ldt; | |
474 | struct kvm_dtable gdt, idt; | |
475 | __u64 cr0, cr2, cr3, cr4, cr8; | |
476 | __u64 efer; | |
477 | __u64 apic_base; | |
478 | __u64 interrupt_bitmap[(KVM_NR_INTERRUPTS + 63) / 64]; | |
106ee47d | 479 | }; |
9c1b96e3 | 480 | |
106ee47d | 481 | /* ppc -- see arch/powerpc/include/uapi/asm/kvm.h */ |
5ce941ee | 482 | |
9c1b96e3 AK |
483 | interrupt_bitmap is a bitmap of pending external interrupts. At most |
484 | one bit may be set. This interrupt has been acknowledged by the APIC | |
485 | but not yet injected into the cpu core. | |
486 | ||
414fa985 | 487 | |
68ba6974 | 488 | 4.14 KVM_SET_SREGS |
106ee47d | 489 | ------------------ |
9c1b96e3 | 490 | |
106ee47d MCC |
491 | :Capability: basic |
492 | :Architectures: x86, ppc | |
493 | :Type: vcpu ioctl | |
494 | :Parameters: struct kvm_sregs (in) | |
495 | :Returns: 0 on success, -1 on error | |
9c1b96e3 AK |
496 | |
497 | Writes special registers into the vcpu. See KVM_GET_SREGS for the | |
498 | data structures. | |
499 | ||
414fa985 | 500 | |
68ba6974 | 501 | 4.15 KVM_TRANSLATE |
106ee47d | 502 | ------------------ |
9c1b96e3 | 503 | |
106ee47d MCC |
504 | :Capability: basic |
505 | :Architectures: x86 | |
506 | :Type: vcpu ioctl | |
507 | :Parameters: struct kvm_translation (in/out) | |
508 | :Returns: 0 on success, -1 on error | |
9c1b96e3 AK |
509 | |
510 | Translates a virtual address according to the vcpu's current address | |
511 | translation mode. | |
512 | ||
106ee47d MCC |
513 | :: |
514 | ||
515 | struct kvm_translation { | |
9c1b96e3 AK |
516 | /* in */ |
517 | __u64 linear_address; | |
518 | ||
519 | /* out */ | |
520 | __u64 physical_address; | |
521 | __u8 valid; | |
522 | __u8 writeable; | |
523 | __u8 usermode; | |
524 | __u8 pad[5]; | |
106ee47d | 525 | }; |
9c1b96e3 | 526 | |
414fa985 | 527 | |
68ba6974 | 528 | 4.16 KVM_INTERRUPT |
106ee47d | 529 | ------------------ |
9c1b96e3 | 530 | |
106ee47d | 531 | :Capability: basic |
da40d858 | 532 | :Architectures: x86, ppc, mips, riscv |
106ee47d MCC |
533 | :Type: vcpu ioctl |
534 | :Parameters: struct kvm_interrupt (in) | |
535 | :Returns: 0 on success, negative on failure. | |
9c1b96e3 | 536 | |
1c1a9ce9 | 537 | Queues a hardware interrupt vector to be injected. |
9c1b96e3 | 538 | |
106ee47d MCC |
539 | :: |
540 | ||
541 | /* for KVM_INTERRUPT */ | |
542 | struct kvm_interrupt { | |
9c1b96e3 AK |
543 | /* in */ |
544 | __u32 irq; | |
106ee47d | 545 | }; |
9c1b96e3 | 546 | |
6f7a2bd4 | 547 | X86: |
106ee47d MCC |
548 | ^^^^ |
549 | ||
550 | :Returns: | |
6f7a2bd4 | 551 | |
106ee47d MCC |
552 | ========= =================================== |
553 | 0 on success, | |
554 | -EEXIST if an interrupt is already enqueued | |
3747c5d3 | 555 | -EINVAL the irq number is invalid |
106ee47d MCC |
556 | -ENXIO if the PIC is in the kernel |
557 | -EFAULT if the pointer is invalid | |
558 | ========= =================================== | |
1c1a9ce9 SR |
559 | |
560 | Note 'irq' is an interrupt vector, not an interrupt pin or line. This | |
561 | ioctl is useful if the in-kernel PIC is not used. | |
9c1b96e3 | 562 | |
6f7a2bd4 | 563 | PPC: |
106ee47d | 564 | ^^^^ |
6f7a2bd4 AG |
565 | |
566 | Queues an external interrupt to be injected. This ioctl is overleaded | |
567 | with 3 different irq values: | |
568 | ||
569 | a) KVM_INTERRUPT_SET | |
570 | ||
106ee47d MCC |
571 | This injects an edge type external interrupt into the guest once it's ready |
572 | to receive interrupts. When injected, the interrupt is done. | |
6f7a2bd4 AG |
573 | |
574 | b) KVM_INTERRUPT_UNSET | |
575 | ||
106ee47d | 576 | This unsets any pending interrupt. |
6f7a2bd4 | 577 | |
106ee47d | 578 | Only available with KVM_CAP_PPC_UNSET_IRQ. |
6f7a2bd4 AG |
579 | |
580 | c) KVM_INTERRUPT_SET_LEVEL | |
581 | ||
106ee47d MCC |
582 | This injects a level type external interrupt into the guest context. The |
583 | interrupt stays pending until a specific ioctl with KVM_INTERRUPT_UNSET | |
584 | is triggered. | |
6f7a2bd4 | 585 | |
106ee47d | 586 | Only available with KVM_CAP_PPC_IRQ_LEVEL. |
6f7a2bd4 AG |
587 | |
588 | Note that any value for 'irq' other than the ones stated above is invalid | |
589 | and incurs unexpected behavior. | |
590 | ||
5e124900 SC |
591 | This is an asynchronous vcpu ioctl and can be invoked from any thread. |
592 | ||
c2d2c21b | 593 | MIPS: |
106ee47d | 594 | ^^^^^ |
c2d2c21b JH |
595 | |
596 | Queues an external interrupt to be injected into the virtual CPU. A negative | |
597 | interrupt number dequeues the interrupt. | |
598 | ||
5e124900 SC |
599 | This is an asynchronous vcpu ioctl and can be invoked from any thread. |
600 | ||
da40d858 AP |
601 | RISC-V: |
602 | ^^^^^^^ | |
603 | ||
604 | Queues an external interrupt to be injected into the virutal CPU. This ioctl | |
605 | is overloaded with 2 different irq values: | |
606 | ||
607 | a) KVM_INTERRUPT_SET | |
608 | ||
609 | This sets external interrupt for a virtual CPU and it will receive | |
610 | once it is ready. | |
611 | ||
612 | b) KVM_INTERRUPT_UNSET | |
613 | ||
614 | This clears pending external interrupt for a virtual CPU. | |
615 | ||
616 | This is an asynchronous vcpu ioctl and can be invoked from any thread. | |
617 | ||
414fa985 | 618 | |
68ba6974 | 619 | 4.17 KVM_DEBUG_GUEST |
106ee47d | 620 | -------------------- |
9c1b96e3 | 621 | |
106ee47d MCC |
622 | :Capability: basic |
623 | :Architectures: none | |
624 | :Type: vcpu ioctl | |
625 | :Parameters: none) | |
626 | :Returns: -1 on error | |
9c1b96e3 AK |
627 | |
628 | Support for this has been removed. Use KVM_SET_GUEST_DEBUG instead. | |
629 | ||
414fa985 | 630 | |
68ba6974 | 631 | 4.18 KVM_GET_MSRS |
106ee47d | 632 | ----------------- |
9c1b96e3 | 633 | |
106ee47d MCC |
634 | :Capability: basic (vcpu), KVM_CAP_GET_MSR_FEATURES (system) |
635 | :Architectures: x86 | |
636 | :Type: system ioctl, vcpu ioctl | |
637 | :Parameters: struct kvm_msrs (in/out) | |
638 | :Returns: number of msrs successfully returned; | |
639 | -1 on error | |
801e459a TL |
640 | |
641 | When used as a system ioctl: | |
642 | Reads the values of MSR-based features that are available for the VM. This | |
643 | is similar to KVM_GET_SUPPORTED_CPUID, but it returns MSR indices and values. | |
644 | The list of msr-based features can be obtained using KVM_GET_MSR_FEATURE_INDEX_LIST | |
645 | in a system ioctl. | |
9c1b96e3 | 646 | |
801e459a | 647 | When used as a vcpu ioctl: |
9c1b96e3 | 648 | Reads model-specific registers from the vcpu. Supported msr indices can |
801e459a | 649 | be obtained using KVM_GET_MSR_INDEX_LIST in a system ioctl. |
9c1b96e3 | 650 | |
106ee47d MCC |
651 | :: |
652 | ||
653 | struct kvm_msrs { | |
9c1b96e3 AK |
654 | __u32 nmsrs; /* number of msrs in entries */ |
655 | __u32 pad; | |
656 | ||
657 | struct kvm_msr_entry entries[0]; | |
106ee47d | 658 | }; |
9c1b96e3 | 659 | |
106ee47d | 660 | struct kvm_msr_entry { |
9c1b96e3 AK |
661 | __u32 index; |
662 | __u32 reserved; | |
663 | __u64 data; | |
106ee47d | 664 | }; |
9c1b96e3 AK |
665 | |
666 | Application code should set the 'nmsrs' member (which indicates the | |
667 | size of the entries array) and the 'index' member of each array entry. | |
668 | kvm will fill in the 'data' member. | |
669 | ||
414fa985 | 670 | |
68ba6974 | 671 | 4.19 KVM_SET_MSRS |
106ee47d | 672 | ----------------- |
9c1b96e3 | 673 | |
106ee47d MCC |
674 | :Capability: basic |
675 | :Architectures: x86 | |
676 | :Type: vcpu ioctl | |
677 | :Parameters: struct kvm_msrs (in) | |
678 | :Returns: number of msrs successfully set (see below), -1 on error | |
9c1b96e3 AK |
679 | |
680 | Writes model-specific registers to the vcpu. See KVM_GET_MSRS for the | |
681 | data structures. | |
682 | ||
683 | Application code should set the 'nmsrs' member (which indicates the | |
684 | size of the entries array), and the 'index' and 'data' members of each | |
685 | array entry. | |
686 | ||
b274a290 XL |
687 | It tries to set the MSRs in array entries[] one by one. If setting an MSR |
688 | fails, e.g., due to setting reserved bits, the MSR isn't supported/emulated | |
689 | by KVM, etc..., it stops processing the MSR list and returns the number of | |
690 | MSRs that have been set successfully. | |
691 | ||
414fa985 | 692 | |
68ba6974 | 693 | 4.20 KVM_SET_CPUID |
106ee47d | 694 | ------------------ |
9c1b96e3 | 695 | |
106ee47d MCC |
696 | :Capability: basic |
697 | :Architectures: x86 | |
698 | :Type: vcpu ioctl | |
699 | :Parameters: struct kvm_cpuid (in) | |
700 | :Returns: 0 on success, -1 on error | |
9c1b96e3 AK |
701 | |
702 | Defines the vcpu responses to the cpuid instruction. Applications | |
703 | should use the KVM_SET_CPUID2 ioctl if available. | |
704 | ||
63f5a190 SC |
705 | Caveat emptor: |
706 | - If this IOCTL fails, KVM gives no guarantees that previous valid CPUID | |
707 | configuration (if there is) is not corrupted. Userspace can get a copy | |
708 | of the resulting CPUID configuration through KVM_GET_CPUID2 in case. | |
709 | - Using KVM_SET_CPUID{,2} after KVM_RUN, i.e. changing the guest vCPU model | |
710 | after running the guest, may cause guest instability. | |
711 | - Using heterogeneous CPUID configurations, modulo APIC IDs, topology, etc... | |
712 | may cause guest instability. | |
18964092 | 713 | |
106ee47d | 714 | :: |
9c1b96e3 | 715 | |
106ee47d | 716 | struct kvm_cpuid_entry { |
9c1b96e3 AK |
717 | __u32 function; |
718 | __u32 eax; | |
719 | __u32 ebx; | |
720 | __u32 ecx; | |
721 | __u32 edx; | |
722 | __u32 padding; | |
106ee47d | 723 | }; |
9c1b96e3 | 724 | |
106ee47d MCC |
725 | /* for KVM_SET_CPUID */ |
726 | struct kvm_cpuid { | |
9c1b96e3 AK |
727 | __u32 nent; |
728 | __u32 padding; | |
729 | struct kvm_cpuid_entry entries[0]; | |
106ee47d | 730 | }; |
9c1b96e3 | 731 | |
414fa985 | 732 | |
68ba6974 | 733 | 4.21 KVM_SET_SIGNAL_MASK |
106ee47d | 734 | ------------------------ |
9c1b96e3 | 735 | |
106ee47d MCC |
736 | :Capability: basic |
737 | :Architectures: all | |
738 | :Type: vcpu ioctl | |
739 | :Parameters: struct kvm_signal_mask (in) | |
740 | :Returns: 0 on success, -1 on error | |
9c1b96e3 AK |
741 | |
742 | Defines which signals are blocked during execution of KVM_RUN. This | |
743 | signal mask temporarily overrides the threads signal mask. Any | |
744 | unblocked signal received (except SIGKILL and SIGSTOP, which retain | |
745 | their traditional behaviour) will cause KVM_RUN to return with -EINTR. | |
746 | ||
747 | Note the signal will only be delivered if not blocked by the original | |
748 | signal mask. | |
749 | ||
106ee47d MCC |
750 | :: |
751 | ||
752 | /* for KVM_SET_SIGNAL_MASK */ | |
753 | struct kvm_signal_mask { | |
9c1b96e3 AK |
754 | __u32 len; |
755 | __u8 sigset[0]; | |
106ee47d | 756 | }; |
9c1b96e3 | 757 | |
414fa985 | 758 | |
68ba6974 | 759 | 4.22 KVM_GET_FPU |
106ee47d | 760 | ---------------- |
9c1b96e3 | 761 | |
106ee47d MCC |
762 | :Capability: basic |
763 | :Architectures: x86 | |
764 | :Type: vcpu ioctl | |
765 | :Parameters: struct kvm_fpu (out) | |
766 | :Returns: 0 on success, -1 on error | |
9c1b96e3 AK |
767 | |
768 | Reads the floating point state from the vcpu. | |
769 | ||
106ee47d MCC |
770 | :: |
771 | ||
772 | /* for KVM_GET_FPU and KVM_SET_FPU */ | |
773 | struct kvm_fpu { | |
9c1b96e3 AK |
774 | __u8 fpr[8][16]; |
775 | __u16 fcw; | |
776 | __u16 fsw; | |
777 | __u8 ftwx; /* in fxsave format */ | |
778 | __u8 pad1; | |
779 | __u16 last_opcode; | |
780 | __u64 last_ip; | |
781 | __u64 last_dp; | |
782 | __u8 xmm[16][16]; | |
783 | __u32 mxcsr; | |
784 | __u32 pad2; | |
106ee47d | 785 | }; |
9c1b96e3 | 786 | |
414fa985 | 787 | |
68ba6974 | 788 | 4.23 KVM_SET_FPU |
106ee47d | 789 | ---------------- |
9c1b96e3 | 790 | |
106ee47d MCC |
791 | :Capability: basic |
792 | :Architectures: x86 | |
793 | :Type: vcpu ioctl | |
794 | :Parameters: struct kvm_fpu (in) | |
795 | :Returns: 0 on success, -1 on error | |
9c1b96e3 AK |
796 | |
797 | Writes the floating point state to the vcpu. | |
798 | ||
106ee47d MCC |
799 | :: |
800 | ||
801 | /* for KVM_GET_FPU and KVM_SET_FPU */ | |
802 | struct kvm_fpu { | |
9c1b96e3 AK |
803 | __u8 fpr[8][16]; |
804 | __u16 fcw; | |
805 | __u16 fsw; | |
806 | __u8 ftwx; /* in fxsave format */ | |
807 | __u8 pad1; | |
808 | __u16 last_opcode; | |
809 | __u64 last_ip; | |
810 | __u64 last_dp; | |
811 | __u8 xmm[16][16]; | |
812 | __u32 mxcsr; | |
813 | __u32 pad2; | |
106ee47d | 814 | }; |
9c1b96e3 | 815 | |
414fa985 | 816 | |
68ba6974 | 817 | 4.24 KVM_CREATE_IRQCHIP |
106ee47d | 818 | ----------------------- |
5dadbfd6 | 819 | |
106ee47d | 820 | :Capability: KVM_CAP_IRQCHIP, KVM_CAP_S390_IRQCHIP (s390) |
3fbf4207 | 821 | :Architectures: x86, arm64, s390 |
106ee47d MCC |
822 | :Type: vm ioctl |
823 | :Parameters: none | |
824 | :Returns: 0 on success, -1 on error | |
5dadbfd6 | 825 | |
ac3d3735 AP |
826 | Creates an interrupt controller model in the kernel. |
827 | On x86, creates a virtual ioapic, a virtual PIC (two PICs, nested), and sets up | |
828 | future vcpus to have a local APIC. IRQ routing for GSIs 0-15 is set to both | |
829 | PIC and IOAPIC; GSI 16-23 only go to the IOAPIC. | |
3fbf4207 | 830 | On arm64, a GICv2 is created. Any other GIC versions require the usage of |
ac3d3735 AP |
831 | KVM_CREATE_DEVICE, which also supports creating a GICv2. Using |
832 | KVM_CREATE_DEVICE is preferred over KVM_CREATE_IRQCHIP for GICv2. | |
833 | On s390, a dummy irq routing table is created. | |
84223598 CH |
834 | |
835 | Note that on s390 the KVM_CAP_S390_IRQCHIP vm capability needs to be enabled | |
836 | before KVM_CREATE_IRQCHIP can be used. | |
5dadbfd6 | 837 | |
414fa985 | 838 | |
68ba6974 | 839 | 4.25 KVM_IRQ_LINE |
106ee47d | 840 | ----------------- |
5dadbfd6 | 841 | |
106ee47d | 842 | :Capability: KVM_CAP_IRQCHIP |
3fbf4207 | 843 | :Architectures: x86, arm64 |
106ee47d MCC |
844 | :Type: vm ioctl |
845 | :Parameters: struct kvm_irq_level | |
846 | :Returns: 0 on success, -1 on error | |
5dadbfd6 AK |
847 | |
848 | Sets the level of a GSI input to the interrupt controller model in the kernel. | |
86ce8535 CD |
849 | On some architectures it is required that an interrupt controller model has |
850 | been previously created with KVM_CREATE_IRQCHIP. Note that edge-triggered | |
851 | interrupts require the level to be set to 1 and then back to 0. | |
852 | ||
100943c5 GS |
853 | On real hardware, interrupt pins can be active-low or active-high. This |
854 | does not matter for the level field of struct kvm_irq_level: 1 always | |
855 | means active (asserted), 0 means inactive (deasserted). | |
856 | ||
857 | x86 allows the operating system to program the interrupt polarity | |
858 | (active-low/active-high) for level-triggered interrupts, and KVM used | |
859 | to consider the polarity. However, due to bitrot in the handling of | |
860 | active-low interrupts, the above convention is now valid on x86 too. | |
861 | This is signaled by KVM_CAP_X86_IOAPIC_POLARITY_IGNORED. Userspace | |
862 | should not present interrupts to the guest as active-low unless this | |
863 | capability is present (or unless it is not using the in-kernel irqchip, | |
864 | of course). | |
865 | ||
866 | ||
3fbf4207 | 867 | arm64 can signal an interrupt either at the CPU level, or at the |
379e04c7 MZ |
868 | in-kernel irqchip (GIC), and for in-kernel irqchip can tell the GIC to |
869 | use PPIs designated for specific cpus. The irq field is interpreted | |
106ee47d | 870 | like this:: |
86ce8535 | 871 | |
3b1c8c56 | 872 | bits: | 31 ... 28 | 27 ... 24 | 23 ... 16 | 15 ... 0 | |
92f35b75 | 873 | field: | vcpu2_index | irq_type | vcpu_index | irq_id | |
86ce8535 CD |
874 | |
875 | The irq_type field has the following values: | |
106ee47d MCC |
876 | |
877 | - irq_type[0]: | |
878 | out-of-kernel GIC: irq_id 0 is IRQ, irq_id 1 is FIQ | |
879 | - irq_type[1]: | |
880 | in-kernel GIC: SPI, irq_id between 32 and 1019 (incl.) | |
86ce8535 | 881 | (the vcpu_index field is ignored) |
106ee47d MCC |
882 | - irq_type[2]: |
883 | in-kernel GIC: PPI, irq_id between 16 and 31 (incl.) | |
86ce8535 CD |
884 | |
885 | (The irq_id field thus corresponds nicely to the IRQ ID in the ARM GIC specs) | |
886 | ||
100943c5 | 887 | In both cases, level is used to assert/deassert the line. |
5dadbfd6 | 888 | |
92f35b75 MZ |
889 | When KVM_CAP_ARM_IRQ_LINE_LAYOUT_2 is supported, the target vcpu is |
890 | identified as (256 * vcpu2_index + vcpu_index). Otherwise, vcpu2_index | |
891 | must be zero. | |
892 | ||
3fbf4207 | 893 | Note that on arm64, the KVM_CAP_IRQCHIP capability only conditions |
92f35b75 MZ |
894 | injection of interrupts for the in-kernel irqchip. KVM_IRQ_LINE can always |
895 | be used for a userspace interrupt controller. | |
896 | ||
106ee47d MCC |
897 | :: |
898 | ||
899 | struct kvm_irq_level { | |
5dadbfd6 AK |
900 | union { |
901 | __u32 irq; /* GSI */ | |
902 | __s32 status; /* not used for KVM_IRQ_LEVEL */ | |
903 | }; | |
904 | __u32 level; /* 0 or 1 */ | |
106ee47d | 905 | }; |
5dadbfd6 | 906 | |
414fa985 | 907 | |
68ba6974 | 908 | 4.26 KVM_GET_IRQCHIP |
106ee47d | 909 | -------------------- |
5dadbfd6 | 910 | |
106ee47d MCC |
911 | :Capability: KVM_CAP_IRQCHIP |
912 | :Architectures: x86 | |
913 | :Type: vm ioctl | |
914 | :Parameters: struct kvm_irqchip (in/out) | |
915 | :Returns: 0 on success, -1 on error | |
5dadbfd6 AK |
916 | |
917 | Reads the state of a kernel interrupt controller created with | |
918 | KVM_CREATE_IRQCHIP into a buffer provided by the caller. | |
919 | ||
106ee47d MCC |
920 | :: |
921 | ||
922 | struct kvm_irqchip { | |
5dadbfd6 AK |
923 | __u32 chip_id; /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */ |
924 | __u32 pad; | |
925 | union { | |
926 | char dummy[512]; /* reserving space */ | |
927 | struct kvm_pic_state pic; | |
928 | struct kvm_ioapic_state ioapic; | |
929 | } chip; | |
106ee47d | 930 | }; |
5dadbfd6 | 931 | |
414fa985 | 932 | |
68ba6974 | 933 | 4.27 KVM_SET_IRQCHIP |
106ee47d | 934 | -------------------- |
5dadbfd6 | 935 | |
106ee47d MCC |
936 | :Capability: KVM_CAP_IRQCHIP |
937 | :Architectures: x86 | |
938 | :Type: vm ioctl | |
939 | :Parameters: struct kvm_irqchip (in) | |
940 | :Returns: 0 on success, -1 on error | |
5dadbfd6 AK |
941 | |
942 | Sets the state of a kernel interrupt controller created with | |
943 | KVM_CREATE_IRQCHIP from a buffer provided by the caller. | |
944 | ||
106ee47d MCC |
945 | :: |
946 | ||
947 | struct kvm_irqchip { | |
5dadbfd6 AK |
948 | __u32 chip_id; /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */ |
949 | __u32 pad; | |
950 | union { | |
951 | char dummy[512]; /* reserving space */ | |
952 | struct kvm_pic_state pic; | |
953 | struct kvm_ioapic_state ioapic; | |
954 | } chip; | |
106ee47d | 955 | }; |
5dadbfd6 | 956 | |
414fa985 | 957 | |
68ba6974 | 958 | 4.28 KVM_XEN_HVM_CONFIG |
106ee47d | 959 | ----------------------- |
ffde22ac | 960 | |
106ee47d MCC |
961 | :Capability: KVM_CAP_XEN_HVM |
962 | :Architectures: x86 | |
963 | :Type: vm ioctl | |
964 | :Parameters: struct kvm_xen_hvm_config (in) | |
965 | :Returns: 0 on success, -1 on error | |
ffde22ac ES |
966 | |
967 | Sets the MSR that the Xen HVM guest uses to initialize its hypercall | |
968 | page, and provides the starting address and size of the hypercall | |
969 | blobs in userspace. When the guest writes the MSR, kvm copies one | |
970 | page of a blob (32- or 64-bit, depending on the vcpu mode) to guest | |
971 | memory. | |
972 | ||
106ee47d MCC |
973 | :: |
974 | ||
975 | struct kvm_xen_hvm_config { | |
ffde22ac ES |
976 | __u32 flags; |
977 | __u32 msr; | |
978 | __u64 blob_addr_32; | |
979 | __u64 blob_addr_64; | |
980 | __u8 blob_size_32; | |
981 | __u8 blob_size_64; | |
982 | __u8 pad2[30]; | |
106ee47d | 983 | }; |
ffde22ac | 984 | |
661a20fa DW |
985 | If certain flags are returned from the KVM_CAP_XEN_HVM check, they may |
986 | be set in the flags field of this ioctl: | |
987 | ||
988 | The KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL flag requests KVM to generate | |
989 | the contents of the hypercall page automatically; hypercalls will be | |
990 | intercepted and passed to userspace through KVM_EXIT_XEN. In this | |
991 | ase, all of the blob size and address fields must be zero. | |
992 | ||
993 | The KVM_XEN_HVM_CONFIG_EVTCHN_SEND flag indicates to KVM that userspace | |
994 | will always use the KVM_XEN_HVM_EVTCHN_SEND ioctl to deliver event | |
995 | channel interrupts rather than manipulating the guest's shared_info | |
996 | structures directly. This, in turn, may allow KVM to enable features | |
997 | such as intercepting the SCHEDOP_poll hypercall to accelerate PV | |
998 | spinlock operation for the guest. Userspace may still use the ioctl | |
999 | to deliver events if it was advertised, even if userspace does not | |
1000 | send this indication that it will always do so | |
e1f68169 DW |
1001 | |
1002 | No other flags are currently valid in the struct kvm_xen_hvm_config. | |
414fa985 | 1003 | |
68ba6974 | 1004 | 4.29 KVM_GET_CLOCK |
106ee47d | 1005 | ------------------ |
afbcf7ab | 1006 | |
106ee47d MCC |
1007 | :Capability: KVM_CAP_ADJUST_CLOCK |
1008 | :Architectures: x86 | |
1009 | :Type: vm ioctl | |
1010 | :Parameters: struct kvm_clock_data (out) | |
1011 | :Returns: 0 on success, -1 on error | |
afbcf7ab GC |
1012 | |
1013 | Gets the current timestamp of kvmclock as seen by the current guest. In | |
1014 | conjunction with KVM_SET_CLOCK, it is used to ensure monotonicity on scenarios | |
1015 | such as migration. | |
1016 | ||
e3fd9a93 PB |
1017 | When KVM_CAP_ADJUST_CLOCK is passed to KVM_CHECK_EXTENSION, it returns the |
1018 | set of bits that KVM can return in struct kvm_clock_data's flag member. | |
1019 | ||
c68dc1b5 OU |
1020 | The following flags are defined: |
1021 | ||
1022 | KVM_CLOCK_TSC_STABLE | |
1023 | If set, the returned value is the exact kvmclock | |
1024 | value seen by all VCPUs at the instant when KVM_GET_CLOCK was called. | |
1025 | If clear, the returned value is simply CLOCK_MONOTONIC plus a constant | |
1026 | offset; the offset can be modified with KVM_SET_CLOCK. KVM will try | |
1027 | to make all VCPUs follow this clock, but the exact value read by each | |
1028 | VCPU could differ, because the host TSC is not stable. | |
1029 | ||
1030 | KVM_CLOCK_REALTIME | |
1031 | If set, the `realtime` field in the kvm_clock_data | |
1032 | structure is populated with the value of the host's real time | |
1033 | clocksource at the instant when KVM_GET_CLOCK was called. If clear, | |
1034 | the `realtime` field does not contain a value. | |
1035 | ||
1036 | KVM_CLOCK_HOST_TSC | |
1037 | If set, the `host_tsc` field in the kvm_clock_data | |
1038 | structure is populated with the value of the host's timestamp counter (TSC) | |
1039 | at the instant when KVM_GET_CLOCK was called. If clear, the `host_tsc` field | |
1040 | does not contain a value. | |
e3fd9a93 | 1041 | |
106ee47d MCC |
1042 | :: |
1043 | ||
1044 | struct kvm_clock_data { | |
afbcf7ab GC |
1045 | __u64 clock; /* kvmclock current value */ |
1046 | __u32 flags; | |
c68dc1b5 OU |
1047 | __u32 pad0; |
1048 | __u64 realtime; | |
1049 | __u64 host_tsc; | |
1050 | __u32 pad[4]; | |
106ee47d | 1051 | }; |
afbcf7ab | 1052 | |
414fa985 | 1053 | |
68ba6974 | 1054 | 4.30 KVM_SET_CLOCK |
106ee47d | 1055 | ------------------ |
afbcf7ab | 1056 | |
106ee47d MCC |
1057 | :Capability: KVM_CAP_ADJUST_CLOCK |
1058 | :Architectures: x86 | |
1059 | :Type: vm ioctl | |
1060 | :Parameters: struct kvm_clock_data (in) | |
1061 | :Returns: 0 on success, -1 on error | |
afbcf7ab | 1062 | |
2044892d | 1063 | Sets the current timestamp of kvmclock to the value specified in its parameter. |
afbcf7ab GC |
1064 | In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios |
1065 | such as migration. | |
1066 | ||
c68dc1b5 OU |
1067 | The following flags can be passed: |
1068 | ||
1069 | KVM_CLOCK_REALTIME | |
1070 | If set, KVM will compare the value of the `realtime` field | |
1071 | with the value of the host's real time clocksource at the instant when | |
1072 | KVM_SET_CLOCK was called. The difference in elapsed time is added to the final | |
1073 | kvmclock value that will be provided to guests. | |
1074 | ||
1075 | Other flags returned by ``KVM_GET_CLOCK`` are accepted but ignored. | |
1076 | ||
106ee47d MCC |
1077 | :: |
1078 | ||
1079 | struct kvm_clock_data { | |
afbcf7ab GC |
1080 | __u64 clock; /* kvmclock current value */ |
1081 | __u32 flags; | |
c68dc1b5 OU |
1082 | __u32 pad0; |
1083 | __u64 realtime; | |
1084 | __u64 host_tsc; | |
1085 | __u32 pad[4]; | |
106ee47d | 1086 | }; |
afbcf7ab | 1087 | |
414fa985 | 1088 | |
68ba6974 | 1089 | 4.31 KVM_GET_VCPU_EVENTS |
106ee47d | 1090 | ------------------------ |
3cfc3092 | 1091 | |
106ee47d MCC |
1092 | :Capability: KVM_CAP_VCPU_EVENTS |
1093 | :Extended by: KVM_CAP_INTR_SHADOW | |
3fbf4207 | 1094 | :Architectures: x86, arm64 |
106ee47d MCC |
1095 | :Type: vcpu ioctl |
1096 | :Parameters: struct kvm_vcpu_event (out) | |
1097 | :Returns: 0 on success, -1 on error | |
3cfc3092 | 1098 | |
b7b27fac | 1099 | X86: |
106ee47d | 1100 | ^^^^ |
b7b27fac | 1101 | |
3cfc3092 JK |
1102 | Gets currently pending exceptions, interrupts, and NMIs as well as related |
1103 | states of the vcpu. | |
1104 | ||
106ee47d MCC |
1105 | :: |
1106 | ||
1107 | struct kvm_vcpu_events { | |
3cfc3092 JK |
1108 | struct { |
1109 | __u8 injected; | |
1110 | __u8 nr; | |
1111 | __u8 has_error_code; | |
59073aaf | 1112 | __u8 pending; |
3cfc3092 JK |
1113 | __u32 error_code; |
1114 | } exception; | |
1115 | struct { | |
1116 | __u8 injected; | |
1117 | __u8 nr; | |
1118 | __u8 soft; | |
48005f64 | 1119 | __u8 shadow; |
3cfc3092 JK |
1120 | } interrupt; |
1121 | struct { | |
1122 | __u8 injected; | |
1123 | __u8 pending; | |
1124 | __u8 masked; | |
1125 | __u8 pad; | |
1126 | } nmi; | |
1127 | __u32 sipi_vector; | |
dab4b911 | 1128 | __u32 flags; |
f077825a PB |
1129 | struct { |
1130 | __u8 smm; | |
1131 | __u8 pending; | |
1132 | __u8 smm_inside_nmi; | |
1133 | __u8 latched_init; | |
1134 | } smi; | |
59073aaf JM |
1135 | __u8 reserved[27]; |
1136 | __u8 exception_has_payload; | |
1137 | __u64 exception_payload; | |
106ee47d | 1138 | }; |
3cfc3092 | 1139 | |
59073aaf | 1140 | The following bits are defined in the flags field: |
f077825a | 1141 | |
59073aaf | 1142 | - KVM_VCPUEVENT_VALID_SHADOW may be set to signal that |
f077825a | 1143 | interrupt.shadow contains a valid state. |
48005f64 | 1144 | |
59073aaf JM |
1145 | - KVM_VCPUEVENT_VALID_SMM may be set to signal that smi contains a |
1146 | valid state. | |
1147 | ||
1148 | - KVM_VCPUEVENT_VALID_PAYLOAD may be set to signal that the | |
1149 | exception_has_payload, exception_payload, and exception.pending | |
1150 | fields contain a valid state. This bit will be set whenever | |
1151 | KVM_CAP_EXCEPTION_PAYLOAD is enabled. | |
414fa985 | 1152 | |
3fbf4207 OU |
1153 | ARM64: |
1154 | ^^^^^^ | |
b7b27fac DG |
1155 | |
1156 | If the guest accesses a device that is being emulated by the host kernel in | |
1157 | such a way that a real device would generate a physical SError, KVM may make | |
1158 | a virtual SError pending for that VCPU. This system error interrupt remains | |
1159 | pending until the guest takes the exception by unmasking PSTATE.A. | |
1160 | ||
1161 | Running the VCPU may cause it to take a pending SError, or make an access that | |
1162 | causes an SError to become pending. The event's description is only valid while | |
1163 | the VPCU is not running. | |
1164 | ||
1165 | This API provides a way to read and write the pending 'event' state that is not | |
1166 | visible to the guest. To save, restore or migrate a VCPU the struct representing | |
1167 | the state can be read then written using this GET/SET API, along with the other | |
1168 | guest-visible registers. It is not possible to 'cancel' an SError that has been | |
1169 | made pending. | |
1170 | ||
1171 | A device being emulated in user-space may also wish to generate an SError. To do | |
1172 | this the events structure can be populated by user-space. The current state | |
1173 | should be read first, to ensure no existing SError is pending. If an existing | |
1174 | SError is pending, the architecture's 'Multiple SError interrupts' rules should | |
1175 | be followed. (2.5.3 of DDI0587.a "ARM Reliability, Availability, and | |
1176 | Serviceability (RAS) Specification"). | |
1177 | ||
be26b3a7 DG |
1178 | SError exceptions always have an ESR value. Some CPUs have the ability to |
1179 | specify what the virtual SError's ESR value should be. These systems will | |
688e0581 | 1180 | advertise KVM_CAP_ARM_INJECT_SERROR_ESR. In this case exception.has_esr will |
be26b3a7 DG |
1181 | always have a non-zero value when read, and the agent making an SError pending |
1182 | should specify the ISS field in the lower 24 bits of exception.serror_esr. If | |
688e0581 | 1183 | the system supports KVM_CAP_ARM_INJECT_SERROR_ESR, but user-space sets the events |
be26b3a7 DG |
1184 | with exception.has_esr as zero, KVM will choose an ESR. |
1185 | ||
1186 | Specifying exception.has_esr on a system that does not support it will return | |
1187 | -EINVAL. Setting anything other than the lower 24bits of exception.serror_esr | |
1188 | will return -EINVAL. | |
1189 | ||
da345174 CD |
1190 | It is not possible to read back a pending external abort (injected via |
1191 | KVM_SET_VCPU_EVENTS or otherwise) because such an exception is always delivered | |
1192 | directly to the virtual CPU). | |
1193 | ||
106ee47d | 1194 | :: |
da345174 | 1195 | |
106ee47d | 1196 | struct kvm_vcpu_events { |
b7b27fac DG |
1197 | struct { |
1198 | __u8 serror_pending; | |
1199 | __u8 serror_has_esr; | |
da345174 | 1200 | __u8 ext_dabt_pending; |
b7b27fac | 1201 | /* Align it to 8 bytes */ |
da345174 | 1202 | __u8 pad[5]; |
b7b27fac DG |
1203 | __u64 serror_esr; |
1204 | } exception; | |
1205 | __u32 reserved[12]; | |
106ee47d | 1206 | }; |
b7b27fac | 1207 | |
68ba6974 | 1208 | 4.32 KVM_SET_VCPU_EVENTS |
106ee47d | 1209 | ------------------------ |
3cfc3092 | 1210 | |
106ee47d MCC |
1211 | :Capability: KVM_CAP_VCPU_EVENTS |
1212 | :Extended by: KVM_CAP_INTR_SHADOW | |
3fbf4207 | 1213 | :Architectures: x86, arm64 |
106ee47d MCC |
1214 | :Type: vcpu ioctl |
1215 | :Parameters: struct kvm_vcpu_event (in) | |
1216 | :Returns: 0 on success, -1 on error | |
3cfc3092 | 1217 | |
b7b27fac | 1218 | X86: |
106ee47d | 1219 | ^^^^ |
b7b27fac | 1220 | |
3cfc3092 JK |
1221 | Set pending exceptions, interrupts, and NMIs as well as related states of the |
1222 | vcpu. | |
1223 | ||
1224 | See KVM_GET_VCPU_EVENTS for the data structure. | |
1225 | ||
dab4b911 | 1226 | Fields that may be modified asynchronously by running VCPUs can be excluded |
f077825a PB |
1227 | from the update. These fields are nmi.pending, sipi_vector, smi.smm, |
1228 | smi.pending. Keep the corresponding bits in the flags field cleared to | |
1229 | suppress overwriting the current in-kernel state. The bits are: | |
dab4b911 | 1230 | |
106ee47d MCC |
1231 | =============================== ================================== |
1232 | KVM_VCPUEVENT_VALID_NMI_PENDING transfer nmi.pending to the kernel | |
1233 | KVM_VCPUEVENT_VALID_SIPI_VECTOR transfer sipi_vector | |
1234 | KVM_VCPUEVENT_VALID_SMM transfer the smi sub-struct. | |
1235 | =============================== ================================== | |
dab4b911 | 1236 | |
48005f64 JK |
1237 | If KVM_CAP_INTR_SHADOW is available, KVM_VCPUEVENT_VALID_SHADOW can be set in |
1238 | the flags field to signal that interrupt.shadow contains a valid state and | |
1239 | shall be written into the VCPU. | |
1240 | ||
f077825a PB |
1241 | KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available. |
1242 | ||
59073aaf JM |
1243 | If KVM_CAP_EXCEPTION_PAYLOAD is enabled, KVM_VCPUEVENT_VALID_PAYLOAD |
1244 | can be set in the flags field to signal that the | |
1245 | exception_has_payload, exception_payload, and exception.pending fields | |
1246 | contain a valid state and shall be written into the VCPU. | |
1247 | ||
3fbf4207 OU |
1248 | ARM64: |
1249 | ^^^^^^ | |
b7b27fac | 1250 | |
da345174 CD |
1251 | User space may need to inject several types of events to the guest. |
1252 | ||
b7b27fac DG |
1253 | Set the pending SError exception state for this VCPU. It is not possible to |
1254 | 'cancel' an Serror that has been made pending. | |
1255 | ||
da345174 CD |
1256 | If the guest performed an access to I/O memory which could not be handled by |
1257 | userspace, for example because of missing instruction syndrome decode | |
1258 | information or because there is no device mapped at the accessed IPA, then | |
1259 | userspace can ask the kernel to inject an external abort using the address | |
1260 | from the exiting fault on the VCPU. It is a programming error to set | |
1261 | ext_dabt_pending after an exit which was not either KVM_EXIT_MMIO or | |
1262 | KVM_EXIT_ARM_NISV. This feature is only available if the system supports | |
1263 | KVM_CAP_ARM_INJECT_EXT_DABT. This is a helper which provides commonality in | |
1264 | how userspace reports accesses for the above cases to guests, across different | |
1265 | userspace implementations. Nevertheless, userspace can still emulate all Arm | |
1266 | exceptions by manipulating individual registers using the KVM_SET_ONE_REG API. | |
1267 | ||
b7b27fac DG |
1268 | See KVM_GET_VCPU_EVENTS for the data structure. |
1269 | ||
414fa985 | 1270 | |
68ba6974 | 1271 | 4.33 KVM_GET_DEBUGREGS |
106ee47d | 1272 | ---------------------- |
a1efbe77 | 1273 | |
106ee47d MCC |
1274 | :Capability: KVM_CAP_DEBUGREGS |
1275 | :Architectures: x86 | |
1276 | :Type: vm ioctl | |
1277 | :Parameters: struct kvm_debugregs (out) | |
1278 | :Returns: 0 on success, -1 on error | |
a1efbe77 JK |
1279 | |
1280 | Reads debug registers from the vcpu. | |
1281 | ||
106ee47d MCC |
1282 | :: |
1283 | ||
1284 | struct kvm_debugregs { | |
a1efbe77 JK |
1285 | __u64 db[4]; |
1286 | __u64 dr6; | |
1287 | __u64 dr7; | |
1288 | __u64 flags; | |
1289 | __u64 reserved[9]; | |
106ee47d | 1290 | }; |
a1efbe77 | 1291 | |
414fa985 | 1292 | |
68ba6974 | 1293 | 4.34 KVM_SET_DEBUGREGS |
106ee47d | 1294 | ---------------------- |
a1efbe77 | 1295 | |
106ee47d MCC |
1296 | :Capability: KVM_CAP_DEBUGREGS |
1297 | :Architectures: x86 | |
1298 | :Type: vm ioctl | |
1299 | :Parameters: struct kvm_debugregs (in) | |
1300 | :Returns: 0 on success, -1 on error | |
a1efbe77 JK |
1301 | |
1302 | Writes debug registers into the vcpu. | |
1303 | ||
1304 | See KVM_GET_DEBUGREGS for the data structure. The flags field is unused | |
1305 | yet and must be cleared on entry. | |
1306 | ||
414fa985 | 1307 | |
68ba6974 | 1308 | 4.35 KVM_SET_USER_MEMORY_REGION |
106ee47d MCC |
1309 | ------------------------------- |
1310 | ||
1311 | :Capability: KVM_CAP_USER_MEMORY | |
1312 | :Architectures: all | |
1313 | :Type: vm ioctl | |
1314 | :Parameters: struct kvm_userspace_memory_region (in) | |
1315 | :Returns: 0 on success, -1 on error | |
0f2d8f4d | 1316 | |
106ee47d | 1317 | :: |
0f2d8f4d | 1318 | |
106ee47d | 1319 | struct kvm_userspace_memory_region { |
0f2d8f4d AK |
1320 | __u32 slot; |
1321 | __u32 flags; | |
1322 | __u64 guest_phys_addr; | |
1323 | __u64 memory_size; /* bytes */ | |
1324 | __u64 userspace_addr; /* start of the userspace allocated memory */ | |
106ee47d | 1325 | }; |
0f2d8f4d | 1326 | |
106ee47d MCC |
1327 | /* for kvm_memory_region::flags */ |
1328 | #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) | |
1329 | #define KVM_MEM_READONLY (1UL << 1) | |
0f2d8f4d | 1330 | |
e2788c4a PB |
1331 | This ioctl allows the user to create, modify or delete a guest physical |
1332 | memory slot. Bits 0-15 of "slot" specify the slot id and this value | |
1333 | should be less than the maximum number of user memory slots supported per | |
c110ae57 PB |
1334 | VM. The maximum allowed slots can be queried using KVM_CAP_NR_MEMSLOTS. |
1335 | Slots may not overlap in guest physical address space. | |
0f2d8f4d | 1336 | |
f481b069 PB |
1337 | If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 of "slot" |
1338 | specifies the address space which is being modified. They must be | |
1339 | less than the value that KVM_CHECK_EXTENSION returns for the | |
1340 | KVM_CAP_MULTI_ADDRESS_SPACE capability. Slots in separate address spaces | |
1341 | are unrelated; the restriction on overlapping slots only applies within | |
1342 | each address space. | |
1343 | ||
e2788c4a PB |
1344 | Deleting a slot is done by passing zero for memory_size. When changing |
1345 | an existing slot, it may be moved in the guest physical memory space, | |
1346 | or its flags may be modified, but it may not be resized. | |
1347 | ||
0f2d8f4d AK |
1348 | Memory for the region is taken starting at the address denoted by the |
1349 | field userspace_addr, which must point at user addressable memory for | |
1350 | the entire memory slot size. Any object may back this memory, including | |
1351 | anonymous memory, ordinary files, and hugetlbfs. | |
1352 | ||
139bc8a6 MZ |
1353 | On architectures that support a form of address tagging, userspace_addr must |
1354 | be an untagged address. | |
1355 | ||
0f2d8f4d AK |
1356 | It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr |
1357 | be identical. This allows large pages in the guest to be backed by large | |
1358 | pages in the host. | |
1359 | ||
75d61fbc TY |
1360 | The flags field supports two flags: KVM_MEM_LOG_DIRTY_PAGES and |
1361 | KVM_MEM_READONLY. The former can be set to instruct KVM to keep track of | |
1362 | writes to memory within the slot. See KVM_GET_DIRTY_LOG ioctl to know how to | |
1363 | use it. The latter can be set, if KVM_CAP_READONLY_MEM capability allows it, | |
1364 | to make a new slot read-only. In this case, writes to this memory will be | |
1365 | posted to userspace as KVM_EXIT_MMIO exits. | |
7efd8fa1 JK |
1366 | |
1367 | When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of | |
1368 | the memory region are automatically reflected into the guest. For example, an | |
1369 | mmap() that affects the region will be made visible immediately. Another | |
1370 | example is madvise(MADV_DROP). | |
0f2d8f4d AK |
1371 | |
1372 | It is recommended to use this API instead of the KVM_SET_MEMORY_REGION ioctl. | |
1373 | The KVM_SET_MEMORY_REGION does not allow fine grained control over memory | |
1374 | allocation and is deprecated. | |
3cfc3092 | 1375 | |
414fa985 | 1376 | |
68ba6974 | 1377 | 4.36 KVM_SET_TSS_ADDR |
106ee47d | 1378 | --------------------- |
8a5416db | 1379 | |
106ee47d MCC |
1380 | :Capability: KVM_CAP_SET_TSS_ADDR |
1381 | :Architectures: x86 | |
1382 | :Type: vm ioctl | |
1383 | :Parameters: unsigned long tss_address (in) | |
1384 | :Returns: 0 on success, -1 on error | |
8a5416db AK |
1385 | |
1386 | This ioctl defines the physical address of a three-page region in the guest | |
1387 | physical address space. The region must be within the first 4GB of the | |
1388 | guest physical address space and must not conflict with any memory slot | |
1389 | or any mmio address. The guest may malfunction if it accesses this memory | |
1390 | region. | |
1391 | ||
1392 | This ioctl is required on Intel-based hosts. This is needed on Intel hardware | |
1393 | because of a quirk in the virtualization implementation (see the internals | |
1394 | documentation when it pops into existence). | |
1395 | ||
414fa985 | 1396 | |
68ba6974 | 1397 | 4.37 KVM_ENABLE_CAP |
106ee47d | 1398 | ------------------- |
71fbfd5f | 1399 | |
106ee47d | 1400 | :Capability: KVM_CAP_ENABLE_CAP |
127770ac | 1401 | :Architectures: mips, ppc, s390, x86 |
106ee47d MCC |
1402 | :Type: vcpu ioctl |
1403 | :Parameters: struct kvm_enable_cap (in) | |
1404 | :Returns: 0 on success; -1 on error | |
e5d83c74 | 1405 | |
106ee47d MCC |
1406 | :Capability: KVM_CAP_ENABLE_CAP_VM |
1407 | :Architectures: all | |
a10f373a | 1408 | :Type: vm ioctl |
106ee47d MCC |
1409 | :Parameters: struct kvm_enable_cap (in) |
1410 | :Returns: 0 on success; -1 on error | |
1411 | ||
1412 | .. note:: | |
71fbfd5f | 1413 | |
106ee47d MCC |
1414 | Not all extensions are enabled by default. Using this ioctl the application |
1415 | can enable an extension, making it available to the guest. | |
71fbfd5f AG |
1416 | |
1417 | On systems that do not support this ioctl, it always fails. On systems that | |
1418 | do support it, it only works for extensions that are supported for enablement. | |
1419 | ||
1420 | To check if a capability can be enabled, the KVM_CHECK_EXTENSION ioctl should | |
1421 | be used. | |
1422 | ||
106ee47d MCC |
1423 | :: |
1424 | ||
1425 | struct kvm_enable_cap { | |
71fbfd5f AG |
1426 | /* in */ |
1427 | __u32 cap; | |
1428 | ||
1429 | The capability that is supposed to get enabled. | |
1430 | ||
106ee47d MCC |
1431 | :: |
1432 | ||
71fbfd5f AG |
1433 | __u32 flags; |
1434 | ||
1435 | A bitfield indicating future enhancements. Has to be 0 for now. | |
1436 | ||
106ee47d MCC |
1437 | :: |
1438 | ||
71fbfd5f AG |
1439 | __u64 args[4]; |
1440 | ||
1441 | Arguments for enabling a feature. If a feature needs initial values to | |
1442 | function properly, this is the place to put them. | |
1443 | ||
106ee47d MCC |
1444 | :: |
1445 | ||
71fbfd5f | 1446 | __u8 pad[64]; |
106ee47d | 1447 | }; |
71fbfd5f | 1448 | |
d938dc55 CH |
1449 | The vcpu ioctl should be used for vcpu-specific capabilities, the vm ioctl |
1450 | for vm-wide capabilities. | |
414fa985 | 1451 | |
68ba6974 | 1452 | 4.38 KVM_GET_MP_STATE |
106ee47d | 1453 | --------------------- |
b843f065 | 1454 | |
106ee47d | 1455 | :Capability: KVM_CAP_MP_STATE |
3fbf4207 | 1456 | :Architectures: x86, s390, arm64, riscv |
106ee47d MCC |
1457 | :Type: vcpu ioctl |
1458 | :Parameters: struct kvm_mp_state (out) | |
1459 | :Returns: 0 on success; -1 on error | |
1460 | ||
1461 | :: | |
b843f065 | 1462 | |
106ee47d | 1463 | struct kvm_mp_state { |
b843f065 | 1464 | __u32 mp_state; |
106ee47d | 1465 | }; |
b843f065 AK |
1466 | |
1467 | Returns the vcpu's current "multiprocessing state" (though also valid on | |
1468 | uniprocessor guests). | |
1469 | ||
1470 | Possible values are: | |
1471 | ||
106ee47d | 1472 | ========================== =============================================== |
da40d858 | 1473 | KVM_MP_STATE_RUNNABLE the vcpu is currently running |
3fbf4207 | 1474 | [x86,arm64,riscv] |
106ee47d | 1475 | KVM_MP_STATE_UNINITIALIZED the vcpu is an application processor (AP) |
c32a4272 | 1476 | which has not yet received an INIT signal [x86] |
106ee47d | 1477 | KVM_MP_STATE_INIT_RECEIVED the vcpu has received an INIT signal, and is |
c32a4272 | 1478 | now ready for a SIPI [x86] |
106ee47d | 1479 | KVM_MP_STATE_HALTED the vcpu has executed a HLT instruction and |
c32a4272 | 1480 | is waiting for an interrupt [x86] |
106ee47d | 1481 | KVM_MP_STATE_SIPI_RECEIVED the vcpu has just received a SIPI (vector |
c32a4272 | 1482 | accessible via KVM_GET_VCPU_EVENTS) [x86] |
3fbf4207 | 1483 | KVM_MP_STATE_STOPPED the vcpu is stopped [s390,arm64,riscv] |
106ee47d MCC |
1484 | KVM_MP_STATE_CHECK_STOP the vcpu is in a special error state [s390] |
1485 | KVM_MP_STATE_OPERATING the vcpu is operating (running or halted) | |
6352e4d2 | 1486 | [s390] |
106ee47d | 1487 | KVM_MP_STATE_LOAD the vcpu is in a special load/startup state |
6352e4d2 | 1488 | [s390] |
7b33a09d OU |
1489 | KVM_MP_STATE_SUSPENDED the vcpu is in a suspend state and is waiting |
1490 | for a wakeup event [arm64] | |
106ee47d | 1491 | ========================== =============================================== |
b843f065 | 1492 | |
c32a4272 | 1493 | On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an |
0b4820d6 DH |
1494 | in-kernel irqchip, the multiprocessing state must be maintained by userspace on |
1495 | these architectures. | |
b843f065 | 1496 | |
7b33a09d OU |
1497 | For arm64: |
1498 | ^^^^^^^^^^ | |
1499 | ||
1500 | If a vCPU is in the KVM_MP_STATE_SUSPENDED state, KVM will emulate the | |
1501 | architectural execution of a WFI instruction. | |
1502 | ||
1503 | If a wakeup event is recognized, KVM will exit to userspace with a | |
1504 | KVM_SYSTEM_EVENT exit, where the event type is KVM_SYSTEM_EVENT_WAKEUP. If | |
1505 | userspace wants to honor the wakeup, it must set the vCPU's MP state to | |
1506 | KVM_MP_STATE_RUNNABLE. If it does not, KVM will continue to await a wakeup | |
1507 | event in subsequent calls to KVM_RUN. | |
1508 | ||
1509 | .. warning:: | |
1510 | ||
1511 | If userspace intends to keep the vCPU in a SUSPENDED state, it is | |
1512 | strongly recommended that userspace take action to suppress the | |
1513 | wakeup event (such as masking an interrupt). Otherwise, subsequent | |
1514 | calls to KVM_RUN will immediately exit with a KVM_SYSTEM_EVENT_WAKEUP | |
1515 | event and inadvertently waste CPU cycles. | |
1516 | ||
1517 | Additionally, if userspace takes action to suppress a wakeup event, | |
1518 | it is strongly recommended that it also restores the vCPU to its | |
1519 | original state when the vCPU is made RUNNABLE again. For example, | |
1520 | if userspace masked a pending interrupt to suppress the wakeup, | |
1521 | the interrupt should be unmasked before returning control to the | |
1522 | guest. | |
1523 | ||
1524 | For riscv: | |
1525 | ^^^^^^^^^^ | |
ecccf0cc AB |
1526 | |
1527 | The only states that are valid are KVM_MP_STATE_STOPPED and | |
1528 | KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not. | |
414fa985 | 1529 | |
68ba6974 | 1530 | 4.39 KVM_SET_MP_STATE |
106ee47d | 1531 | --------------------- |
b843f065 | 1532 | |
106ee47d | 1533 | :Capability: KVM_CAP_MP_STATE |
3fbf4207 | 1534 | :Architectures: x86, s390, arm64, riscv |
106ee47d MCC |
1535 | :Type: vcpu ioctl |
1536 | :Parameters: struct kvm_mp_state (in) | |
1537 | :Returns: 0 on success; -1 on error | |
b843f065 AK |
1538 | |
1539 | Sets the vcpu's current "multiprocessing state"; see KVM_GET_MP_STATE for | |
1540 | arguments. | |
1541 | ||
c32a4272 | 1542 | On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an |
0b4820d6 DH |
1543 | in-kernel irqchip, the multiprocessing state must be maintained by userspace on |
1544 | these architectures. | |
b843f065 | 1545 | |
3fbf4207 OU |
1546 | For arm64/riscv: |
1547 | ^^^^^^^^^^^^^^^^ | |
ecccf0cc AB |
1548 | |
1549 | The only states that are valid are KVM_MP_STATE_STOPPED and | |
1550 | KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not. | |
414fa985 | 1551 | |
68ba6974 | 1552 | 4.40 KVM_SET_IDENTITY_MAP_ADDR |
106ee47d | 1553 | ------------------------------ |
47dbb84f | 1554 | |
106ee47d MCC |
1555 | :Capability: KVM_CAP_SET_IDENTITY_MAP_ADDR |
1556 | :Architectures: x86 | |
1557 | :Type: vm ioctl | |
1558 | :Parameters: unsigned long identity (in) | |
1559 | :Returns: 0 on success, -1 on error | |
47dbb84f AK |
1560 | |
1561 | This ioctl defines the physical address of a one-page region in the guest | |
1562 | physical address space. The region must be within the first 4GB of the | |
1563 | guest physical address space and must not conflict with any memory slot | |
1564 | or any mmio address. The guest may malfunction if it accesses this memory | |
1565 | region. | |
1566 | ||
726b99c4 DH |
1567 | Setting the address to 0 will result in resetting the address to its default |
1568 | (0xfffbc000). | |
1569 | ||
47dbb84f AK |
1570 | This ioctl is required on Intel-based hosts. This is needed on Intel hardware |
1571 | because of a quirk in the virtualization implementation (see the internals | |
1572 | documentation when it pops into existence). | |
1573 | ||
1af1ac91 | 1574 | Fails if any VCPU has already been created. |
414fa985 | 1575 | |
68ba6974 | 1576 | 4.41 KVM_SET_BOOT_CPU_ID |
106ee47d | 1577 | ------------------------ |
57bc24cf | 1578 | |
106ee47d MCC |
1579 | :Capability: KVM_CAP_SET_BOOT_CPU_ID |
1580 | :Architectures: x86 | |
1581 | :Type: vm ioctl | |
1582 | :Parameters: unsigned long vcpu_id | |
1583 | :Returns: 0 on success, -1 on error | |
57bc24cf AK |
1584 | |
1585 | Define which vcpu is the Bootstrap Processor (BSP). Values are the same | |
1586 | as the vcpu id in KVM_CREATE_VCPU. If this ioctl is not called, the default | |
9ce3746d EGE |
1587 | is vcpu 0. This ioctl has to be called before vcpu creation, |
1588 | otherwise it will return EBUSY error. | |
57bc24cf | 1589 | |
414fa985 | 1590 | |
68ba6974 | 1591 | 4.42 KVM_GET_XSAVE |
106ee47d | 1592 | ------------------ |
2d5b5a66 | 1593 | |
106ee47d MCC |
1594 | :Capability: KVM_CAP_XSAVE |
1595 | :Architectures: x86 | |
1596 | :Type: vcpu ioctl | |
1597 | :Parameters: struct kvm_xsave (out) | |
1598 | :Returns: 0 on success, -1 on error | |
1599 | ||
1600 | ||
1601 | :: | |
2d5b5a66 | 1602 | |
106ee47d | 1603 | struct kvm_xsave { |
2d5b5a66 | 1604 | __u32 region[1024]; |
be50b206 | 1605 | __u32 extra[0]; |
106ee47d | 1606 | }; |
2d5b5a66 SY |
1607 | |
1608 | This ioctl would copy current vcpu's xsave struct to the userspace. | |
1609 | ||
414fa985 | 1610 | |
68ba6974 | 1611 | 4.43 KVM_SET_XSAVE |
106ee47d | 1612 | ------------------ |
2d5b5a66 | 1613 | |
be50b206 | 1614 | :Capability: KVM_CAP_XSAVE and KVM_CAP_XSAVE2 |
106ee47d MCC |
1615 | :Architectures: x86 |
1616 | :Type: vcpu ioctl | |
1617 | :Parameters: struct kvm_xsave (in) | |
1618 | :Returns: 0 on success, -1 on error | |
1619 | ||
1620 | :: | |
2d5b5a66 | 1621 | |
106ee47d MCC |
1622 | |
1623 | struct kvm_xsave { | |
2d5b5a66 | 1624 | __u32 region[1024]; |
be50b206 | 1625 | __u32 extra[0]; |
106ee47d | 1626 | }; |
2d5b5a66 | 1627 | |
be50b206 GZ |
1628 | This ioctl would copy userspace's xsave struct to the kernel. It copies |
1629 | as many bytes as are returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2), | |
1630 | when invoked on the vm file descriptor. The size value returned by | |
1631 | KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2) will always be at least 4096. | |
1632 | Currently, it is only greater than 4096 if a dynamic feature has been | |
1633 | enabled with ``arch_prctl()``, but this may change in the future. | |
1634 | ||
1635 | The offsets of the state save areas in struct kvm_xsave follow the | |
1636 | contents of CPUID leaf 0xD on the host. | |
2d5b5a66 | 1637 | |
414fa985 | 1638 | |
68ba6974 | 1639 | 4.44 KVM_GET_XCRS |
106ee47d | 1640 | ----------------- |
2d5b5a66 | 1641 | |
106ee47d MCC |
1642 | :Capability: KVM_CAP_XCRS |
1643 | :Architectures: x86 | |
1644 | :Type: vcpu ioctl | |
1645 | :Parameters: struct kvm_xcrs (out) | |
1646 | :Returns: 0 on success, -1 on error | |
1647 | ||
1648 | :: | |
2d5b5a66 | 1649 | |
106ee47d | 1650 | struct kvm_xcr { |
2d5b5a66 SY |
1651 | __u32 xcr; |
1652 | __u32 reserved; | |
1653 | __u64 value; | |
106ee47d | 1654 | }; |
2d5b5a66 | 1655 | |
106ee47d | 1656 | struct kvm_xcrs { |
2d5b5a66 SY |
1657 | __u32 nr_xcrs; |
1658 | __u32 flags; | |
1659 | struct kvm_xcr xcrs[KVM_MAX_XCRS]; | |
1660 | __u64 padding[16]; | |
106ee47d | 1661 | }; |
2d5b5a66 SY |
1662 | |
1663 | This ioctl would copy current vcpu's xcrs to the userspace. | |
1664 | ||
414fa985 | 1665 | |
68ba6974 | 1666 | 4.45 KVM_SET_XCRS |
106ee47d | 1667 | ----------------- |
2d5b5a66 | 1668 | |
106ee47d MCC |
1669 | :Capability: KVM_CAP_XCRS |
1670 | :Architectures: x86 | |
1671 | :Type: vcpu ioctl | |
1672 | :Parameters: struct kvm_xcrs (in) | |
1673 | :Returns: 0 on success, -1 on error | |
1674 | ||
1675 | :: | |
2d5b5a66 | 1676 | |
106ee47d | 1677 | struct kvm_xcr { |
2d5b5a66 SY |
1678 | __u32 xcr; |
1679 | __u32 reserved; | |
1680 | __u64 value; | |
106ee47d | 1681 | }; |
2d5b5a66 | 1682 | |
106ee47d | 1683 | struct kvm_xcrs { |
2d5b5a66 SY |
1684 | __u32 nr_xcrs; |
1685 | __u32 flags; | |
1686 | struct kvm_xcr xcrs[KVM_MAX_XCRS]; | |
1687 | __u64 padding[16]; | |
106ee47d | 1688 | }; |
2d5b5a66 SY |
1689 | |
1690 | This ioctl would set vcpu's xcr to the value userspace specified. | |
1691 | ||
414fa985 | 1692 | |
68ba6974 | 1693 | 4.46 KVM_GET_SUPPORTED_CPUID |
106ee47d MCC |
1694 | ---------------------------- |
1695 | ||
1696 | :Capability: KVM_CAP_EXT_CPUID | |
1697 | :Architectures: x86 | |
1698 | :Type: system ioctl | |
1699 | :Parameters: struct kvm_cpuid2 (in/out) | |
1700 | :Returns: 0 on success, -1 on error | |
d153513d | 1701 | |
106ee47d | 1702 | :: |
d153513d | 1703 | |
106ee47d | 1704 | struct kvm_cpuid2 { |
d153513d AK |
1705 | __u32 nent; |
1706 | __u32 padding; | |
1707 | struct kvm_cpuid_entry2 entries[0]; | |
106ee47d | 1708 | }; |
d153513d | 1709 | |
106ee47d | 1710 | #define KVM_CPUID_FLAG_SIGNIFCANT_INDEX BIT(0) |
7ff6c035 SC |
1711 | #define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1) /* deprecated */ |
1712 | #define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2) /* deprecated */ | |
d153513d | 1713 | |
106ee47d | 1714 | struct kvm_cpuid_entry2 { |
d153513d AK |
1715 | __u32 function; |
1716 | __u32 index; | |
1717 | __u32 flags; | |
1718 | __u32 eax; | |
1719 | __u32 ebx; | |
1720 | __u32 ecx; | |
1721 | __u32 edx; | |
1722 | __u32 padding[3]; | |
106ee47d | 1723 | }; |
d153513d | 1724 | |
df9cb9cc JM |
1725 | This ioctl returns x86 cpuid features which are supported by both the |
1726 | hardware and kvm in its default configuration. Userspace can use the | |
1727 | information returned by this ioctl to construct cpuid information (for | |
1728 | KVM_SET_CPUID2) that is consistent with hardware, kernel, and | |
1729 | userspace capabilities, and with user requirements (for example, the | |
1730 | user may wish to constrain cpuid to emulate older hardware, or for | |
1731 | feature consistency across a cluster). | |
1732 | ||
445ecdf7 JL |
1733 | Dynamically-enabled feature bits need to be requested with |
1734 | ``arch_prctl()`` before calling this ioctl. Feature bits that have not | |
1735 | been requested are excluded from the result. | |
1736 | ||
df9cb9cc JM |
1737 | Note that certain capabilities, such as KVM_CAP_X86_DISABLE_EXITS, may |
1738 | expose cpuid features (e.g. MONITOR) which are not supported by kvm in | |
1739 | its default configuration. If userspace enables such capabilities, it | |
1740 | is responsible for modifying the results of this ioctl appropriately. | |
d153513d AK |
1741 | |
1742 | Userspace invokes KVM_GET_SUPPORTED_CPUID by passing a kvm_cpuid2 structure | |
1743 | with the 'nent' field indicating the number of entries in the variable-size | |
1744 | array 'entries'. If the number of entries is too low to describe the cpu | |
1745 | capabilities, an error (E2BIG) is returned. If the number is too high, | |
1746 | the 'nent' field is adjusted and an error (ENOMEM) is returned. If the | |
1747 | number is just right, the 'nent' field is adjusted to the number of valid | |
1748 | entries in the 'entries' array, which is then filled. | |
1749 | ||
1750 | The entries returned are the host cpuid as returned by the cpuid instruction, | |
c39cbd2a AK |
1751 | with unknown or unsupported features masked out. Some features (for example, |
1752 | x2apic), may not be present in the host cpu, but are exposed by kvm if it can | |
1753 | emulate them efficiently. The fields in each entry are defined as follows: | |
d153513d | 1754 | |
106ee47d MCC |
1755 | function: |
1756 | the eax value used to obtain the entry | |
1757 | ||
1758 | index: | |
1759 | the ecx value used to obtain the entry (for entries that are | |
d153513d | 1760 | affected by ecx) |
106ee47d MCC |
1761 | |
1762 | flags: | |
1763 | an OR of zero or more of the following: | |
1764 | ||
d153513d AK |
1765 | KVM_CPUID_FLAG_SIGNIFCANT_INDEX: |
1766 | if the index field is valid | |
106ee47d MCC |
1767 | |
1768 | eax, ebx, ecx, edx: | |
1769 | the values returned by the cpuid instruction for | |
d153513d AK |
1770 | this function/index combination |
1771 | ||
4d25a066 JK |
1772 | The TSC deadline timer feature (CPUID leaf 1, ecx[24]) is always returned |
1773 | as false, since the feature depends on KVM_CREATE_IRQCHIP for local APIC | |
106ee47d | 1774 | support. Instead it is reported via:: |
4d25a066 JK |
1775 | |
1776 | ioctl(KVM_CHECK_EXTENSION, KVM_CAP_TSC_DEADLINE_TIMER) | |
1777 | ||
1778 | if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the | |
1779 | feature in userspace, then you can enable the feature for KVM_SET_CPUID2. | |
1780 | ||
414fa985 | 1781 | |
68ba6974 | 1782 | 4.47 KVM_PPC_GET_PVINFO |
106ee47d MCC |
1783 | ----------------------- |
1784 | ||
1785 | :Capability: KVM_CAP_PPC_GET_PVINFO | |
1786 | :Architectures: ppc | |
1787 | :Type: vm ioctl | |
1788 | :Parameters: struct kvm_ppc_pvinfo (out) | |
1789 | :Returns: 0 on success, !0 on error | |
15711e9c | 1790 | |
106ee47d | 1791 | :: |
15711e9c | 1792 | |
106ee47d | 1793 | struct kvm_ppc_pvinfo { |
15711e9c AG |
1794 | __u32 flags; |
1795 | __u32 hcall[4]; | |
1796 | __u8 pad[108]; | |
106ee47d | 1797 | }; |
15711e9c AG |
1798 | |
1799 | This ioctl fetches PV specific information that need to be passed to the guest | |
1800 | using the device tree or other means from vm context. | |
1801 | ||
9202e076 | 1802 | The hcall array defines 4 instructions that make up a hypercall. |
15711e9c AG |
1803 | |
1804 | If any additional field gets added to this structure later on, a bit for that | |
1805 | additional piece of information will be set in the flags bitmap. | |
1806 | ||
106ee47d | 1807 | The flags bitmap is defined as:: |
9202e076 LYB |
1808 | |
1809 | /* the host supports the ePAPR idle hcall | |
1810 | #define KVM_PPC_PVINFO_FLAGS_EV_IDLE (1<<0) | |
414fa985 | 1811 | |
68ba6974 | 1812 | 4.52 KVM_SET_GSI_ROUTING |
106ee47d | 1813 | ------------------------ |
49f48172 | 1814 | |
106ee47d | 1815 | :Capability: KVM_CAP_IRQ_ROUTING |
3fbf4207 | 1816 | :Architectures: x86 s390 arm64 |
106ee47d MCC |
1817 | :Type: vm ioctl |
1818 | :Parameters: struct kvm_irq_routing (in) | |
1819 | :Returns: 0 on success, -1 on error | |
49f48172 JK |
1820 | |
1821 | Sets the GSI routing table entries, overwriting any previously set entries. | |
1822 | ||
3fbf4207 | 1823 | On arm64, GSI routing has the following limitation: |
106ee47d | 1824 | |
180ae7b1 EA |
1825 | - GSI routing does not apply to KVM_IRQ_LINE but only to KVM_IRQFD. |
1826 | ||
106ee47d MCC |
1827 | :: |
1828 | ||
1829 | struct kvm_irq_routing { | |
49f48172 JK |
1830 | __u32 nr; |
1831 | __u32 flags; | |
1832 | struct kvm_irq_routing_entry entries[0]; | |
106ee47d | 1833 | }; |
49f48172 JK |
1834 | |
1835 | No flags are specified so far, the corresponding field must be set to zero. | |
1836 | ||
106ee47d MCC |
1837 | :: |
1838 | ||
1839 | struct kvm_irq_routing_entry { | |
49f48172 JK |
1840 | __u32 gsi; |
1841 | __u32 type; | |
1842 | __u32 flags; | |
1843 | __u32 pad; | |
1844 | union { | |
1845 | struct kvm_irq_routing_irqchip irqchip; | |
1846 | struct kvm_irq_routing_msi msi; | |
84223598 | 1847 | struct kvm_irq_routing_s390_adapter adapter; |
5c919412 | 1848 | struct kvm_irq_routing_hv_sint hv_sint; |
14243b38 | 1849 | struct kvm_irq_routing_xen_evtchn xen_evtchn; |
49f48172 JK |
1850 | __u32 pad[8]; |
1851 | } u; | |
106ee47d | 1852 | }; |
49f48172 | 1853 | |
106ee47d MCC |
1854 | /* gsi routing entry types */ |
1855 | #define KVM_IRQ_ROUTING_IRQCHIP 1 | |
1856 | #define KVM_IRQ_ROUTING_MSI 2 | |
1857 | #define KVM_IRQ_ROUTING_S390_ADAPTER 3 | |
1858 | #define KVM_IRQ_ROUTING_HV_SINT 4 | |
14243b38 | 1859 | #define KVM_IRQ_ROUTING_XEN_EVTCHN 5 |
49f48172 | 1860 | |
76a10b86 | 1861 | flags: |
106ee47d | 1862 | |
6f49b2f3 PB |
1863 | - KVM_MSI_VALID_DEVID: used along with KVM_IRQ_ROUTING_MSI routing entry |
1864 | type, specifies that the devid field contains a valid value. The per-VM | |
1865 | KVM_CAP_MSI_DEVID capability advertises the requirement to provide | |
1866 | the device ID. If this capability is not available, userspace should | |
1867 | never set the KVM_MSI_VALID_DEVID flag as the ioctl might fail. | |
76a10b86 | 1868 | - zero otherwise |
49f48172 | 1869 | |
106ee47d MCC |
1870 | :: |
1871 | ||
1872 | struct kvm_irq_routing_irqchip { | |
49f48172 JK |
1873 | __u32 irqchip; |
1874 | __u32 pin; | |
106ee47d | 1875 | }; |
49f48172 | 1876 | |
106ee47d | 1877 | struct kvm_irq_routing_msi { |
49f48172 JK |
1878 | __u32 address_lo; |
1879 | __u32 address_hi; | |
1880 | __u32 data; | |
76a10b86 EA |
1881 | union { |
1882 | __u32 pad; | |
1883 | __u32 devid; | |
1884 | }; | |
106ee47d | 1885 | }; |
49f48172 | 1886 | |
6f49b2f3 PB |
1887 | If KVM_MSI_VALID_DEVID is set, devid contains a unique device identifier |
1888 | for the device that wrote the MSI message. For PCI, this is usually a | |
1889 | BFD identifier in the lower 16 bits. | |
76a10b86 | 1890 | |
37131313 RK |
1891 | On x86, address_hi is ignored unless the KVM_X2APIC_API_USE_32BIT_IDS |
1892 | feature of KVM_CAP_X2APIC_API capability is enabled. If it is enabled, | |
1893 | address_hi bits 31-8 provide bits 31-8 of the destination id. Bits 7-0 of | |
1894 | address_hi must be zero. | |
1895 | ||
106ee47d MCC |
1896 | :: |
1897 | ||
1898 | struct kvm_irq_routing_s390_adapter { | |
84223598 CH |
1899 | __u64 ind_addr; |
1900 | __u64 summary_addr; | |
1901 | __u64 ind_offset; | |
1902 | __u32 summary_offset; | |
1903 | __u32 adapter_id; | |
106ee47d | 1904 | }; |
84223598 | 1905 | |
106ee47d | 1906 | struct kvm_irq_routing_hv_sint { |
5c919412 AS |
1907 | __u32 vcpu; |
1908 | __u32 sint; | |
106ee47d | 1909 | }; |
414fa985 | 1910 | |
14243b38 DW |
1911 | struct kvm_irq_routing_xen_evtchn { |
1912 | __u32 port; | |
1913 | __u32 vcpu; | |
1914 | __u32 priority; | |
1915 | }; | |
1916 | ||
1917 | ||
1918 | When KVM_CAP_XEN_HVM includes the KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL bit | |
1919 | in its indication of supported features, routing to Xen event channels | |
1920 | is supported. Although the priority field is present, only the value | |
1921 | KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL is supported, which means delivery by | |
1922 | 2 level event channels. FIFO event channel support may be added in | |
1923 | the future. | |
1924 | ||
414fa985 JK |
1925 | |
1926 | 4.55 KVM_SET_TSC_KHZ | |
106ee47d | 1927 | -------------------- |
92a1f12d | 1928 | |
ffbb61d0 | 1929 | :Capability: KVM_CAP_TSC_CONTROL / KVM_CAP_VM_TSC_CONTROL |
106ee47d | 1930 | :Architectures: x86 |
ffbb61d0 | 1931 | :Type: vcpu ioctl / vm ioctl |
106ee47d MCC |
1932 | :Parameters: virtual tsc_khz |
1933 | :Returns: 0 on success, -1 on error | |
92a1f12d JR |
1934 | |
1935 | Specifies the tsc frequency for the virtual machine. The unit of the | |
1936 | frequency is KHz. | |
1937 | ||
ffbb61d0 DW |
1938 | If the KVM_CAP_VM_TSC_CONTROL capability is advertised, this can also |
1939 | be used as a vm ioctl to set the initial tsc frequency of subsequently | |
1940 | created vCPUs. | |
414fa985 JK |
1941 | |
1942 | 4.56 KVM_GET_TSC_KHZ | |
106ee47d | 1943 | -------------------- |
92a1f12d | 1944 | |
ffbb61d0 | 1945 | :Capability: KVM_CAP_GET_TSC_KHZ / KVM_CAP_VM_TSC_CONTROL |
106ee47d | 1946 | :Architectures: x86 |
ffbb61d0 | 1947 | :Type: vcpu ioctl / vm ioctl |
106ee47d MCC |
1948 | :Parameters: none |
1949 | :Returns: virtual tsc-khz on success, negative value on error | |
92a1f12d JR |
1950 | |
1951 | Returns the tsc frequency of the guest. The unit of the return value is | |
1952 | KHz. If the host has unstable tsc this ioctl returns -EIO instead as an | |
1953 | error. | |
1954 | ||
414fa985 JK |
1955 | |
1956 | 4.57 KVM_GET_LAPIC | |
106ee47d | 1957 | ------------------ |
e7677933 | 1958 | |
106ee47d MCC |
1959 | :Capability: KVM_CAP_IRQCHIP |
1960 | :Architectures: x86 | |
1961 | :Type: vcpu ioctl | |
1962 | :Parameters: struct kvm_lapic_state (out) | |
1963 | :Returns: 0 on success, -1 on error | |
e7677933 | 1964 | |
106ee47d MCC |
1965 | :: |
1966 | ||
1967 | #define KVM_APIC_REG_SIZE 0x400 | |
1968 | struct kvm_lapic_state { | |
e7677933 | 1969 | char regs[KVM_APIC_REG_SIZE]; |
106ee47d | 1970 | }; |
e7677933 AK |
1971 | |
1972 | Reads the Local APIC registers and copies them into the input argument. The | |
1973 | data format and layout are the same as documented in the architecture manual. | |
1974 | ||
37131313 RK |
1975 | If KVM_X2APIC_API_USE_32BIT_IDS feature of KVM_CAP_X2APIC_API is |
1976 | enabled, then the format of APIC_ID register depends on the APIC mode | |
1977 | (reported by MSR_IA32_APICBASE) of its VCPU. x2APIC stores APIC ID in | |
1978 | the APIC_ID register (bytes 32-35). xAPIC only allows an 8-bit APIC ID | |
1979 | which is stored in bits 31-24 of the APIC register, or equivalently in | |
1980 | byte 35 of struct kvm_lapic_state's regs field. KVM_GET_LAPIC must then | |
1981 | be called after MSR_IA32_APICBASE has been set with KVM_SET_MSR. | |
1982 | ||
1983 | If KVM_X2APIC_API_USE_32BIT_IDS feature is disabled, struct kvm_lapic_state | |
1984 | always uses xAPIC format. | |
1985 | ||
414fa985 JK |
1986 | |
1987 | 4.58 KVM_SET_LAPIC | |
106ee47d | 1988 | ------------------ |
e7677933 | 1989 | |
106ee47d MCC |
1990 | :Capability: KVM_CAP_IRQCHIP |
1991 | :Architectures: x86 | |
1992 | :Type: vcpu ioctl | |
1993 | :Parameters: struct kvm_lapic_state (in) | |
1994 | :Returns: 0 on success, -1 on error | |
e7677933 | 1995 | |
106ee47d MCC |
1996 | :: |
1997 | ||
1998 | #define KVM_APIC_REG_SIZE 0x400 | |
1999 | struct kvm_lapic_state { | |
e7677933 | 2000 | char regs[KVM_APIC_REG_SIZE]; |
106ee47d | 2001 | }; |
e7677933 | 2002 | |
df5cbb27 | 2003 | Copies the input argument into the Local APIC registers. The data format |
e7677933 AK |
2004 | and layout are the same as documented in the architecture manual. |
2005 | ||
37131313 RK |
2006 | The format of the APIC ID register (bytes 32-35 of struct kvm_lapic_state's |
2007 | regs field) depends on the state of the KVM_CAP_X2APIC_API capability. | |
2008 | See the note in KVM_GET_LAPIC. | |
2009 | ||
414fa985 JK |
2010 | |
2011 | 4.59 KVM_IOEVENTFD | |
106ee47d | 2012 | ------------------ |
55399a02 | 2013 | |
106ee47d MCC |
2014 | :Capability: KVM_CAP_IOEVENTFD |
2015 | :Architectures: all | |
2016 | :Type: vm ioctl | |
2017 | :Parameters: struct kvm_ioeventfd (in) | |
2018 | :Returns: 0 on success, !0 on error | |
55399a02 SL |
2019 | |
2020 | This ioctl attaches or detaches an ioeventfd to a legal pio/mmio address | |
2021 | within the guest. A guest write in the registered address will signal the | |
2022 | provided event instead of triggering an exit. | |
2023 | ||
106ee47d MCC |
2024 | :: |
2025 | ||
2026 | struct kvm_ioeventfd { | |
55399a02 SL |
2027 | __u64 datamatch; |
2028 | __u64 addr; /* legal pio/mmio address */ | |
e9ea5069 | 2029 | __u32 len; /* 0, 1, 2, 4, or 8 bytes */ |
55399a02 SL |
2030 | __s32 fd; |
2031 | __u32 flags; | |
2032 | __u8 pad[36]; | |
106ee47d | 2033 | }; |
55399a02 | 2034 | |
2b83451b CH |
2035 | For the special case of virtio-ccw devices on s390, the ioevent is matched |
2036 | to a subchannel/virtqueue tuple instead. | |
2037 | ||
106ee47d | 2038 | The following flags are defined:: |
55399a02 | 2039 | |
106ee47d MCC |
2040 | #define KVM_IOEVENTFD_FLAG_DATAMATCH (1 << kvm_ioeventfd_flag_nr_datamatch) |
2041 | #define KVM_IOEVENTFD_FLAG_PIO (1 << kvm_ioeventfd_flag_nr_pio) | |
2042 | #define KVM_IOEVENTFD_FLAG_DEASSIGN (1 << kvm_ioeventfd_flag_nr_deassign) | |
2043 | #define KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \ | |
2b83451b | 2044 | (1 << kvm_ioeventfd_flag_nr_virtio_ccw_notify) |
55399a02 SL |
2045 | |
2046 | If datamatch flag is set, the event will be signaled only if the written value | |
2047 | to the registered address is equal to datamatch in struct kvm_ioeventfd. | |
2048 | ||
2b83451b CH |
2049 | For virtio-ccw devices, addr contains the subchannel id and datamatch the |
2050 | virtqueue index. | |
2051 | ||
e9ea5069 JW |
2052 | With KVM_CAP_IOEVENTFD_ANY_LENGTH, a zero length ioeventfd is allowed, and |
2053 | the kernel will ignore the length of guest write and may get a faster vmexit. | |
2054 | The speedup may only apply to specific architectures, but the ioeventfd will | |
2055 | work anyway. | |
414fa985 JK |
2056 | |
2057 | 4.60 KVM_DIRTY_TLB | |
106ee47d | 2058 | ------------------ |
dc83b8bc | 2059 | |
106ee47d MCC |
2060 | :Capability: KVM_CAP_SW_TLB |
2061 | :Architectures: ppc | |
2062 | :Type: vcpu ioctl | |
2063 | :Parameters: struct kvm_dirty_tlb (in) | |
2064 | :Returns: 0 on success, -1 on error | |
2065 | ||
2066 | :: | |
dc83b8bc | 2067 | |
106ee47d | 2068 | struct kvm_dirty_tlb { |
dc83b8bc SW |
2069 | __u64 bitmap; |
2070 | __u32 num_dirty; | |
106ee47d | 2071 | }; |
dc83b8bc SW |
2072 | |
2073 | This must be called whenever userspace has changed an entry in the shared | |
2074 | TLB, prior to calling KVM_RUN on the associated vcpu. | |
2075 | ||
2076 | The "bitmap" field is the userspace address of an array. This array | |
2077 | consists of a number of bits, equal to the total number of TLB entries as | |
2078 | determined by the last successful call to KVM_CONFIG_TLB, rounded up to the | |
2079 | nearest multiple of 64. | |
2080 | ||
2081 | Each bit corresponds to one TLB entry, ordered the same as in the shared TLB | |
2082 | array. | |
2083 | ||
2084 | The array is little-endian: the bit 0 is the least significant bit of the | |
2085 | first byte, bit 8 is the least significant bit of the second byte, etc. | |
2086 | This avoids any complications with differing word sizes. | |
2087 | ||
2088 | The "num_dirty" field is a performance hint for KVM to determine whether it | |
2089 | should skip processing the bitmap and just invalidate everything. It must | |
2090 | be set to the number of set bits in the bitmap. | |
2091 | ||
414fa985 | 2092 | |
54738c09 | 2093 | 4.62 KVM_CREATE_SPAPR_TCE |
106ee47d | 2094 | ------------------------- |
54738c09 | 2095 | |
106ee47d MCC |
2096 | :Capability: KVM_CAP_SPAPR_TCE |
2097 | :Architectures: powerpc | |
2098 | :Type: vm ioctl | |
2099 | :Parameters: struct kvm_create_spapr_tce (in) | |
2100 | :Returns: file descriptor for manipulating the created TCE table | |
54738c09 DG |
2101 | |
2102 | This creates a virtual TCE (translation control entry) table, which | |
2103 | is an IOMMU for PAPR-style virtual I/O. It is used to translate | |
2104 | logical addresses used in virtual I/O into guest physical addresses, | |
2105 | and provides a scatter/gather capability for PAPR virtual I/O. | |
2106 | ||
106ee47d MCC |
2107 | :: |
2108 | ||
2109 | /* for KVM_CAP_SPAPR_TCE */ | |
2110 | struct kvm_create_spapr_tce { | |
54738c09 DG |
2111 | __u64 liobn; |
2112 | __u32 window_size; | |
106ee47d | 2113 | }; |
54738c09 DG |
2114 | |
2115 | The liobn field gives the logical IO bus number for which to create a | |
2116 | TCE table. The window_size field specifies the size of the DMA window | |
2117 | which this TCE table will translate - the table will contain one 64 | |
2118 | bit TCE entry for every 4kiB of the DMA window. | |
2119 | ||
2120 | When the guest issues an H_PUT_TCE hcall on a liobn for which a TCE | |
2121 | table has been created using this ioctl(), the kernel will handle it | |
2122 | in real mode, updating the TCE table. H_PUT_TCE calls for other | |
2123 | liobns will cause a vm exit and must be handled by userspace. | |
2124 | ||
2125 | The return value is a file descriptor which can be passed to mmap(2) | |
2126 | to map the created TCE table into userspace. This lets userspace read | |
2127 | the entries written by kernel-handled H_PUT_TCE calls, and also lets | |
2128 | userspace update the TCE table directly which is useful in some | |
2129 | circumstances. | |
2130 | ||
414fa985 | 2131 | |
aa04b4cc | 2132 | 4.63 KVM_ALLOCATE_RMA |
106ee47d | 2133 | --------------------- |
aa04b4cc | 2134 | |
106ee47d MCC |
2135 | :Capability: KVM_CAP_PPC_RMA |
2136 | :Architectures: powerpc | |
2137 | :Type: vm ioctl | |
2138 | :Parameters: struct kvm_allocate_rma (out) | |
2139 | :Returns: file descriptor for mapping the allocated RMA | |
aa04b4cc PM |
2140 | |
2141 | This allocates a Real Mode Area (RMA) from the pool allocated at boot | |
2142 | time by the kernel. An RMA is a physically-contiguous, aligned region | |
2143 | of memory used on older POWER processors to provide the memory which | |
2144 | will be accessed by real-mode (MMU off) accesses in a KVM guest. | |
2145 | POWER processors support a set of sizes for the RMA that usually | |
2146 | includes 64MB, 128MB, 256MB and some larger powers of two. | |
2147 | ||
106ee47d MCC |
2148 | :: |
2149 | ||
2150 | /* for KVM_ALLOCATE_RMA */ | |
2151 | struct kvm_allocate_rma { | |
aa04b4cc | 2152 | __u64 rma_size; |
106ee47d | 2153 | }; |
aa04b4cc PM |
2154 | |
2155 | The return value is a file descriptor which can be passed to mmap(2) | |
2156 | to map the allocated RMA into userspace. The mapped area can then be | |
2157 | passed to the KVM_SET_USER_MEMORY_REGION ioctl to establish it as the | |
2158 | RMA for a virtual machine. The size of the RMA in bytes (which is | |
2159 | fixed at host kernel boot time) is returned in the rma_size field of | |
2160 | the argument structure. | |
2161 | ||
2162 | The KVM_CAP_PPC_RMA capability is 1 or 2 if the KVM_ALLOCATE_RMA ioctl | |
2163 | is supported; 2 if the processor requires all virtual machines to have | |
2164 | an RMA, or 1 if the processor can use an RMA but doesn't require it, | |
2165 | because it supports the Virtual RMA (VRMA) facility. | |
2166 | ||
414fa985 | 2167 | |
3f745f1e | 2168 | 4.64 KVM_NMI |
106ee47d | 2169 | ------------ |
3f745f1e | 2170 | |
106ee47d MCC |
2171 | :Capability: KVM_CAP_USER_NMI |
2172 | :Architectures: x86 | |
2173 | :Type: vcpu ioctl | |
2174 | :Parameters: none | |
2175 | :Returns: 0 on success, -1 on error | |
3f745f1e AK |
2176 | |
2177 | Queues an NMI on the thread's vcpu. Note this is well defined only | |
2178 | when KVM_CREATE_IRQCHIP has not been called, since this is an interface | |
2179 | between the virtual cpu core and virtual local APIC. After KVM_CREATE_IRQCHIP | |
2180 | has been called, this interface is completely emulated within the kernel. | |
2181 | ||
2182 | To use this to emulate the LINT1 input with KVM_CREATE_IRQCHIP, use the | |
2183 | following algorithm: | |
2184 | ||
5d4f6f3d | 2185 | - pause the vcpu |
3f745f1e AK |
2186 | - read the local APIC's state (KVM_GET_LAPIC) |
2187 | - check whether changing LINT1 will queue an NMI (see the LVT entry for LINT1) | |
2188 | - if so, issue KVM_NMI | |
2189 | - resume the vcpu | |
2190 | ||
2191 | Some guests configure the LINT1 NMI input to cause a panic, aiding in | |
2192 | debugging. | |
2193 | ||
414fa985 | 2194 | |
e24ed81f | 2195 | 4.65 KVM_S390_UCAS_MAP |
106ee47d | 2196 | ---------------------- |
27e0393f | 2197 | |
106ee47d MCC |
2198 | :Capability: KVM_CAP_S390_UCONTROL |
2199 | :Architectures: s390 | |
2200 | :Type: vcpu ioctl | |
2201 | :Parameters: struct kvm_s390_ucas_mapping (in) | |
2202 | :Returns: 0 in case of success | |
2203 | ||
2204 | The parameter is defined like this:: | |
27e0393f | 2205 | |
27e0393f CO |
2206 | struct kvm_s390_ucas_mapping { |
2207 | __u64 user_addr; | |
2208 | __u64 vcpu_addr; | |
2209 | __u64 length; | |
2210 | }; | |
2211 | ||
2212 | This ioctl maps the memory at "user_addr" with the length "length" to | |
2213 | the vcpu's address space starting at "vcpu_addr". All parameters need to | |
f884ab15 | 2214 | be aligned by 1 megabyte. |
27e0393f | 2215 | |
414fa985 | 2216 | |
e24ed81f | 2217 | 4.66 KVM_S390_UCAS_UNMAP |
106ee47d | 2218 | ------------------------ |
27e0393f | 2219 | |
106ee47d MCC |
2220 | :Capability: KVM_CAP_S390_UCONTROL |
2221 | :Architectures: s390 | |
2222 | :Type: vcpu ioctl | |
2223 | :Parameters: struct kvm_s390_ucas_mapping (in) | |
2224 | :Returns: 0 in case of success | |
2225 | ||
2226 | The parameter is defined like this:: | |
27e0393f | 2227 | |
27e0393f CO |
2228 | struct kvm_s390_ucas_mapping { |
2229 | __u64 user_addr; | |
2230 | __u64 vcpu_addr; | |
2231 | __u64 length; | |
2232 | }; | |
2233 | ||
2234 | This ioctl unmaps the memory in the vcpu's address space starting at | |
2235 | "vcpu_addr" with the length "length". The field "user_addr" is ignored. | |
f884ab15 | 2236 | All parameters need to be aligned by 1 megabyte. |
27e0393f | 2237 | |
414fa985 | 2238 | |
e24ed81f | 2239 | 4.67 KVM_S390_VCPU_FAULT |
106ee47d | 2240 | ------------------------ |
ccc7910f | 2241 | |
106ee47d MCC |
2242 | :Capability: KVM_CAP_S390_UCONTROL |
2243 | :Architectures: s390 | |
2244 | :Type: vcpu ioctl | |
2245 | :Parameters: vcpu absolute address (in) | |
2246 | :Returns: 0 in case of success | |
ccc7910f CO |
2247 | |
2248 | This call creates a page table entry on the virtual cpu's address space | |
2249 | (for user controlled virtual machines) or the virtual machine's address | |
2250 | space (for regular virtual machines). This only works for minor faults, | |
2251 | thus it's recommended to access subject memory page via the user page | |
2252 | table upfront. This is useful to handle validity intercepts for user | |
2253 | controlled virtual machines to fault in the virtual cpu's lowcore pages | |
2254 | prior to calling the KVM_RUN ioctl. | |
2255 | ||
414fa985 | 2256 | |
e24ed81f | 2257 | 4.68 KVM_SET_ONE_REG |
106ee47d MCC |
2258 | -------------------- |
2259 | ||
2260 | :Capability: KVM_CAP_ONE_REG | |
2261 | :Architectures: all | |
2262 | :Type: vcpu ioctl | |
2263 | :Parameters: struct kvm_one_reg (in) | |
2264 | :Returns: 0 on success, negative value on failure | |
e24ed81f | 2265 | |
395f562f | 2266 | Errors: |
106ee47d MCC |
2267 | |
2268 | ====== ============================================================ | |
3b1c8c56 MCC |
2269 | ENOENT no such register |
2270 | EINVAL invalid register ID, or no such register or used with VMs in | |
68cf7b1f | 2271 | protected virtualization mode on s390 |
3b1c8c56 | 2272 | EPERM (arm64) register access not allowed before vcpu finalization |
106ee47d MCC |
2273 | ====== ============================================================ |
2274 | ||
fe365b4e DM |
2275 | (These error codes are indicative only: do not rely on a specific error |
2276 | code being returned in a specific situation.) | |
e24ed81f | 2277 | |
106ee47d MCC |
2278 | :: |
2279 | ||
2280 | struct kvm_one_reg { | |
e24ed81f AG |
2281 | __u64 id; |
2282 | __u64 addr; | |
106ee47d | 2283 | }; |
e24ed81f AG |
2284 | |
2285 | Using this ioctl, a single vcpu register can be set to a specific value | |
2286 | defined by user space with the passed in struct kvm_one_reg, where id | |
2287 | refers to the register identifier as described below and addr is a pointer | |
2288 | to a variable with the respective size. There can be architecture agnostic | |
2289 | and architecture specific registers. Each have their own range of operation | |
2290 | and their own constants and width. To keep track of the implemented | |
2291 | registers, find a list below: | |
2292 | ||
106ee47d MCC |
2293 | ======= =============================== ============ |
2294 | Arch Register Width (bits) | |
2295 | ======= =============================== ============ | |
2296 | PPC KVM_REG_PPC_HIOR 64 | |
2297 | PPC KVM_REG_PPC_IAC1 64 | |
2298 | PPC KVM_REG_PPC_IAC2 64 | |
2299 | PPC KVM_REG_PPC_IAC3 64 | |
2300 | PPC KVM_REG_PPC_IAC4 64 | |
2301 | PPC KVM_REG_PPC_DAC1 64 | |
2302 | PPC KVM_REG_PPC_DAC2 64 | |
2303 | PPC KVM_REG_PPC_DABR 64 | |
2304 | PPC KVM_REG_PPC_DSCR 64 | |
2305 | PPC KVM_REG_PPC_PURR 64 | |
2306 | PPC KVM_REG_PPC_SPURR 64 | |
2307 | PPC KVM_REG_PPC_DAR 64 | |
2308 | PPC KVM_REG_PPC_DSISR 32 | |
2309 | PPC KVM_REG_PPC_AMR 64 | |
2310 | PPC KVM_REG_PPC_UAMOR 64 | |
2311 | PPC KVM_REG_PPC_MMCR0 64 | |
2312 | PPC KVM_REG_PPC_MMCR1 64 | |
2313 | PPC KVM_REG_PPC_MMCRA 64 | |
2314 | PPC KVM_REG_PPC_MMCR2 64 | |
2315 | PPC KVM_REG_PPC_MMCRS 64 | |
5752fe0b | 2316 | PPC KVM_REG_PPC_MMCR3 64 |
106ee47d MCC |
2317 | PPC KVM_REG_PPC_SIAR 64 |
2318 | PPC KVM_REG_PPC_SDAR 64 | |
2319 | PPC KVM_REG_PPC_SIER 64 | |
5752fe0b AR |
2320 | PPC KVM_REG_PPC_SIER2 64 |
2321 | PPC KVM_REG_PPC_SIER3 64 | |
106ee47d MCC |
2322 | PPC KVM_REG_PPC_PMC1 32 |
2323 | PPC KVM_REG_PPC_PMC2 32 | |
2324 | PPC KVM_REG_PPC_PMC3 32 | |
2325 | PPC KVM_REG_PPC_PMC4 32 | |
2326 | PPC KVM_REG_PPC_PMC5 32 | |
2327 | PPC KVM_REG_PPC_PMC6 32 | |
2328 | PPC KVM_REG_PPC_PMC7 32 | |
2329 | PPC KVM_REG_PPC_PMC8 32 | |
2330 | PPC KVM_REG_PPC_FPR0 64 | |
2331 | ... | |
2332 | PPC KVM_REG_PPC_FPR31 64 | |
2333 | PPC KVM_REG_PPC_VR0 128 | |
2334 | ... | |
2335 | PPC KVM_REG_PPC_VR31 128 | |
2336 | PPC KVM_REG_PPC_VSR0 128 | |
2337 | ... | |
2338 | PPC KVM_REG_PPC_VSR31 128 | |
2339 | PPC KVM_REG_PPC_FPSCR 64 | |
2340 | PPC KVM_REG_PPC_VSCR 32 | |
2341 | PPC KVM_REG_PPC_VPA_ADDR 64 | |
2342 | PPC KVM_REG_PPC_VPA_SLB 128 | |
2343 | PPC KVM_REG_PPC_VPA_DTL 128 | |
2344 | PPC KVM_REG_PPC_EPCR 32 | |
2345 | PPC KVM_REG_PPC_EPR 32 | |
2346 | PPC KVM_REG_PPC_TCR 32 | |
2347 | PPC KVM_REG_PPC_TSR 32 | |
2348 | PPC KVM_REG_PPC_OR_TSR 32 | |
2349 | PPC KVM_REG_PPC_CLEAR_TSR 32 | |
2350 | PPC KVM_REG_PPC_MAS0 32 | |
2351 | PPC KVM_REG_PPC_MAS1 32 | |
2352 | PPC KVM_REG_PPC_MAS2 64 | |
2353 | PPC KVM_REG_PPC_MAS7_3 64 | |
2354 | PPC KVM_REG_PPC_MAS4 32 | |
2355 | PPC KVM_REG_PPC_MAS6 32 | |
2356 | PPC KVM_REG_PPC_MMUCFG 32 | |
2357 | PPC KVM_REG_PPC_TLB0CFG 32 | |
2358 | PPC KVM_REG_PPC_TLB1CFG 32 | |
2359 | PPC KVM_REG_PPC_TLB2CFG 32 | |
2360 | PPC KVM_REG_PPC_TLB3CFG 32 | |
2361 | PPC KVM_REG_PPC_TLB0PS 32 | |
2362 | PPC KVM_REG_PPC_TLB1PS 32 | |
2363 | PPC KVM_REG_PPC_TLB2PS 32 | |
2364 | PPC KVM_REG_PPC_TLB3PS 32 | |
2365 | PPC KVM_REG_PPC_EPTCFG 32 | |
2366 | PPC KVM_REG_PPC_ICP_STATE 64 | |
2367 | PPC KVM_REG_PPC_VP_STATE 128 | |
2368 | PPC KVM_REG_PPC_TB_OFFSET 64 | |
2369 | PPC KVM_REG_PPC_SPMC1 32 | |
2370 | PPC KVM_REG_PPC_SPMC2 32 | |
2371 | PPC KVM_REG_PPC_IAMR 64 | |
2372 | PPC KVM_REG_PPC_TFHAR 64 | |
2373 | PPC KVM_REG_PPC_TFIAR 64 | |
2374 | PPC KVM_REG_PPC_TEXASR 64 | |
2375 | PPC KVM_REG_PPC_FSCR 64 | |
2376 | PPC KVM_REG_PPC_PSPB 32 | |
2377 | PPC KVM_REG_PPC_EBBHR 64 | |
2378 | PPC KVM_REG_PPC_EBBRR 64 | |
2379 | PPC KVM_REG_PPC_BESCR 64 | |
2380 | PPC KVM_REG_PPC_TAR 64 | |
2381 | PPC KVM_REG_PPC_DPDES 64 | |
2382 | PPC KVM_REG_PPC_DAWR 64 | |
2383 | PPC KVM_REG_PPC_DAWRX 64 | |
2384 | PPC KVM_REG_PPC_CIABR 64 | |
2385 | PPC KVM_REG_PPC_IC 64 | |
2386 | PPC KVM_REG_PPC_VTB 64 | |
2387 | PPC KVM_REG_PPC_CSIGR 64 | |
2388 | PPC KVM_REG_PPC_TACR 64 | |
2389 | PPC KVM_REG_PPC_TCSCR 64 | |
2390 | PPC KVM_REG_PPC_PID 64 | |
2391 | PPC KVM_REG_PPC_ACOP 64 | |
2392 | PPC KVM_REG_PPC_VRSAVE 32 | |
2393 | PPC KVM_REG_PPC_LPCR 32 | |
2394 | PPC KVM_REG_PPC_LPCR_64 64 | |
2395 | PPC KVM_REG_PPC_PPR 64 | |
2396 | PPC KVM_REG_PPC_ARCH_COMPAT 32 | |
2397 | PPC KVM_REG_PPC_DABRX 32 | |
2398 | PPC KVM_REG_PPC_WORT 64 | |
2399 | PPC KVM_REG_PPC_SPRG9 64 | |
2400 | PPC KVM_REG_PPC_DBSR 32 | |
2401 | PPC KVM_REG_PPC_TIDR 64 | |
2402 | PPC KVM_REG_PPC_PSSCR 64 | |
2403 | PPC KVM_REG_PPC_DEC_EXPIRY 64 | |
2404 | PPC KVM_REG_PPC_PTCR 64 | |
bd1de1a0 RB |
2405 | PPC KVM_REG_PPC_DAWR1 64 |
2406 | PPC KVM_REG_PPC_DAWRX1 64 | |
106ee47d MCC |
2407 | PPC KVM_REG_PPC_TM_GPR0 64 |
2408 | ... | |
2409 | PPC KVM_REG_PPC_TM_GPR31 64 | |
2410 | PPC KVM_REG_PPC_TM_VSR0 128 | |
2411 | ... | |
2412 | PPC KVM_REG_PPC_TM_VSR63 128 | |
2413 | PPC KVM_REG_PPC_TM_CR 64 | |
2414 | PPC KVM_REG_PPC_TM_LR 64 | |
2415 | PPC KVM_REG_PPC_TM_CTR 64 | |
2416 | PPC KVM_REG_PPC_TM_FPSCR 64 | |
2417 | PPC KVM_REG_PPC_TM_AMR 64 | |
2418 | PPC KVM_REG_PPC_TM_PPR 64 | |
2419 | PPC KVM_REG_PPC_TM_VRSAVE 64 | |
2420 | PPC KVM_REG_PPC_TM_VSCR 32 | |
2421 | PPC KVM_REG_PPC_TM_DSCR 64 | |
2422 | PPC KVM_REG_PPC_TM_TAR 64 | |
2423 | PPC KVM_REG_PPC_TM_XER 64 | |
2424 | ||
2425 | MIPS KVM_REG_MIPS_R0 64 | |
2426 | ... | |
2427 | MIPS KVM_REG_MIPS_R31 64 | |
2428 | MIPS KVM_REG_MIPS_HI 64 | |
2429 | MIPS KVM_REG_MIPS_LO 64 | |
2430 | MIPS KVM_REG_MIPS_PC 64 | |
2431 | MIPS KVM_REG_MIPS_CP0_INDEX 32 | |
2432 | MIPS KVM_REG_MIPS_CP0_ENTRYLO0 64 | |
2433 | MIPS KVM_REG_MIPS_CP0_ENTRYLO1 64 | |
2434 | MIPS KVM_REG_MIPS_CP0_CONTEXT 64 | |
2435 | MIPS KVM_REG_MIPS_CP0_CONTEXTCONFIG 32 | |
2436 | MIPS KVM_REG_MIPS_CP0_USERLOCAL 64 | |
2437 | MIPS KVM_REG_MIPS_CP0_XCONTEXTCONFIG 64 | |
2438 | MIPS KVM_REG_MIPS_CP0_PAGEMASK 32 | |
2439 | MIPS KVM_REG_MIPS_CP0_PAGEGRAIN 32 | |
2440 | MIPS KVM_REG_MIPS_CP0_SEGCTL0 64 | |
2441 | MIPS KVM_REG_MIPS_CP0_SEGCTL1 64 | |
2442 | MIPS KVM_REG_MIPS_CP0_SEGCTL2 64 | |
2443 | MIPS KVM_REG_MIPS_CP0_PWBASE 64 | |
2444 | MIPS KVM_REG_MIPS_CP0_PWFIELD 64 | |
2445 | MIPS KVM_REG_MIPS_CP0_PWSIZE 64 | |
2446 | MIPS KVM_REG_MIPS_CP0_WIRED 32 | |
2447 | MIPS KVM_REG_MIPS_CP0_PWCTL 32 | |
2448 | MIPS KVM_REG_MIPS_CP0_HWRENA 32 | |
2449 | MIPS KVM_REG_MIPS_CP0_BADVADDR 64 | |
2450 | MIPS KVM_REG_MIPS_CP0_BADINSTR 32 | |
2451 | MIPS KVM_REG_MIPS_CP0_BADINSTRP 32 | |
2452 | MIPS KVM_REG_MIPS_CP0_COUNT 32 | |
2453 | MIPS KVM_REG_MIPS_CP0_ENTRYHI 64 | |
2454 | MIPS KVM_REG_MIPS_CP0_COMPARE 32 | |
2455 | MIPS KVM_REG_MIPS_CP0_STATUS 32 | |
2456 | MIPS KVM_REG_MIPS_CP0_INTCTL 32 | |
2457 | MIPS KVM_REG_MIPS_CP0_CAUSE 32 | |
2458 | MIPS KVM_REG_MIPS_CP0_EPC 64 | |
2459 | MIPS KVM_REG_MIPS_CP0_PRID 32 | |
2460 | MIPS KVM_REG_MIPS_CP0_EBASE 64 | |
2461 | MIPS KVM_REG_MIPS_CP0_CONFIG 32 | |
2462 | MIPS KVM_REG_MIPS_CP0_CONFIG1 32 | |
2463 | MIPS KVM_REG_MIPS_CP0_CONFIG2 32 | |
2464 | MIPS KVM_REG_MIPS_CP0_CONFIG3 32 | |
2465 | MIPS KVM_REG_MIPS_CP0_CONFIG4 32 | |
2466 | MIPS KVM_REG_MIPS_CP0_CONFIG5 32 | |
2467 | MIPS KVM_REG_MIPS_CP0_CONFIG7 32 | |
2468 | MIPS KVM_REG_MIPS_CP0_XCONTEXT 64 | |
2469 | MIPS KVM_REG_MIPS_CP0_ERROREPC 64 | |
2470 | MIPS KVM_REG_MIPS_CP0_KSCRATCH1 64 | |
2471 | MIPS KVM_REG_MIPS_CP0_KSCRATCH2 64 | |
2472 | MIPS KVM_REG_MIPS_CP0_KSCRATCH3 64 | |
2473 | MIPS KVM_REG_MIPS_CP0_KSCRATCH4 64 | |
2474 | MIPS KVM_REG_MIPS_CP0_KSCRATCH5 64 | |
2475 | MIPS KVM_REG_MIPS_CP0_KSCRATCH6 64 | |
2476 | MIPS KVM_REG_MIPS_CP0_MAAR(0..63) 64 | |
2477 | MIPS KVM_REG_MIPS_COUNT_CTL 64 | |
2478 | MIPS KVM_REG_MIPS_COUNT_RESUME 64 | |
2479 | MIPS KVM_REG_MIPS_COUNT_HZ 64 | |
2480 | MIPS KVM_REG_MIPS_FPR_32(0..31) 32 | |
2481 | MIPS KVM_REG_MIPS_FPR_64(0..31) 64 | |
2482 | MIPS KVM_REG_MIPS_VEC_128(0..31) 128 | |
2483 | MIPS KVM_REG_MIPS_FCR_IR 32 | |
2484 | MIPS KVM_REG_MIPS_FCR_CSR 32 | |
2485 | MIPS KVM_REG_MIPS_MSA_IR 32 | |
2486 | MIPS KVM_REG_MIPS_MSA_CSR 32 | |
2487 | ======= =============================== ============ | |
414fa985 | 2488 | |
749cf76c CD |
2489 | ARM registers are mapped using the lower 32 bits. The upper 16 of that |
2490 | is the register group type, or coprocessor number: | |
2491 | ||
106ee47d MCC |
2492 | ARM core registers have the following id bit patterns:: |
2493 | ||
aa404ddf | 2494 | 0x4020 0000 0010 <index into the kvm_regs struct:16> |
749cf76c | 2495 | |
106ee47d MCC |
2496 | ARM 32-bit CP15 registers have the following id bit patterns:: |
2497 | ||
aa404ddf | 2498 | 0x4020 0000 000F <zero:1> <crn:4> <crm:4> <opc1:4> <opc2:3> |
1138245c | 2499 | |
106ee47d MCC |
2500 | ARM 64-bit CP15 registers have the following id bit patterns:: |
2501 | ||
aa404ddf | 2502 | 0x4030 0000 000F <zero:1> <zero:4> <crm:4> <opc1:4> <zero:3> |
749cf76c | 2503 | |
106ee47d MCC |
2504 | ARM CCSIDR registers are demultiplexed by CSSELR value:: |
2505 | ||
aa404ddf | 2506 | 0x4020 0000 0011 00 <csselr:8> |
749cf76c | 2507 | |
106ee47d MCC |
2508 | ARM 32-bit VFP control registers have the following id bit patterns:: |
2509 | ||
aa404ddf | 2510 | 0x4020 0000 0012 1 <regno:12> |
4fe21e4c | 2511 | |
106ee47d MCC |
2512 | ARM 64-bit FP registers have the following id bit patterns:: |
2513 | ||
aa404ddf | 2514 | 0x4030 0000 0012 0 <regno:12> |
4fe21e4c | 2515 | |
106ee47d MCC |
2516 | ARM firmware pseudo-registers have the following bit pattern:: |
2517 | ||
85bd0ba1 MZ |
2518 | 0x4030 0000 0014 <regno:16> |
2519 | ||
379e04c7 MZ |
2520 | |
2521 | arm64 registers are mapped using the lower 32 bits. The upper 16 of | |
2522 | that is the register group type, or coprocessor number: | |
2523 | ||
2524 | arm64 core/FP-SIMD registers have the following id bit patterns. Note | |
2525 | that the size of the access is variable, as the kvm_regs structure | |
2526 | contains elements ranging from 32 to 128 bits. The index is a 32bit | |
106ee47d MCC |
2527 | value in the kvm_regs structure seen as a 32bit array:: |
2528 | ||
379e04c7 MZ |
2529 | 0x60x0 0000 0010 <index into the kvm_regs struct:16> |
2530 | ||
fd3bc912 | 2531 | Specifically: |
106ee47d MCC |
2532 | |
2533 | ======================= ========= ===== ======================================= | |
fd3bc912 | 2534 | Encoding Register Bits kvm_regs member |
106ee47d | 2535 | ======================= ========= ===== ======================================= |
fd3bc912 DM |
2536 | 0x6030 0000 0010 0000 X0 64 regs.regs[0] |
2537 | 0x6030 0000 0010 0002 X1 64 regs.regs[1] | |
106ee47d | 2538 | ... |
fd3bc912 DM |
2539 | 0x6030 0000 0010 003c X30 64 regs.regs[30] |
2540 | 0x6030 0000 0010 003e SP 64 regs.sp | |
2541 | 0x6030 0000 0010 0040 PC 64 regs.pc | |
2542 | 0x6030 0000 0010 0042 PSTATE 64 regs.pstate | |
2543 | 0x6030 0000 0010 0044 SP_EL1 64 sp_el1 | |
2544 | 0x6030 0000 0010 0046 ELR_EL1 64 elr_el1 | |
2545 | 0x6030 0000 0010 0048 SPSR_EL1 64 spsr[KVM_SPSR_EL1] (alias SPSR_SVC) | |
2546 | 0x6030 0000 0010 004a SPSR_ABT 64 spsr[KVM_SPSR_ABT] | |
2547 | 0x6030 0000 0010 004c SPSR_UND 64 spsr[KVM_SPSR_UND] | |
2548 | 0x6030 0000 0010 004e SPSR_IRQ 64 spsr[KVM_SPSR_IRQ] | |
2549 | 0x6060 0000 0010 0050 SPSR_FIQ 64 spsr[KVM_SPSR_FIQ] | |
106ee47d MCC |
2550 | 0x6040 0000 0010 0054 V0 128 fp_regs.vregs[0] [1]_ |
2551 | 0x6040 0000 0010 0058 V1 128 fp_regs.vregs[1] [1]_ | |
2552 | ... | |
2553 | 0x6040 0000 0010 00d0 V31 128 fp_regs.vregs[31] [1]_ | |
fd3bc912 DM |
2554 | 0x6020 0000 0010 00d4 FPSR 32 fp_regs.fpsr |
2555 | 0x6020 0000 0010 00d5 FPCR 32 fp_regs.fpcr | |
106ee47d | 2556 | ======================= ========= ===== ======================================= |
fd3bc912 | 2557 | |
106ee47d MCC |
2558 | .. [1] These encodings are not accepted for SVE-enabled vcpus. See |
2559 | KVM_ARM_VCPU_INIT. | |
50036ad0 | 2560 | |
106ee47d MCC |
2561 | The equivalent register content can be accessed via bits [127:0] of |
2562 | the corresponding SVE Zn registers instead for vcpus that have SVE | |
2563 | enabled (see below). | |
2564 | ||
2565 | arm64 CCSIDR registers are demultiplexed by CSSELR value:: | |
50036ad0 | 2566 | |
379e04c7 MZ |
2567 | 0x6020 0000 0011 00 <csselr:8> |
2568 | ||
106ee47d MCC |
2569 | arm64 system registers have the following id bit patterns:: |
2570 | ||
379e04c7 MZ |
2571 | 0x6030 0000 0013 <op0:2> <op1:3> <crn:4> <crm:4> <op2:3> |
2572 | ||
106ee47d MCC |
2573 | .. warning:: |
2574 | ||
290a6bb0 AJ |
2575 | Two system register IDs do not follow the specified pattern. These |
2576 | are KVM_REG_ARM_TIMER_CVAL and KVM_REG_ARM_TIMER_CNT, which map to | |
2577 | system registers CNTV_CVAL_EL0 and CNTVCT_EL0 respectively. These | |
2578 | two had their values accidentally swapped, which means TIMER_CVAL is | |
2579 | derived from the register encoding for CNTVCT_EL0 and TIMER_CNT is | |
2580 | derived from the register encoding for CNTV_CVAL_EL0. As this is | |
2581 | API, it must remain this way. | |
2582 | ||
106ee47d MCC |
2583 | arm64 firmware pseudo-registers have the following bit pattern:: |
2584 | ||
85bd0ba1 MZ |
2585 | 0x6030 0000 0014 <regno:16> |
2586 | ||
106ee47d MCC |
2587 | arm64 SVE registers have the following bit patterns:: |
2588 | ||
50036ad0 DM |
2589 | 0x6080 0000 0015 00 <n:5> <slice:5> Zn bits[2048*slice + 2047 : 2048*slice] |
2590 | 0x6050 0000 0015 04 <n:4> <slice:5> Pn bits[256*slice + 255 : 256*slice] | |
2591 | 0x6050 0000 0015 060 <slice:5> FFR bits[256*slice + 255 : 256*slice] | |
2592 | 0x6060 0000 0015 ffff KVM_REG_ARM64_SVE_VLS pseudo-register | |
2593 | ||
43b8e1f0 DM |
2594 | Access to register IDs where 2048 * slice >= 128 * max_vq will fail with |
2595 | ENOENT. max_vq is the vcpu's maximum supported vector length in 128-bit | |
106ee47d | 2596 | quadwords: see [2]_ below. |
50036ad0 DM |
2597 | |
2598 | These registers are only accessible on vcpus for which SVE is enabled. | |
2599 | See KVM_ARM_VCPU_INIT for details. | |
2600 | ||
2601 | In addition, except for KVM_REG_ARM64_SVE_VLS, these registers are not | |
2602 | accessible until the vcpu's SVE configuration has been finalized | |
2603 | using KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE). See KVM_ARM_VCPU_INIT | |
2604 | and KVM_ARM_VCPU_FINALIZE for more information about this procedure. | |
2605 | ||
2606 | KVM_REG_ARM64_SVE_VLS is a pseudo-register that allows the set of vector | |
2607 | lengths supported by the vcpu to be discovered and configured by | |
2608 | userspace. When transferred to or from user memory via KVM_GET_ONE_REG | |
4bd774e5 DM |
2609 | or KVM_SET_ONE_REG, the value of this register is of type |
2610 | __u64[KVM_ARM64_SVE_VLS_WORDS], and encodes the set of vector lengths as | |
106ee47d | 2611 | follows:: |
50036ad0 | 2612 | |
106ee47d | 2613 | __u64 vector_lengths[KVM_ARM64_SVE_VLS_WORDS]; |
50036ad0 | 2614 | |
106ee47d MCC |
2615 | if (vq >= SVE_VQ_MIN && vq <= SVE_VQ_MAX && |
2616 | ((vector_lengths[(vq - KVM_ARM64_SVE_VQ_MIN) / 64] >> | |
4bd774e5 | 2617 | ((vq - KVM_ARM64_SVE_VQ_MIN) % 64)) & 1)) |
50036ad0 | 2618 | /* Vector length vq * 16 bytes supported */ |
106ee47d | 2619 | else |
50036ad0 DM |
2620 | /* Vector length vq * 16 bytes not supported */ |
2621 | ||
106ee47d MCC |
2622 | .. [2] The maximum value vq for which the above condition is true is |
2623 | max_vq. This is the maximum vector length available to the guest on | |
2624 | this vcpu, and determines which register slices are visible through | |
2625 | this ioctl interface. | |
50036ad0 | 2626 | |
b693d0b3 | 2627 | (See Documentation/arm64/sve.rst for an explanation of the "vq" |
50036ad0 DM |
2628 | nomenclature.) |
2629 | ||
2630 | KVM_REG_ARM64_SVE_VLS is only accessible after KVM_ARM_VCPU_INIT. | |
2631 | KVM_ARM_VCPU_INIT initialises it to the best set of vector lengths that | |
2632 | the host supports. | |
2633 | ||
2634 | Userspace may subsequently modify it if desired until the vcpu's SVE | |
2635 | configuration is finalized using KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE). | |
2636 | ||
2637 | Apart from simply removing all vector lengths from the host set that | |
2638 | exceed some value, support for arbitrarily chosen sets of vector lengths | |
2639 | is hardware-dependent and may not be available. Attempting to configure | |
2640 | an invalid set of vector lengths via KVM_SET_ONE_REG will fail with | |
2641 | EINVAL. | |
2642 | ||
2643 | After the vcpu's SVE configuration is finalized, further attempts to | |
2644 | write this register will fail with EPERM. | |
2645 | ||
fa246c68 RRA |
2646 | arm64 bitmap feature firmware pseudo-registers have the following bit pattern:: |
2647 | ||
2648 | 0x6030 0000 0016 <regno:16> | |
2649 | ||
2650 | The bitmap feature firmware registers exposes the hypercall services that | |
2651 | are available for userspace to configure. The set bits corresponds to the | |
2652 | services that are available for the guests to access. By default, KVM | |
2653 | sets all the supported bits during VM initialization. The userspace can | |
2654 | discover the available services via KVM_GET_ONE_REG, and write back the | |
2655 | bitmap corresponding to the features that it wishes guests to see via | |
2656 | KVM_SET_ONE_REG. | |
2657 | ||
2658 | Note: These registers are immutable once any of the vCPUs of the VM has | |
2659 | run at least once. A KVM_SET_ONE_REG in such a scenario will return | |
2660 | a -EBUSY to userspace. | |
2661 | ||
2662 | (See Documentation/virt/kvm/arm/hypercalls.rst for more details.) | |
2663 | ||
c2d2c21b JH |
2664 | |
2665 | MIPS registers are mapped using the lower 32 bits. The upper 16 of that is | |
2666 | the register group type: | |
2667 | ||
106ee47d MCC |
2668 | MIPS core registers (see above) have the following id bit patterns:: |
2669 | ||
c2d2c21b JH |
2670 | 0x7030 0000 0000 <reg:16> |
2671 | ||
2672 | MIPS CP0 registers (see KVM_REG_MIPS_CP0_* above) have the following id bit | |
106ee47d MCC |
2673 | patterns depending on whether they're 32-bit or 64-bit registers:: |
2674 | ||
c2d2c21b JH |
2675 | 0x7020 0000 0001 00 <reg:5> <sel:3> (32-bit) |
2676 | 0x7030 0000 0001 00 <reg:5> <sel:3> (64-bit) | |
2677 | ||
013044cc JH |
2678 | Note: KVM_REG_MIPS_CP0_ENTRYLO0 and KVM_REG_MIPS_CP0_ENTRYLO1 are the MIPS64 |
2679 | versions of the EntryLo registers regardless of the word size of the host | |
2680 | hardware, host kernel, guest, and whether XPA is present in the guest, i.e. | |
2681 | with the RI and XI bits (if they exist) in bits 63 and 62 respectively, and | |
2682 | the PFNX field starting at bit 30. | |
2683 | ||
d42a008f | 2684 | MIPS MAARs (see KVM_REG_MIPS_CP0_MAAR(*) above) have the following id bit |
106ee47d MCC |
2685 | patterns:: |
2686 | ||
d42a008f JH |
2687 | 0x7030 0000 0001 01 <reg:8> |
2688 | ||
106ee47d MCC |
2689 | MIPS KVM control registers (see above) have the following id bit patterns:: |
2690 | ||
c2d2c21b JH |
2691 | 0x7030 0000 0002 <reg:16> |
2692 | ||
379245cd JH |
2693 | MIPS FPU registers (see KVM_REG_MIPS_FPR_{32,64}() above) have the following |
2694 | id bit patterns depending on the size of the register being accessed. They are | |
2695 | always accessed according to the current guest FPU mode (Status.FR and | |
2696 | Config5.FRE), i.e. as the guest would see them, and they become unpredictable | |
ab86bd60 JH |
2697 | if the guest FPU mode is changed. MIPS SIMD Architecture (MSA) vector |
2698 | registers (see KVM_REG_MIPS_VEC_128() above) have similar patterns as they | |
106ee47d MCC |
2699 | overlap the FPU registers:: |
2700 | ||
379245cd JH |
2701 | 0x7020 0000 0003 00 <0:3> <reg:5> (32-bit FPU registers) |
2702 | 0x7030 0000 0003 00 <0:3> <reg:5> (64-bit FPU registers) | |
ab86bd60 | 2703 | 0x7040 0000 0003 00 <0:3> <reg:5> (128-bit MSA vector registers) |
379245cd JH |
2704 | |
2705 | MIPS FPU control registers (see KVM_REG_MIPS_FCR_{IR,CSR} above) have the | |
106ee47d MCC |
2706 | following id bit patterns:: |
2707 | ||
379245cd JH |
2708 | 0x7020 0000 0003 01 <0:3> <reg:5> |
2709 | ||
ab86bd60 | 2710 | MIPS MSA control registers (see KVM_REG_MIPS_MSA_{IR,CSR} above) have the |
106ee47d MCC |
2711 | following id bit patterns:: |
2712 | ||
ab86bd60 JH |
2713 | 0x7020 0000 0003 02 <0:3> <reg:5> |
2714 | ||
da40d858 AP |
2715 | RISC-V registers are mapped using the lower 32 bits. The upper 8 bits of |
2716 | that is the register group type. | |
2717 | ||
2718 | RISC-V config registers are meant for configuring a Guest VCPU and it has | |
2719 | the following id bit patterns:: | |
2720 | ||
2721 | 0x8020 0000 01 <index into the kvm_riscv_config struct:24> (32bit Host) | |
2722 | 0x8030 0000 01 <index into the kvm_riscv_config struct:24> (64bit Host) | |
2723 | ||
2724 | Following are the RISC-V config registers: | |
2725 | ||
2726 | ======================= ========= ============================================= | |
2727 | Encoding Register Description | |
2728 | ======================= ========= ============================================= | |
2729 | 0x80x0 0000 0100 0000 isa ISA feature bitmap of Guest VCPU | |
2730 | ======================= ========= ============================================= | |
2731 | ||
2732 | The isa config register can be read anytime but can only be written before | |
2733 | a Guest VCPU runs. It will have ISA feature bits matching underlying host | |
2734 | set by default. | |
2735 | ||
2736 | RISC-V core registers represent the general excution state of a Guest VCPU | |
2737 | and it has the following id bit patterns:: | |
2738 | ||
2739 | 0x8020 0000 02 <index into the kvm_riscv_core struct:24> (32bit Host) | |
2740 | 0x8030 0000 02 <index into the kvm_riscv_core struct:24> (64bit Host) | |
2741 | ||
2742 | Following are the RISC-V core registers: | |
2743 | ||
2744 | ======================= ========= ============================================= | |
2745 | Encoding Register Description | |
2746 | ======================= ========= ============================================= | |
2747 | 0x80x0 0000 0200 0000 regs.pc Program counter | |
2748 | 0x80x0 0000 0200 0001 regs.ra Return address | |
2749 | 0x80x0 0000 0200 0002 regs.sp Stack pointer | |
2750 | 0x80x0 0000 0200 0003 regs.gp Global pointer | |
2751 | 0x80x0 0000 0200 0004 regs.tp Task pointer | |
2752 | 0x80x0 0000 0200 0005 regs.t0 Caller saved register 0 | |
2753 | 0x80x0 0000 0200 0006 regs.t1 Caller saved register 1 | |
2754 | 0x80x0 0000 0200 0007 regs.t2 Caller saved register 2 | |
2755 | 0x80x0 0000 0200 0008 regs.s0 Callee saved register 0 | |
2756 | 0x80x0 0000 0200 0009 regs.s1 Callee saved register 1 | |
2757 | 0x80x0 0000 0200 000a regs.a0 Function argument (or return value) 0 | |
2758 | 0x80x0 0000 0200 000b regs.a1 Function argument (or return value) 1 | |
2759 | 0x80x0 0000 0200 000c regs.a2 Function argument 2 | |
2760 | 0x80x0 0000 0200 000d regs.a3 Function argument 3 | |
2761 | 0x80x0 0000 0200 000e regs.a4 Function argument 4 | |
2762 | 0x80x0 0000 0200 000f regs.a5 Function argument 5 | |
2763 | 0x80x0 0000 0200 0010 regs.a6 Function argument 6 | |
2764 | 0x80x0 0000 0200 0011 regs.a7 Function argument 7 | |
2765 | 0x80x0 0000 0200 0012 regs.s2 Callee saved register 2 | |
2766 | 0x80x0 0000 0200 0013 regs.s3 Callee saved register 3 | |
2767 | 0x80x0 0000 0200 0014 regs.s4 Callee saved register 4 | |
2768 | 0x80x0 0000 0200 0015 regs.s5 Callee saved register 5 | |
2769 | 0x80x0 0000 0200 0016 regs.s6 Callee saved register 6 | |
2770 | 0x80x0 0000 0200 0017 regs.s7 Callee saved register 7 | |
2771 | 0x80x0 0000 0200 0018 regs.s8 Callee saved register 8 | |
2772 | 0x80x0 0000 0200 0019 regs.s9 Callee saved register 9 | |
2773 | 0x80x0 0000 0200 001a regs.s10 Callee saved register 10 | |
2774 | 0x80x0 0000 0200 001b regs.s11 Callee saved register 11 | |
2775 | 0x80x0 0000 0200 001c regs.t3 Caller saved register 3 | |
2776 | 0x80x0 0000 0200 001d regs.t4 Caller saved register 4 | |
2777 | 0x80x0 0000 0200 001e regs.t5 Caller saved register 5 | |
2778 | 0x80x0 0000 0200 001f regs.t6 Caller saved register 6 | |
2779 | 0x80x0 0000 0200 0020 mode Privilege mode (1 = S-mode or 0 = U-mode) | |
2780 | ======================= ========= ============================================= | |
2781 | ||
2782 | RISC-V csr registers represent the supervisor mode control/status registers | |
2783 | of a Guest VCPU and it has the following id bit patterns:: | |
2784 | ||
2785 | 0x8020 0000 03 <index into the kvm_riscv_csr struct:24> (32bit Host) | |
2786 | 0x8030 0000 03 <index into the kvm_riscv_csr struct:24> (64bit Host) | |
2787 | ||
2788 | Following are the RISC-V csr registers: | |
2789 | ||
2790 | ======================= ========= ============================================= | |
2791 | Encoding Register Description | |
2792 | ======================= ========= ============================================= | |
2793 | 0x80x0 0000 0300 0000 sstatus Supervisor status | |
2794 | 0x80x0 0000 0300 0001 sie Supervisor interrupt enable | |
2795 | 0x80x0 0000 0300 0002 stvec Supervisor trap vector base | |
2796 | 0x80x0 0000 0300 0003 sscratch Supervisor scratch register | |
2797 | 0x80x0 0000 0300 0004 sepc Supervisor exception program counter | |
2798 | 0x80x0 0000 0300 0005 scause Supervisor trap cause | |
2799 | 0x80x0 0000 0300 0006 stval Supervisor bad address or instruction | |
2800 | 0x80x0 0000 0300 0007 sip Supervisor interrupt pending | |
2801 | 0x80x0 0000 0300 0008 satp Supervisor address translation and protection | |
2802 | ======================= ========= ============================================= | |
2803 | ||
2804 | RISC-V timer registers represent the timer state of a Guest VCPU and it has | |
2805 | the following id bit patterns:: | |
2806 | ||
2807 | 0x8030 0000 04 <index into the kvm_riscv_timer struct:24> | |
2808 | ||
2809 | Following are the RISC-V timer registers: | |
2810 | ||
2811 | ======================= ========= ============================================= | |
2812 | Encoding Register Description | |
2813 | ======================= ========= ============================================= | |
2814 | 0x8030 0000 0400 0000 frequency Time base frequency (read-only) | |
2815 | 0x8030 0000 0400 0001 time Time value visible to Guest | |
2816 | 0x8030 0000 0400 0002 compare Time compare programmed by Guest | |
2817 | 0x8030 0000 0400 0003 state Time compare state (1 = ON or 0 = OFF) | |
2818 | ======================= ========= ============================================= | |
2819 | ||
2820 | RISC-V F-extension registers represent the single precision floating point | |
2821 | state of a Guest VCPU and it has the following id bit patterns:: | |
2822 | ||
2823 | 0x8020 0000 05 <index into the __riscv_f_ext_state struct:24> | |
2824 | ||
2825 | Following are the RISC-V F-extension registers: | |
2826 | ||
2827 | ======================= ========= ============================================= | |
2828 | Encoding Register Description | |
2829 | ======================= ========= ============================================= | |
2830 | 0x8020 0000 0500 0000 f[0] Floating point register 0 | |
2831 | ... | |
2832 | 0x8020 0000 0500 001f f[31] Floating point register 31 | |
2833 | 0x8020 0000 0500 0020 fcsr Floating point control and status register | |
2834 | ======================= ========= ============================================= | |
2835 | ||
2836 | RISC-V D-extension registers represent the double precision floating point | |
2837 | state of a Guest VCPU and it has the following id bit patterns:: | |
2838 | ||
2839 | 0x8020 0000 06 <index into the __riscv_d_ext_state struct:24> (fcsr) | |
2840 | 0x8030 0000 06 <index into the __riscv_d_ext_state struct:24> (non-fcsr) | |
2841 | ||
2842 | Following are the RISC-V D-extension registers: | |
2843 | ||
2844 | ======================= ========= ============================================= | |
2845 | Encoding Register Description | |
2846 | ======================= ========= ============================================= | |
2847 | 0x8030 0000 0600 0000 f[0] Floating point register 0 | |
2848 | ... | |
2849 | 0x8030 0000 0600 001f f[31] Floating point register 31 | |
2850 | 0x8020 0000 0600 0020 fcsr Floating point control and status register | |
2851 | ======================= ========= ============================================= | |
2852 | ||
c2d2c21b | 2853 | |
e24ed81f | 2854 | 4.69 KVM_GET_ONE_REG |
106ee47d MCC |
2855 | -------------------- |
2856 | ||
2857 | :Capability: KVM_CAP_ONE_REG | |
2858 | :Architectures: all | |
2859 | :Type: vcpu ioctl | |
2860 | :Parameters: struct kvm_one_reg (in and out) | |
2861 | :Returns: 0 on success, negative value on failure | |
e24ed81f | 2862 | |
fe365b4e | 2863 | Errors include: |
106ee47d MCC |
2864 | |
2865 | ======== ============================================================ | |
3b1c8c56 MCC |
2866 | ENOENT no such register |
2867 | EINVAL invalid register ID, or no such register or used with VMs in | |
68cf7b1f | 2868 | protected virtualization mode on s390 |
3b1c8c56 | 2869 | EPERM (arm64) register access not allowed before vcpu finalization |
106ee47d MCC |
2870 | ======== ============================================================ |
2871 | ||
fe365b4e DM |
2872 | (These error codes are indicative only: do not rely on a specific error |
2873 | code being returned in a specific situation.) | |
e24ed81f AG |
2874 | |
2875 | This ioctl allows to receive the value of a single register implemented | |
2876 | in a vcpu. The register to read is indicated by the "id" field of the | |
2877 | kvm_one_reg struct passed in. On success, the register value can be found | |
2878 | at the memory location pointed to by "addr". | |
2879 | ||
2880 | The list of registers accessible using this interface is identical to the | |
2e232702 | 2881 | list in 4.68. |
e24ed81f | 2882 | |
414fa985 | 2883 | |
1c0b28c2 | 2884 | 4.70 KVM_KVMCLOCK_CTRL |
106ee47d | 2885 | ---------------------- |
1c0b28c2 | 2886 | |
106ee47d MCC |
2887 | :Capability: KVM_CAP_KVMCLOCK_CTRL |
2888 | :Architectures: Any that implement pvclocks (currently x86 only) | |
2889 | :Type: vcpu ioctl | |
2890 | :Parameters: None | |
2891 | :Returns: 0 on success, -1 on error | |
1c0b28c2 | 2892 | |
35c59990 JA |
2893 | This ioctl sets a flag accessible to the guest indicating that the specified |
2894 | vCPU has been paused by the host userspace. | |
2895 | ||
2896 | The host will set a flag in the pvclock structure that is checked from the | |
2897 | soft lockup watchdog. The flag is part of the pvclock structure that is | |
2898 | shared between guest and host, specifically the second bit of the flags | |
1c0b28c2 EM |
2899 | field of the pvclock_vcpu_time_info structure. It will be set exclusively by |
2900 | the host and read/cleared exclusively by the guest. The guest operation of | |
35c59990 | 2901 | checking and clearing the flag must be an atomic operation so |
1c0b28c2 EM |
2902 | load-link/store-conditional, or equivalent must be used. There are two cases |
2903 | where the guest will clear the flag: when the soft lockup watchdog timer resets | |
2904 | itself or when a soft lockup is detected. This ioctl can be called any time | |
2905 | after pausing the vcpu, but before it is resumed. | |
2906 | ||
414fa985 | 2907 | |
07975ad3 | 2908 | 4.71 KVM_SIGNAL_MSI |
106ee47d | 2909 | ------------------- |
07975ad3 | 2910 | |
106ee47d | 2911 | :Capability: KVM_CAP_SIGNAL_MSI |
3fbf4207 | 2912 | :Architectures: x86 arm64 |
106ee47d MCC |
2913 | :Type: vm ioctl |
2914 | :Parameters: struct kvm_msi (in) | |
2915 | :Returns: >0 on delivery, 0 if guest blocked the MSI, and -1 on error | |
07975ad3 JK |
2916 | |
2917 | Directly inject a MSI message. Only valid with in-kernel irqchip that handles | |
2918 | MSI messages. | |
2919 | ||
106ee47d MCC |
2920 | :: |
2921 | ||
2922 | struct kvm_msi { | |
07975ad3 JK |
2923 | __u32 address_lo; |
2924 | __u32 address_hi; | |
2925 | __u32 data; | |
2926 | __u32 flags; | |
2b8ddd93 AP |
2927 | __u32 devid; |
2928 | __u8 pad[12]; | |
106ee47d | 2929 | }; |
07975ad3 | 2930 | |
106ee47d MCC |
2931 | flags: |
2932 | KVM_MSI_VALID_DEVID: devid contains a valid value. The per-VM | |
6f49b2f3 PB |
2933 | KVM_CAP_MSI_DEVID capability advertises the requirement to provide |
2934 | the device ID. If this capability is not available, userspace | |
2935 | should never set the KVM_MSI_VALID_DEVID flag as the ioctl might fail. | |
2b8ddd93 | 2936 | |
6f49b2f3 PB |
2937 | If KVM_MSI_VALID_DEVID is set, devid contains a unique device identifier |
2938 | for the device that wrote the MSI message. For PCI, this is usually a | |
2939 | BFD identifier in the lower 16 bits. | |
07975ad3 | 2940 | |
055b6ae9 PB |
2941 | On x86, address_hi is ignored unless the KVM_X2APIC_API_USE_32BIT_IDS |
2942 | feature of KVM_CAP_X2APIC_API capability is enabled. If it is enabled, | |
2943 | address_hi bits 31-8 provide bits 31-8 of the destination id. Bits 7-0 of | |
2944 | address_hi must be zero. | |
37131313 | 2945 | |
414fa985 | 2946 | |
0589ff6c | 2947 | 4.71 KVM_CREATE_PIT2 |
106ee47d | 2948 | -------------------- |
0589ff6c | 2949 | |
106ee47d MCC |
2950 | :Capability: KVM_CAP_PIT2 |
2951 | :Architectures: x86 | |
2952 | :Type: vm ioctl | |
2953 | :Parameters: struct kvm_pit_config (in) | |
2954 | :Returns: 0 on success, -1 on error | |
0589ff6c JK |
2955 | |
2956 | Creates an in-kernel device model for the i8254 PIT. This call is only valid | |
2957 | after enabling in-kernel irqchip support via KVM_CREATE_IRQCHIP. The following | |
106ee47d | 2958 | parameters have to be passed:: |
0589ff6c | 2959 | |
106ee47d | 2960 | struct kvm_pit_config { |
0589ff6c JK |
2961 | __u32 flags; |
2962 | __u32 pad[15]; | |
106ee47d | 2963 | }; |
0589ff6c | 2964 | |
106ee47d | 2965 | Valid flags are:: |
0589ff6c | 2966 | |
106ee47d | 2967 | #define KVM_PIT_SPEAKER_DUMMY 1 /* emulate speaker port stub */ |
0589ff6c | 2968 | |
b6ddf05f | 2969 | PIT timer interrupts may use a per-VM kernel thread for injection. If it |
106ee47d | 2970 | exists, this thread will have a name of the following pattern:: |
b6ddf05f | 2971 | |
106ee47d | 2972 | kvm-pit/<owner-process-pid> |
b6ddf05f JK |
2973 | |
2974 | When running a guest with elevated priorities, the scheduling parameters of | |
2975 | this thread may have to be adjusted accordingly. | |
2976 | ||
0589ff6c JK |
2977 | This IOCTL replaces the obsolete KVM_CREATE_PIT. |
2978 | ||
2979 | ||
2980 | 4.72 KVM_GET_PIT2 | |
106ee47d | 2981 | ----------------- |
0589ff6c | 2982 | |
106ee47d MCC |
2983 | :Capability: KVM_CAP_PIT_STATE2 |
2984 | :Architectures: x86 | |
2985 | :Type: vm ioctl | |
2986 | :Parameters: struct kvm_pit_state2 (out) | |
2987 | :Returns: 0 on success, -1 on error | |
0589ff6c JK |
2988 | |
2989 | Retrieves the state of the in-kernel PIT model. Only valid after | |
106ee47d | 2990 | KVM_CREATE_PIT2. The state is returned in the following structure:: |
0589ff6c | 2991 | |
106ee47d | 2992 | struct kvm_pit_state2 { |
0589ff6c JK |
2993 | struct kvm_pit_channel_state channels[3]; |
2994 | __u32 flags; | |
2995 | __u32 reserved[9]; | |
106ee47d | 2996 | }; |
0589ff6c | 2997 | |
106ee47d | 2998 | Valid flags are:: |
0589ff6c | 2999 | |
106ee47d MCC |
3000 | /* disable PIT in HPET legacy mode */ |
3001 | #define KVM_PIT_FLAGS_HPET_LEGACY 0x00000001 | |
0589ff6c JK |
3002 | |
3003 | This IOCTL replaces the obsolete KVM_GET_PIT. | |
3004 | ||
3005 | ||
3006 | 4.73 KVM_SET_PIT2 | |
106ee47d | 3007 | ----------------- |
0589ff6c | 3008 | |
106ee47d MCC |
3009 | :Capability: KVM_CAP_PIT_STATE2 |
3010 | :Architectures: x86 | |
3011 | :Type: vm ioctl | |
3012 | :Parameters: struct kvm_pit_state2 (in) | |
3013 | :Returns: 0 on success, -1 on error | |
0589ff6c JK |
3014 | |
3015 | Sets the state of the in-kernel PIT model. Only valid after KVM_CREATE_PIT2. | |
3016 | See KVM_GET_PIT2 for details on struct kvm_pit_state2. | |
3017 | ||
3018 | This IOCTL replaces the obsolete KVM_SET_PIT. | |
3019 | ||
3020 | ||
5b74716e | 3021 | 4.74 KVM_PPC_GET_SMMU_INFO |
106ee47d | 3022 | -------------------------- |
5b74716e | 3023 | |
106ee47d MCC |
3024 | :Capability: KVM_CAP_PPC_GET_SMMU_INFO |
3025 | :Architectures: powerpc | |
3026 | :Type: vm ioctl | |
3027 | :Parameters: None | |
3028 | :Returns: 0 on success, -1 on error | |
5b74716e BH |
3029 | |
3030 | This populates and returns a structure describing the features of | |
3031 | the "Server" class MMU emulation supported by KVM. | |
cc22c354 | 3032 | This can in turn be used by userspace to generate the appropriate |
5b74716e BH |
3033 | device-tree properties for the guest operating system. |
3034 | ||
c98be0c9 | 3035 | The structure contains some global information, followed by an |
106ee47d | 3036 | array of supported segment page sizes:: |
5b74716e BH |
3037 | |
3038 | struct kvm_ppc_smmu_info { | |
3039 | __u64 flags; | |
3040 | __u32 slb_size; | |
3041 | __u32 pad; | |
3042 | struct kvm_ppc_one_seg_page_size sps[KVM_PPC_PAGE_SIZES_MAX_SZ]; | |
3043 | }; | |
3044 | ||
3045 | The supported flags are: | |
3046 | ||
3047 | - KVM_PPC_PAGE_SIZES_REAL: | |
3048 | When that flag is set, guest page sizes must "fit" the backing | |
3049 | store page sizes. When not set, any page size in the list can | |
3050 | be used regardless of how they are backed by userspace. | |
3051 | ||
3052 | - KVM_PPC_1T_SEGMENTS | |
3053 | The emulated MMU supports 1T segments in addition to the | |
3054 | standard 256M ones. | |
3055 | ||
901f8c3f PM |
3056 | - KVM_PPC_NO_HASH |
3057 | This flag indicates that HPT guests are not supported by KVM, | |
3058 | thus all guests must use radix MMU mode. | |
3059 | ||
5b74716e BH |
3060 | The "slb_size" field indicates how many SLB entries are supported |
3061 | ||
3062 | The "sps" array contains 8 entries indicating the supported base | |
3063 | page sizes for a segment in increasing order. Each entry is defined | |
106ee47d | 3064 | as follow:: |
5b74716e BH |
3065 | |
3066 | struct kvm_ppc_one_seg_page_size { | |
3067 | __u32 page_shift; /* Base page shift of segment (or 0) */ | |
3068 | __u32 slb_enc; /* SLB encoding for BookS */ | |
3069 | struct kvm_ppc_one_page_size enc[KVM_PPC_PAGE_SIZES_MAX_SZ]; | |
3070 | }; | |
3071 | ||
3072 | An entry with a "page_shift" of 0 is unused. Because the array is | |
3073 | organized in increasing order, a lookup can stop when encoutering | |
3074 | such an entry. | |
3075 | ||
3076 | The "slb_enc" field provides the encoding to use in the SLB for the | |
3077 | page size. The bits are in positions such as the value can directly | |
3078 | be OR'ed into the "vsid" argument of the slbmte instruction. | |
3079 | ||
3080 | The "enc" array is a list which for each of those segment base page | |
3081 | size provides the list of supported actual page sizes (which can be | |
3082 | only larger or equal to the base page size), along with the | |
f884ab15 | 3083 | corresponding encoding in the hash PTE. Similarly, the array is |
5b74716e | 3084 | 8 entries sorted by increasing sizes and an entry with a "0" shift |
106ee47d | 3085 | is an empty entry and a terminator:: |
5b74716e BH |
3086 | |
3087 | struct kvm_ppc_one_page_size { | |
3088 | __u32 page_shift; /* Page shift (or 0) */ | |
3089 | __u32 pte_enc; /* Encoding in the HPTE (>>12) */ | |
3090 | }; | |
3091 | ||
3092 | The "pte_enc" field provides a value that can OR'ed into the hash | |
3093 | PTE's RPN field (ie, it needs to be shifted left by 12 to OR it | |
3094 | into the hash PTE second double word). | |
3095 | ||
f36992e3 | 3096 | 4.75 KVM_IRQFD |
106ee47d | 3097 | -------------- |
f36992e3 | 3098 | |
106ee47d | 3099 | :Capability: KVM_CAP_IRQFD |
3fbf4207 | 3100 | :Architectures: x86 s390 arm64 |
106ee47d MCC |
3101 | :Type: vm ioctl |
3102 | :Parameters: struct kvm_irqfd (in) | |
3103 | :Returns: 0 on success, -1 on error | |
f36992e3 AW |
3104 | |
3105 | Allows setting an eventfd to directly trigger a guest interrupt. | |
3106 | kvm_irqfd.fd specifies the file descriptor to use as the eventfd and | |
3107 | kvm_irqfd.gsi specifies the irqchip pin toggled by this event. When | |
17180032 | 3108 | an event is triggered on the eventfd, an interrupt is injected into |
f36992e3 AW |
3109 | the guest using the specified gsi pin. The irqfd is removed using |
3110 | the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd | |
3111 | and kvm_irqfd.gsi. | |
3112 | ||
7a84428a AW |
3113 | With KVM_CAP_IRQFD_RESAMPLE, KVM_IRQFD supports a de-assert and notify |
3114 | mechanism allowing emulation of level-triggered, irqfd-based | |
3115 | interrupts. When KVM_IRQFD_FLAG_RESAMPLE is set the user must pass an | |
3116 | additional eventfd in the kvm_irqfd.resamplefd field. When operating | |
3117 | in resample mode, posting of an interrupt through kvm_irq.fd asserts | |
3118 | the specified gsi in the irqchip. When the irqchip is resampled, such | |
17180032 | 3119 | as from an EOI, the gsi is de-asserted and the user is notified via |
7a84428a AW |
3120 | kvm_irqfd.resamplefd. It is the user's responsibility to re-queue |
3121 | the interrupt if the device making use of it still requires service. | |
3122 | Note that closing the resamplefd is not sufficient to disable the | |
3123 | irqfd. The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment | |
3124 | and need not be specified with KVM_IRQFD_FLAG_DEASSIGN. | |
3125 | ||
3fbf4207 | 3126 | On arm64, gsi routing being supported, the following can happen: |
106ee47d | 3127 | |
180ae7b1 EA |
3128 | - in case no routing entry is associated to this gsi, injection fails |
3129 | - in case the gsi is associated to an irqchip routing entry, | |
3130 | irqchip.pin + 32 corresponds to the injected SPI ID. | |
995a0ee9 EA |
3131 | - in case the gsi is associated to an MSI routing entry, the MSI |
3132 | message and device ID are translated into an LPI (support restricted | |
3133 | to GICv3 ITS in-kernel emulation). | |
174178fe | 3134 | |
5fecc9d8 | 3135 | 4.76 KVM_PPC_ALLOCATE_HTAB |
106ee47d | 3136 | -------------------------- |
32fad281 | 3137 | |
106ee47d MCC |
3138 | :Capability: KVM_CAP_PPC_ALLOC_HTAB |
3139 | :Architectures: powerpc | |
3140 | :Type: vm ioctl | |
3141 | :Parameters: Pointer to u32 containing hash table order (in/out) | |
3142 | :Returns: 0 on success, -1 on error | |
32fad281 PM |
3143 | |
3144 | This requests the host kernel to allocate an MMU hash table for a | |
3145 | guest using the PAPR paravirtualization interface. This only does | |
3146 | anything if the kernel is configured to use the Book 3S HV style of | |
3147 | virtualization. Otherwise the capability doesn't exist and the ioctl | |
3148 | returns an ENOTTY error. The rest of this description assumes Book 3S | |
3149 | HV. | |
3150 | ||
3151 | There must be no vcpus running when this ioctl is called; if there | |
3152 | are, it will do nothing and return an EBUSY error. | |
3153 | ||
3154 | The parameter is a pointer to a 32-bit unsigned integer variable | |
3155 | containing the order (log base 2) of the desired size of the hash | |
3156 | table, which must be between 18 and 46. On successful return from the | |
f98a8bf9 | 3157 | ioctl, the value will not be changed by the kernel. |
32fad281 PM |
3158 | |
3159 | If no hash table has been allocated when any vcpu is asked to run | |
3160 | (with the KVM_RUN ioctl), the host kernel will allocate a | |
3161 | default-sized hash table (16 MB). | |
3162 | ||
3163 | If this ioctl is called when a hash table has already been allocated, | |
f98a8bf9 DG |
3164 | with a different order from the existing hash table, the existing hash |
3165 | table will be freed and a new one allocated. If this is ioctl is | |
3166 | called when a hash table has already been allocated of the same order | |
3167 | as specified, the kernel will clear out the existing hash table (zero | |
3168 | all HPTEs). In either case, if the guest is using the virtualized | |
3169 | real-mode area (VRMA) facility, the kernel will re-create the VMRA | |
3170 | HPTEs on the next KVM_RUN of any vcpu. | |
32fad281 | 3171 | |
416ad65f | 3172 | 4.77 KVM_S390_INTERRUPT |
106ee47d | 3173 | ----------------------- |
416ad65f | 3174 | |
106ee47d MCC |
3175 | :Capability: basic |
3176 | :Architectures: s390 | |
3177 | :Type: vm ioctl, vcpu ioctl | |
3178 | :Parameters: struct kvm_s390_interrupt (in) | |
3179 | :Returns: 0 on success, -1 on error | |
416ad65f CH |
3180 | |
3181 | Allows to inject an interrupt to the guest. Interrupts can be floating | |
3182 | (vm ioctl) or per cpu (vcpu ioctl), depending on the interrupt type. | |
3183 | ||
106ee47d | 3184 | Interrupt parameters are passed via kvm_s390_interrupt:: |
416ad65f | 3185 | |
106ee47d | 3186 | struct kvm_s390_interrupt { |
416ad65f CH |
3187 | __u32 type; |
3188 | __u32 parm; | |
3189 | __u64 parm64; | |
106ee47d | 3190 | }; |
416ad65f CH |
3191 | |
3192 | type can be one of the following: | |
3193 | ||
106ee47d MCC |
3194 | KVM_S390_SIGP_STOP (vcpu) |
3195 | - sigp stop; optional flags in parm | |
3196 | KVM_S390_PROGRAM_INT (vcpu) | |
3197 | - program check; code in parm | |
3198 | KVM_S390_SIGP_SET_PREFIX (vcpu) | |
3199 | - sigp set prefix; prefix address in parm | |
3200 | KVM_S390_RESTART (vcpu) | |
3201 | - restart | |
3202 | KVM_S390_INT_CLOCK_COMP (vcpu) | |
3203 | - clock comparator interrupt | |
3204 | KVM_S390_INT_CPU_TIMER (vcpu) | |
3205 | - CPU timer interrupt | |
3206 | KVM_S390_INT_VIRTIO (vm) | |
3207 | - virtio external interrupt; external interrupt | |
3208 | parameters in parm and parm64 | |
3209 | KVM_S390_INT_SERVICE (vm) | |
3210 | - sclp external interrupt; sclp parameter in parm | |
3211 | KVM_S390_INT_EMERGENCY (vcpu) | |
3212 | - sigp emergency; source cpu in parm | |
3213 | KVM_S390_INT_EXTERNAL_CALL (vcpu) | |
3214 | - sigp external call; source cpu in parm | |
3215 | KVM_S390_INT_IO(ai,cssid,ssid,schid) (vm) | |
3216 | - compound value to indicate an | |
3217 | I/O interrupt (ai - adapter interrupt; cssid,ssid,schid - subchannel); | |
3218 | I/O interruption parameters in parm (subchannel) and parm64 (intparm, | |
3219 | interruption subclass) | |
3220 | KVM_S390_MCHK (vm, vcpu) | |
3221 | - machine check interrupt; cr 14 bits in parm, machine check interrupt | |
3222 | code in parm64 (note that machine checks needing further payload are not | |
3223 | supported by this ioctl) | |
416ad65f | 3224 | |
5e124900 | 3225 | This is an asynchronous vcpu ioctl and can be invoked from any thread. |
416ad65f | 3226 | |
a2932923 | 3227 | 4.78 KVM_PPC_GET_HTAB_FD |
106ee47d | 3228 | ------------------------ |
a2932923 | 3229 | |
106ee47d MCC |
3230 | :Capability: KVM_CAP_PPC_HTAB_FD |
3231 | :Architectures: powerpc | |
3232 | :Type: vm ioctl | |
3233 | :Parameters: Pointer to struct kvm_get_htab_fd (in) | |
3234 | :Returns: file descriptor number (>= 0) on success, -1 on error | |
a2932923 PM |
3235 | |
3236 | This returns a file descriptor that can be used either to read out the | |
3237 | entries in the guest's hashed page table (HPT), or to write entries to | |
3238 | initialize the HPT. The returned fd can only be written to if the | |
3239 | KVM_GET_HTAB_WRITE bit is set in the flags field of the argument, and | |
3240 | can only be read if that bit is clear. The argument struct looks like | |
106ee47d | 3241 | this:: |
a2932923 | 3242 | |
106ee47d MCC |
3243 | /* For KVM_PPC_GET_HTAB_FD */ |
3244 | struct kvm_get_htab_fd { | |
a2932923 PM |
3245 | __u64 flags; |
3246 | __u64 start_index; | |
3247 | __u64 reserved[2]; | |
106ee47d | 3248 | }; |
a2932923 | 3249 | |
106ee47d MCC |
3250 | /* Values for kvm_get_htab_fd.flags */ |
3251 | #define KVM_GET_HTAB_BOLTED_ONLY ((__u64)0x1) | |
3252 | #define KVM_GET_HTAB_WRITE ((__u64)0x2) | |
a2932923 | 3253 | |
106ee47d | 3254 | The 'start_index' field gives the index in the HPT of the entry at |
a2932923 PM |
3255 | which to start reading. It is ignored when writing. |
3256 | ||
3257 | Reads on the fd will initially supply information about all | |
3258 | "interesting" HPT entries. Interesting entries are those with the | |
3259 | bolted bit set, if the KVM_GET_HTAB_BOLTED_ONLY bit is set, otherwise | |
3260 | all entries. When the end of the HPT is reached, the read() will | |
3261 | return. If read() is called again on the fd, it will start again from | |
3262 | the beginning of the HPT, but will only return HPT entries that have | |
3263 | changed since they were last read. | |
3264 | ||
3265 | Data read or written is structured as a header (8 bytes) followed by a | |
3266 | series of valid HPT entries (16 bytes) each. The header indicates how | |
3267 | many valid HPT entries there are and how many invalid entries follow | |
3268 | the valid entries. The invalid entries are not represented explicitly | |
106ee47d | 3269 | in the stream. The header format is:: |
a2932923 | 3270 | |
106ee47d | 3271 | struct kvm_get_htab_header { |
a2932923 PM |
3272 | __u32 index; |
3273 | __u16 n_valid; | |
3274 | __u16 n_invalid; | |
106ee47d | 3275 | }; |
a2932923 PM |
3276 | |
3277 | Writes to the fd create HPT entries starting at the index given in the | |
106ee47d MCC |
3278 | header; first 'n_valid' valid entries with contents from the data |
3279 | written, then 'n_invalid' invalid entries, invalidating any previously | |
a2932923 PM |
3280 | valid entries found. |
3281 | ||
852b6d57 | 3282 | 4.79 KVM_CREATE_DEVICE |
106ee47d MCC |
3283 | ---------------------- |
3284 | ||
3285 | :Capability: KVM_CAP_DEVICE_CTRL | |
3286 | :Type: vm ioctl | |
3287 | :Parameters: struct kvm_create_device (in/out) | |
3288 | :Returns: 0 on success, -1 on error | |
852b6d57 | 3289 | |
852b6d57 | 3290 | Errors: |
106ee47d MCC |
3291 | |
3292 | ====== ======================================================= | |
3293 | ENODEV The device type is unknown or unsupported | |
3294 | EEXIST Device already created, and this type of device may not | |
852b6d57 | 3295 | be instantiated multiple times |
106ee47d | 3296 | ====== ======================================================= |
852b6d57 SW |
3297 | |
3298 | Other error conditions may be defined by individual device types or | |
3299 | have their standard meanings. | |
3300 | ||
3301 | Creates an emulated device in the kernel. The file descriptor returned | |
3302 | in fd can be used with KVM_SET/GET/HAS_DEVICE_ATTR. | |
3303 | ||
3304 | If the KVM_CREATE_DEVICE_TEST flag is set, only test whether the | |
3305 | device type is supported (not necessarily whether it can be created | |
3306 | in the current vm). | |
3307 | ||
3308 | Individual devices should not define flags. Attributes should be used | |
3309 | for specifying any behavior that is not implied by the device type | |
3310 | number. | |
3311 | ||
106ee47d MCC |
3312 | :: |
3313 | ||
3314 | struct kvm_create_device { | |
852b6d57 SW |
3315 | __u32 type; /* in: KVM_DEV_TYPE_xxx */ |
3316 | __u32 fd; /* out: device handle */ | |
3317 | __u32 flags; /* in: KVM_CREATE_DEVICE_xxx */ | |
106ee47d | 3318 | }; |
852b6d57 SW |
3319 | |
3320 | 4.80 KVM_SET_DEVICE_ATTR/KVM_GET_DEVICE_ATTR | |
106ee47d MCC |
3321 | -------------------------------------------- |
3322 | ||
3323 | :Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device, | |
3324 | KVM_CAP_VCPU_ATTRIBUTES for vcpu device | |
dd6e6312 | 3325 | KVM_CAP_SYS_ATTRIBUTES for system (/dev/kvm) device (no set) |
106ee47d MCC |
3326 | :Type: device ioctl, vm ioctl, vcpu ioctl |
3327 | :Parameters: struct kvm_device_attr | |
3328 | :Returns: 0 on success, -1 on error | |
852b6d57 | 3329 | |
852b6d57 | 3330 | Errors: |
106ee47d MCC |
3331 | |
3332 | ===== ============================================================= | |
3333 | ENXIO The group or attribute is unknown/unsupported for this device | |
f9cbd9b0 | 3334 | or hardware support is missing. |
106ee47d | 3335 | EPERM The attribute cannot (currently) be accessed this way |
852b6d57 SW |
3336 | (e.g. read-only attribute, or attribute that only makes |
3337 | sense when the device is in a different state) | |
106ee47d | 3338 | ===== ============================================================= |
852b6d57 SW |
3339 | |
3340 | Other error conditions may be defined by individual device types. | |
3341 | ||
3342 | Gets/sets a specified piece of device configuration and/or state. The | |
3343 | semantics are device-specific. See individual device documentation in | |
3344 | the "devices" directory. As with ONE_REG, the size of the data | |
3345 | transferred is defined by the particular attribute. | |
3346 | ||
106ee47d MCC |
3347 | :: |
3348 | ||
3349 | struct kvm_device_attr { | |
852b6d57 SW |
3350 | __u32 flags; /* no flags currently defined */ |
3351 | __u32 group; /* device-defined */ | |
3352 | __u64 attr; /* group-defined */ | |
3353 | __u64 addr; /* userspace address of attr data */ | |
106ee47d | 3354 | }; |
852b6d57 SW |
3355 | |
3356 | 4.81 KVM_HAS_DEVICE_ATTR | |
106ee47d MCC |
3357 | ------------------------ |
3358 | ||
3359 | :Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device, | |
dd6e6312 PB |
3360 | KVM_CAP_VCPU_ATTRIBUTES for vcpu device |
3361 | KVM_CAP_SYS_ATTRIBUTES for system (/dev/kvm) device | |
106ee47d MCC |
3362 | :Type: device ioctl, vm ioctl, vcpu ioctl |
3363 | :Parameters: struct kvm_device_attr | |
3364 | :Returns: 0 on success, -1 on error | |
852b6d57 | 3365 | |
852b6d57 | 3366 | Errors: |
106ee47d MCC |
3367 | |
3368 | ===== ============================================================= | |
3369 | ENXIO The group or attribute is unknown/unsupported for this device | |
f9cbd9b0 | 3370 | or hardware support is missing. |
106ee47d | 3371 | ===== ============================================================= |
852b6d57 SW |
3372 | |
3373 | Tests whether a device supports a particular attribute. A successful | |
3374 | return indicates the attribute is implemented. It does not necessarily | |
3375 | indicate that the attribute can be read or written in the device's | |
3376 | current state. "addr" is ignored. | |
f36992e3 | 3377 | |
d8968f1f | 3378 | 4.82 KVM_ARM_VCPU_INIT |
106ee47d MCC |
3379 | ---------------------- |
3380 | ||
3381 | :Capability: basic | |
3fbf4207 | 3382 | :Architectures: arm64 |
106ee47d MCC |
3383 | :Type: vcpu ioctl |
3384 | :Parameters: struct kvm_vcpu_init (in) | |
3385 | :Returns: 0 on success; -1 on error | |
749cf76c | 3386 | |
749cf76c | 3387 | Errors: |
106ee47d MCC |
3388 | |
3389 | ====== ================================================================= | |
3b1c8c56 MCC |
3390 | EINVAL the target is unknown, or the combination of features is invalid. |
3391 | ENOENT a features bit specified is unknown. | |
106ee47d | 3392 | ====== ================================================================= |
749cf76c CD |
3393 | |
3394 | This tells KVM what type of CPU to present to the guest, and what | |
3b1c8c56 MCC |
3395 | optional features it should have. This will cause a reset of the cpu |
3396 | registers to their initial values. If this is not called, KVM_RUN will | |
749cf76c CD |
3397 | return ENOEXEC for that vcpu. |
3398 | ||
5b32a53d MZ |
3399 | The initial values are defined as: |
3400 | - Processor state: | |
3401 | * AArch64: EL1h, D, A, I and F bits set. All other bits | |
3402 | are cleared. | |
3403 | * AArch32: SVC, A, I and F bits set. All other bits are | |
3404 | cleared. | |
3405 | - General Purpose registers, including PC and SP: set to 0 | |
3406 | - FPSIMD/NEON registers: set to 0 | |
3407 | - SVE registers: set to 0 | |
3408 | - System registers: Reset to their architecturally defined | |
3409 | values as for a warm reset to EL1 (resp. SVC) | |
3410 | ||
749cf76c CD |
3411 | Note that because some registers reflect machine topology, all vcpus |
3412 | should be created before this ioctl is invoked. | |
3413 | ||
f7fa034d CD |
3414 | Userspace can call this function multiple times for a given vcpu, including |
3415 | after the vcpu has been run. This will reset the vcpu to its initial | |
3416 | state. All calls to this function after the initial call must use the same | |
3417 | target and same set of feature flags, otherwise EINVAL will be returned. | |
3418 | ||
aa024c2f | 3419 | Possible features: |
106ee47d | 3420 | |
aa024c2f | 3421 | - KVM_ARM_VCPU_POWER_OFF: Starts the CPU in a power-off state. |
3ad8b3de CD |
3422 | Depends on KVM_CAP_ARM_PSCI. If not set, the CPU will be powered on |
3423 | and execute guest code when KVM_RUN is called. | |
379e04c7 MZ |
3424 | - KVM_ARM_VCPU_EL1_32BIT: Starts the CPU in a 32bit mode. |
3425 | Depends on KVM_CAP_ARM_EL1_32BIT (arm64 only). | |
85bd0ba1 MZ |
3426 | - KVM_ARM_VCPU_PSCI_0_2: Emulate PSCI v0.2 (or a future revision |
3427 | backward compatible with v0.2) for the CPU. | |
50bb0c94 | 3428 | Depends on KVM_CAP_ARM_PSCI_0_2. |
808e7381 SZ |
3429 | - KVM_ARM_VCPU_PMU_V3: Emulate PMUv3 for the CPU. |
3430 | Depends on KVM_CAP_ARM_PMU_V3. | |
aa024c2f | 3431 | |
a22fa321 ADK |
3432 | - KVM_ARM_VCPU_PTRAUTH_ADDRESS: Enables Address Pointer authentication |
3433 | for arm64 only. | |
a243c16d ADK |
3434 | Depends on KVM_CAP_ARM_PTRAUTH_ADDRESS. |
3435 | If KVM_CAP_ARM_PTRAUTH_ADDRESS and KVM_CAP_ARM_PTRAUTH_GENERIC are | |
3436 | both present, then both KVM_ARM_VCPU_PTRAUTH_ADDRESS and | |
3437 | KVM_ARM_VCPU_PTRAUTH_GENERIC must be requested or neither must be | |
3438 | requested. | |
a22fa321 ADK |
3439 | |
3440 | - KVM_ARM_VCPU_PTRAUTH_GENERIC: Enables Generic Pointer authentication | |
3441 | for arm64 only. | |
a243c16d ADK |
3442 | Depends on KVM_CAP_ARM_PTRAUTH_GENERIC. |
3443 | If KVM_CAP_ARM_PTRAUTH_ADDRESS and KVM_CAP_ARM_PTRAUTH_GENERIC are | |
3444 | both present, then both KVM_ARM_VCPU_PTRAUTH_ADDRESS and | |
3445 | KVM_ARM_VCPU_PTRAUTH_GENERIC must be requested or neither must be | |
3446 | requested. | |
a22fa321 | 3447 | |
50036ad0 DM |
3448 | - KVM_ARM_VCPU_SVE: Enables SVE for the CPU (arm64 only). |
3449 | Depends on KVM_CAP_ARM_SVE. | |
3450 | Requires KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE): | |
3451 | ||
3452 | * After KVM_ARM_VCPU_INIT: | |
3453 | ||
3454 | - KVM_REG_ARM64_SVE_VLS may be read using KVM_GET_ONE_REG: the | |
3455 | initial value of this pseudo-register indicates the best set of | |
3456 | vector lengths possible for a vcpu on this host. | |
3457 | ||
3458 | * Before KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE): | |
3459 | ||
3460 | - KVM_RUN and KVM_GET_REG_LIST are not available; | |
3461 | ||
3462 | - KVM_GET_ONE_REG and KVM_SET_ONE_REG cannot be used to access | |
3463 | the scalable archietctural SVE registers | |
3464 | KVM_REG_ARM64_SVE_ZREG(), KVM_REG_ARM64_SVE_PREG() or | |
3465 | KVM_REG_ARM64_SVE_FFR; | |
3466 | ||
3467 | - KVM_REG_ARM64_SVE_VLS may optionally be written using | |
3468 | KVM_SET_ONE_REG, to modify the set of vector lengths available | |
3469 | for the vcpu. | |
3470 | ||
3471 | * After KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE): | |
3472 | ||
3473 | - the KVM_REG_ARM64_SVE_VLS pseudo-register is immutable, and can | |
3474 | no longer be written using KVM_SET_ONE_REG. | |
749cf76c | 3475 | |
740edfc0 | 3476 | 4.83 KVM_ARM_PREFERRED_TARGET |
106ee47d MCC |
3477 | ----------------------------- |
3478 | ||
3479 | :Capability: basic | |
3fbf4207 | 3480 | :Architectures: arm64 |
106ee47d | 3481 | :Type: vm ioctl |
a84b757e | 3482 | :Parameters: struct kvm_vcpu_init (out) |
106ee47d | 3483 | :Returns: 0 on success; -1 on error |
740edfc0 | 3484 | |
740edfc0 | 3485 | Errors: |
106ee47d MCC |
3486 | |
3487 | ====== ========================================== | |
3488 | ENODEV no preferred target available for the host | |
3489 | ====== ========================================== | |
740edfc0 AP |
3490 | |
3491 | This queries KVM for preferred CPU target type which can be emulated | |
3492 | by KVM on underlying host. | |
3493 | ||
3494 | The ioctl returns struct kvm_vcpu_init instance containing information | |
3495 | about preferred CPU target type and recommended features for it. The | |
3496 | kvm_vcpu_init->features bitmap returned will have feature bits set if | |
3497 | the preferred target recommends setting these features, but this is | |
3498 | not mandatory. | |
3499 | ||
3500 | The information returned by this ioctl can be used to prepare an instance | |
3501 | of struct kvm_vcpu_init for KVM_ARM_VCPU_INIT ioctl which will result in | |
3747c5d3 | 3502 | VCPU matching underlying host. |
740edfc0 AP |
3503 | |
3504 | ||
3505 | 4.84 KVM_GET_REG_LIST | |
106ee47d MCC |
3506 | --------------------- |
3507 | ||
3508 | :Capability: basic | |
3fbf4207 | 3509 | :Architectures: arm64, mips |
106ee47d MCC |
3510 | :Type: vcpu ioctl |
3511 | :Parameters: struct kvm_reg_list (in/out) | |
3512 | :Returns: 0 on success; -1 on error | |
749cf76c | 3513 | |
749cf76c | 3514 | Errors: |
106ee47d MCC |
3515 | |
3516 | ===== ============================================================== | |
3b1c8c56 MCC |
3517 | E2BIG the reg index list is too big to fit in the array specified by |
3518 | the user (the number required will be written into n). | |
106ee47d MCC |
3519 | ===== ============================================================== |
3520 | ||
3521 | :: | |
749cf76c | 3522 | |
106ee47d | 3523 | struct kvm_reg_list { |
749cf76c CD |
3524 | __u64 n; /* number of registers in reg[] */ |
3525 | __u64 reg[0]; | |
106ee47d | 3526 | }; |
749cf76c CD |
3527 | |
3528 | This ioctl returns the guest registers that are supported for the | |
3529 | KVM_GET_ONE_REG/KVM_SET_ONE_REG calls. | |
3530 | ||
ce01e4e8 CD |
3531 | |
3532 | 4.85 KVM_ARM_SET_DEVICE_ADDR (deprecated) | |
106ee47d MCC |
3533 | ----------------------------------------- |
3534 | ||
3535 | :Capability: KVM_CAP_ARM_SET_DEVICE_ADDR | |
3fbf4207 | 3536 | :Architectures: arm64 |
106ee47d MCC |
3537 | :Type: vm ioctl |
3538 | :Parameters: struct kvm_arm_device_address (in) | |
3539 | :Returns: 0 on success, -1 on error | |
3401d546 | 3540 | |
3401d546 | 3541 | Errors: |
3401d546 | 3542 | |
106ee47d MCC |
3543 | ====== ============================================ |
3544 | ENODEV The device id is unknown | |
3545 | ENXIO Device not supported on current system | |
3546 | EEXIST Address already set | |
3547 | E2BIG Address outside guest physical address space | |
3548 | EBUSY Address overlaps with other device range | |
3549 | ====== ============================================ | |
3550 | ||
3551 | :: | |
3552 | ||
3553 | struct kvm_arm_device_addr { | |
3401d546 CD |
3554 | __u64 id; |
3555 | __u64 addr; | |
106ee47d | 3556 | }; |
3401d546 CD |
3557 | |
3558 | Specify a device address in the guest's physical address space where guests | |
3559 | can access emulated or directly exposed devices, which the host kernel needs | |
3560 | to know about. The id field is an architecture specific identifier for a | |
3561 | specific device. | |
3562 | ||
3fbf4207 | 3563 | arm64 divides the id field into two parts, a device id and an |
106ee47d | 3564 | address type id specific to the individual device:: |
3401d546 | 3565 | |
3b1c8c56 | 3566 | bits: | 63 ... 32 | 31 ... 16 | 15 ... 0 | |
3401d546 CD |
3567 | field: | 0x00000000 | device id | addr type id | |
3568 | ||
3fbf4207 | 3569 | arm64 currently only require this when using the in-kernel GIC |
379e04c7 MZ |
3570 | support for the hardware VGIC features, using KVM_ARM_DEVICE_VGIC_V2 |
3571 | as the device id. When setting the base address for the guest's | |
3572 | mapping of the VGIC virtual CPU and distributor interface, the ioctl | |
3573 | must be called after calling KVM_CREATE_IRQCHIP, but before calling | |
3574 | KVM_RUN on any of the VCPUs. Calling this ioctl twice for any of the | |
3575 | base addresses will return -EEXIST. | |
3401d546 | 3576 | |
ce01e4e8 CD |
3577 | Note, this IOCTL is deprecated and the more flexible SET/GET_DEVICE_ATTR API |
3578 | should be used instead. | |
3579 | ||
3580 | ||
740edfc0 | 3581 | 4.86 KVM_PPC_RTAS_DEFINE_TOKEN |
106ee47d | 3582 | ------------------------------ |
8e591cb7 | 3583 | |
106ee47d MCC |
3584 | :Capability: KVM_CAP_PPC_RTAS |
3585 | :Architectures: ppc | |
3586 | :Type: vm ioctl | |
3587 | :Parameters: struct kvm_rtas_token_args | |
3588 | :Returns: 0 on success, -1 on error | |
8e591cb7 ME |
3589 | |
3590 | Defines a token value for a RTAS (Run Time Abstraction Services) | |
3591 | service in order to allow it to be handled in the kernel. The | |
3592 | argument struct gives the name of the service, which must be the name | |
3593 | of a service that has a kernel-side implementation. If the token | |
3594 | value is non-zero, it will be associated with that service, and | |
3595 | subsequent RTAS calls by the guest specifying that token will be | |
3596 | handled by the kernel. If the token value is 0, then any token | |
3597 | associated with the service will be forgotten, and subsequent RTAS | |
3598 | calls by the guest for that service will be passed to userspace to be | |
3599 | handled. | |
3600 | ||
4bd9d344 | 3601 | 4.87 KVM_SET_GUEST_DEBUG |
106ee47d | 3602 | ------------------------ |
4bd9d344 | 3603 | |
106ee47d MCC |
3604 | :Capability: KVM_CAP_SET_GUEST_DEBUG |
3605 | :Architectures: x86, s390, ppc, arm64 | |
3606 | :Type: vcpu ioctl | |
3607 | :Parameters: struct kvm_guest_debug (in) | |
3608 | :Returns: 0 on success; -1 on error | |
3609 | ||
3610 | :: | |
4bd9d344 | 3611 | |
106ee47d | 3612 | struct kvm_guest_debug { |
4bd9d344 AB |
3613 | __u32 control; |
3614 | __u32 pad; | |
3615 | struct kvm_guest_debug_arch arch; | |
106ee47d | 3616 | }; |
4bd9d344 AB |
3617 | |
3618 | Set up the processor specific debug registers and configure vcpu for | |
3619 | handling guest debug events. There are two parts to the structure, the | |
3620 | first a control bitfield indicates the type of debug events to handle | |
3621 | when running. Common control bits are: | |
3622 | ||
3623 | - KVM_GUESTDBG_ENABLE: guest debugging is enabled | |
3624 | - KVM_GUESTDBG_SINGLESTEP: the next run should single-step | |
3625 | ||
3626 | The top 16 bits of the control field are architecture specific control | |
3627 | flags which can include the following: | |
3628 | ||
4bd611ca | 3629 | - KVM_GUESTDBG_USE_SW_BP: using software breakpoints [x86, arm64] |
feb5dc3d AE |
3630 | - KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390] |
3631 | - KVM_GUESTDBG_USE_HW: using hardware debug events [arm64] | |
4bd9d344 AB |
3632 | - KVM_GUESTDBG_INJECT_DB: inject DB type exception [x86] |
3633 | - KVM_GUESTDBG_INJECT_BP: inject BP type exception [x86] | |
3634 | - KVM_GUESTDBG_EXIT_PENDING: trigger an immediate guest exit [s390] | |
61e5f69e | 3635 | - KVM_GUESTDBG_BLOCKIRQ: avoid injecting interrupts/NMI/SMI [x86] |
4bd9d344 AB |
3636 | |
3637 | For example KVM_GUESTDBG_USE_SW_BP indicates that software breakpoints | |
3638 | are enabled in memory so we need to ensure breakpoint exceptions are | |
3639 | correctly trapped and the KVM run loop exits at the breakpoint and not | |
3640 | running off into the normal guest vector. For KVM_GUESTDBG_USE_HW_BP | |
3641 | we need to ensure the guest vCPUs architecture specific registers are | |
3642 | updated to the correct (supplied) values. | |
3643 | ||
3644 | The second part of the structure is architecture specific and | |
3645 | typically contains a set of debug registers. | |
3646 | ||
834bf887 AB |
3647 | For arm64 the number of debug registers is implementation defined and |
3648 | can be determined by querying the KVM_CAP_GUEST_DEBUG_HW_BPS and | |
3649 | KVM_CAP_GUEST_DEBUG_HW_WPS capabilities which return a positive number | |
3650 | indicating the number of supported registers. | |
3651 | ||
1a9167a2 FR |
3652 | For ppc, the KVM_CAP_PPC_GUEST_DEBUG_SSTEP capability indicates whether |
3653 | the single-step debug event (KVM_GUESTDBG_SINGLESTEP) is supported. | |
3654 | ||
8b13c364 PB |
3655 | Also when supported, KVM_CAP_SET_GUEST_DEBUG2 capability indicates the |
3656 | supported KVM_GUESTDBG_* bits in the control field. | |
3657 | ||
4bd9d344 AB |
3658 | When debug events exit the main run loop with the reason |
3659 | KVM_EXIT_DEBUG with the kvm_debug_exit_arch part of the kvm_run | |
3660 | structure containing architecture specific debug information. | |
3401d546 | 3661 | |
209cf19f | 3662 | 4.88 KVM_GET_EMULATED_CPUID |
106ee47d MCC |
3663 | --------------------------- |
3664 | ||
3665 | :Capability: KVM_CAP_EXT_EMUL_CPUID | |
3666 | :Architectures: x86 | |
3667 | :Type: system ioctl | |
3668 | :Parameters: struct kvm_cpuid2 (in/out) | |
3669 | :Returns: 0 on success, -1 on error | |
209cf19f | 3670 | |
106ee47d | 3671 | :: |
209cf19f | 3672 | |
106ee47d | 3673 | struct kvm_cpuid2 { |
209cf19f AB |
3674 | __u32 nent; |
3675 | __u32 flags; | |
3676 | struct kvm_cpuid_entry2 entries[0]; | |
106ee47d | 3677 | }; |
209cf19f AB |
3678 | |
3679 | The member 'flags' is used for passing flags from userspace. | |
3680 | ||
106ee47d | 3681 | :: |
209cf19f | 3682 | |
106ee47d | 3683 | #define KVM_CPUID_FLAG_SIGNIFCANT_INDEX BIT(0) |
7ff6c035 SC |
3684 | #define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1) /* deprecated */ |
3685 | #define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2) /* deprecated */ | |
106ee47d MCC |
3686 | |
3687 | struct kvm_cpuid_entry2 { | |
209cf19f AB |
3688 | __u32 function; |
3689 | __u32 index; | |
3690 | __u32 flags; | |
3691 | __u32 eax; | |
3692 | __u32 ebx; | |
3693 | __u32 ecx; | |
3694 | __u32 edx; | |
3695 | __u32 padding[3]; | |
106ee47d | 3696 | }; |
209cf19f AB |
3697 | |
3698 | This ioctl returns x86 cpuid features which are emulated by | |
3699 | kvm.Userspace can use the information returned by this ioctl to query | |
3700 | which features are emulated by kvm instead of being present natively. | |
3701 | ||
3702 | Userspace invokes KVM_GET_EMULATED_CPUID by passing a kvm_cpuid2 | |
3703 | structure with the 'nent' field indicating the number of entries in | |
3704 | the variable-size array 'entries'. If the number of entries is too low | |
3705 | to describe the cpu capabilities, an error (E2BIG) is returned. If the | |
3706 | number is too high, the 'nent' field is adjusted and an error (ENOMEM) | |
3707 | is returned. If the number is just right, the 'nent' field is adjusted | |
3708 | to the number of valid entries in the 'entries' array, which is then | |
3709 | filled. | |
3710 | ||
3711 | The entries returned are the set CPUID bits of the respective features | |
3712 | which kvm emulates, as returned by the CPUID instruction, with unknown | |
3713 | or unsupported feature bits cleared. | |
3714 | ||
3715 | Features like x2apic, for example, may not be present in the host cpu | |
3716 | but are exposed by kvm in KVM_GET_SUPPORTED_CPUID because they can be | |
3717 | emulated efficiently and thus not included here. | |
3718 | ||
3719 | The fields in each entry are defined as follows: | |
3720 | ||
106ee47d MCC |
3721 | function: |
3722 | the eax value used to obtain the entry | |
3723 | index: | |
3724 | the ecx value used to obtain the entry (for entries that are | |
209cf19f | 3725 | affected by ecx) |
106ee47d MCC |
3726 | flags: |
3727 | an OR of zero or more of the following: | |
3728 | ||
209cf19f AB |
3729 | KVM_CPUID_FLAG_SIGNIFCANT_INDEX: |
3730 | if the index field is valid | |
106ee47d MCC |
3731 | |
3732 | eax, ebx, ecx, edx: | |
3733 | ||
3734 | the values returned by the cpuid instruction for | |
209cf19f AB |
3735 | this function/index combination |
3736 | ||
41408c28 | 3737 | 4.89 KVM_S390_MEM_OP |
106ee47d | 3738 | -------------------- |
41408c28 | 3739 | |
5e35d0eb | 3740 | :Capability: KVM_CAP_S390_MEM_OP, KVM_CAP_S390_PROTECTED, KVM_CAP_S390_MEM_OP_EXTENSION |
106ee47d | 3741 | :Architectures: s390 |
5e35d0eb | 3742 | :Type: vm ioctl, vcpu ioctl |
106ee47d MCC |
3743 | :Parameters: struct kvm_s390_mem_op (in) |
3744 | :Returns: = 0 on success, | |
3745 | < 0 on generic error (e.g. -EFAULT or -ENOMEM), | |
3746 | > 0 if an exception occurred while walking the page tables | |
41408c28 | 3747 | |
5e35d0eb JSG |
3748 | Read or write data from/to the VM's memory. |
3749 | The KVM_CAP_S390_MEM_OP_EXTENSION capability specifies what functionality is | |
3750 | supported. | |
41408c28 | 3751 | |
106ee47d | 3752 | Parameters are specified via the following structure:: |
41408c28 | 3753 | |
106ee47d | 3754 | struct kvm_s390_mem_op { |
41408c28 TH |
3755 | __u64 gaddr; /* the guest address */ |
3756 | __u64 flags; /* flags */ | |
3757 | __u32 size; /* amount of bytes */ | |
3758 | __u32 op; /* type of operation */ | |
3759 | __u64 buf; /* buffer in userspace */ | |
5e35d0eb JSG |
3760 | union { |
3761 | struct { | |
3762 | __u8 ar; /* the access register number */ | |
3763 | __u8 key; /* access key, ignored if flag unset */ | |
3764 | }; | |
3765 | __u32 sida_offset; /* offset into the sida */ | |
3766 | __u8 reserved[32]; /* ignored */ | |
3767 | }; | |
106ee47d | 3768 | }; |
41408c28 | 3769 | |
41408c28 | 3770 | The start address of the memory region has to be specified in the "gaddr" |
b4d863c3 CH |
3771 | field, and the length of the region in the "size" field (which must not |
3772 | be 0). The maximum value for "size" can be obtained by checking the | |
3773 | KVM_CAP_S390_MEM_OP capability. "buf" is the buffer supplied by the | |
3774 | userspace application where the read data should be written to for | |
5e35d0eb JSG |
3775 | a read access, or where the data that should be written is stored for |
3776 | a write access. The "reserved" field is meant for future extensions. | |
3777 | Reserved and unused values are ignored. Future extension that add members must | |
3778 | introduce new flags. | |
3779 | ||
3780 | The type of operation is specified in the "op" field. Flags modifying | |
3781 | their behavior can be set in the "flags" field. Undefined flag bits must | |
3782 | be set to 0. | |
3783 | ||
3784 | Possible operations are: | |
3785 | * ``KVM_S390_MEMOP_LOGICAL_READ`` | |
3786 | * ``KVM_S390_MEMOP_LOGICAL_WRITE`` | |
3787 | * ``KVM_S390_MEMOP_ABSOLUTE_READ`` | |
3788 | * ``KVM_S390_MEMOP_ABSOLUTE_WRITE`` | |
3789 | * ``KVM_S390_MEMOP_SIDA_READ`` | |
3790 | * ``KVM_S390_MEMOP_SIDA_WRITE`` | |
3791 | ||
3792 | Logical read/write: | |
3793 | ^^^^^^^^^^^^^^^^^^^ | |
3794 | ||
3795 | Access logical memory, i.e. translate the given guest address to an absolute | |
3796 | address given the state of the VCPU and use the absolute address as target of | |
3797 | the access. "ar" designates the access register number to be used; the valid | |
3798 | range is 0..15. | |
3799 | Logical accesses are permitted for the VCPU ioctl only. | |
3800 | Logical accesses are permitted for non-protected guests only. | |
3801 | ||
3802 | Supported flags: | |
3803 | * ``KVM_S390_MEMOP_F_CHECK_ONLY`` | |
3804 | * ``KVM_S390_MEMOP_F_INJECT_EXCEPTION`` | |
3805 | * ``KVM_S390_MEMOP_F_SKEY_PROTECTION`` | |
3806 | ||
3807 | The KVM_S390_MEMOP_F_CHECK_ONLY flag can be set to check whether the | |
3808 | corresponding memory access would cause an access exception; however, | |
3809 | no actual access to the data in memory at the destination is performed. | |
3810 | In this case, "buf" is unused and can be NULL. | |
3811 | ||
3812 | In case an access exception occurred during the access (or would occur | |
3813 | in case of KVM_S390_MEMOP_F_CHECK_ONLY), the ioctl returns a positive | |
3814 | error number indicating the type of exception. This exception is also | |
3815 | raised directly at the corresponding VCPU if the flag | |
3816 | KVM_S390_MEMOP_F_INJECT_EXCEPTION is set. | |
c783631b JSG |
3817 | On protection exceptions, unless specified otherwise, the injected |
3818 | translation-exception identifier (TEID) indicates suppression. | |
5e35d0eb JSG |
3819 | |
3820 | If the KVM_S390_MEMOP_F_SKEY_PROTECTION flag is set, storage key | |
3821 | protection is also in effect and may cause exceptions if accesses are | |
cbf9b810 | 3822 | prohibited given the access key designated by "key"; the valid range is 0..15. |
5e35d0eb JSG |
3823 | KVM_S390_MEMOP_F_SKEY_PROTECTION is available if KVM_CAP_S390_MEM_OP_EXTENSION |
3824 | is > 0. | |
c783631b JSG |
3825 | Since the accessed memory may span multiple pages and those pages might have |
3826 | different storage keys, it is possible that a protection exception occurs | |
3827 | after memory has been modified. In this case, if the exception is injected, | |
3828 | the TEID does not indicate suppression. | |
5e35d0eb JSG |
3829 | |
3830 | Absolute read/write: | |
3831 | ^^^^^^^^^^^^^^^^^^^^ | |
3832 | ||
3833 | Access absolute memory. This operation is intended to be used with the | |
3834 | KVM_S390_MEMOP_F_SKEY_PROTECTION flag, to allow accessing memory and performing | |
3835 | the checks required for storage key protection as one operation (as opposed to | |
3836 | user space getting the storage keys, performing the checks, and accessing | |
3837 | memory thereafter, which could lead to a delay between check and access). | |
3838 | Absolute accesses are permitted for the VM ioctl if KVM_CAP_S390_MEM_OP_EXTENSION | |
3839 | is > 0. | |
3840 | Currently absolute accesses are not permitted for VCPU ioctls. | |
3841 | Absolute accesses are permitted for non-protected guests only. | |
3842 | ||
3843 | Supported flags: | |
3844 | * ``KVM_S390_MEMOP_F_CHECK_ONLY`` | |
3845 | * ``KVM_S390_MEMOP_F_SKEY_PROTECTION`` | |
3846 | ||
3847 | The semantics of the flags are as for logical accesses. | |
3848 | ||
3849 | SIDA read/write: | |
3850 | ^^^^^^^^^^^^^^^^ | |
3851 | ||
3852 | Access the secure instruction data area which contains memory operands necessary | |
3853 | for instruction emulation for protected guests. | |
3854 | SIDA accesses are available if the KVM_CAP_S390_PROTECTED capability is available. | |
3855 | SIDA accesses are permitted for the VCPU ioctl only. | |
3856 | SIDA accesses are permitted for protected guests only. | |
41408c28 | 3857 | |
5e35d0eb | 3858 | No flags are supported. |
41408c28 | 3859 | |
30ee2a98 | 3860 | 4.90 KVM_S390_GET_SKEYS |
106ee47d | 3861 | ----------------------- |
30ee2a98 | 3862 | |
106ee47d MCC |
3863 | :Capability: KVM_CAP_S390_SKEYS |
3864 | :Architectures: s390 | |
3865 | :Type: vm ioctl | |
3866 | :Parameters: struct kvm_s390_skeys | |
49ae248b | 3867 | :Returns: 0 on success, KVM_S390_GET_SKEYS_NONE if guest is not using storage |
106ee47d | 3868 | keys, negative value on error |
30ee2a98 JH |
3869 | |
3870 | This ioctl is used to get guest storage key values on the s390 | |
106ee47d | 3871 | architecture. The ioctl takes parameters via the kvm_s390_skeys struct:: |
30ee2a98 | 3872 | |
106ee47d | 3873 | struct kvm_s390_skeys { |
30ee2a98 JH |
3874 | __u64 start_gfn; |
3875 | __u64 count; | |
3876 | __u64 skeydata_addr; | |
3877 | __u32 flags; | |
3878 | __u32 reserved[9]; | |
106ee47d | 3879 | }; |
30ee2a98 JH |
3880 | |
3881 | The start_gfn field is the number of the first guest frame whose storage keys | |
3882 | you want to get. | |
3883 | ||
3884 | The count field is the number of consecutive frames (starting from start_gfn) | |
3885 | whose storage keys to get. The count field must be at least 1 and the maximum | |
49ae248b | 3886 | allowed value is defined as KVM_S390_SKEYS_MAX. Values outside this range |
30ee2a98 JH |
3887 | will cause the ioctl to return -EINVAL. |
3888 | ||
3889 | The skeydata_addr field is the address to a buffer large enough to hold count | |
3890 | bytes. This buffer will be filled with storage key data by the ioctl. | |
3891 | ||
3892 | 4.91 KVM_S390_SET_SKEYS | |
106ee47d | 3893 | ----------------------- |
30ee2a98 | 3894 | |
106ee47d MCC |
3895 | :Capability: KVM_CAP_S390_SKEYS |
3896 | :Architectures: s390 | |
3897 | :Type: vm ioctl | |
3898 | :Parameters: struct kvm_s390_skeys | |
3899 | :Returns: 0 on success, negative value on error | |
30ee2a98 JH |
3900 | |
3901 | This ioctl is used to set guest storage key values on the s390 | |
3902 | architecture. The ioctl takes parameters via the kvm_s390_skeys struct. | |
3903 | See section on KVM_S390_GET_SKEYS for struct definition. | |
3904 | ||
3905 | The start_gfn field is the number of the first guest frame whose storage keys | |
3906 | you want to set. | |
3907 | ||
3908 | The count field is the number of consecutive frames (starting from start_gfn) | |
3909 | whose storage keys to get. The count field must be at least 1 and the maximum | |
49ae248b | 3910 | allowed value is defined as KVM_S390_SKEYS_MAX. Values outside this range |
30ee2a98 JH |
3911 | will cause the ioctl to return -EINVAL. |
3912 | ||
3913 | The skeydata_addr field is the address to a buffer containing count bytes of | |
3914 | storage keys. Each byte in the buffer will be set as the storage key for a | |
3915 | single frame starting at start_gfn for count frames. | |
3916 | ||
3917 | Note: If any architecturally invalid key value is found in the given data then | |
3918 | the ioctl will return -EINVAL. | |
3919 | ||
47b43c52 | 3920 | 4.92 KVM_S390_IRQ |
106ee47d MCC |
3921 | ----------------- |
3922 | ||
3923 | :Capability: KVM_CAP_S390_INJECT_IRQ | |
3924 | :Architectures: s390 | |
3925 | :Type: vcpu ioctl | |
3926 | :Parameters: struct kvm_s390_irq (in) | |
3927 | :Returns: 0 on success, -1 on error | |
47b43c52 | 3928 | |
47b43c52 | 3929 | Errors: |
106ee47d MCC |
3930 | |
3931 | ||
3932 | ====== ================================================================= | |
3933 | EINVAL interrupt type is invalid | |
3934 | type is KVM_S390_SIGP_STOP and flag parameter is invalid value, | |
47b43c52 | 3935 | type is KVM_S390_INT_EXTERNAL_CALL and code is bigger |
106ee47d MCC |
3936 | than the maximum of VCPUs |
3937 | EBUSY type is KVM_S390_SIGP_SET_PREFIX and vcpu is not stopped, | |
3938 | type is KVM_S390_SIGP_STOP and a stop irq is already pending, | |
47b43c52 | 3939 | type is KVM_S390_INT_EXTERNAL_CALL and an external call interrupt |
106ee47d MCC |
3940 | is already pending |
3941 | ====== ================================================================= | |
47b43c52 JF |
3942 | |
3943 | Allows to inject an interrupt to the guest. | |
3944 | ||
3945 | Using struct kvm_s390_irq as a parameter allows | |
3946 | to inject additional payload which is not | |
3947 | possible via KVM_S390_INTERRUPT. | |
3948 | ||
106ee47d | 3949 | Interrupt parameters are passed via kvm_s390_irq:: |
47b43c52 | 3950 | |
106ee47d | 3951 | struct kvm_s390_irq { |
47b43c52 JF |
3952 | __u64 type; |
3953 | union { | |
3954 | struct kvm_s390_io_info io; | |
3955 | struct kvm_s390_ext_info ext; | |
3956 | struct kvm_s390_pgm_info pgm; | |
3957 | struct kvm_s390_emerg_info emerg; | |
3958 | struct kvm_s390_extcall_info extcall; | |
3959 | struct kvm_s390_prefix_info prefix; | |
3960 | struct kvm_s390_stop_info stop; | |
3961 | struct kvm_s390_mchk_info mchk; | |
3962 | char reserved[64]; | |
3963 | } u; | |
106ee47d | 3964 | }; |
47b43c52 JF |
3965 | |
3966 | type can be one of the following: | |
3967 | ||
106ee47d MCC |
3968 | - KVM_S390_SIGP_STOP - sigp stop; parameter in .stop |
3969 | - KVM_S390_PROGRAM_INT - program check; parameters in .pgm | |
3970 | - KVM_S390_SIGP_SET_PREFIX - sigp set prefix; parameters in .prefix | |
3971 | - KVM_S390_RESTART - restart; no parameters | |
3972 | - KVM_S390_INT_CLOCK_COMP - clock comparator interrupt; no parameters | |
3973 | - KVM_S390_INT_CPU_TIMER - CPU timer interrupt; no parameters | |
3974 | - KVM_S390_INT_EMERGENCY - sigp emergency; parameters in .emerg | |
3975 | - KVM_S390_INT_EXTERNAL_CALL - sigp external call; parameters in .extcall | |
3976 | - KVM_S390_MCHK - machine check interrupt; parameters in .mchk | |
47b43c52 | 3977 | |
5e124900 | 3978 | This is an asynchronous vcpu ioctl and can be invoked from any thread. |
47b43c52 | 3979 | |
816c7667 | 3980 | 4.94 KVM_S390_GET_IRQ_STATE |
106ee47d | 3981 | --------------------------- |
816c7667 | 3982 | |
106ee47d MCC |
3983 | :Capability: KVM_CAP_S390_IRQ_STATE |
3984 | :Architectures: s390 | |
3985 | :Type: vcpu ioctl | |
3986 | :Parameters: struct kvm_s390_irq_state (out) | |
3987 | :Returns: >= number of bytes copied into buffer, | |
3988 | -EINVAL if buffer size is 0, | |
3989 | -ENOBUFS if buffer size is too small to fit all pending interrupts, | |
3990 | -EFAULT if the buffer address was invalid | |
816c7667 JF |
3991 | |
3992 | This ioctl allows userspace to retrieve the complete state of all currently | |
3993 | pending interrupts in a single buffer. Use cases include migration | |
3994 | and introspection. The parameter structure contains the address of a | |
106ee47d | 3995 | userspace buffer and its length:: |
816c7667 | 3996 | |
106ee47d | 3997 | struct kvm_s390_irq_state { |
816c7667 | 3998 | __u64 buf; |
bb64da9a | 3999 | __u32 flags; /* will stay unused for compatibility reasons */ |
816c7667 | 4000 | __u32 len; |
bb64da9a | 4001 | __u32 reserved[4]; /* will stay unused for compatibility reasons */ |
106ee47d | 4002 | }; |
816c7667 JF |
4003 | |
4004 | Userspace passes in the above struct and for each pending interrupt a | |
4005 | struct kvm_s390_irq is copied to the provided buffer. | |
4006 | ||
bb64da9a CB |
4007 | The structure contains a flags and a reserved field for future extensions. As |
4008 | the kernel never checked for flags == 0 and QEMU never pre-zeroed flags and | |
4009 | reserved, these fields can not be used in the future without breaking | |
4010 | compatibility. | |
4011 | ||
816c7667 JF |
4012 | If -ENOBUFS is returned the buffer provided was too small and userspace |
4013 | may retry with a bigger buffer. | |
4014 | ||
4015 | 4.95 KVM_S390_SET_IRQ_STATE | |
106ee47d MCC |
4016 | --------------------------- |
4017 | ||
4018 | :Capability: KVM_CAP_S390_IRQ_STATE | |
4019 | :Architectures: s390 | |
4020 | :Type: vcpu ioctl | |
4021 | :Parameters: struct kvm_s390_irq_state (in) | |
4022 | :Returns: 0 on success, | |
4023 | -EFAULT if the buffer address was invalid, | |
4024 | -EINVAL for an invalid buffer length (see below), | |
4025 | -EBUSY if there were already interrupts pending, | |
4026 | errors occurring when actually injecting the | |
816c7667 JF |
4027 | interrupt. See KVM_S390_IRQ. |
4028 | ||
4029 | This ioctl allows userspace to set the complete state of all cpu-local | |
4030 | interrupts currently pending for the vcpu. It is intended for restoring | |
4031 | interrupt state after a migration. The input parameter is a userspace buffer | |
106ee47d | 4032 | containing a struct kvm_s390_irq_state:: |
816c7667 | 4033 | |
106ee47d | 4034 | struct kvm_s390_irq_state { |
816c7667 | 4035 | __u64 buf; |
bb64da9a | 4036 | __u32 flags; /* will stay unused for compatibility reasons */ |
816c7667 | 4037 | __u32 len; |
bb64da9a | 4038 | __u32 reserved[4]; /* will stay unused for compatibility reasons */ |
106ee47d | 4039 | }; |
816c7667 | 4040 | |
bb64da9a CB |
4041 | The restrictions for flags and reserved apply as well. |
4042 | (see KVM_S390_GET_IRQ_STATE) | |
4043 | ||
816c7667 JF |
4044 | The userspace memory referenced by buf contains a struct kvm_s390_irq |
4045 | for each interrupt to be injected into the guest. | |
4046 | If one of the interrupts could not be injected for some reason the | |
4047 | ioctl aborts. | |
4048 | ||
4049 | len must be a multiple of sizeof(struct kvm_s390_irq). It must be > 0 | |
4050 | and it must not exceed (max_vcpus + 32) * sizeof(struct kvm_s390_irq), | |
4051 | which is the maximum number of possibly pending cpu-local interrupts. | |
47b43c52 | 4052 | |
ed8e5a24 | 4053 | 4.96 KVM_SMI |
106ee47d | 4054 | ------------ |
f077825a | 4055 | |
106ee47d MCC |
4056 | :Capability: KVM_CAP_X86_SMM |
4057 | :Architectures: x86 | |
4058 | :Type: vcpu ioctl | |
4059 | :Parameters: none | |
4060 | :Returns: 0 on success, -1 on error | |
f077825a PB |
4061 | |
4062 | Queues an SMI on the thread's vcpu. | |
4063 | ||
24e7475f EGE |
4064 | 4.97 KVM_X86_SET_MSR_FILTER |
4065 | ---------------------------- | |
d3695aa4 | 4066 | |
24e7475f EGE |
4067 | :Capability: KVM_X86_SET_MSR_FILTER |
4068 | :Architectures: x86 | |
4069 | :Type: vm ioctl | |
4070 | :Parameters: struct kvm_msr_filter | |
4071 | :Returns: 0 on success, < 0 on error | |
d3695aa4 | 4072 | |
24e7475f | 4073 | :: |
d3695aa4 | 4074 | |
24e7475f EGE |
4075 | struct kvm_msr_filter_range { |
4076 | #define KVM_MSR_FILTER_READ (1 << 0) | |
4077 | #define KVM_MSR_FILTER_WRITE (1 << 1) | |
4078 | __u32 flags; | |
4079 | __u32 nmsrs; /* number of msrs in bitmap */ | |
4080 | __u32 base; /* MSR index the bitmap starts at */ | |
4081 | __u8 *bitmap; /* a 1 bit allows the operations in flags, 0 denies */ | |
4082 | }; | |
d3695aa4 | 4083 | |
24e7475f EGE |
4084 | #define KVM_MSR_FILTER_MAX_RANGES 16 |
4085 | struct kvm_msr_filter { | |
4086 | #define KVM_MSR_FILTER_DEFAULT_ALLOW (0 << 0) | |
4087 | #define KVM_MSR_FILTER_DEFAULT_DENY (1 << 0) | |
4088 | __u32 flags; | |
4089 | struct kvm_msr_filter_range ranges[KVM_MSR_FILTER_MAX_RANGES]; | |
4090 | }; | |
d3695aa4 | 4091 | |
24e7475f EGE |
4092 | flags values for ``struct kvm_msr_filter_range``: |
4093 | ||
4094 | ``KVM_MSR_FILTER_READ`` | |
4095 | ||
4096 | Filter read accesses to MSRs using the given bitmap. A 0 in the bitmap | |
4097 | indicates that a read should immediately fail, while a 1 indicates that | |
4098 | a read for a particular MSR should be handled regardless of the default | |
4099 | filter action. | |
4100 | ||
4101 | ``KVM_MSR_FILTER_WRITE`` | |
4102 | ||
4103 | Filter write accesses to MSRs using the given bitmap. A 0 in the bitmap | |
4104 | indicates that a write should immediately fail, while a 1 indicates that | |
4105 | a write for a particular MSR should be handled regardless of the default | |
4106 | filter action. | |
4107 | ||
4108 | ``KVM_MSR_FILTER_READ | KVM_MSR_FILTER_WRITE`` | |
4109 | ||
4110 | Filter both read and write accesses to MSRs using the given bitmap. A 0 | |
4111 | in the bitmap indicates that both reads and writes should immediately fail, | |
4112 | while a 1 indicates that reads and writes for a particular MSR are not | |
4113 | filtered by this range. | |
4114 | ||
4115 | flags values for ``struct kvm_msr_filter``: | |
4116 | ||
4117 | ``KVM_MSR_FILTER_DEFAULT_ALLOW`` | |
4118 | ||
4119 | If no filter range matches an MSR index that is getting accessed, KVM will | |
4120 | fall back to allowing access to the MSR. | |
4121 | ||
4122 | ``KVM_MSR_FILTER_DEFAULT_DENY`` | |
4123 | ||
4124 | If no filter range matches an MSR index that is getting accessed, KVM will | |
4125 | fall back to rejecting access to the MSR. In this mode, all MSRs that should | |
4126 | be processed by KVM need to explicitly be marked as allowed in the bitmaps. | |
4127 | ||
4128 | This ioctl allows user space to define up to 16 bitmaps of MSR ranges to | |
4129 | specify whether a certain MSR access should be explicitly filtered for or not. | |
4130 | ||
4131 | If this ioctl has never been invoked, MSR accesses are not guarded and the | |
4132 | default KVM in-kernel emulation behavior is fully preserved. | |
4133 | ||
4134 | Calling this ioctl with an empty set of ranges (all nmsrs == 0) disables MSR | |
4135 | filtering. In that mode, ``KVM_MSR_FILTER_DEFAULT_DENY`` is invalid and causes | |
4136 | an error. | |
4137 | ||
4138 | As soon as the filtering is in place, every MSR access is processed through | |
4139 | the filtering except for accesses to the x2APIC MSRs (from 0x800 to 0x8ff); | |
4140 | x2APIC MSRs are always allowed, independent of the ``default_allow`` setting, | |
4141 | and their behavior depends on the ``X2APIC_ENABLE`` bit of the APIC base | |
4142 | register. | |
4143 | ||
ce2f72e2 PB |
4144 | .. warning:: |
4145 | MSR accesses coming from nested vmentry/vmexit are not filtered. | |
4146 | This includes both writes to individual VMCS fields and reads/writes | |
4147 | through the MSR lists pointed to by the VMCS. | |
4148 | ||
24e7475f EGE |
4149 | If a bit is within one of the defined ranges, read and write accesses are |
4150 | guarded by the bitmap's value for the MSR index if the kind of access | |
4151 | is included in the ``struct kvm_msr_filter_range`` flags. If no range | |
4152 | cover this particular access, the behavior is determined by the flags | |
4153 | field in the kvm_msr_filter struct: ``KVM_MSR_FILTER_DEFAULT_ALLOW`` | |
4154 | and ``KVM_MSR_FILTER_DEFAULT_DENY``. | |
4155 | ||
4156 | Each bitmap range specifies a range of MSRs to potentially allow access on. | |
4157 | The range goes from MSR index [base .. base+nmsrs]. The flags field | |
4158 | indicates whether reads, writes or both reads and writes are filtered | |
4159 | by setting a 1 bit in the bitmap for the corresponding MSR index. | |
4160 | ||
4161 | If an MSR access is not permitted through the filtering, it generates a | |
4162 | #GP inside the guest. When combined with KVM_CAP_X86_USER_SPACE_MSR, that | |
4163 | allows user space to deflect and potentially handle various MSR accesses | |
4164 | into user space. | |
4165 | ||
4166 | If a vCPU is in running state while this ioctl is invoked, the vCPU may | |
4167 | experience inconsistent filtering behavior on MSR accesses. | |
d3695aa4 | 4168 | |
58ded420 | 4169 | 4.98 KVM_CREATE_SPAPR_TCE_64 |
106ee47d | 4170 | ---------------------------- |
58ded420 | 4171 | |
106ee47d MCC |
4172 | :Capability: KVM_CAP_SPAPR_TCE_64 |
4173 | :Architectures: powerpc | |
4174 | :Type: vm ioctl | |
4175 | :Parameters: struct kvm_create_spapr_tce_64 (in) | |
4176 | :Returns: file descriptor for manipulating the created TCE table | |
58ded420 AK |
4177 | |
4178 | This is an extension for KVM_CAP_SPAPR_TCE which only supports 32bit | |
4179 | windows, described in 4.62 KVM_CREATE_SPAPR_TCE | |
4180 | ||
106ee47d | 4181 | This capability uses extended struct in ioctl interface:: |
58ded420 | 4182 | |
106ee47d MCC |
4183 | /* for KVM_CAP_SPAPR_TCE_64 */ |
4184 | struct kvm_create_spapr_tce_64 { | |
58ded420 AK |
4185 | __u64 liobn; |
4186 | __u32 page_shift; | |
4187 | __u32 flags; | |
4188 | __u64 offset; /* in pages */ | |
4189 | __u64 size; /* in pages */ | |
106ee47d | 4190 | }; |
58ded420 AK |
4191 | |
4192 | The aim of extension is to support an additional bigger DMA window with | |
4193 | a variable page size. | |
4194 | KVM_CREATE_SPAPR_TCE_64 receives a 64bit window size, an IOMMU page shift and | |
4195 | a bus offset of the corresponding DMA window, @size and @offset are numbers | |
4196 | of IOMMU pages. | |
4197 | ||
4198 | @flags are not used at the moment. | |
4199 | ||
4200 | The rest of functionality is identical to KVM_CREATE_SPAPR_TCE. | |
4201 | ||
ccc4df4e | 4202 | 4.99 KVM_REINJECT_CONTROL |
106ee47d | 4203 | ------------------------- |
107d44a2 | 4204 | |
106ee47d MCC |
4205 | :Capability: KVM_CAP_REINJECT_CONTROL |
4206 | :Architectures: x86 | |
4207 | :Type: vm ioctl | |
4208 | :Parameters: struct kvm_reinject_control (in) | |
4209 | :Returns: 0 on success, | |
107d44a2 RK |
4210 | -EFAULT if struct kvm_reinject_control cannot be read, |
4211 | -ENXIO if KVM_CREATE_PIT or KVM_CREATE_PIT2 didn't succeed earlier. | |
4212 | ||
4213 | i8254 (PIT) has two modes, reinject and !reinject. The default is reinject, | |
4214 | where KVM queues elapsed i8254 ticks and monitors completion of interrupt from | |
4215 | vector(s) that i8254 injects. Reinject mode dequeues a tick and injects its | |
4216 | interrupt whenever there isn't a pending interrupt from i8254. | |
4217 | !reinject mode injects an interrupt as soon as a tick arrives. | |
4218 | ||
106ee47d MCC |
4219 | :: |
4220 | ||
4221 | struct kvm_reinject_control { | |
107d44a2 RK |
4222 | __u8 pit_reinject; |
4223 | __u8 reserved[31]; | |
106ee47d | 4224 | }; |
107d44a2 RK |
4225 | |
4226 | pit_reinject = 0 (!reinject mode) is recommended, unless running an old | |
4227 | operating system that uses the PIT for timing (e.g. Linux 2.4.x). | |
4228 | ||
ccc4df4e | 4229 | 4.100 KVM_PPC_CONFIGURE_V3_MMU |
106ee47d | 4230 | ------------------------------ |
c9270132 | 4231 | |
106ee47d MCC |
4232 | :Capability: KVM_CAP_PPC_RADIX_MMU or KVM_CAP_PPC_HASH_MMU_V3 |
4233 | :Architectures: ppc | |
4234 | :Type: vm ioctl | |
4235 | :Parameters: struct kvm_ppc_mmuv3_cfg (in) | |
4236 | :Returns: 0 on success, | |
c9270132 PM |
4237 | -EFAULT if struct kvm_ppc_mmuv3_cfg cannot be read, |
4238 | -EINVAL if the configuration is invalid | |
4239 | ||
4240 | This ioctl controls whether the guest will use radix or HPT (hashed | |
4241 | page table) translation, and sets the pointer to the process table for | |
4242 | the guest. | |
4243 | ||
106ee47d MCC |
4244 | :: |
4245 | ||
4246 | struct kvm_ppc_mmuv3_cfg { | |
c9270132 PM |
4247 | __u64 flags; |
4248 | __u64 process_table; | |
106ee47d | 4249 | }; |
c9270132 PM |
4250 | |
4251 | There are two bits that can be set in flags; KVM_PPC_MMUV3_RADIX and | |
4252 | KVM_PPC_MMUV3_GTSE. KVM_PPC_MMUV3_RADIX, if set, configures the guest | |
4253 | to use radix tree translation, and if clear, to use HPT translation. | |
4254 | KVM_PPC_MMUV3_GTSE, if set and if KVM permits it, configures the guest | |
4255 | to be able to use the global TLB and SLB invalidation instructions; | |
4256 | if clear, the guest may not use these instructions. | |
4257 | ||
4258 | The process_table field specifies the address and size of the guest | |
4259 | process table, which is in the guest's space. This field is formatted | |
4260 | as the second doubleword of the partition table entry, as defined in | |
4261 | the Power ISA V3.00, Book III section 5.7.6.1. | |
4262 | ||
ccc4df4e | 4263 | 4.101 KVM_PPC_GET_RMMU_INFO |
106ee47d | 4264 | --------------------------- |
c9270132 | 4265 | |
106ee47d MCC |
4266 | :Capability: KVM_CAP_PPC_RADIX_MMU |
4267 | :Architectures: ppc | |
4268 | :Type: vm ioctl | |
4269 | :Parameters: struct kvm_ppc_rmmu_info (out) | |
4270 | :Returns: 0 on success, | |
c9270132 PM |
4271 | -EFAULT if struct kvm_ppc_rmmu_info cannot be written, |
4272 | -EINVAL if no useful information can be returned | |
4273 | ||
4274 | This ioctl returns a structure containing two things: (a) a list | |
4275 | containing supported radix tree geometries, and (b) a list that maps | |
4276 | page sizes to put in the "AP" (actual page size) field for the tlbie | |
4277 | (TLB invalidate entry) instruction. | |
4278 | ||
106ee47d MCC |
4279 | :: |
4280 | ||
4281 | struct kvm_ppc_rmmu_info { | |
c9270132 PM |
4282 | struct kvm_ppc_radix_geom { |
4283 | __u8 page_shift; | |
4284 | __u8 level_bits[4]; | |
4285 | __u8 pad[3]; | |
4286 | } geometries[8]; | |
4287 | __u32 ap_encodings[8]; | |
106ee47d | 4288 | }; |
c9270132 PM |
4289 | |
4290 | The geometries[] field gives up to 8 supported geometries for the | |
4291 | radix page table, in terms of the log base 2 of the smallest page | |
4292 | size, and the number of bits indexed at each level of the tree, from | |
4293 | the PTE level up to the PGD level in that order. Any unused entries | |
4294 | will have 0 in the page_shift field. | |
4295 | ||
4296 | The ap_encodings gives the supported page sizes and their AP field | |
4297 | encodings, encoded with the AP value in the top 3 bits and the log | |
4298 | base 2 of the page size in the bottom 6 bits. | |
4299 | ||
ef1ead0c | 4300 | 4.102 KVM_PPC_RESIZE_HPT_PREPARE |
106ee47d | 4301 | -------------------------------- |
ef1ead0c | 4302 | |
106ee47d MCC |
4303 | :Capability: KVM_CAP_SPAPR_RESIZE_HPT |
4304 | :Architectures: powerpc | |
4305 | :Type: vm ioctl | |
4306 | :Parameters: struct kvm_ppc_resize_hpt (in) | |
4307 | :Returns: 0 on successful completion, | |
ef1ead0c | 4308 | >0 if a new HPT is being prepared, the value is an estimated |
106ee47d | 4309 | number of milliseconds until preparation is complete, |
ef1ead0c | 4310 | -EFAULT if struct kvm_reinject_control cannot be read, |
106ee47d MCC |
4311 | -EINVAL if the supplied shift or flags are invalid, |
4312 | -ENOMEM if unable to allocate the new HPT, | |
ef1ead0c DG |
4313 | |
4314 | Used to implement the PAPR extension for runtime resizing of a guest's | |
4315 | Hashed Page Table (HPT). Specifically this starts, stops or monitors | |
4316 | the preparation of a new potential HPT for the guest, essentially | |
4317 | implementing the H_RESIZE_HPT_PREPARE hypercall. | |
4318 | ||
e2a0fcac PB |
4319 | :: |
4320 | ||
4321 | struct kvm_ppc_resize_hpt { | |
4322 | __u64 flags; | |
4323 | __u32 shift; | |
4324 | __u32 pad; | |
4325 | }; | |
4326 | ||
ef1ead0c DG |
4327 | If called with shift > 0 when there is no pending HPT for the guest, |
4328 | this begins preparation of a new pending HPT of size 2^(shift) bytes. | |
4329 | It then returns a positive integer with the estimated number of | |
4330 | milliseconds until preparation is complete. | |
4331 | ||
4332 | If called when there is a pending HPT whose size does not match that | |
4333 | requested in the parameters, discards the existing pending HPT and | |
4334 | creates a new one as above. | |
4335 | ||
4336 | If called when there is a pending HPT of the size requested, will: | |
106ee47d | 4337 | |
ef1ead0c DG |
4338 | * If preparation of the pending HPT is already complete, return 0 |
4339 | * If preparation of the pending HPT has failed, return an error | |
4340 | code, then discard the pending HPT. | |
4341 | * If preparation of the pending HPT is still in progress, return an | |
4342 | estimated number of milliseconds until preparation is complete. | |
4343 | ||
4344 | If called with shift == 0, discards any currently pending HPT and | |
4345 | returns 0 (i.e. cancels any in-progress preparation). | |
4346 | ||
4347 | flags is reserved for future expansion, currently setting any bits in | |
4348 | flags will result in an -EINVAL. | |
4349 | ||
4350 | Normally this will be called repeatedly with the same parameters until | |
4351 | it returns <= 0. The first call will initiate preparation, subsequent | |
4352 | ones will monitor preparation until it completes or fails. | |
4353 | ||
ef1ead0c | 4354 | 4.103 KVM_PPC_RESIZE_HPT_COMMIT |
106ee47d | 4355 | ------------------------------- |
ef1ead0c | 4356 | |
106ee47d MCC |
4357 | :Capability: KVM_CAP_SPAPR_RESIZE_HPT |
4358 | :Architectures: powerpc | |
4359 | :Type: vm ioctl | |
4360 | :Parameters: struct kvm_ppc_resize_hpt (in) | |
4361 | :Returns: 0 on successful completion, | |
ef1ead0c | 4362 | -EFAULT if struct kvm_reinject_control cannot be read, |
106ee47d | 4363 | -EINVAL if the supplied shift or flags are invalid, |
ef1ead0c | 4364 | -ENXIO is there is no pending HPT, or the pending HPT doesn't |
106ee47d MCC |
4365 | have the requested size, |
4366 | -EBUSY if the pending HPT is not fully prepared, | |
ef1ead0c | 4367 | -ENOSPC if there was a hash collision when moving existing |
106ee47d | 4368 | HPT entries to the new HPT, |
ef1ead0c DG |
4369 | -EIO on other error conditions |
4370 | ||
4371 | Used to implement the PAPR extension for runtime resizing of a guest's | |
4372 | Hashed Page Table (HPT). Specifically this requests that the guest be | |
4373 | transferred to working with the new HPT, essentially implementing the | |
4374 | H_RESIZE_HPT_COMMIT hypercall. | |
4375 | ||
e2a0fcac PB |
4376 | :: |
4377 | ||
4378 | struct kvm_ppc_resize_hpt { | |
4379 | __u64 flags; | |
4380 | __u32 shift; | |
4381 | __u32 pad; | |
4382 | }; | |
4383 | ||
ef1ead0c DG |
4384 | This should only be called after KVM_PPC_RESIZE_HPT_PREPARE has |
4385 | returned 0 with the same parameters. In other cases | |
4386 | KVM_PPC_RESIZE_HPT_COMMIT will return an error (usually -ENXIO or | |
4387 | -EBUSY, though others may be possible if the preparation was started, | |
4388 | but failed). | |
4389 | ||
4390 | This will have undefined effects on the guest if it has not already | |
4391 | placed itself in a quiescent state where no vcpu will make MMU enabled | |
4392 | memory accesses. | |
4393 | ||
4394 | On succsful completion, the pending HPT will become the guest's active | |
4395 | HPT and the previous HPT will be discarded. | |
4396 | ||
4397 | On failure, the guest will still be operating on its previous HPT. | |
4398 | ||
3aa53859 | 4399 | 4.104 KVM_X86_GET_MCE_CAP_SUPPORTED |
106ee47d | 4400 | ----------------------------------- |
3aa53859 | 4401 | |
106ee47d MCC |
4402 | :Capability: KVM_CAP_MCE |
4403 | :Architectures: x86 | |
4404 | :Type: system ioctl | |
4405 | :Parameters: u64 mce_cap (out) | |
4406 | :Returns: 0 on success, -1 on error | |
3aa53859 LC |
4407 | |
4408 | Returns supported MCE capabilities. The u64 mce_cap parameter | |
4409 | has the same format as the MSR_IA32_MCG_CAP register. Supported | |
4410 | capabilities will have the corresponding bits set. | |
4411 | ||
4412 | 4.105 KVM_X86_SETUP_MCE | |
106ee47d | 4413 | ----------------------- |
3aa53859 | 4414 | |
106ee47d MCC |
4415 | :Capability: KVM_CAP_MCE |
4416 | :Architectures: x86 | |
4417 | :Type: vcpu ioctl | |
4418 | :Parameters: u64 mcg_cap (in) | |
4419 | :Returns: 0 on success, | |
3aa53859 LC |
4420 | -EFAULT if u64 mcg_cap cannot be read, |
4421 | -EINVAL if the requested number of banks is invalid, | |
4422 | -EINVAL if requested MCE capability is not supported. | |
4423 | ||
4424 | Initializes MCE support for use. The u64 mcg_cap parameter | |
4425 | has the same format as the MSR_IA32_MCG_CAP register and | |
4426 | specifies which capabilities should be enabled. The maximum | |
4427 | supported number of error-reporting banks can be retrieved when | |
4428 | checking for KVM_CAP_MCE. The supported capabilities can be | |
4429 | retrieved with KVM_X86_GET_MCE_CAP_SUPPORTED. | |
4430 | ||
4431 | 4.106 KVM_X86_SET_MCE | |
106ee47d | 4432 | --------------------- |
3aa53859 | 4433 | |
106ee47d MCC |
4434 | :Capability: KVM_CAP_MCE |
4435 | :Architectures: x86 | |
4436 | :Type: vcpu ioctl | |
4437 | :Parameters: struct kvm_x86_mce (in) | |
4438 | :Returns: 0 on success, | |
3aa53859 LC |
4439 | -EFAULT if struct kvm_x86_mce cannot be read, |
4440 | -EINVAL if the bank number is invalid, | |
4441 | -EINVAL if VAL bit is not set in status field. | |
4442 | ||
4443 | Inject a machine check error (MCE) into the guest. The input | |
106ee47d | 4444 | parameter is:: |
3aa53859 | 4445 | |
106ee47d | 4446 | struct kvm_x86_mce { |
3aa53859 LC |
4447 | __u64 status; |
4448 | __u64 addr; | |
4449 | __u64 misc; | |
4450 | __u64 mcg_status; | |
4451 | __u8 bank; | |
4452 | __u8 pad1[7]; | |
4453 | __u64 pad2[3]; | |
106ee47d | 4454 | }; |
3aa53859 LC |
4455 | |
4456 | If the MCE being reported is an uncorrected error, KVM will | |
4457 | inject it as an MCE exception into the guest. If the guest | |
4458 | MCG_STATUS register reports that an MCE is in progress, KVM | |
4459 | causes an KVM_EXIT_SHUTDOWN vmexit. | |
4460 | ||
4461 | Otherwise, if the MCE is a corrected error, KVM will just | |
4462 | store it in the corresponding bank (provided this bank is | |
4463 | not holding a previously reported uncorrected error). | |
4464 | ||
4036e387 | 4465 | 4.107 KVM_S390_GET_CMMA_BITS |
106ee47d | 4466 | ---------------------------- |
4036e387 | 4467 | |
106ee47d MCC |
4468 | :Capability: KVM_CAP_S390_CMMA_MIGRATION |
4469 | :Architectures: s390 | |
4470 | :Type: vm ioctl | |
4471 | :Parameters: struct kvm_s390_cmma_log (in, out) | |
4472 | :Returns: 0 on success, a negative value on error | |
4036e387 CI |
4473 | |
4474 | This ioctl is used to get the values of the CMMA bits on the s390 | |
4475 | architecture. It is meant to be used in two scenarios: | |
106ee47d | 4476 | |
4036e387 CI |
4477 | - During live migration to save the CMMA values. Live migration needs |
4478 | to be enabled via the KVM_REQ_START_MIGRATION VM property. | |
4479 | - To non-destructively peek at the CMMA values, with the flag | |
4480 | KVM_S390_CMMA_PEEK set. | |
4481 | ||
4482 | The ioctl takes parameters via the kvm_s390_cmma_log struct. The desired | |
4483 | values are written to a buffer whose location is indicated via the "values" | |
4484 | member in the kvm_s390_cmma_log struct. The values in the input struct are | |
4485 | also updated as needed. | |
106ee47d | 4486 | |
4036e387 CI |
4487 | Each CMMA value takes up one byte. |
4488 | ||
106ee47d MCC |
4489 | :: |
4490 | ||
4491 | struct kvm_s390_cmma_log { | |
4036e387 CI |
4492 | __u64 start_gfn; |
4493 | __u32 count; | |
4494 | __u32 flags; | |
4495 | union { | |
4496 | __u64 remaining; | |
4497 | __u64 mask; | |
4498 | }; | |
4499 | __u64 values; | |
106ee47d | 4500 | }; |
4036e387 CI |
4501 | |
4502 | start_gfn is the number of the first guest frame whose CMMA values are | |
4503 | to be retrieved, | |
4504 | ||
4505 | count is the length of the buffer in bytes, | |
4506 | ||
4507 | values points to the buffer where the result will be written to. | |
4508 | ||
4509 | If count is greater than KVM_S390_SKEYS_MAX, then it is considered to be | |
4510 | KVM_S390_SKEYS_MAX. KVM_S390_SKEYS_MAX is re-used for consistency with | |
4511 | other ioctls. | |
4512 | ||
4513 | The result is written in the buffer pointed to by the field values, and | |
4514 | the values of the input parameter are updated as follows. | |
4515 | ||
4516 | Depending on the flags, different actions are performed. The only | |
4517 | supported flag so far is KVM_S390_CMMA_PEEK. | |
4518 | ||
4519 | The default behaviour if KVM_S390_CMMA_PEEK is not set is: | |
4520 | start_gfn will indicate the first page frame whose CMMA bits were dirty. | |
4521 | It is not necessarily the same as the one passed as input, as clean pages | |
4522 | are skipped. | |
4523 | ||
4524 | count will indicate the number of bytes actually written in the buffer. | |
4525 | It can (and very often will) be smaller than the input value, since the | |
4526 | buffer is only filled until 16 bytes of clean values are found (which | |
4527 | are then not copied in the buffer). Since a CMMA migration block needs | |
4528 | the base address and the length, for a total of 16 bytes, we will send | |
4529 | back some clean data if there is some dirty data afterwards, as long as | |
4530 | the size of the clean data does not exceed the size of the header. This | |
4531 | allows to minimize the amount of data to be saved or transferred over | |
4532 | the network at the expense of more roundtrips to userspace. The next | |
4533 | invocation of the ioctl will skip over all the clean values, saving | |
4534 | potentially more than just the 16 bytes we found. | |
4535 | ||
4536 | If KVM_S390_CMMA_PEEK is set: | |
4537 | the existing storage attributes are read even when not in migration | |
4538 | mode, and no other action is performed; | |
4539 | ||
4540 | the output start_gfn will be equal to the input start_gfn, | |
4541 | ||
4542 | the output count will be equal to the input count, except if the end of | |
4543 | memory has been reached. | |
4544 | ||
4545 | In both cases: | |
4546 | the field "remaining" will indicate the total number of dirty CMMA values | |
4547 | still remaining, or 0 if KVM_S390_CMMA_PEEK is set and migration mode is | |
4548 | not enabled. | |
4549 | ||
4550 | mask is unused. | |
4551 | ||
4552 | values points to the userspace buffer where the result will be stored. | |
4553 | ||
4554 | This ioctl can fail with -ENOMEM if not enough memory can be allocated to | |
4555 | complete the task, with -ENXIO if CMMA is not enabled, with -EINVAL if | |
4556 | KVM_S390_CMMA_PEEK is not set but migration mode was not enabled, with | |
4557 | -EFAULT if the userspace address is invalid or if no page table is | |
4558 | present for the addresses (e.g. when using hugepages). | |
4559 | ||
4560 | 4.108 KVM_S390_SET_CMMA_BITS | |
106ee47d | 4561 | ---------------------------- |
4036e387 | 4562 | |
106ee47d MCC |
4563 | :Capability: KVM_CAP_S390_CMMA_MIGRATION |
4564 | :Architectures: s390 | |
4565 | :Type: vm ioctl | |
4566 | :Parameters: struct kvm_s390_cmma_log (in) | |
4567 | :Returns: 0 on success, a negative value on error | |
4036e387 CI |
4568 | |
4569 | This ioctl is used to set the values of the CMMA bits on the s390 | |
4570 | architecture. It is meant to be used during live migration to restore | |
4571 | the CMMA values, but there are no restrictions on its use. | |
4572 | The ioctl takes parameters via the kvm_s390_cmma_values struct. | |
4573 | Each CMMA value takes up one byte. | |
4574 | ||
106ee47d MCC |
4575 | :: |
4576 | ||
4577 | struct kvm_s390_cmma_log { | |
4036e387 CI |
4578 | __u64 start_gfn; |
4579 | __u32 count; | |
4580 | __u32 flags; | |
4581 | union { | |
4582 | __u64 remaining; | |
4583 | __u64 mask; | |
106ee47d | 4584 | }; |
4036e387 | 4585 | __u64 values; |
106ee47d | 4586 | }; |
4036e387 CI |
4587 | |
4588 | start_gfn indicates the starting guest frame number, | |
4589 | ||
4590 | count indicates how many values are to be considered in the buffer, | |
4591 | ||
4592 | flags is not used and must be 0. | |
4593 | ||
4594 | mask indicates which PGSTE bits are to be considered. | |
4595 | ||
4596 | remaining is not used. | |
4597 | ||
4598 | values points to the buffer in userspace where to store the values. | |
4599 | ||
4600 | This ioctl can fail with -ENOMEM if not enough memory can be allocated to | |
4601 | complete the task, with -ENXIO if CMMA is not enabled, with -EINVAL if | |
4602 | the count field is too large (e.g. more than KVM_S390_CMMA_SIZE_MAX) or | |
4603 | if the flags field was not 0, with -EFAULT if the userspace address is | |
4604 | invalid, if invalid pages are written to (e.g. after the end of memory) | |
4605 | or if no page table is present for the addresses (e.g. when using | |
4606 | hugepages). | |
4607 | ||
7bf14c28 | 4608 | 4.109 KVM_PPC_GET_CPU_CHAR |
106ee47d | 4609 | -------------------------- |
3214d01f | 4610 | |
106ee47d MCC |
4611 | :Capability: KVM_CAP_PPC_GET_CPU_CHAR |
4612 | :Architectures: powerpc | |
4613 | :Type: vm ioctl | |
4614 | :Parameters: struct kvm_ppc_cpu_char (out) | |
4615 | :Returns: 0 on successful completion, | |
3214d01f PM |
4616 | -EFAULT if struct kvm_ppc_cpu_char cannot be written |
4617 | ||
4618 | This ioctl gives userspace information about certain characteristics | |
4619 | of the CPU relating to speculative execution of instructions and | |
4620 | possible information leakage resulting from speculative execution (see | |
4621 | CVE-2017-5715, CVE-2017-5753 and CVE-2017-5754). The information is | |
106ee47d | 4622 | returned in struct kvm_ppc_cpu_char, which looks like this:: |
3214d01f | 4623 | |
106ee47d | 4624 | struct kvm_ppc_cpu_char { |
3214d01f PM |
4625 | __u64 character; /* characteristics of the CPU */ |
4626 | __u64 behaviour; /* recommended software behaviour */ | |
4627 | __u64 character_mask; /* valid bits in character */ | |
4628 | __u64 behaviour_mask; /* valid bits in behaviour */ | |
106ee47d | 4629 | }; |
3214d01f PM |
4630 | |
4631 | For extensibility, the character_mask and behaviour_mask fields | |
4632 | indicate which bits of character and behaviour have been filled in by | |
4633 | the kernel. If the set of defined bits is extended in future then | |
4634 | userspace will be able to tell whether it is running on a kernel that | |
4635 | knows about the new bits. | |
4636 | ||
4637 | The character field describes attributes of the CPU which can help | |
4638 | with preventing inadvertent information disclosure - specifically, | |
4639 | whether there is an instruction to flash-invalidate the L1 data cache | |
4640 | (ori 30,30,0 or mtspr SPRN_TRIG2,rN), whether the L1 data cache is set | |
4641 | to a mode where entries can only be used by the thread that created | |
4642 | them, whether the bcctr[l] instruction prevents speculation, and | |
4643 | whether a speculation barrier instruction (ori 31,31,0) is provided. | |
4644 | ||
4645 | The behaviour field describes actions that software should take to | |
4646 | prevent inadvertent information disclosure, and thus describes which | |
4647 | vulnerabilities the hardware is subject to; specifically whether the | |
4648 | L1 data cache should be flushed when returning to user mode from the | |
4649 | kernel, and whether a speculation barrier should be placed between an | |
4650 | array bounds check and the array access. | |
4651 | ||
4652 | These fields use the same bit definitions as the new | |
4653 | H_GET_CPU_CHARACTERISTICS hypercall. | |
4654 | ||
7bf14c28 | 4655 | 4.110 KVM_MEMORY_ENCRYPT_OP |
106ee47d | 4656 | --------------------------- |
5acc5c06 | 4657 | |
106ee47d MCC |
4658 | :Capability: basic |
4659 | :Architectures: x86 | |
46ca9ee5 | 4660 | :Type: vm |
106ee47d MCC |
4661 | :Parameters: an opaque platform specific structure (in/out) |
4662 | :Returns: 0 on success; -1 on error | |
5acc5c06 BS |
4663 | |
4664 | If the platform supports creating encrypted VMs then this ioctl can be used | |
4665 | for issuing platform-specific memory encryption commands to manage those | |
4666 | encrypted VMs. | |
4667 | ||
4668 | Currently, this ioctl is used for issuing Secure Encrypted Virtualization | |
4669 | (SEV) commands on AMD Processors. The SEV commands are defined in | |
2f5947df | 4670 | Documentation/virt/kvm/amd-memory-encryption.rst. |
5acc5c06 | 4671 | |
7bf14c28 | 4672 | 4.111 KVM_MEMORY_ENCRYPT_REG_REGION |
106ee47d | 4673 | ----------------------------------- |
69eaedee | 4674 | |
106ee47d MCC |
4675 | :Capability: basic |
4676 | :Architectures: x86 | |
4677 | :Type: system | |
4678 | :Parameters: struct kvm_enc_region (in) | |
4679 | :Returns: 0 on success; -1 on error | |
69eaedee BS |
4680 | |
4681 | This ioctl can be used to register a guest memory region which may | |
4682 | contain encrypted data (e.g. guest RAM, SMRAM etc). | |
4683 | ||
4684 | It is used in the SEV-enabled guest. When encryption is enabled, a guest | |
4685 | memory region may contain encrypted data. The SEV memory encryption | |
4686 | engine uses a tweak such that two identical plaintext pages, each at | |
4687 | different locations will have differing ciphertexts. So swapping or | |
4688 | moving ciphertext of those pages will not result in plaintext being | |
4689 | swapped. So relocating (or migrating) physical backing pages for the SEV | |
4690 | guest will require some additional steps. | |
4691 | ||
4692 | Note: The current SEV key management spec does not provide commands to | |
4693 | swap or migrate (move) ciphertext pages. Hence, for now we pin the guest | |
4694 | memory region registered with the ioctl. | |
4695 | ||
7bf14c28 | 4696 | 4.112 KVM_MEMORY_ENCRYPT_UNREG_REGION |
106ee47d | 4697 | ------------------------------------- |
69eaedee | 4698 | |
106ee47d MCC |
4699 | :Capability: basic |
4700 | :Architectures: x86 | |
4701 | :Type: system | |
4702 | :Parameters: struct kvm_enc_region (in) | |
4703 | :Returns: 0 on success; -1 on error | |
69eaedee BS |
4704 | |
4705 | This ioctl can be used to unregister the guest memory region registered | |
4706 | with KVM_MEMORY_ENCRYPT_REG_REGION ioctl above. | |
4707 | ||
faeb7833 | 4708 | 4.113 KVM_HYPERV_EVENTFD |
106ee47d | 4709 | ------------------------ |
faeb7833 | 4710 | |
106ee47d MCC |
4711 | :Capability: KVM_CAP_HYPERV_EVENTFD |
4712 | :Architectures: x86 | |
4713 | :Type: vm ioctl | |
4714 | :Parameters: struct kvm_hyperv_eventfd (in) | |
faeb7833 RK |
4715 | |
4716 | This ioctl (un)registers an eventfd to receive notifications from the guest on | |
4717 | the specified Hyper-V connection id through the SIGNAL_EVENT hypercall, without | |
4718 | causing a user exit. SIGNAL_EVENT hypercall with non-zero event flag number | |
4719 | (bits 24-31) still triggers a KVM_EXIT_HYPERV_HCALL user exit. | |
4720 | ||
106ee47d MCC |
4721 | :: |
4722 | ||
4723 | struct kvm_hyperv_eventfd { | |
faeb7833 RK |
4724 | __u32 conn_id; |
4725 | __s32 fd; | |
4726 | __u32 flags; | |
4727 | __u32 padding[3]; | |
106ee47d | 4728 | }; |
faeb7833 | 4729 | |
106ee47d | 4730 | The conn_id field should fit within 24 bits:: |
faeb7833 | 4731 | |
106ee47d | 4732 | #define KVM_HYPERV_CONN_ID_MASK 0x00ffffff |
faeb7833 | 4733 | |
106ee47d | 4734 | The acceptable values for the flags field are:: |
faeb7833 | 4735 | |
106ee47d | 4736 | #define KVM_HYPERV_EVENTFD_DEASSIGN (1 << 0) |
faeb7833 | 4737 | |
106ee47d MCC |
4738 | :Returns: 0 on success, |
4739 | -EINVAL if conn_id or flags is outside the allowed range, | |
4740 | -ENOENT on deassign if the conn_id isn't registered, | |
4741 | -EEXIST on assign if the conn_id is already registered | |
faeb7833 | 4742 | |
8fcc4b59 | 4743 | 4.114 KVM_GET_NESTED_STATE |
106ee47d MCC |
4744 | -------------------------- |
4745 | ||
4746 | :Capability: KVM_CAP_NESTED_STATE | |
4747 | :Architectures: x86 | |
4748 | :Type: vcpu ioctl | |
4749 | :Parameters: struct kvm_nested_state (in/out) | |
4750 | :Returns: 0 on success, -1 on error | |
8fcc4b59 | 4751 | |
8fcc4b59 | 4752 | Errors: |
106ee47d MCC |
4753 | |
4754 | ===== ============================================================= | |
4755 | E2BIG the total state size exceeds the value of 'size' specified by | |
8fcc4b59 | 4756 | the user; the size required will be written into size. |
106ee47d | 4757 | ===== ============================================================= |
8fcc4b59 | 4758 | |
106ee47d MCC |
4759 | :: |
4760 | ||
4761 | struct kvm_nested_state { | |
8fcc4b59 JM |
4762 | __u16 flags; |
4763 | __u16 format; | |
4764 | __u32 size; | |
6ca00dfa | 4765 | |
8fcc4b59 | 4766 | union { |
6ca00dfa LA |
4767 | struct kvm_vmx_nested_state_hdr vmx; |
4768 | struct kvm_svm_nested_state_hdr svm; | |
4769 | ||
4770 | /* Pad the header to 128 bytes. */ | |
8fcc4b59 | 4771 | __u8 pad[120]; |
6ca00dfa LA |
4772 | } hdr; |
4773 | ||
4774 | union { | |
4775 | struct kvm_vmx_nested_state_data vmx[0]; | |
4776 | struct kvm_svm_nested_state_data svm[0]; | |
4777 | } data; | |
106ee47d | 4778 | }; |
8fcc4b59 | 4779 | |
106ee47d MCC |
4780 | #define KVM_STATE_NESTED_GUEST_MODE 0x00000001 |
4781 | #define KVM_STATE_NESTED_RUN_PENDING 0x00000002 | |
4782 | #define KVM_STATE_NESTED_EVMCS 0x00000004 | |
8fcc4b59 | 4783 | |
106ee47d MCC |
4784 | #define KVM_STATE_NESTED_FORMAT_VMX 0 |
4785 | #define KVM_STATE_NESTED_FORMAT_SVM 1 | |
8fcc4b59 | 4786 | |
106ee47d | 4787 | #define KVM_STATE_NESTED_VMX_VMCS_SIZE 0x1000 |
6ca00dfa | 4788 | |
106ee47d MCC |
4789 | #define KVM_STATE_NESTED_VMX_SMM_GUEST_MODE 0x00000001 |
4790 | #define KVM_STATE_NESTED_VMX_SMM_VMXON 0x00000002 | |
6ca00dfa | 4791 | |
3c97f03e | 4792 | #define KVM_STATE_VMX_PREEMPTION_TIMER_DEADLINE 0x00000001 |
850448f3 | 4793 | |
106ee47d | 4794 | struct kvm_vmx_nested_state_hdr { |
8fcc4b59 | 4795 | __u64 vmxon_pa; |
6ca00dfa | 4796 | __u64 vmcs12_pa; |
8fcc4b59 JM |
4797 | |
4798 | struct { | |
4799 | __u16 flags; | |
4800 | } smm; | |
83d31e52 PB |
4801 | |
4802 | __u32 flags; | |
4803 | __u64 preemption_timer_deadline; | |
106ee47d | 4804 | }; |
8fcc4b59 | 4805 | |
106ee47d | 4806 | struct kvm_vmx_nested_state_data { |
6ca00dfa LA |
4807 | __u8 vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE]; |
4808 | __u8 shadow_vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE]; | |
106ee47d | 4809 | }; |
6ca00dfa | 4810 | |
8fcc4b59 JM |
4811 | This ioctl copies the vcpu's nested virtualization state from the kernel to |
4812 | userspace. | |
4813 | ||
6ca00dfa LA |
4814 | The maximum size of the state can be retrieved by passing KVM_CAP_NESTED_STATE |
4815 | to the KVM_CHECK_EXTENSION ioctl(). | |
8fcc4b59 JM |
4816 | |
4817 | 4.115 KVM_SET_NESTED_STATE | |
106ee47d | 4818 | -------------------------- |
8fcc4b59 | 4819 | |
106ee47d MCC |
4820 | :Capability: KVM_CAP_NESTED_STATE |
4821 | :Architectures: x86 | |
4822 | :Type: vcpu ioctl | |
4823 | :Parameters: struct kvm_nested_state (in) | |
4824 | :Returns: 0 on success, -1 on error | |
8fcc4b59 | 4825 | |
6ca00dfa LA |
4826 | This copies the vcpu's kvm_nested_state struct from userspace to the kernel. |
4827 | For the definition of struct kvm_nested_state, see KVM_GET_NESTED_STATE. | |
7bf14c28 | 4828 | |
9943450b | 4829 | 4.116 KVM_(UN)REGISTER_COALESCED_MMIO |
106ee47d | 4830 | ------------------------------------- |
9943450b | 4831 | |
106ee47d MCC |
4832 | :Capability: KVM_CAP_COALESCED_MMIO (for coalesced mmio) |
4833 | KVM_CAP_COALESCED_PIO (for coalesced pio) | |
4834 | :Architectures: all | |
4835 | :Type: vm ioctl | |
4836 | :Parameters: struct kvm_coalesced_mmio_zone | |
4837 | :Returns: 0 on success, < 0 on error | |
9943450b | 4838 | |
0804c849 | 4839 | Coalesced I/O is a performance optimization that defers hardware |
9943450b PH |
4840 | register write emulation so that userspace exits are avoided. It is |
4841 | typically used to reduce the overhead of emulating frequently accessed | |
4842 | hardware registers. | |
4843 | ||
0804c849 | 4844 | When a hardware register is configured for coalesced I/O, write accesses |
9943450b PH |
4845 | do not exit to userspace and their value is recorded in a ring buffer |
4846 | that is shared between kernel and userspace. | |
4847 | ||
0804c849 | 4848 | Coalesced I/O is used if one or more write accesses to a hardware |
9943450b PH |
4849 | register can be deferred until a read or a write to another hardware |
4850 | register on the same device. This last access will cause a vmexit and | |
4851 | userspace will process accesses from the ring buffer before emulating | |
0804c849 PH |
4852 | it. That will avoid exiting to userspace on repeated writes. |
4853 | ||
4854 | Coalesced pio is based on coalesced mmio. There is little difference | |
4855 | between coalesced mmio and pio except that coalesced pio records accesses | |
4856 | to I/O ports. | |
9943450b | 4857 | |
2a31b9db | 4858 | 4.117 KVM_CLEAR_DIRTY_LOG (vm ioctl) |
106ee47d MCC |
4859 | ------------------------------------ |
4860 | ||
4861 | :Capability: KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 | |
3fbf4207 | 4862 | :Architectures: x86, arm64, mips |
106ee47d | 4863 | :Type: vm ioctl |
01ead84c | 4864 | :Parameters: struct kvm_clear_dirty_log (in) |
106ee47d | 4865 | :Returns: 0 on success, -1 on error |
2a31b9db | 4866 | |
106ee47d | 4867 | :: |
2a31b9db | 4868 | |
106ee47d MCC |
4869 | /* for KVM_CLEAR_DIRTY_LOG */ |
4870 | struct kvm_clear_dirty_log { | |
2a31b9db PB |
4871 | __u32 slot; |
4872 | __u32 num_pages; | |
4873 | __u64 first_page; | |
4874 | union { | |
4875 | void __user *dirty_bitmap; /* one bit per page */ | |
4876 | __u64 padding; | |
4877 | }; | |
106ee47d | 4878 | }; |
2a31b9db PB |
4879 | |
4880 | The ioctl clears the dirty status of pages in a memory slot, according to | |
4881 | the bitmap that is passed in struct kvm_clear_dirty_log's dirty_bitmap | |
4882 | field. Bit 0 of the bitmap corresponds to page "first_page" in the | |
4883 | memory slot, and num_pages is the size in bits of the input bitmap. | |
76d58e0f PB |
4884 | first_page must be a multiple of 64; num_pages must also be a multiple of |
4885 | 64 unless first_page + num_pages is the size of the memory slot. For each | |
4886 | bit that is set in the input bitmap, the corresponding page is marked "clean" | |
2a31b9db PB |
4887 | in KVM's dirty bitmap, and dirty tracking is re-enabled for that page |
4888 | (for example via write-protection, or by clearing the dirty bit in | |
4889 | a page table entry). | |
4890 | ||
01ead84c ZY |
4891 | If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 of slot field specifies |
4892 | the address space for which you want to clear the dirty status. See | |
4893 | KVM_SET_USER_MEMORY_REGION for details on the usage of slot field. | |
2a31b9db | 4894 | |
d7547c55 | 4895 | This ioctl is mostly useful when KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 |
2a31b9db PB |
4896 | is enabled; for more information, see the description of the capability. |
4897 | However, it can always be used as long as KVM_CHECK_EXTENSION confirms | |
d7547c55 | 4898 | that KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is present. |
2a31b9db | 4899 | |
2bc39970 | 4900 | 4.118 KVM_GET_SUPPORTED_HV_CPUID |
106ee47d | 4901 | -------------------------------- |
2bc39970 | 4902 | |
c21d54f0 | 4903 | :Capability: KVM_CAP_HYPERV_CPUID (vcpu), KVM_CAP_SYS_HYPERV_CPUID (system) |
106ee47d | 4904 | :Architectures: x86 |
c21d54f0 | 4905 | :Type: system ioctl, vcpu ioctl |
106ee47d MCC |
4906 | :Parameters: struct kvm_cpuid2 (in/out) |
4907 | :Returns: 0 on success, -1 on error | |
4908 | ||
4909 | :: | |
2bc39970 | 4910 | |
106ee47d | 4911 | struct kvm_cpuid2 { |
2bc39970 VK |
4912 | __u32 nent; |
4913 | __u32 padding; | |
4914 | struct kvm_cpuid_entry2 entries[0]; | |
106ee47d | 4915 | }; |
2bc39970 | 4916 | |
106ee47d | 4917 | struct kvm_cpuid_entry2 { |
2bc39970 VK |
4918 | __u32 function; |
4919 | __u32 index; | |
4920 | __u32 flags; | |
4921 | __u32 eax; | |
4922 | __u32 ebx; | |
4923 | __u32 ecx; | |
4924 | __u32 edx; | |
4925 | __u32 padding[3]; | |
106ee47d | 4926 | }; |
2bc39970 VK |
4927 | |
4928 | This ioctl returns x86 cpuid features leaves related to Hyper-V emulation in | |
4929 | KVM. Userspace can use the information returned by this ioctl to construct | |
4930 | cpuid information presented to guests consuming Hyper-V enlightenments (e.g. | |
4931 | Windows or Hyper-V guests). | |
4932 | ||
4933 | CPUID feature leaves returned by this ioctl are defined by Hyper-V Top Level | |
4934 | Functional Specification (TLFS). These leaves can't be obtained with | |
4935 | KVM_GET_SUPPORTED_CPUID ioctl because some of them intersect with KVM feature | |
4936 | leaves (0x40000000, 0x40000001). | |
4937 | ||
4938 | Currently, the following list of CPUID leaves are returned: | |
356c7558 | 4939 | |
106ee47d MCC |
4940 | - HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS |
4941 | - HYPERV_CPUID_INTERFACE | |
4942 | - HYPERV_CPUID_VERSION | |
4943 | - HYPERV_CPUID_FEATURES | |
4944 | - HYPERV_CPUID_ENLIGHTMENT_INFO | |
4945 | - HYPERV_CPUID_IMPLEMENT_LIMITS | |
4946 | - HYPERV_CPUID_NESTED_FEATURES | |
b44f50d8 VK |
4947 | - HYPERV_CPUID_SYNDBG_VENDOR_AND_MAX_FUNCTIONS |
4948 | - HYPERV_CPUID_SYNDBG_INTERFACE | |
4949 | - HYPERV_CPUID_SYNDBG_PLATFORM_CAPABILITIES | |
2bc39970 | 4950 | |
b44f50d8 | 4951 | Userspace invokes KVM_GET_SUPPORTED_HV_CPUID by passing a kvm_cpuid2 structure |
2bc39970 VK |
4952 | with the 'nent' field indicating the number of entries in the variable-size |
4953 | array 'entries'. If the number of entries is too low to describe all Hyper-V | |
4954 | feature leaves, an error (E2BIG) is returned. If the number is more or equal | |
4955 | to the number of Hyper-V feature leaves, the 'nent' field is adjusted to the | |
4956 | number of valid entries in the 'entries' array, which is then filled. | |
4957 | ||
4958 | 'index' and 'flags' fields in 'struct kvm_cpuid_entry2' are currently reserved, | |
4959 | userspace should not expect to get any particular value there. | |
2a31b9db | 4960 | |
c21d54f0 VK |
4961 | Note, vcpu version of KVM_GET_SUPPORTED_HV_CPUID is currently deprecated. Unlike |
4962 | system ioctl which exposes all supported feature bits unconditionally, vcpu | |
4963 | version has the following quirks: | |
356c7558 | 4964 | |
c21d54f0 VK |
4965 | - HYPERV_CPUID_NESTED_FEATURES leaf and HV_X64_ENLIGHTENED_VMCS_RECOMMENDED |
4966 | feature bit are only exposed when Enlightened VMCS was previously enabled | |
4967 | on the corresponding vCPU (KVM_CAP_HYPERV_ENLIGHTENED_VMCS). | |
4968 | - HV_STIMER_DIRECT_MODE_AVAILABLE bit is only exposed with in-kernel LAPIC. | |
4969 | (presumes KVM_CREATE_IRQCHIP has already been called). | |
4970 | ||
50036ad0 | 4971 | 4.119 KVM_ARM_VCPU_FINALIZE |
106ee47d MCC |
4972 | --------------------------- |
4973 | ||
3fbf4207 | 4974 | :Architectures: arm64 |
106ee47d MCC |
4975 | :Type: vcpu ioctl |
4976 | :Parameters: int feature (in) | |
4977 | :Returns: 0 on success, -1 on error | |
50036ad0 | 4978 | |
50036ad0 | 4979 | Errors: |
106ee47d MCC |
4980 | |
4981 | ====== ============================================================== | |
4982 | EPERM feature not enabled, needs configuration, or already finalized | |
4983 | EINVAL feature unknown or not present | |
4984 | ====== ============================================================== | |
50036ad0 DM |
4985 | |
4986 | Recognised values for feature: | |
106ee47d MCC |
4987 | |
4988 | ===== =========================================== | |
9df2d660 | 4989 | arm64 KVM_ARM_VCPU_SVE (requires KVM_CAP_ARM_SVE) |
106ee47d | 4990 | ===== =========================================== |
50036ad0 DM |
4991 | |
4992 | Finalizes the configuration of the specified vcpu feature. | |
4993 | ||
4994 | The vcpu must already have been initialised, enabling the affected feature, by | |
4995 | means of a successful KVM_ARM_VCPU_INIT call with the appropriate flag set in | |
4996 | features[]. | |
4997 | ||
4998 | For affected vcpu features, this is a mandatory step that must be performed | |
4999 | before the vcpu is fully usable. | |
5000 | ||
5001 | Between KVM_ARM_VCPU_INIT and KVM_ARM_VCPU_FINALIZE, the feature may be | |
5002 | configured by use of ioctls such as KVM_SET_ONE_REG. The exact configuration | |
5003 | that should be performaned and how to do it are feature-dependent. | |
5004 | ||
5005 | Other calls that depend on a particular feature being finalized, such as | |
5006 | KVM_RUN, KVM_GET_REG_LIST, KVM_GET_ONE_REG and KVM_SET_ONE_REG, will fail with | |
5007 | -EPERM unless the feature has already been finalized by means of a | |
5008 | KVM_ARM_VCPU_FINALIZE call. | |
5009 | ||
5010 | See KVM_ARM_VCPU_INIT for details of vcpu features that require finalization | |
5011 | using this ioctl. | |
5012 | ||
66bb8a06 | 5013 | 4.120 KVM_SET_PMU_EVENT_FILTER |
106ee47d | 5014 | ------------------------------ |
66bb8a06 | 5015 | |
106ee47d MCC |
5016 | :Capability: KVM_CAP_PMU_EVENT_FILTER |
5017 | :Architectures: x86 | |
5018 | :Type: vm ioctl | |
5019 | :Parameters: struct kvm_pmu_event_filter (in) | |
5020 | :Returns: 0 on success, -1 on error | |
66bb8a06 | 5021 | |
106ee47d MCC |
5022 | :: |
5023 | ||
5024 | struct kvm_pmu_event_filter { | |
30cd8604 EH |
5025 | __u32 action; |
5026 | __u32 nevents; | |
5027 | __u32 fixed_counter_bitmap; | |
5028 | __u32 flags; | |
5029 | __u32 pad[4]; | |
5030 | __u64 events[0]; | |
106ee47d | 5031 | }; |
66bb8a06 EH |
5032 | |
5033 | This ioctl restricts the set of PMU events that the guest can program. | |
5034 | The argument holds a list of events which will be allowed or denied. | |
5035 | The eventsel+umask of each event the guest attempts to program is compared | |
5036 | against the events field to determine whether the guest should have access. | |
30cd8604 EH |
5037 | The events field only controls general purpose counters; fixed purpose |
5038 | counters are controlled by the fixed_counter_bitmap. | |
5039 | ||
5040 | No flags are defined yet, the field must be zero. | |
66bb8a06 | 5041 | |
106ee47d MCC |
5042 | Valid values for 'action':: |
5043 | ||
5044 | #define KVM_PMU_EVENT_ALLOW 0 | |
5045 | #define KVM_PMU_EVENT_DENY 1 | |
66bb8a06 | 5046 | |
22945688 | 5047 | 4.121 KVM_PPC_SVM_OFF |
106ee47d MCC |
5048 | --------------------- |
5049 | ||
5050 | :Capability: basic | |
5051 | :Architectures: powerpc | |
5052 | :Type: vm ioctl | |
5053 | :Parameters: none | |
5054 | :Returns: 0 on successful completion, | |
22945688 | 5055 | |
22945688 | 5056 | Errors: |
106ee47d MCC |
5057 | |
5058 | ====== ================================================================ | |
5059 | EINVAL if ultravisor failed to terminate the secure guest | |
5060 | ENOMEM if hypervisor failed to allocate new radix page tables for guest | |
5061 | ====== ================================================================ | |
22945688 BR |
5062 | |
5063 | This ioctl is used to turn off the secure mode of the guest or transition | |
5064 | the guest from secure mode to normal mode. This is invoked when the guest | |
5065 | is reset. This has no effect if called for a normal guest. | |
5066 | ||
5067 | This ioctl issues an ultravisor call to terminate the secure guest, | |
5068 | unpins the VPA pages and releases all the device pages that are used to | |
5069 | track the secure pages by hypervisor. | |
66bb8a06 | 5070 | |
7de3f142 | 5071 | 4.122 KVM_S390_NORMAL_RESET |
a93236fc | 5072 | --------------------------- |
7de3f142 | 5073 | |
a93236fc CB |
5074 | :Capability: KVM_CAP_S390_VCPU_RESETS |
5075 | :Architectures: s390 | |
5076 | :Type: vcpu ioctl | |
5077 | :Parameters: none | |
5078 | :Returns: 0 | |
7de3f142 JF |
5079 | |
5080 | This ioctl resets VCPU registers and control structures according to | |
5081 | the cpu reset definition in the POP (Principles Of Operation). | |
5082 | ||
5083 | 4.123 KVM_S390_INITIAL_RESET | |
a93236fc | 5084 | ---------------------------- |
7de3f142 | 5085 | |
a93236fc CB |
5086 | :Capability: none |
5087 | :Architectures: s390 | |
5088 | :Type: vcpu ioctl | |
5089 | :Parameters: none | |
5090 | :Returns: 0 | |
7de3f142 JF |
5091 | |
5092 | This ioctl resets VCPU registers and control structures according to | |
5093 | the initial cpu reset definition in the POP. However, the cpu is not | |
5094 | put into ESA mode. This reset is a superset of the normal reset. | |
5095 | ||
5096 | 4.124 KVM_S390_CLEAR_RESET | |
a93236fc | 5097 | -------------------------- |
7de3f142 | 5098 | |
a93236fc CB |
5099 | :Capability: KVM_CAP_S390_VCPU_RESETS |
5100 | :Architectures: s390 | |
5101 | :Type: vcpu ioctl | |
5102 | :Parameters: none | |
5103 | :Returns: 0 | |
7de3f142 JF |
5104 | |
5105 | This ioctl resets VCPU registers and control structures according to | |
5106 | the clear cpu reset definition in the POP. However, the cpu is not put | |
5107 | into ESA mode. This reset is a superset of the initial reset. | |
5108 | ||
5109 | ||
04ed89dc JF |
5110 | 4.125 KVM_S390_PV_COMMAND |
5111 | ------------------------- | |
5112 | ||
5113 | :Capability: KVM_CAP_S390_PROTECTED | |
5114 | :Architectures: s390 | |
5115 | :Type: vm ioctl | |
5116 | :Parameters: struct kvm_pv_cmd | |
5117 | :Returns: 0 on success, < 0 on error | |
5118 | ||
5119 | :: | |
5120 | ||
5121 | struct kvm_pv_cmd { | |
5122 | __u32 cmd; /* Command to be executed */ | |
5123 | __u16 rc; /* Ultravisor return code */ | |
5124 | __u16 rrc; /* Ultravisor return reason code */ | |
5125 | __u64 data; /* Data or address */ | |
5126 | __u32 flags; /* flags for future extensions. Must be 0 for now */ | |
5127 | __u32 reserved[3]; | |
5128 | }; | |
5129 | ||
5130 | cmd values: | |
5131 | ||
5132 | KVM_PV_ENABLE | |
5133 | Allocate memory and register the VM with the Ultravisor, thereby | |
5134 | donating memory to the Ultravisor that will become inaccessible to | |
5135 | KVM. All existing CPUs are converted to protected ones. After this | |
5136 | command has succeeded, any CPU added via hotplug will become | |
5137 | protected during its creation as well. | |
5138 | ||
7a265361 CB |
5139 | Errors: |
5140 | ||
5141 | ===== ============================= | |
5142 | EINTR an unmasked signal is pending | |
5143 | ===== ============================= | |
5144 | ||
04ed89dc JF |
5145 | KVM_PV_DISABLE |
5146 | ||
5147 | Deregister the VM from the Ultravisor and reclaim the memory that | |
5148 | had been donated to the Ultravisor, making it usable by the kernel | |
5149 | again. All registered VCPUs are converted back to non-protected | |
5150 | ones. | |
5151 | ||
5152 | KVM_PV_VM_SET_SEC_PARMS | |
5153 | Pass the image header from VM memory to the Ultravisor in | |
5154 | preparation of image unpacking and verification. | |
5155 | ||
5156 | KVM_PV_VM_UNPACK | |
5157 | Unpack (protect and decrypt) a page of the encrypted boot image. | |
5158 | ||
5159 | KVM_PV_VM_VERIFY | |
5160 | Verify the integrity of the unpacked image. Only if this succeeds, | |
5161 | KVM is allowed to start protected VCPUs. | |
5162 | ||
1a155254 AG |
5163 | 4.126 KVM_X86_SET_MSR_FILTER |
5164 | ---------------------------- | |
5165 | ||
46a63924 | 5166 | :Capability: KVM_CAP_X86_MSR_FILTER |
1a155254 AG |
5167 | :Architectures: x86 |
5168 | :Type: vm ioctl | |
5169 | :Parameters: struct kvm_msr_filter | |
5170 | :Returns: 0 on success, < 0 on error | |
5171 | ||
5172 | :: | |
5173 | ||
5174 | struct kvm_msr_filter_range { | |
5175 | #define KVM_MSR_FILTER_READ (1 << 0) | |
5176 | #define KVM_MSR_FILTER_WRITE (1 << 1) | |
5177 | __u32 flags; | |
5178 | __u32 nmsrs; /* number of msrs in bitmap */ | |
5179 | __u32 base; /* MSR index the bitmap starts at */ | |
5180 | __u8 *bitmap; /* a 1 bit allows the operations in flags, 0 denies */ | |
5181 | }; | |
5182 | ||
5183 | #define KVM_MSR_FILTER_MAX_RANGES 16 | |
5184 | struct kvm_msr_filter { | |
5185 | #define KVM_MSR_FILTER_DEFAULT_ALLOW (0 << 0) | |
5186 | #define KVM_MSR_FILTER_DEFAULT_DENY (1 << 0) | |
5187 | __u32 flags; | |
5188 | struct kvm_msr_filter_range ranges[KVM_MSR_FILTER_MAX_RANGES]; | |
5189 | }; | |
5190 | ||
9389b9d5 | 5191 | flags values for ``struct kvm_msr_filter_range``: |
1a155254 | 5192 | |
9389b9d5 | 5193 | ``KVM_MSR_FILTER_READ`` |
1a155254 AG |
5194 | |
5195 | Filter read accesses to MSRs using the given bitmap. A 0 in the bitmap | |
5196 | indicates that a read should immediately fail, while a 1 indicates that | |
5197 | a read for a particular MSR should be handled regardless of the default | |
5198 | filter action. | |
5199 | ||
9389b9d5 | 5200 | ``KVM_MSR_FILTER_WRITE`` |
1a155254 AG |
5201 | |
5202 | Filter write accesses to MSRs using the given bitmap. A 0 in the bitmap | |
5203 | indicates that a write should immediately fail, while a 1 indicates that | |
5204 | a write for a particular MSR should be handled regardless of the default | |
5205 | filter action. | |
5206 | ||
9389b9d5 | 5207 | ``KVM_MSR_FILTER_READ | KVM_MSR_FILTER_WRITE`` |
1a155254 AG |
5208 | |
5209 | Filter both read and write accesses to MSRs using the given bitmap. A 0 | |
5210 | in the bitmap indicates that both reads and writes should immediately fail, | |
5211 | while a 1 indicates that reads and writes for a particular MSR are not | |
5212 | filtered by this range. | |
5213 | ||
9389b9d5 | 5214 | flags values for ``struct kvm_msr_filter``: |
1a155254 | 5215 | |
9389b9d5 | 5216 | ``KVM_MSR_FILTER_DEFAULT_ALLOW`` |
1a155254 AG |
5217 | |
5218 | If no filter range matches an MSR index that is getting accessed, KVM will | |
5219 | fall back to allowing access to the MSR. | |
5220 | ||
9389b9d5 | 5221 | ``KVM_MSR_FILTER_DEFAULT_DENY`` |
1a155254 AG |
5222 | |
5223 | If no filter range matches an MSR index that is getting accessed, KVM will | |
5224 | fall back to rejecting access to the MSR. In this mode, all MSRs that should | |
5225 | be processed by KVM need to explicitly be marked as allowed in the bitmaps. | |
5226 | ||
5227 | This ioctl allows user space to define up to 16 bitmaps of MSR ranges to | |
5228 | specify whether a certain MSR access should be explicitly filtered for or not. | |
5229 | ||
5230 | If this ioctl has never been invoked, MSR accesses are not guarded and the | |
9389b9d5 | 5231 | default KVM in-kernel emulation behavior is fully preserved. |
1a155254 | 5232 | |
043248b3 PB |
5233 | Calling this ioctl with an empty set of ranges (all nmsrs == 0) disables MSR |
5234 | filtering. In that mode, ``KVM_MSR_FILTER_DEFAULT_DENY`` is invalid and causes | |
5235 | an error. | |
5236 | ||
1a155254 | 5237 | As soon as the filtering is in place, every MSR access is processed through |
9389b9d5 SC |
5238 | the filtering except for accesses to the x2APIC MSRs (from 0x800 to 0x8ff); |
5239 | x2APIC MSRs are always allowed, independent of the ``default_allow`` setting, | |
5240 | and their behavior depends on the ``X2APIC_ENABLE`` bit of the APIC base | |
5241 | register. | |
5242 | ||
043248b3 PB |
5243 | If a bit is within one of the defined ranges, read and write accesses are |
5244 | guarded by the bitmap's value for the MSR index if the kind of access | |
5245 | is included in the ``struct kvm_msr_filter_range`` flags. If no range | |
5246 | cover this particular access, the behavior is determined by the flags | |
5247 | field in the kvm_msr_filter struct: ``KVM_MSR_FILTER_DEFAULT_ALLOW`` | |
5248 | and ``KVM_MSR_FILTER_DEFAULT_DENY``. | |
1a155254 AG |
5249 | |
5250 | Each bitmap range specifies a range of MSRs to potentially allow access on. | |
5251 | The range goes from MSR index [base .. base+nmsrs]. The flags field | |
5252 | indicates whether reads, writes or both reads and writes are filtered | |
5253 | by setting a 1 bit in the bitmap for the corresponding MSR index. | |
5254 | ||
5255 | If an MSR access is not permitted through the filtering, it generates a | |
5256 | #GP inside the guest. When combined with KVM_CAP_X86_USER_SPACE_MSR, that | |
5257 | allows user space to deflect and potentially handle various MSR accesses | |
5258 | into user space. | |
5259 | ||
b318e8de SC |
5260 | Note, invoking this ioctl with a vCPU is running is inherently racy. However, |
5261 | KVM does guarantee that vCPUs will see either the previous filter or the new | |
5262 | filter, e.g. MSRs with identical settings in both the old and new filter will | |
5263 | have deterministic behavior. | |
1a155254 | 5264 | |
e1f68169 DW |
5265 | 4.127 KVM_XEN_HVM_SET_ATTR |
5266 | -------------------------- | |
5267 | ||
5268 | :Capability: KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO | |
5269 | :Architectures: x86 | |
5270 | :Type: vm ioctl | |
5271 | :Parameters: struct kvm_xen_hvm_attr | |
5272 | :Returns: 0 on success, < 0 on error | |
5273 | ||
5274 | :: | |
5275 | ||
5276 | struct kvm_xen_hvm_attr { | |
5277 | __u16 type; | |
5278 | __u16 pad[3]; | |
5279 | union { | |
5280 | __u8 long_mode; | |
5281 | __u8 vector; | |
5282 | struct { | |
5283 | __u64 gfn; | |
5284 | } shared_info; | |
661a20fa DW |
5285 | struct { |
5286 | __u32 send_port; | |
5287 | __u32 type; /* EVTCHNSTAT_ipi / EVTCHNSTAT_interdomain */ | |
5288 | __u32 flags; | |
5289 | union { | |
5290 | struct { | |
5291 | __u32 port; | |
5292 | __u32 vcpu; | |
5293 | __u32 priority; | |
5294 | } port; | |
5295 | struct { | |
5296 | __u32 port; /* Zero for eventfd */ | |
5297 | __s32 fd; | |
5298 | } eventfd; | |
5299 | __u32 padding[4]; | |
5300 | } deliver; | |
5301 | } evtchn; | |
5302 | __u32 xen_version; | |
5303 | __u64 pad[8]; | |
e1f68169 DW |
5304 | } u; |
5305 | }; | |
5306 | ||
5307 | type values: | |
5308 | ||
5309 | KVM_XEN_ATTR_TYPE_LONG_MODE | |
5310 | Sets the ABI mode of the VM to 32-bit or 64-bit (long mode). This | |
5311 | determines the layout of the shared info pages exposed to the VM. | |
5312 | ||
5313 | KVM_XEN_ATTR_TYPE_SHARED_INFO | |
5314 | Sets the guest physical frame number at which the Xen "shared info" | |
5315 | page resides. Note that although Xen places vcpu_info for the first | |
5316 | 32 vCPUs in the shared_info page, KVM does not automatically do so | |
5317 | and instead requires that KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO be used | |
5318 | explicitly even when the vcpu_info for a given vCPU resides at the | |
5319 | "default" location in the shared_info page. This is because KVM is | |
5320 | not aware of the Xen CPU id which is used as the index into the | |
5321 | vcpu_info[] array, so cannot know the correct default location. | |
5322 | ||
1cfc9c4b DW |
5323 | Note that the shared info page may be constantly written to by KVM; |
5324 | it contains the event channel bitmap used to deliver interrupts to | |
5325 | a Xen guest, amongst other things. It is exempt from dirty tracking | |
5326 | mechanisms — KVM will not explicitly mark the page as dirty each | |
5327 | time an event channel interrupt is delivered to the guest! Thus, | |
5328 | userspace should always assume that the designated GFN is dirty if | |
5329 | any vCPU has been running or any event channel interrupts can be | |
5330 | routed to the guest. | |
5331 | ||
e1f68169 DW |
5332 | KVM_XEN_ATTR_TYPE_UPCALL_VECTOR |
5333 | Sets the exception vector used to deliver Xen event channel upcalls. | |
661a20fa DW |
5334 | This is the HVM-wide vector injected directly by the hypervisor |
5335 | (not through the local APIC), typically configured by a guest via | |
5336 | HVM_PARAM_CALLBACK_IRQ. | |
5337 | ||
5338 | KVM_XEN_ATTR_TYPE_EVTCHN | |
5339 | This attribute is available when the KVM_CAP_XEN_HVM ioctl indicates | |
5340 | support for KVM_XEN_HVM_CONFIG_EVTCHN_SEND features. It configures | |
5341 | an outbound port number for interception of EVTCHNOP_send requests | |
5342 | from the guest. A given sending port number may be directed back | |
5343 | to a specified vCPU (by APIC ID) / port / priority on the guest, | |
5344 | or to trigger events on an eventfd. The vCPU and priority can be | |
5345 | changed by setting KVM_XEN_EVTCHN_UPDATE in a subsequent call, | |
5346 | but other fields cannot change for a given sending port. A port | |
5347 | mapping is removed by using KVM_XEN_EVTCHN_DEASSIGN in the flags | |
5348 | field. | |
5349 | ||
5350 | KVM_XEN_ATTR_TYPE_XEN_VERSION | |
5351 | This attribute is available when the KVM_CAP_XEN_HVM ioctl indicates | |
5352 | support for KVM_XEN_HVM_CONFIG_EVTCHN_SEND features. It configures | |
5353 | the 32-bit version code returned to the guest when it invokes the | |
5354 | XENVER_version call; typically (XEN_MAJOR << 16 | XEN_MINOR). PV | |
5355 | Xen guests will often use this to as a dummy hypercall to trigger | |
5356 | event channel delivery, so responding within the kernel without | |
5357 | exiting to userspace is beneficial. | |
e1f68169 | 5358 | |
24e7475f | 5359 | 4.127 KVM_XEN_HVM_GET_ATTR |
e1f68169 DW |
5360 | -------------------------- |
5361 | ||
5362 | :Capability: KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO | |
5363 | :Architectures: x86 | |
5364 | :Type: vm ioctl | |
5365 | :Parameters: struct kvm_xen_hvm_attr | |
5366 | :Returns: 0 on success, < 0 on error | |
5367 | ||
5368 | Allows Xen VM attributes to be read. For the structure and types, | |
661a20fa DW |
5369 | see KVM_XEN_HVM_SET_ATTR above. The KVM_XEN_ATTR_TYPE_EVTCHN |
5370 | attribute cannot be read. | |
e1f68169 | 5371 | |
24e7475f | 5372 | 4.128 KVM_XEN_VCPU_SET_ATTR |
e1f68169 DW |
5373 | --------------------------- |
5374 | ||
5375 | :Capability: KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO | |
5376 | :Architectures: x86 | |
5377 | :Type: vcpu ioctl | |
5378 | :Parameters: struct kvm_xen_vcpu_attr | |
5379 | :Returns: 0 on success, < 0 on error | |
5380 | ||
5381 | :: | |
5382 | ||
5383 | struct kvm_xen_vcpu_attr { | |
5384 | __u16 type; | |
5385 | __u16 pad[3]; | |
5386 | union { | |
5387 | __u64 gpa; | |
5388 | __u64 pad[4]; | |
30b5c851 DW |
5389 | struct { |
5390 | __u64 state; | |
5391 | __u64 state_entry_time; | |
5392 | __u64 time_running; | |
5393 | __u64 time_runnable; | |
5394 | __u64 time_blocked; | |
5395 | __u64 time_offline; | |
5396 | } runstate; | |
661a20fa DW |
5397 | __u32 vcpu_id; |
5398 | struct { | |
5399 | __u32 port; | |
5400 | __u32 priority; | |
5401 | __u64 expires_ns; | |
5402 | } timer; | |
5403 | __u8 vector; | |
e1f68169 DW |
5404 | } u; |
5405 | }; | |
5406 | ||
5407 | type values: | |
5408 | ||
5409 | KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO | |
5410 | Sets the guest physical address of the vcpu_info for a given vCPU. | |
cf1d88b3 DW |
5411 | As with the shared_info page for the VM, the corresponding page may be |
5412 | dirtied at any time if event channel interrupt delivery is enabled, so | |
5413 | userspace should always assume that the page is dirty without relying | |
5414 | on dirty logging. | |
e1f68169 DW |
5415 | |
5416 | KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO | |
5417 | Sets the guest physical address of an additional pvclock structure | |
5418 | for a given vCPU. This is typically used for guest vsyscall support. | |
5419 | ||
30b5c851 DW |
5420 | KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADDR |
5421 | Sets the guest physical address of the vcpu_runstate_info for a given | |
5422 | vCPU. This is how a Xen guest tracks CPU state such as steal time. | |
5423 | ||
5424 | KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT | |
5425 | Sets the runstate (RUNSTATE_running/_runnable/_blocked/_offline) of | |
5426 | the given vCPU from the .u.runstate.state member of the structure. | |
5427 | KVM automatically accounts running and runnable time but blocked | |
5428 | and offline states are only entered explicitly. | |
5429 | ||
5430 | KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_DATA | |
5431 | Sets all fields of the vCPU runstate data from the .u.runstate member | |
5432 | of the structure, including the current runstate. The state_entry_time | |
5433 | must equal the sum of the other four times. | |
5434 | ||
5435 | KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST | |
5436 | This *adds* the contents of the .u.runstate members of the structure | |
5437 | to the corresponding members of the given vCPU's runstate data, thus | |
5438 | permitting atomic adjustments to the runstate times. The adjustment | |
5439 | to the state_entry_time must equal the sum of the adjustments to the | |
5440 | other four times. The state field must be set to -1, or to a valid | |
5441 | runstate value (RUNSTATE_running, RUNSTATE_runnable, RUNSTATE_blocked | |
5442 | or RUNSTATE_offline) to set the current accounted state as of the | |
5443 | adjusted state_entry_time. | |
5444 | ||
661a20fa DW |
5445 | KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID |
5446 | This attribute is available when the KVM_CAP_XEN_HVM ioctl indicates | |
5447 | support for KVM_XEN_HVM_CONFIG_EVTCHN_SEND features. It sets the Xen | |
5448 | vCPU ID of the given vCPU, to allow timer-related VCPU operations to | |
5449 | be intercepted by KVM. | |
5450 | ||
5451 | KVM_XEN_VCPU_ATTR_TYPE_TIMER | |
5452 | This attribute is available when the KVM_CAP_XEN_HVM ioctl indicates | |
5453 | support for KVM_XEN_HVM_CONFIG_EVTCHN_SEND features. It sets the | |
5454 | event channel port/priority for the VIRQ_TIMER of the vCPU, as well | |
5455 | as allowing a pending timer to be saved/restored. | |
5456 | ||
5457 | KVM_XEN_VCPU_ATTR_TYPE_UPCALL_VECTOR | |
5458 | This attribute is available when the KVM_CAP_XEN_HVM ioctl indicates | |
5459 | support for KVM_XEN_HVM_CONFIG_EVTCHN_SEND features. It sets the | |
5460 | per-vCPU local APIC upcall vector, configured by a Xen guest with | |
5461 | the HVMOP_set_evtchn_upcall_vector hypercall. This is typically | |
5462 | used by Windows guests, and is distinct from the HVM-wide upcall | |
5463 | vector configured with HVM_PARAM_CALLBACK_IRQ. | |
5464 | ||
5465 | ||
24e7475f | 5466 | 4.129 KVM_XEN_VCPU_GET_ATTR |
9294b8a1 | 5467 | --------------------------- |
e1f68169 DW |
5468 | |
5469 | :Capability: KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO | |
5470 | :Architectures: x86 | |
5471 | :Type: vcpu ioctl | |
5472 | :Parameters: struct kvm_xen_vcpu_attr | |
5473 | :Returns: 0 on success, < 0 on error | |
5474 | ||
5475 | Allows Xen vCPU attributes to be read. For the structure and types, | |
5476 | see KVM_XEN_VCPU_SET_ATTR above. | |
04ed89dc | 5477 | |
30b5c851 DW |
5478 | The KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST type may not be used |
5479 | with the KVM_XEN_VCPU_GET_ATTR ioctl. | |
5480 | ||
04c02c20 SP |
5481 | 4.130 KVM_ARM_MTE_COPY_TAGS |
5482 | --------------------------- | |
5483 | ||
5484 | :Capability: KVM_CAP_ARM_MTE | |
5485 | :Architectures: arm64 | |
5486 | :Type: vm ioctl | |
5487 | :Parameters: struct kvm_arm_copy_mte_tags | |
5488 | :Returns: number of bytes copied, < 0 on error (-EINVAL for incorrect | |
5489 | arguments, -EFAULT if memory cannot be accessed). | |
5490 | ||
5491 | :: | |
5492 | ||
5493 | struct kvm_arm_copy_mte_tags { | |
5494 | __u64 guest_ipa; | |
5495 | __u64 length; | |
5496 | void __user *addr; | |
5497 | __u64 flags; | |
5498 | __u64 reserved[2]; | |
5499 | }; | |
5500 | ||
5501 | Copies Memory Tagging Extension (MTE) tags to/from guest tag memory. The | |
5502 | ``guest_ipa`` and ``length`` fields must be ``PAGE_SIZE`` aligned. The ``addr`` | |
5503 | field must point to a buffer which the tags will be copied to or from. | |
5504 | ||
5505 | ``flags`` specifies the direction of copy, either ``KVM_ARM_TAGS_TO_GUEST`` or | |
5506 | ``KVM_ARM_TAGS_FROM_GUEST``. | |
5507 | ||
5508 | The size of the buffer to store the tags is ``(length / 16)`` bytes | |
5509 | (granules in MTE are 16 bytes long). Each byte contains a single tag | |
5510 | value. This matches the format of ``PTRACE_PEEKMTETAGS`` and | |
5511 | ``PTRACE_POKEMTETAGS``. | |
5512 | ||
5513 | If an error occurs before any data is copied then a negative error code is | |
5514 | returned. If some tags have been copied before an error occurs then the number | |
5515 | of bytes successfully copied is returned. If the call completes successfully | |
5516 | then ``length`` is returned. | |
6dba9403 ML |
5517 | |
5518 | 4.131 KVM_GET_SREGS2 | |
8b967164 | 5519 | -------------------- |
6dba9403 ML |
5520 | |
5521 | :Capability: KVM_CAP_SREGS2 | |
5522 | :Architectures: x86 | |
5523 | :Type: vcpu ioctl | |
5524 | :Parameters: struct kvm_sregs2 (out) | |
5525 | :Returns: 0 on success, -1 on error | |
5526 | ||
5527 | Reads special registers from the vcpu. | |
5528 | This ioctl (when supported) replaces the KVM_GET_SREGS. | |
5529 | ||
5530 | :: | |
5531 | ||
8b967164 IC |
5532 | struct kvm_sregs2 { |
5533 | /* out (KVM_GET_SREGS2) / in (KVM_SET_SREGS2) */ | |
5534 | struct kvm_segment cs, ds, es, fs, gs, ss; | |
5535 | struct kvm_segment tr, ldt; | |
5536 | struct kvm_dtable gdt, idt; | |
5537 | __u64 cr0, cr2, cr3, cr4, cr8; | |
5538 | __u64 efer; | |
5539 | __u64 apic_base; | |
5540 | __u64 flags; | |
5541 | __u64 pdptrs[4]; | |
5542 | }; | |
6dba9403 ML |
5543 | |
5544 | flags values for ``kvm_sregs2``: | |
5545 | ||
5546 | ``KVM_SREGS2_FLAGS_PDPTRS_VALID`` | |
5547 | ||
5548 | Indicates thats the struct contain valid PDPTR values. | |
5549 | ||
5550 | ||
5551 | 4.132 KVM_SET_SREGS2 | |
8b967164 | 5552 | -------------------- |
6dba9403 ML |
5553 | |
5554 | :Capability: KVM_CAP_SREGS2 | |
5555 | :Architectures: x86 | |
5556 | :Type: vcpu ioctl | |
5557 | :Parameters: struct kvm_sregs2 (in) | |
5558 | :Returns: 0 on success, -1 on error | |
5559 | ||
5560 | Writes special registers into the vcpu. | |
5561 | See KVM_GET_SREGS2 for the data structures. | |
5562 | This ioctl (when supported) replaces the KVM_SET_SREGS. | |
5563 | ||
fdc09ddd JZ |
5564 | 4.133 KVM_GET_STATS_FD |
5565 | ---------------------- | |
5566 | ||
5567 | :Capability: KVM_CAP_STATS_BINARY_FD | |
5568 | :Architectures: all | |
5569 | :Type: vm ioctl, vcpu ioctl | |
5570 | :Parameters: none | |
5571 | :Returns: statistics file descriptor on success, < 0 on error | |
5572 | ||
5573 | Errors: | |
5574 | ||
5575 | ====== ====================================================== | |
5576 | ENOMEM if the fd could not be created due to lack of memory | |
5577 | EMFILE if the number of opened files exceeds the limit | |
5578 | ====== ====================================================== | |
5579 | ||
5580 | The returned file descriptor can be used to read VM/vCPU statistics data in | |
5581 | binary format. The data in the file descriptor consists of four blocks | |
5582 | organized as follows: | |
5583 | ||
5584 | +-------------+ | |
5585 | | Header | | |
5586 | +-------------+ | |
5587 | | id string | | |
5588 | +-------------+ | |
5589 | | Descriptors | | |
5590 | +-------------+ | |
5591 | | Stats Data | | |
5592 | +-------------+ | |
5593 | ||
5594 | Apart from the header starting at offset 0, please be aware that it is | |
5595 | not guaranteed that the four blocks are adjacent or in the above order; | |
5596 | the offsets of the id, descriptors and data blocks are found in the | |
5597 | header. However, all four blocks are aligned to 64 bit offsets in the | |
5598 | file and they do not overlap. | |
5599 | ||
5600 | All blocks except the data block are immutable. Userspace can read them | |
5601 | only one time after retrieving the file descriptor, and then use ``pread`` or | |
5602 | ``lseek`` to read the statistics repeatedly. | |
5603 | ||
5604 | All data is in system endianness. | |
5605 | ||
5606 | The format of the header is as follows:: | |
5607 | ||
5608 | struct kvm_stats_header { | |
5609 | __u32 flags; | |
5610 | __u32 name_size; | |
5611 | __u32 num_desc; | |
5612 | __u32 id_offset; | |
5613 | __u32 desc_offset; | |
5614 | __u32 data_offset; | |
5615 | }; | |
5616 | ||
5617 | The ``flags`` field is not used at the moment. It is always read as 0. | |
5618 | ||
5619 | The ``name_size`` field is the size (in byte) of the statistics name string | |
5620 | (including trailing '\0') which is contained in the "id string" block and | |
5621 | appended at the end of every descriptor. | |
5622 | ||
5623 | The ``num_desc`` field is the number of descriptors that are included in the | |
5624 | descriptor block. (The actual number of values in the data block may be | |
5625 | larger, since each descriptor may comprise more than one value). | |
5626 | ||
5627 | The ``id_offset`` field is the offset of the id string from the start of the | |
5628 | file indicated by the file descriptor. It is a multiple of 8. | |
5629 | ||
5630 | The ``desc_offset`` field is the offset of the Descriptors block from the start | |
5631 | of the file indicated by the file descriptor. It is a multiple of 8. | |
5632 | ||
5633 | The ``data_offset`` field is the offset of the Stats Data block from the start | |
5634 | of the file indicated by the file descriptor. It is a multiple of 8. | |
5635 | ||
5636 | The id string block contains a string which identifies the file descriptor on | |
5637 | which KVM_GET_STATS_FD was invoked. The size of the block, including the | |
5638 | trailing ``'\0'``, is indicated by the ``name_size`` field in the header. | |
5639 | ||
5640 | The descriptors block is only needed to be read once for the lifetime of the | |
5641 | file descriptor contains a sequence of ``struct kvm_stats_desc``, each followed | |
5642 | by a string of size ``name_size``. | |
a9fd134b | 5643 | :: |
fdc09ddd JZ |
5644 | |
5645 | #define KVM_STATS_TYPE_SHIFT 0 | |
5646 | #define KVM_STATS_TYPE_MASK (0xF << KVM_STATS_TYPE_SHIFT) | |
5647 | #define KVM_STATS_TYPE_CUMULATIVE (0x0 << KVM_STATS_TYPE_SHIFT) | |
5648 | #define KVM_STATS_TYPE_INSTANT (0x1 << KVM_STATS_TYPE_SHIFT) | |
5649 | #define KVM_STATS_TYPE_PEAK (0x2 << KVM_STATS_TYPE_SHIFT) | |
0176ec51 JZ |
5650 | #define KVM_STATS_TYPE_LINEAR_HIST (0x3 << KVM_STATS_TYPE_SHIFT) |
5651 | #define KVM_STATS_TYPE_LOG_HIST (0x4 << KVM_STATS_TYPE_SHIFT) | |
5652 | #define KVM_STATS_TYPE_MAX KVM_STATS_TYPE_LOG_HIST | |
fdc09ddd JZ |
5653 | |
5654 | #define KVM_STATS_UNIT_SHIFT 4 | |
5655 | #define KVM_STATS_UNIT_MASK (0xF << KVM_STATS_UNIT_SHIFT) | |
5656 | #define KVM_STATS_UNIT_NONE (0x0 << KVM_STATS_UNIT_SHIFT) | |
5657 | #define KVM_STATS_UNIT_BYTES (0x1 << KVM_STATS_UNIT_SHIFT) | |
5658 | #define KVM_STATS_UNIT_SECONDS (0x2 << KVM_STATS_UNIT_SHIFT) | |
5659 | #define KVM_STATS_UNIT_CYCLES (0x3 << KVM_STATS_UNIT_SHIFT) | |
1b870fa5 | 5660 | #define KVM_STATS_UNIT_BOOLEAN (0x4 << KVM_STATS_UNIT_SHIFT) |
450a5639 | 5661 | #define KVM_STATS_UNIT_MAX KVM_STATS_UNIT_BOOLEAN |
fdc09ddd JZ |
5662 | |
5663 | #define KVM_STATS_BASE_SHIFT 8 | |
5664 | #define KVM_STATS_BASE_MASK (0xF << KVM_STATS_BASE_SHIFT) | |
5665 | #define KVM_STATS_BASE_POW10 (0x0 << KVM_STATS_BASE_SHIFT) | |
5666 | #define KVM_STATS_BASE_POW2 (0x1 << KVM_STATS_BASE_SHIFT) | |
0176ec51 | 5667 | #define KVM_STATS_BASE_MAX KVM_STATS_BASE_POW2 |
fdc09ddd JZ |
5668 | |
5669 | struct kvm_stats_desc { | |
5670 | __u32 flags; | |
5671 | __s16 exponent; | |
5672 | __u16 size; | |
5673 | __u32 offset; | |
0176ec51 | 5674 | __u32 bucket_size; |
fdc09ddd JZ |
5675 | char name[]; |
5676 | }; | |
5677 | ||
5678 | The ``flags`` field contains the type and unit of the statistics data described | |
5679 | by this descriptor. Its endianness is CPU native. | |
5680 | The following flags are supported: | |
5681 | ||
5682 | Bits 0-3 of ``flags`` encode the type: | |
a9fd134b | 5683 | |
fdc09ddd | 5684 | * ``KVM_STATS_TYPE_CUMULATIVE`` |
0176ec51 | 5685 | The statistics reports a cumulative count. The value of data can only be increased. |
fdc09ddd JZ |
5686 | Most of the counters used in KVM are of this type. |
5687 | The corresponding ``size`` field for this type is always 1. | |
5688 | All cumulative statistics data are read/write. | |
5689 | * ``KVM_STATS_TYPE_INSTANT`` | |
0176ec51 | 5690 | The statistics reports an instantaneous value. Its value can be increased or |
fdc09ddd JZ |
5691 | decreased. This type is usually used as a measurement of some resources, |
5692 | like the number of dirty pages, the number of large pages, etc. | |
5693 | All instant statistics are read only. | |
5694 | The corresponding ``size`` field for this type is always 1. | |
5695 | * ``KVM_STATS_TYPE_PEAK`` | |
0176ec51 | 5696 | The statistics data reports a peak value, for example the maximum number |
fdc09ddd | 5697 | of items in a hash table bucket, the longest time waited and so on. |
0176ec51 | 5698 | The value of data can only be increased. |
fdc09ddd | 5699 | The corresponding ``size`` field for this type is always 1. |
0176ec51 JZ |
5700 | * ``KVM_STATS_TYPE_LINEAR_HIST`` |
5701 | The statistic is reported as a linear histogram. The number of | |
5702 | buckets is specified by the ``size`` field. The size of buckets is specified | |
5703 | by the ``hist_param`` field. The range of the Nth bucket (1 <= N < ``size``) | |
5704 | is [``hist_param``*(N-1), ``hist_param``*N), while the range of the last | |
5705 | bucket is [``hist_param``*(``size``-1), +INF). (+INF means positive infinity | |
942d9e89 | 5706 | value.) |
0176ec51 JZ |
5707 | * ``KVM_STATS_TYPE_LOG_HIST`` |
5708 | The statistic is reported as a logarithmic histogram. The number of | |
5709 | buckets is specified by the ``size`` field. The range of the first bucket is | |
5710 | [0, 1), while the range of the last bucket is [pow(2, ``size``-2), +INF). | |
5711 | Otherwise, The Nth bucket (1 < N < ``size``) covers | |
942d9e89 | 5712 | [pow(2, N-2), pow(2, N-1)). |
fdc09ddd JZ |
5713 | |
5714 | Bits 4-7 of ``flags`` encode the unit: | |
a9fd134b | 5715 | |
fdc09ddd JZ |
5716 | * ``KVM_STATS_UNIT_NONE`` |
5717 | There is no unit for the value of statistics data. This usually means that | |
5718 | the value is a simple counter of an event. | |
5719 | * ``KVM_STATS_UNIT_BYTES`` | |
5720 | It indicates that the statistics data is used to measure memory size, in the | |
5721 | unit of Byte, KiByte, MiByte, GiByte, etc. The unit of the data is | |
5722 | determined by the ``exponent`` field in the descriptor. | |
5723 | * ``KVM_STATS_UNIT_SECONDS`` | |
5724 | It indicates that the statistics data is used to measure time or latency. | |
5725 | * ``KVM_STATS_UNIT_CYCLES`` | |
5726 | It indicates that the statistics data is used to measure CPU clock cycles. | |
1b870fa5 PB |
5727 | * ``KVM_STATS_UNIT_BOOLEAN`` |
5728 | It indicates that the statistic will always be either 0 or 1. Boolean | |
5729 | statistics of "peak" type will never go back from 1 to 0. Boolean | |
5730 | statistics can be linear histograms (with two buckets) but not logarithmic | |
5731 | histograms. | |
fdc09ddd | 5732 | |
942d9e89 PB |
5733 | Note that, in the case of histograms, the unit applies to the bucket |
5734 | ranges, while the bucket value indicates how many samples fell in the | |
5735 | bucket's range. | |
5736 | ||
fdc09ddd JZ |
5737 | Bits 8-11 of ``flags``, together with ``exponent``, encode the scale of the |
5738 | unit: | |
a9fd134b | 5739 | |
fdc09ddd JZ |
5740 | * ``KVM_STATS_BASE_POW10`` |
5741 | The scale is based on power of 10. It is used for measurement of time and | |
5742 | CPU clock cycles. For example, an exponent of -9 can be used with | |
5743 | ``KVM_STATS_UNIT_SECONDS`` to express that the unit is nanoseconds. | |
5744 | * ``KVM_STATS_BASE_POW2`` | |
5745 | The scale is based on power of 2. It is used for measurement of memory size. | |
5746 | For example, an exponent of 20 can be used with ``KVM_STATS_UNIT_BYTES`` to | |
5747 | express that the unit is MiB. | |
5748 | ||
5749 | The ``size`` field is the number of values of this statistics data. Its | |
5750 | value is usually 1 for most of simple statistics. 1 means it contains an | |
5751 | unsigned 64bit data. | |
5752 | ||
5753 | The ``offset`` field is the offset from the start of Data Block to the start of | |
5754 | the corresponding statistics data. | |
5755 | ||
0176ec51 JZ |
5756 | The ``bucket_size`` field is used as a parameter for histogram statistics data. |
5757 | It is only used by linear histogram statistics data, specifying the size of a | |
942d9e89 | 5758 | bucket in the unit expressed by bits 4-11 of ``flags`` together with ``exponent``. |
fdc09ddd JZ |
5759 | |
5760 | The ``name`` field is the name string of the statistics data. The name string | |
5761 | starts at the end of ``struct kvm_stats_desc``. The maximum length including | |
5762 | the trailing ``'\0'``, is indicated by ``name_size`` in the header. | |
5763 | ||
5764 | The Stats Data block contains an array of 64-bit values in the same order | |
5765 | as the descriptors in Descriptors block. | |
6dba9403 | 5766 | |
e2e83a73 WW |
5767 | 4.134 KVM_GET_XSAVE2 |
5768 | -------------------- | |
be50b206 GZ |
5769 | |
5770 | :Capability: KVM_CAP_XSAVE2 | |
5771 | :Architectures: x86 | |
5772 | :Type: vcpu ioctl | |
5773 | :Parameters: struct kvm_xsave (out) | |
5774 | :Returns: 0 on success, -1 on error | |
5775 | ||
5776 | ||
5777 | :: | |
5778 | ||
5779 | struct kvm_xsave { | |
5780 | __u32 region[1024]; | |
5781 | __u32 extra[0]; | |
5782 | }; | |
5783 | ||
5784 | This ioctl would copy current vcpu's xsave struct to the userspace. It | |
5785 | copies as many bytes as are returned by KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2) | |
5786 | when invoked on the vm file descriptor. The size value returned by | |
5787 | KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2) will always be at least 4096. | |
5788 | Currently, it is only greater than 4096 if a dynamic feature has been | |
5789 | enabled with ``arch_prctl()``, but this may change in the future. | |
5790 | ||
5791 | The offsets of the state save areas in struct kvm_xsave follow the contents | |
5792 | of CPUID leaf 0xD on the host. | |
5793 | ||
661a20fa DW |
5794 | 4.135 KVM_XEN_HVM_EVTCHN_SEND |
5795 | ----------------------------- | |
5796 | ||
5797 | :Capability: KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_EVTCHN_SEND | |
5798 | :Architectures: x86 | |
5799 | :Type: vm ioctl | |
5800 | :Parameters: struct kvm_irq_routing_xen_evtchn | |
5801 | :Returns: 0 on success, < 0 on error | |
5802 | ||
5803 | ||
5804 | :: | |
5805 | ||
5806 | struct kvm_irq_routing_xen_evtchn { | |
5807 | __u32 port; | |
5808 | __u32 vcpu; | |
5809 | __u32 priority; | |
5810 | }; | |
5811 | ||
5812 | This ioctl injects an event channel interrupt directly to the guest vCPU. | |
be50b206 | 5813 | |
9c1b96e3 | 5814 | 5. The kvm_run structure |
106ee47d | 5815 | ======================== |
9c1b96e3 AK |
5816 | |
5817 | Application code obtains a pointer to the kvm_run structure by | |
5818 | mmap()ing a vcpu fd. From that point, application code can control | |
5819 | execution by changing fields in kvm_run prior to calling the KVM_RUN | |
5820 | ioctl, and obtain information about the reason KVM_RUN returned by | |
5821 | looking up structure members. | |
5822 | ||
106ee47d MCC |
5823 | :: |
5824 | ||
5825 | struct kvm_run { | |
9c1b96e3 AK |
5826 | /* in */ |
5827 | __u8 request_interrupt_window; | |
5828 | ||
5829 | Request that KVM_RUN return when it becomes possible to inject external | |
5830 | interrupts into the guest. Useful in conjunction with KVM_INTERRUPT. | |
5831 | ||
106ee47d MCC |
5832 | :: |
5833 | ||
460df4c1 PB |
5834 | __u8 immediate_exit; |
5835 | ||
5836 | This field is polled once when KVM_RUN starts; if non-zero, KVM_RUN | |
5837 | exits immediately, returning -EINTR. In the common scenario where a | |
5838 | signal is used to "kick" a VCPU out of KVM_RUN, this field can be used | |
5839 | to avoid usage of KVM_SET_SIGNAL_MASK, which has worse scalability. | |
5840 | Rather than blocking the signal outside KVM_RUN, userspace can set up | |
5841 | a signal handler that sets run->immediate_exit to a non-zero value. | |
5842 | ||
5843 | This field is ignored if KVM_CAP_IMMEDIATE_EXIT is not available. | |
5844 | ||
106ee47d MCC |
5845 | :: |
5846 | ||
460df4c1 | 5847 | __u8 padding1[6]; |
9c1b96e3 AK |
5848 | |
5849 | /* out */ | |
5850 | __u32 exit_reason; | |
5851 | ||
5852 | When KVM_RUN has returned successfully (return value 0), this informs | |
5853 | application code why KVM_RUN has returned. Allowable values for this | |
5854 | field are detailed below. | |
5855 | ||
106ee47d MCC |
5856 | :: |
5857 | ||
9c1b96e3 AK |
5858 | __u8 ready_for_interrupt_injection; |
5859 | ||
5860 | If request_interrupt_window has been specified, this field indicates | |
5861 | an interrupt can be injected now with KVM_INTERRUPT. | |
5862 | ||
106ee47d MCC |
5863 | :: |
5864 | ||
9c1b96e3 AK |
5865 | __u8 if_flag; |
5866 | ||
5867 | The value of the current interrupt flag. Only valid if in-kernel | |
5868 | local APIC is not used. | |
5869 | ||
106ee47d MCC |
5870 | :: |
5871 | ||
f077825a PB |
5872 | __u16 flags; |
5873 | ||
5874 | More architecture-specific flags detailing state of the VCPU that may | |
96564d77 CQ |
5875 | affect the device's behavior. Current defined flags:: |
5876 | ||
c32b1b89 CQ |
5877 | /* x86, set if the VCPU is in system management mode */ |
5878 | #define KVM_RUN_X86_SMM (1 << 0) | |
5879 | /* x86, set if bus lock detected in VM */ | |
5880 | #define KVM_RUN_BUS_LOCK (1 << 1) | |
18f3976f AE |
5881 | /* arm64, set for KVM_EXIT_DEBUG */ |
5882 | #define KVM_DEBUG_ARCH_HSR_HIGH_VALID (1 << 0) | |
9c1b96e3 | 5883 | |
106ee47d MCC |
5884 | :: |
5885 | ||
9c1b96e3 AK |
5886 | /* in (pre_kvm_run), out (post_kvm_run) */ |
5887 | __u64 cr8; | |
5888 | ||
5889 | The value of the cr8 register. Only valid if in-kernel local APIC is | |
5890 | not used. Both input and output. | |
5891 | ||
106ee47d MCC |
5892 | :: |
5893 | ||
9c1b96e3 AK |
5894 | __u64 apic_base; |
5895 | ||
5896 | The value of the APIC BASE msr. Only valid if in-kernel local | |
5897 | APIC is not used. Both input and output. | |
5898 | ||
106ee47d MCC |
5899 | :: |
5900 | ||
9c1b96e3 AK |
5901 | union { |
5902 | /* KVM_EXIT_UNKNOWN */ | |
5903 | struct { | |
5904 | __u64 hardware_exit_reason; | |
5905 | } hw; | |
5906 | ||
5907 | If exit_reason is KVM_EXIT_UNKNOWN, the vcpu has exited due to unknown | |
5908 | reasons. Further architecture-specific information is available in | |
5909 | hardware_exit_reason. | |
5910 | ||
106ee47d MCC |
5911 | :: |
5912 | ||
9c1b96e3 AK |
5913 | /* KVM_EXIT_FAIL_ENTRY */ |
5914 | struct { | |
5915 | __u64 hardware_entry_failure_reason; | |
1aa561b1 | 5916 | __u32 cpu; /* if KVM_LAST_CPU */ |
9c1b96e3 AK |
5917 | } fail_entry; |
5918 | ||
5919 | If exit_reason is KVM_EXIT_FAIL_ENTRY, the vcpu could not be run due | |
5920 | to unknown reasons. Further architecture-specific information is | |
5921 | available in hardware_entry_failure_reason. | |
5922 | ||
106ee47d MCC |
5923 | :: |
5924 | ||
9c1b96e3 AK |
5925 | /* KVM_EXIT_EXCEPTION */ |
5926 | struct { | |
5927 | __u32 exception; | |
5928 | __u32 error_code; | |
5929 | } ex; | |
5930 | ||
5931 | Unused. | |
5932 | ||
106ee47d MCC |
5933 | :: |
5934 | ||
9c1b96e3 AK |
5935 | /* KVM_EXIT_IO */ |
5936 | struct { | |
106ee47d MCC |
5937 | #define KVM_EXIT_IO_IN 0 |
5938 | #define KVM_EXIT_IO_OUT 1 | |
9c1b96e3 AK |
5939 | __u8 direction; |
5940 | __u8 size; /* bytes */ | |
5941 | __u16 port; | |
5942 | __u32 count; | |
5943 | __u64 data_offset; /* relative to kvm_run start */ | |
5944 | } io; | |
5945 | ||
2044892d | 5946 | If exit_reason is KVM_EXIT_IO, then the vcpu has |
9c1b96e3 AK |
5947 | executed a port I/O instruction which could not be satisfied by kvm. |
5948 | data_offset describes where the data is located (KVM_EXIT_IO_OUT) or | |
5949 | where kvm expects application code to place the data for the next | |
2044892d | 5950 | KVM_RUN invocation (KVM_EXIT_IO_IN). Data format is a packed array. |
9c1b96e3 | 5951 | |
106ee47d MCC |
5952 | :: |
5953 | ||
8ab30c15 | 5954 | /* KVM_EXIT_DEBUG */ |
9c1b96e3 AK |
5955 | struct { |
5956 | struct kvm_debug_exit_arch arch; | |
5957 | } debug; | |
5958 | ||
8ab30c15 AB |
5959 | If the exit_reason is KVM_EXIT_DEBUG, then a vcpu is processing a debug event |
5960 | for which architecture specific information is returned. | |
9c1b96e3 | 5961 | |
106ee47d MCC |
5962 | :: |
5963 | ||
9c1b96e3 AK |
5964 | /* KVM_EXIT_MMIO */ |
5965 | struct { | |
5966 | __u64 phys_addr; | |
5967 | __u8 data[8]; | |
5968 | __u32 len; | |
5969 | __u8 is_write; | |
5970 | } mmio; | |
5971 | ||
2044892d | 5972 | If exit_reason is KVM_EXIT_MMIO, then the vcpu has |
9c1b96e3 AK |
5973 | executed a memory-mapped I/O instruction which could not be satisfied |
5974 | by kvm. The 'data' member contains the written data if 'is_write' is | |
5975 | true, and should be filled by application code otherwise. | |
5976 | ||
6acdb160 CD |
5977 | The 'data' member contains, in its first 'len' bytes, the value as it would |
5978 | appear if the VCPU performed a load or store of the appropriate width directly | |
5979 | to the byte array. | |
5980 | ||
106ee47d MCC |
5981 | .. note:: |
5982 | ||
e1f68169 | 5983 | For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR, KVM_EXIT_XEN, |
1ae09954 AG |
5984 | KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the corresponding |
5985 | operations are complete (and guest state is consistent) only after userspace | |
5986 | has re-entered the kernel with KVM_RUN. The kernel side will first finish | |
e1f68169 DW |
5987 | incomplete operations and then check for pending signals. |
5988 | ||
5989 | The pending state of the operation is not preserved in state which is | |
5990 | visible to userspace, thus userspace should ensure that the operation is | |
5991 | completed before performing a live migration. Userspace can re-enter the | |
5992 | guest with an unmasked signal pending or with the immediate_exit field set | |
5993 | to complete pending operations without allowing any further instructions | |
5994 | to be executed. | |
67961344 | 5995 | |
106ee47d MCC |
5996 | :: |
5997 | ||
9c1b96e3 AK |
5998 | /* KVM_EXIT_HYPERCALL */ |
5999 | struct { | |
6000 | __u64 nr; | |
6001 | __u64 args[6]; | |
6002 | __u64 ret; | |
6003 | __u32 longmode; | |
6004 | __u32 pad; | |
6005 | } hypercall; | |
6006 | ||
647dc49e AK |
6007 | Unused. This was once used for 'hypercall to userspace'. To implement |
6008 | such functionality, use KVM_EXIT_IO (x86) or KVM_EXIT_MMIO (all except s390). | |
106ee47d MCC |
6009 | |
6010 | .. note:: KVM_EXIT_IO is significantly faster than KVM_EXIT_MMIO. | |
6011 | ||
6012 | :: | |
9c1b96e3 AK |
6013 | |
6014 | /* KVM_EXIT_TPR_ACCESS */ | |
6015 | struct { | |
6016 | __u64 rip; | |
6017 | __u32 is_write; | |
6018 | __u32 pad; | |
6019 | } tpr_access; | |
6020 | ||
6021 | To be documented (KVM_TPR_ACCESS_REPORTING). | |
6022 | ||
106ee47d MCC |
6023 | :: |
6024 | ||
9c1b96e3 AK |
6025 | /* KVM_EXIT_S390_SIEIC */ |
6026 | struct { | |
6027 | __u8 icptcode; | |
6028 | __u64 mask; /* psw upper half */ | |
6029 | __u64 addr; /* psw lower half */ | |
6030 | __u16 ipa; | |
6031 | __u32 ipb; | |
6032 | } s390_sieic; | |
6033 | ||
6034 | s390 specific. | |
6035 | ||
106ee47d MCC |
6036 | :: |
6037 | ||
9c1b96e3 | 6038 | /* KVM_EXIT_S390_RESET */ |
106ee47d MCC |
6039 | #define KVM_S390_RESET_POR 1 |
6040 | #define KVM_S390_RESET_CLEAR 2 | |
6041 | #define KVM_S390_RESET_SUBSYSTEM 4 | |
6042 | #define KVM_S390_RESET_CPU_INIT 8 | |
6043 | #define KVM_S390_RESET_IPL 16 | |
9c1b96e3 AK |
6044 | __u64 s390_reset_flags; |
6045 | ||
6046 | s390 specific. | |
6047 | ||
106ee47d MCC |
6048 | :: |
6049 | ||
e168bf8d CO |
6050 | /* KVM_EXIT_S390_UCONTROL */ |
6051 | struct { | |
6052 | __u64 trans_exc_code; | |
6053 | __u32 pgm_code; | |
6054 | } s390_ucontrol; | |
6055 | ||
6056 | s390 specific. A page fault has occurred for a user controlled virtual | |
6057 | machine (KVM_VM_S390_UNCONTROL) on it's host page table that cannot be | |
6058 | resolved by the kernel. | |
6059 | The program code and the translation exception code that were placed | |
6060 | in the cpu's lowcore are presented here as defined by the z Architecture | |
6061 | Principles of Operation Book in the Chapter for Dynamic Address Translation | |
6062 | (DAT) | |
6063 | ||
106ee47d MCC |
6064 | :: |
6065 | ||
9c1b96e3 AK |
6066 | /* KVM_EXIT_DCR */ |
6067 | struct { | |
6068 | __u32 dcrn; | |
6069 | __u32 data; | |
6070 | __u8 is_write; | |
6071 | } dcr; | |
6072 | ||
ce91ddc4 | 6073 | Deprecated - was used for 440 KVM. |
9c1b96e3 | 6074 | |
106ee47d MCC |
6075 | :: |
6076 | ||
ad0a048b AG |
6077 | /* KVM_EXIT_OSI */ |
6078 | struct { | |
6079 | __u64 gprs[32]; | |
6080 | } osi; | |
6081 | ||
6082 | MOL uses a special hypercall interface it calls 'OSI'. To enable it, we catch | |
6083 | hypercalls and exit with this exit struct that contains all the guest gprs. | |
6084 | ||
6085 | If exit_reason is KVM_EXIT_OSI, then the vcpu has triggered such a hypercall. | |
6086 | Userspace can now handle the hypercall and when it's done modify the gprs as | |
6087 | necessary. Upon guest entry all guest GPRs will then be replaced by the values | |
6088 | in this struct. | |
6089 | ||
106ee47d MCC |
6090 | :: |
6091 | ||
de56a948 PM |
6092 | /* KVM_EXIT_PAPR_HCALL */ |
6093 | struct { | |
6094 | __u64 nr; | |
6095 | __u64 ret; | |
6096 | __u64 args[9]; | |
6097 | } papr_hcall; | |
6098 | ||
6099 | This is used on 64-bit PowerPC when emulating a pSeries partition, | |
6100 | e.g. with the 'pseries' machine type in qemu. It occurs when the | |
6101 | guest does a hypercall using the 'sc 1' instruction. The 'nr' field | |
6102 | contains the hypercall number (from the guest R3), and 'args' contains | |
6103 | the arguments (from the guest R4 - R12). Userspace should put the | |
6104 | return code in 'ret' and any extra returned values in args[]. | |
6105 | The possible hypercalls are defined in the Power Architecture Platform | |
6106 | Requirements (PAPR) document available from www.power.org (free | |
6107 | developer registration required to access it). | |
6108 | ||
106ee47d MCC |
6109 | :: |
6110 | ||
fa6b7fe9 CH |
6111 | /* KVM_EXIT_S390_TSCH */ |
6112 | struct { | |
6113 | __u16 subchannel_id; | |
6114 | __u16 subchannel_nr; | |
6115 | __u32 io_int_parm; | |
6116 | __u32 io_int_word; | |
6117 | __u32 ipb; | |
6118 | __u8 dequeued; | |
6119 | } s390_tsch; | |
6120 | ||
6121 | s390 specific. This exit occurs when KVM_CAP_S390_CSS_SUPPORT has been enabled | |
6122 | and TEST SUBCHANNEL was intercepted. If dequeued is set, a pending I/O | |
6123 | interrupt for the target subchannel has been dequeued and subchannel_id, | |
6124 | subchannel_nr, io_int_parm and io_int_word contain the parameters for that | |
6125 | interrupt. ipb is needed for instruction parameter decoding. | |
6126 | ||
106ee47d MCC |
6127 | :: |
6128 | ||
1c810636 AG |
6129 | /* KVM_EXIT_EPR */ |
6130 | struct { | |
6131 | __u32 epr; | |
6132 | } epr; | |
6133 | ||
6134 | On FSL BookE PowerPC chips, the interrupt controller has a fast patch | |
6135 | interrupt acknowledge path to the core. When the core successfully | |
6136 | delivers an interrupt, it automatically populates the EPR register with | |
6137 | the interrupt vector number and acknowledges the interrupt inside | |
6138 | the interrupt controller. | |
6139 | ||
6140 | In case the interrupt controller lives in user space, we need to do | |
6141 | the interrupt acknowledge cycle through it to fetch the next to be | |
6142 | delivered interrupt vector using this exit. | |
6143 | ||
6144 | It gets triggered whenever both KVM_CAP_PPC_EPR are enabled and an | |
6145 | external interrupt has just been delivered into the guest. User space | |
6146 | should put the acknowledged interrupt vector into the 'epr' field. | |
6147 | ||
106ee47d MCC |
6148 | :: |
6149 | ||
8ad6b634 AP |
6150 | /* KVM_EXIT_SYSTEM_EVENT */ |
6151 | struct { | |
106ee47d MCC |
6152 | #define KVM_SYSTEM_EVENT_SHUTDOWN 1 |
6153 | #define KVM_SYSTEM_EVENT_RESET 2 | |
6154 | #define KVM_SYSTEM_EVENT_CRASH 3 | |
7b33a09d | 6155 | #define KVM_SYSTEM_EVENT_WAKEUP 4 |
bfbab445 | 6156 | #define KVM_SYSTEM_EVENT_SUSPEND 5 |
47e8eec8 | 6157 | #define KVM_SYSTEM_EVENT_SEV_TERM 6 |
8ad6b634 | 6158 | __u32 type; |
d495f942 PB |
6159 | __u32 ndata; |
6160 | __u64 data[16]; | |
8ad6b634 AP |
6161 | } system_event; |
6162 | ||
6163 | If exit_reason is KVM_EXIT_SYSTEM_EVENT then the vcpu has triggered | |
6164 | a system-level event using some architecture specific mechanism (hypercall | |
3fbf4207 | 6165 | or some special instruction). In case of ARM64, this is triggered using |
d495f942 | 6166 | HVC instruction based PSCI call from the vcpu. |
8ad6b634 | 6167 | |
d495f942 | 6168 | The 'type' field describes the system-level event type. |
cf5d3188 | 6169 | Valid values for 'type' are: |
106ee47d MCC |
6170 | |
6171 | - KVM_SYSTEM_EVENT_SHUTDOWN -- the guest has requested a shutdown of the | |
cf5d3188 CD |
6172 | VM. Userspace is not obliged to honour this, and if it does honour |
6173 | this does not need to destroy the VM synchronously (ie it may call | |
6174 | KVM_RUN again before shutdown finally occurs). | |
106ee47d | 6175 | - KVM_SYSTEM_EVENT_RESET -- the guest has requested a reset of the VM. |
cf5d3188 CD |
6176 | As with SHUTDOWN, userspace can choose to ignore the request, or |
6177 | to schedule the reset to occur in the future and may call KVM_RUN again. | |
106ee47d | 6178 | - KVM_SYSTEM_EVENT_CRASH -- the guest crash occurred and the guest |
2ce79189 AS |
6179 | has requested a crash condition maintenance. Userspace can choose |
6180 | to ignore the request, or to gather VM memory core dump and/or | |
6181 | reset/shutdown of the VM. | |
c24a950e PG |
6182 | - KVM_SYSTEM_EVENT_SEV_TERM -- an AMD SEV guest requested termination. |
6183 | The guest physical address of the guest's GHCB is stored in `data[0]`. | |
7b33a09d OU |
6184 | - KVM_SYSTEM_EVENT_WAKEUP -- the exiting vCPU is in a suspended state and |
6185 | KVM has recognized a wakeup event. Userspace may honor this event by | |
6186 | marking the exiting vCPU as runnable, or deny it and call KVM_RUN again. | |
bfbab445 OU |
6187 | - KVM_SYSTEM_EVENT_SUSPEND -- the guest has requested a suspension of |
6188 | the VM. | |
cf5d3188 | 6189 | |
d495f942 PB |
6190 | If KVM_CAP_SYSTEM_EVENT_DATA is present, the 'data' field can contain |
6191 | architecture specific information for the system-level event. Only | |
6192 | the first `ndata` items (possibly zero) of the data array are valid. | |
34739fd9 | 6193 | |
d495f942 PB |
6194 | - for arm64, data[0] is set to KVM_SYSTEM_EVENT_RESET_FLAG_PSCI_RESET2 if |
6195 | the guest issued a SYSTEM_RESET2 call according to v1.1 of the PSCI | |
6196 | specification. | |
6197 | ||
6198 | - for RISC-V, data[0] is set to the value of the second argument of the | |
6199 | ``sbi_system_reset`` call. | |
6200 | ||
6201 | Previous versions of Linux defined a `flags` member in this struct. The | |
6202 | field is now aliased to `data[0]`. Userspace can assume that it is only | |
6203 | written if ndata is greater than 0. | |
34739fd9 | 6204 | |
186af6bb PB |
6205 | For arm/arm64: |
6206 | -------------- | |
6207 | ||
6208 | KVM_SYSTEM_EVENT_SUSPEND exits are enabled with the | |
6209 | KVM_CAP_ARM_SYSTEM_SUSPEND VM capability. If a guest invokes the PSCI | |
6210 | SYSTEM_SUSPEND function, KVM will exit to userspace with this event | |
6211 | type. | |
6212 | ||
6213 | It is the sole responsibility of userspace to implement the PSCI | |
6214 | SYSTEM_SUSPEND call according to ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND". | |
6215 | KVM does not change the vCPU's state before exiting to userspace, so | |
6216 | the call parameters are left in-place in the vCPU registers. | |
6217 | ||
6218 | Userspace is _required_ to take action for such an exit. It must | |
6219 | either: | |
6220 | ||
6221 | - Honor the guest request to suspend the VM. Userspace can request | |
6222 | in-kernel emulation of suspension by setting the calling vCPU's | |
6223 | state to KVM_MP_STATE_SUSPENDED. Userspace must configure the vCPU's | |
6224 | state according to the parameters passed to the PSCI function when | |
6225 | the calling vCPU is resumed. See ARM DEN0022D.b 5.19.1 "Intended use" | |
6226 | for details on the function parameters. | |
6227 | ||
6228 | - Deny the guest request to suspend the VM. See ARM DEN0022D.b 5.19.2 | |
6229 | "Caller responsibilities" for possible return values. | |
6230 | ||
106ee47d MCC |
6231 | :: |
6232 | ||
7543a635 SR |
6233 | /* KVM_EXIT_IOAPIC_EOI */ |
6234 | struct { | |
6235 | __u8 vector; | |
6236 | } eoi; | |
6237 | ||
6238 | Indicates that the VCPU's in-kernel local APIC received an EOI for a | |
6239 | level-triggered IOAPIC interrupt. This exit only triggers when the | |
6240 | IOAPIC is implemented in userspace (i.e. KVM_CAP_SPLIT_IRQCHIP is enabled); | |
6241 | the userspace IOAPIC should process the EOI and retrigger the interrupt if | |
6242 | it is still asserted. Vector is the LAPIC interrupt vector for which the | |
6243 | EOI was received. | |
6244 | ||
106ee47d MCC |
6245 | :: |
6246 | ||
db397571 | 6247 | struct kvm_hyperv_exit { |
106ee47d MCC |
6248 | #define KVM_EXIT_HYPERV_SYNIC 1 |
6249 | #define KVM_EXIT_HYPERV_HCALL 2 | |
f97f5a56 | 6250 | #define KVM_EXIT_HYPERV_SYNDBG 3 |
db397571 | 6251 | __u32 type; |
f7d31e65 | 6252 | __u32 pad1; |
db397571 AS |
6253 | union { |
6254 | struct { | |
6255 | __u32 msr; | |
f7d31e65 | 6256 | __u32 pad2; |
db397571 AS |
6257 | __u64 control; |
6258 | __u64 evt_page; | |
6259 | __u64 msg_page; | |
6260 | } synic; | |
83326e43 AS |
6261 | struct { |
6262 | __u64 input; | |
6263 | __u64 result; | |
6264 | __u64 params[2]; | |
6265 | } hcall; | |
f97f5a56 JD |
6266 | struct { |
6267 | __u32 msr; | |
6268 | __u32 pad2; | |
6269 | __u64 control; | |
6270 | __u64 status; | |
6271 | __u64 send_page; | |
6272 | __u64 recv_page; | |
6273 | __u64 pending_page; | |
6274 | } syndbg; | |
db397571 AS |
6275 | } u; |
6276 | }; | |
6277 | /* KVM_EXIT_HYPERV */ | |
6278 | struct kvm_hyperv_exit hyperv; | |
106ee47d | 6279 | |
db397571 AS |
6280 | Indicates that the VCPU exits into userspace to process some tasks |
6281 | related to Hyper-V emulation. | |
106ee47d | 6282 | |
db397571 | 6283 | Valid values for 'type' are: |
106ee47d MCC |
6284 | |
6285 | - KVM_EXIT_HYPERV_SYNIC -- synchronously notify user-space about | |
6286 | ||
db397571 AS |
6287 | Hyper-V SynIC state change. Notification is used to remap SynIC |
6288 | event/message pages and to enable/disable SynIC messages/events processing | |
6289 | in userspace. | |
6290 | ||
f97f5a56 JD |
6291 | - KVM_EXIT_HYPERV_SYNDBG -- synchronously notify user-space about |
6292 | ||
6293 | Hyper-V Synthetic debugger state change. Notification is used to either update | |
6294 | the pending_page location or to send a control command (send the buffer located | |
6295 | in send_page or recv a buffer to recv_page). | |
6296 | ||
106ee47d MCC |
6297 | :: |
6298 | ||
c726200d CD |
6299 | /* KVM_EXIT_ARM_NISV */ |
6300 | struct { | |
6301 | __u64 esr_iss; | |
6302 | __u64 fault_ipa; | |
6303 | } arm_nisv; | |
6304 | ||
3fbf4207 | 6305 | Used on arm64 systems. If a guest accesses memory not in a memslot, |
c726200d CD |
6306 | KVM will typically return to userspace and ask it to do MMIO emulation on its |
6307 | behalf. However, for certain classes of instructions, no instruction decode | |
6308 | (direction, length of memory access) is provided, and fetching and decoding | |
6309 | the instruction from the VM is overly complicated to live in the kernel. | |
6310 | ||
6311 | Historically, when this situation occurred, KVM would print a warning and kill | |
6312 | the VM. KVM assumed that if the guest accessed non-memslot memory, it was | |
6313 | trying to do I/O, which just couldn't be emulated, and the warning message was | |
6314 | phrased accordingly. However, what happened more often was that a guest bug | |
6315 | caused access outside the guest memory areas which should lead to a more | |
6316 | meaningful warning message and an external abort in the guest, if the access | |
6317 | did not fall within an I/O window. | |
6318 | ||
6319 | Userspace implementations can query for KVM_CAP_ARM_NISV_TO_USER, and enable | |
6320 | this capability at VM creation. Once this is done, these types of errors will | |
6321 | instead return to userspace with KVM_EXIT_ARM_NISV, with the valid bits from | |
3fbf4207 OU |
6322 | the ESR_EL2 in the esr_iss field, and the faulting IPA in the fault_ipa field. |
6323 | Userspace can either fix up the access if it's actually an I/O access by | |
6324 | decoding the instruction from guest memory (if it's very brave) and continue | |
6325 | executing the guest, or it can decide to suspend, dump, or restart the guest. | |
c726200d CD |
6326 | |
6327 | Note that KVM does not skip the faulting instruction as it does for | |
6328 | KVM_EXIT_MMIO, but userspace has to emulate any change to the processing state | |
6329 | if it decides to decode and emulate the instruction. | |
6330 | ||
1ae09954 AG |
6331 | :: |
6332 | ||
6333 | /* KVM_EXIT_X86_RDMSR / KVM_EXIT_X86_WRMSR */ | |
6334 | struct { | |
6335 | __u8 error; /* user -> kernel */ | |
6336 | __u8 pad[7]; | |
6337 | __u32 reason; /* kernel -> user */ | |
6338 | __u32 index; /* kernel -> user */ | |
6339 | __u64 data; /* kernel <-> user */ | |
6340 | } msr; | |
6341 | ||
6342 | Used on x86 systems. When the VM capability KVM_CAP_X86_USER_SPACE_MSR is | |
6343 | enabled, MSR accesses to registers that would invoke a #GP by KVM kernel code | |
6344 | will instead trigger a KVM_EXIT_X86_RDMSR exit for reads and KVM_EXIT_X86_WRMSR | |
6345 | exit for writes. | |
6346 | ||
6347 | The "reason" field specifies why the MSR trap occurred. User space will only | |
6348 | receive MSR exit traps when a particular reason was requested during through | |
6349 | ENABLE_CAP. Currently valid exit reasons are: | |
6350 | ||
6351 | KVM_MSR_EXIT_REASON_UNKNOWN - access to MSR that is unknown to KVM | |
6352 | KVM_MSR_EXIT_REASON_INVAL - access to invalid MSRs or reserved bits | |
1a155254 | 6353 | KVM_MSR_EXIT_REASON_FILTER - access blocked by KVM_X86_SET_MSR_FILTER |
1ae09954 AG |
6354 | |
6355 | For KVM_EXIT_X86_RDMSR, the "index" field tells user space which MSR the guest | |
6356 | wants to read. To respond to this request with a successful read, user space | |
6357 | writes the respective data into the "data" field and must continue guest | |
6358 | execution to ensure the read data is transferred into guest register state. | |
6359 | ||
6360 | If the RDMSR request was unsuccessful, user space indicates that with a "1" in | |
6361 | the "error" field. This will inject a #GP into the guest when the VCPU is | |
6362 | executed again. | |
6363 | ||
6364 | For KVM_EXIT_X86_WRMSR, the "index" field tells user space which MSR the guest | |
6365 | wants to write. Once finished processing the event, user space must continue | |
6366 | vCPU execution. If the MSR write was unsuccessful, user space also sets the | |
6367 | "error" field to "1". | |
6368 | ||
e1f68169 DW |
6369 | :: |
6370 | ||
6371 | ||
6372 | struct kvm_xen_exit { | |
6373 | #define KVM_EXIT_XEN_HCALL 1 | |
6374 | __u32 type; | |
6375 | union { | |
6376 | struct { | |
6377 | __u32 longmode; | |
6378 | __u32 cpl; | |
6379 | __u64 input; | |
6380 | __u64 result; | |
6381 | __u64 params[6]; | |
6382 | } hcall; | |
6383 | } u; | |
6384 | }; | |
6385 | /* KVM_EXIT_XEN */ | |
6386 | struct kvm_hyperv_exit xen; | |
6387 | ||
6388 | Indicates that the VCPU exits into userspace to process some tasks | |
6389 | related to Xen emulation. | |
6390 | ||
6391 | Valid values for 'type' are: | |
6392 | ||
6393 | - KVM_EXIT_XEN_HCALL -- synchronously notify user-space about Xen hypercall. | |
6394 | Userspace is expected to place the hypercall result into the appropriate | |
6395 | field before invoking KVM_RUN again. | |
6396 | ||
da40d858 AP |
6397 | :: |
6398 | ||
6399 | /* KVM_EXIT_RISCV_SBI */ | |
6400 | struct { | |
6401 | unsigned long extension_id; | |
6402 | unsigned long function_id; | |
6403 | unsigned long args[6]; | |
6404 | unsigned long ret[2]; | |
6405 | } riscv_sbi; | |
c1be1ef1 | 6406 | |
da40d858 AP |
6407 | If exit reason is KVM_EXIT_RISCV_SBI then it indicates that the VCPU has |
6408 | done a SBI call which is not handled by KVM RISC-V kernel module. The details | |
6409 | of the SBI call are available in 'riscv_sbi' member of kvm_run structure. The | |
6410 | 'extension_id' field of 'riscv_sbi' represents SBI extension ID whereas the | |
6411 | 'function_id' field represents function ID of given SBI extension. The 'args' | |
6412 | array field of 'riscv_sbi' represents parameters for the SBI call and 'ret' | |
6413 | array field represents return values. The userspace should update the return | |
6414 | values of SBI call before resuming the VCPU. For more details on RISC-V SBI | |
6415 | spec refer, https://github.com/riscv/riscv-sbi-doc. | |
6416 | ||
106ee47d MCC |
6417 | :: |
6418 | ||
9c1b96e3 AK |
6419 | /* Fix the size of the union. */ |
6420 | char padding[256]; | |
6421 | }; | |
b9e5dc8d CB |
6422 | |
6423 | /* | |
6424 | * shared registers between kvm and userspace. | |
6425 | * kvm_valid_regs specifies the register classes set by the host | |
6426 | * kvm_dirty_regs specified the register classes dirtied by userspace | |
6427 | * struct kvm_sync_regs is architecture specific, as well as the | |
6428 | * bits for kvm_valid_regs and kvm_dirty_regs | |
6429 | */ | |
6430 | __u64 kvm_valid_regs; | |
6431 | __u64 kvm_dirty_regs; | |
6432 | union { | |
6433 | struct kvm_sync_regs regs; | |
7b7e3952 | 6434 | char padding[SYNC_REGS_SIZE_BYTES]; |
b9e5dc8d CB |
6435 | } s; |
6436 | ||
6437 | If KVM_CAP_SYNC_REGS is defined, these fields allow userspace to access | |
6438 | certain guest registers without having to call SET/GET_*REGS. Thus we can | |
6439 | avoid some system call overhead if userspace has to handle the exit. | |
6440 | Userspace can query the validity of the structure by checking | |
6441 | kvm_valid_regs for specific bits. These bits are architecture specific | |
6442 | and usually define the validity of a groups of registers. (e.g. one bit | |
106ee47d | 6443 | for general purpose registers) |
b9e5dc8d | 6444 | |
d8482c0d DH |
6445 | Please note that the kernel is allowed to use the kvm_run structure as the |
6446 | primary storage for certain register types. Therefore, the kernel may use the | |
6447 | values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set. | |
6448 | ||
106ee47d MCC |
6449 | :: |
6450 | ||
6451 | }; | |
821246a5 | 6452 | |
414fa985 | 6453 | |
9c15bb1d | 6454 | |
699a0ea0 | 6455 | 6. Capabilities that can be enabled on vCPUs |
106ee47d | 6456 | ============================================ |
821246a5 | 6457 | |
0907c855 CH |
6458 | There are certain capabilities that change the behavior of the virtual CPU or |
6459 | the virtual machine when enabled. To enable them, please see section 4.37. | |
6460 | Below you can find a list of capabilities and what their effect on the vCPU or | |
6461 | the virtual machine is when enabling them. | |
821246a5 AG |
6462 | |
6463 | The following information is provided along with the description: | |
6464 | ||
106ee47d MCC |
6465 | Architectures: |
6466 | which instruction set architectures provide this ioctl. | |
821246a5 AG |
6467 | x86 includes both i386 and x86_64. |
6468 | ||
106ee47d MCC |
6469 | Target: |
6470 | whether this is a per-vcpu or per-vm capability. | |
0907c855 | 6471 | |
106ee47d MCC |
6472 | Parameters: |
6473 | what parameters are accepted by the capability. | |
821246a5 | 6474 | |
106ee47d MCC |
6475 | Returns: |
6476 | the return value. General error numbers (EBADF, ENOMEM, EINVAL) | |
821246a5 AG |
6477 | are not detailed, but errors with specific meanings are. |
6478 | ||
414fa985 | 6479 | |
821246a5 | 6480 | 6.1 KVM_CAP_PPC_OSI |
106ee47d | 6481 | ------------------- |
821246a5 | 6482 | |
106ee47d MCC |
6483 | :Architectures: ppc |
6484 | :Target: vcpu | |
6485 | :Parameters: none | |
6486 | :Returns: 0 on success; -1 on error | |
821246a5 AG |
6487 | |
6488 | This capability enables interception of OSI hypercalls that otherwise would | |
6489 | be treated as normal system calls to be injected into the guest. OSI hypercalls | |
6490 | were invented by Mac-on-Linux to have a standardized communication mechanism | |
6491 | between the guest and the host. | |
6492 | ||
6493 | When this capability is enabled, KVM_EXIT_OSI can occur. | |
6494 | ||
414fa985 | 6495 | |
821246a5 | 6496 | 6.2 KVM_CAP_PPC_PAPR |
106ee47d | 6497 | -------------------- |
821246a5 | 6498 | |
106ee47d MCC |
6499 | :Architectures: ppc |
6500 | :Target: vcpu | |
6501 | :Parameters: none | |
6502 | :Returns: 0 on success; -1 on error | |
821246a5 AG |
6503 | |
6504 | This capability enables interception of PAPR hypercalls. PAPR hypercalls are | |
6505 | done using the hypercall instruction "sc 1". | |
6506 | ||
6507 | It also sets the guest privilege level to "supervisor" mode. Usually the guest | |
6508 | runs in "hypervisor" privilege mode with a few missing features. | |
6509 | ||
6510 | In addition to the above, it changes the semantics of SDR1. In this mode, the | |
6511 | HTAB address part of SDR1 contains an HVA instead of a GPA, as PAPR keeps the | |
6512 | HTAB invisible to the guest. | |
6513 | ||
6514 | When this capability is enabled, KVM_EXIT_PAPR_HCALL can occur. | |
dc83b8bc | 6515 | |
414fa985 | 6516 | |
dc83b8bc | 6517 | 6.3 KVM_CAP_SW_TLB |
106ee47d MCC |
6518 | ------------------ |
6519 | ||
6520 | :Architectures: ppc | |
6521 | :Target: vcpu | |
6522 | :Parameters: args[0] is the address of a struct kvm_config_tlb | |
6523 | :Returns: 0 on success; -1 on error | |
dc83b8bc | 6524 | |
106ee47d | 6525 | :: |
dc83b8bc | 6526 | |
106ee47d | 6527 | struct kvm_config_tlb { |
dc83b8bc SW |
6528 | __u64 params; |
6529 | __u64 array; | |
6530 | __u32 mmu_type; | |
6531 | __u32 array_len; | |
106ee47d | 6532 | }; |
dc83b8bc SW |
6533 | |
6534 | Configures the virtual CPU's TLB array, establishing a shared memory area | |
6535 | between userspace and KVM. The "params" and "array" fields are userspace | |
6536 | addresses of mmu-type-specific data structures. The "array_len" field is an | |
6537 | safety mechanism, and should be set to the size in bytes of the memory that | |
6538 | userspace has reserved for the array. It must be at least the size dictated | |
6539 | by "mmu_type" and "params". | |
6540 | ||
6541 | While KVM_RUN is active, the shared region is under control of KVM. Its | |
6542 | contents are undefined, and any modification by userspace results in | |
6543 | boundedly undefined behavior. | |
6544 | ||
6545 | On return from KVM_RUN, the shared region will reflect the current state of | |
6546 | the guest's TLB. If userspace makes any changes, it must call KVM_DIRTY_TLB | |
6547 | to tell KVM which entries have been changed, prior to calling KVM_RUN again | |
6548 | on this vcpu. | |
6549 | ||
6550 | For mmu types KVM_MMU_FSL_BOOKE_NOHV and KVM_MMU_FSL_BOOKE_HV: | |
106ee47d | 6551 | |
dc83b8bc SW |
6552 | - The "params" field is of type "struct kvm_book3e_206_tlb_params". |
6553 | - The "array" field points to an array of type "struct | |
6554 | kvm_book3e_206_tlb_entry". | |
6555 | - The array consists of all entries in the first TLB, followed by all | |
6556 | entries in the second TLB. | |
6557 | - Within a TLB, entries are ordered first by increasing set number. Within a | |
6558 | set, entries are ordered by way (increasing ESEL). | |
6559 | - The hash for determining set number in TLB0 is: (MAS2 >> 12) & (num_sets - 1) | |
6560 | where "num_sets" is the tlb_sizes[] value divided by the tlb_ways[] value. | |
6561 | - The tsize field of mas1 shall be set to 4K on TLB0, even though the | |
6562 | hardware ignores this value for TLB0. | |
fa6b7fe9 CH |
6563 | |
6564 | 6.4 KVM_CAP_S390_CSS_SUPPORT | |
106ee47d | 6565 | ---------------------------- |
fa6b7fe9 | 6566 | |
106ee47d MCC |
6567 | :Architectures: s390 |
6568 | :Target: vcpu | |
6569 | :Parameters: none | |
6570 | :Returns: 0 on success; -1 on error | |
fa6b7fe9 CH |
6571 | |
6572 | This capability enables support for handling of channel I/O instructions. | |
6573 | ||
6574 | TEST PENDING INTERRUPTION and the interrupt portion of TEST SUBCHANNEL are | |
6575 | handled in-kernel, while the other I/O instructions are passed to userspace. | |
6576 | ||
6577 | When this capability is enabled, KVM_EXIT_S390_TSCH will occur on TEST | |
6578 | SUBCHANNEL intercepts. | |
1c810636 | 6579 | |
0907c855 CH |
6580 | Note that even though this capability is enabled per-vcpu, the complete |
6581 | virtual machine is affected. | |
6582 | ||
1c810636 | 6583 | 6.5 KVM_CAP_PPC_EPR |
106ee47d | 6584 | ------------------- |
1c810636 | 6585 | |
106ee47d MCC |
6586 | :Architectures: ppc |
6587 | :Target: vcpu | |
6588 | :Parameters: args[0] defines whether the proxy facility is active | |
6589 | :Returns: 0 on success; -1 on error | |
1c810636 AG |
6590 | |
6591 | This capability enables or disables the delivery of interrupts through the | |
6592 | external proxy facility. | |
6593 | ||
6594 | When enabled (args[0] != 0), every time the guest gets an external interrupt | |
6595 | delivered, it automatically exits into user space with a KVM_EXIT_EPR exit | |
6596 | to receive the topmost interrupt vector. | |
6597 | ||
6598 | When disabled (args[0] == 0), behavior is as if this facility is unsupported. | |
6599 | ||
6600 | When this capability is enabled, KVM_EXIT_EPR can occur. | |
eb1e4f43 SW |
6601 | |
6602 | 6.6 KVM_CAP_IRQ_MPIC | |
106ee47d | 6603 | -------------------- |
eb1e4f43 | 6604 | |
106ee47d MCC |
6605 | :Architectures: ppc |
6606 | :Parameters: args[0] is the MPIC device fd; | |
6607 | args[1] is the MPIC CPU number for this vcpu | |
eb1e4f43 SW |
6608 | |
6609 | This capability connects the vcpu to an in-kernel MPIC device. | |
5975a2e0 PM |
6610 | |
6611 | 6.7 KVM_CAP_IRQ_XICS | |
106ee47d | 6612 | -------------------- |
5975a2e0 | 6613 | |
106ee47d MCC |
6614 | :Architectures: ppc |
6615 | :Target: vcpu | |
6616 | :Parameters: args[0] is the XICS device fd; | |
6617 | args[1] is the XICS CPU number (server ID) for this vcpu | |
5975a2e0 PM |
6618 | |
6619 | This capability connects the vcpu to an in-kernel XICS device. | |
8a366a4b CH |
6620 | |
6621 | 6.8 KVM_CAP_S390_IRQCHIP | |
106ee47d | 6622 | ------------------------ |
8a366a4b | 6623 | |
106ee47d MCC |
6624 | :Architectures: s390 |
6625 | :Target: vm | |
6626 | :Parameters: none | |
8a366a4b CH |
6627 | |
6628 | This capability enables the in-kernel irqchip for s390. Please refer to | |
6629 | "4.24 KVM_CREATE_IRQCHIP" for details. | |
699a0ea0 | 6630 | |
5fafd874 | 6631 | 6.9 KVM_CAP_MIPS_FPU |
106ee47d | 6632 | -------------------- |
5fafd874 | 6633 | |
106ee47d MCC |
6634 | :Architectures: mips |
6635 | :Target: vcpu | |
6636 | :Parameters: args[0] is reserved for future use (should be 0). | |
5fafd874 JH |
6637 | |
6638 | This capability allows the use of the host Floating Point Unit by the guest. It | |
6639 | allows the Config1.FP bit to be set to enable the FPU in the guest. Once this is | |
106ee47d MCC |
6640 | done the ``KVM_REG_MIPS_FPR_*`` and ``KVM_REG_MIPS_FCR_*`` registers can be |
6641 | accessed (depending on the current guest FPU register mode), and the Status.FR, | |
5fafd874 JH |
6642 | Config5.FRE bits are accessible via the KVM API and also from the guest, |
6643 | depending on them being supported by the FPU. | |
6644 | ||
d952bd07 | 6645 | 6.10 KVM_CAP_MIPS_MSA |
106ee47d | 6646 | --------------------- |
d952bd07 | 6647 | |
106ee47d MCC |
6648 | :Architectures: mips |
6649 | :Target: vcpu | |
6650 | :Parameters: args[0] is reserved for future use (should be 0). | |
d952bd07 JH |
6651 | |
6652 | This capability allows the use of the MIPS SIMD Architecture (MSA) by the guest. | |
6653 | It allows the Config3.MSAP bit to be set to enable the use of MSA by the guest. | |
106ee47d MCC |
6654 | Once this is done the ``KVM_REG_MIPS_VEC_*`` and ``KVM_REG_MIPS_MSA_*`` |
6655 | registers can be accessed, and the Config5.MSAEn bit is accessible via the | |
6656 | KVM API and also from the guest. | |
d952bd07 | 6657 | |
01643c51 | 6658 | 6.74 KVM_CAP_SYNC_REGS |
106ee47d MCC |
6659 | ---------------------- |
6660 | ||
6661 | :Architectures: s390, x86 | |
6662 | :Target: s390: always enabled, x86: vcpu | |
6663 | :Parameters: none | |
6664 | :Returns: x86: KVM_CHECK_EXTENSION returns a bit-array indicating which register | |
6665 | sets are supported | |
6666 | (bitfields defined in arch/x86/include/uapi/asm/kvm.h). | |
01643c51 KH |
6667 | |
6668 | As described above in the kvm_sync_regs struct info in section 5 (kvm_run): | |
6669 | KVM_CAP_SYNC_REGS "allow[s] userspace to access certain guest registers | |
6670 | without having to call SET/GET_*REGS". This reduces overhead by eliminating | |
6671 | repeated ioctl calls for setting and/or getting register values. This is | |
6672 | particularly important when userspace is making synchronous guest state | |
6673 | modifications, e.g. when emulating and/or intercepting instructions in | |
6674 | userspace. | |
6675 | ||
6676 | For s390 specifics, please refer to the source code. | |
6677 | ||
6678 | For x86: | |
106ee47d | 6679 | |
01643c51 KH |
6680 | - the register sets to be copied out to kvm_run are selectable |
6681 | by userspace (rather that all sets being copied out for every exit). | |
6682 | - vcpu_events are available in addition to regs and sregs. | |
6683 | ||
6684 | For x86, the 'kvm_valid_regs' field of struct kvm_run is overloaded to | |
6685 | function as an input bit-array field set by userspace to indicate the | |
6686 | specific register sets to be copied out on the next exit. | |
6687 | ||
6688 | To indicate when userspace has modified values that should be copied into | |
6689 | the vCPU, the all architecture bitarray field, 'kvm_dirty_regs' must be set. | |
6690 | This is done using the same bitflags as for the 'kvm_valid_regs' field. | |
6691 | If the dirty bit is not set, then the register set values will not be copied | |
6692 | into the vCPU even if they've been modified. | |
6693 | ||
6694 | Unused bitfields in the bitarrays must be set to zero. | |
6695 | ||
106ee47d MCC |
6696 | :: |
6697 | ||
6698 | struct kvm_sync_regs { | |
01643c51 KH |
6699 | struct kvm_regs regs; |
6700 | struct kvm_sregs sregs; | |
6701 | struct kvm_vcpu_events events; | |
106ee47d | 6702 | }; |
01643c51 | 6703 | |
eacc56bb | 6704 | 6.75 KVM_CAP_PPC_IRQ_XIVE |
106ee47d | 6705 | ------------------------- |
eacc56bb | 6706 | |
106ee47d MCC |
6707 | :Architectures: ppc |
6708 | :Target: vcpu | |
6709 | :Parameters: args[0] is the XIVE device fd; | |
6710 | args[1] is the XIVE CPU number (server ID) for this vcpu | |
eacc56bb CLG |
6711 | |
6712 | This capability connects the vcpu to an in-kernel XIVE device. | |
6713 | ||
699a0ea0 | 6714 | 7. Capabilities that can be enabled on VMs |
106ee47d | 6715 | ========================================== |
699a0ea0 PM |
6716 | |
6717 | There are certain capabilities that change the behavior of the virtual | |
6718 | machine when enabled. To enable them, please see section 4.37. Below | |
6719 | you can find a list of capabilities and what their effect on the VM | |
6720 | is when enabling them. | |
6721 | ||
6722 | The following information is provided along with the description: | |
6723 | ||
106ee47d MCC |
6724 | Architectures: |
6725 | which instruction set architectures provide this ioctl. | |
699a0ea0 PM |
6726 | x86 includes both i386 and x86_64. |
6727 | ||
106ee47d MCC |
6728 | Parameters: |
6729 | what parameters are accepted by the capability. | |
699a0ea0 | 6730 | |
106ee47d MCC |
6731 | Returns: |
6732 | the return value. General error numbers (EBADF, ENOMEM, EINVAL) | |
699a0ea0 PM |
6733 | are not detailed, but errors with specific meanings are. |
6734 | ||
6735 | ||
6736 | 7.1 KVM_CAP_PPC_ENABLE_HCALL | |
106ee47d | 6737 | ---------------------------- |
699a0ea0 | 6738 | |
106ee47d MCC |
6739 | :Architectures: ppc |
6740 | :Parameters: args[0] is the sPAPR hcall number; | |
6741 | args[1] is 0 to disable, 1 to enable in-kernel handling | |
699a0ea0 PM |
6742 | |
6743 | This capability controls whether individual sPAPR hypercalls (hcalls) | |
6744 | get handled by the kernel or not. Enabling or disabling in-kernel | |
6745 | handling of an hcall is effective across the VM. On creation, an | |
6746 | initial set of hcalls are enabled for in-kernel handling, which | |
6747 | consists of those hcalls for which in-kernel handlers were implemented | |
6748 | before this capability was implemented. If disabled, the kernel will | |
6749 | not to attempt to handle the hcall, but will always exit to userspace | |
6750 | to handle it. Note that it may not make sense to enable some and | |
6751 | disable others of a group of related hcalls, but KVM does not prevent | |
6752 | userspace from doing that. | |
ae2113a4 PM |
6753 | |
6754 | If the hcall number specified is not one that has an in-kernel | |
6755 | implementation, the KVM_ENABLE_CAP ioctl will fail with an EINVAL | |
6756 | error. | |
2444b352 DH |
6757 | |
6758 | 7.2 KVM_CAP_S390_USER_SIGP | |
106ee47d | 6759 | -------------------------- |
2444b352 | 6760 | |
106ee47d MCC |
6761 | :Architectures: s390 |
6762 | :Parameters: none | |
2444b352 DH |
6763 | |
6764 | This capability controls which SIGP orders will be handled completely in user | |
6765 | space. With this capability enabled, all fast orders will be handled completely | |
6766 | in the kernel: | |
106ee47d | 6767 | |
2444b352 DH |
6768 | - SENSE |
6769 | - SENSE RUNNING | |
6770 | - EXTERNAL CALL | |
6771 | - EMERGENCY SIGNAL | |
6772 | - CONDITIONAL EMERGENCY SIGNAL | |
6773 | ||
6774 | All other orders will be handled completely in user space. | |
6775 | ||
6776 | Only privileged operation exceptions will be checked for in the kernel (or even | |
6777 | in the hardware prior to interception). If this capability is not enabled, the | |
6778 | old way of handling SIGP orders is used (partially in kernel and user space). | |
68c55750 EF |
6779 | |
6780 | 7.3 KVM_CAP_S390_VECTOR_REGISTERS | |
106ee47d | 6781 | --------------------------------- |
68c55750 | 6782 | |
106ee47d MCC |
6783 | :Architectures: s390 |
6784 | :Parameters: none | |
6785 | :Returns: 0 on success, negative value on error | |
68c55750 EF |
6786 | |
6787 | Allows use of the vector registers introduced with z13 processor, and | |
6788 | provides for the synchronization between host and user space. Will | |
6789 | return -EINVAL if the machine does not support vectors. | |
e44fc8c9 ET |
6790 | |
6791 | 7.4 KVM_CAP_S390_USER_STSI | |
106ee47d | 6792 | -------------------------- |
e44fc8c9 | 6793 | |
106ee47d MCC |
6794 | :Architectures: s390 |
6795 | :Parameters: none | |
e44fc8c9 ET |
6796 | |
6797 | This capability allows post-handlers for the STSI instruction. After | |
6798 | initial handling in the kernel, KVM exits to user space with | |
6799 | KVM_EXIT_S390_STSI to allow user space to insert further data. | |
6800 | ||
6801 | Before exiting to userspace, kvm handlers should fill in s390_stsi field of | |
106ee47d MCC |
6802 | vcpu->run:: |
6803 | ||
6804 | struct { | |
e44fc8c9 ET |
6805 | __u64 addr; |
6806 | __u8 ar; | |
6807 | __u8 reserved; | |
6808 | __u8 fc; | |
6809 | __u8 sel1; | |
6810 | __u16 sel2; | |
106ee47d | 6811 | } s390_stsi; |
e44fc8c9 | 6812 | |
106ee47d MCC |
6813 | @addr - guest address of STSI SYSIB |
6814 | @fc - function code | |
6815 | @sel1 - selector 1 | |
6816 | @sel2 - selector 2 | |
6817 | @ar - access register number | |
e44fc8c9 ET |
6818 | |
6819 | KVM handlers should exit to userspace with rc = -EREMOTE. | |
e928e9cb | 6820 | |
49df6397 | 6821 | 7.5 KVM_CAP_SPLIT_IRQCHIP |
106ee47d | 6822 | ------------------------- |
49df6397 | 6823 | |
106ee47d MCC |
6824 | :Architectures: x86 |
6825 | :Parameters: args[0] - number of routes reserved for userspace IOAPICs | |
6826 | :Returns: 0 on success, -1 on error | |
49df6397 SR |
6827 | |
6828 | Create a local apic for each processor in the kernel. This can be used | |
6829 | instead of KVM_CREATE_IRQCHIP if the userspace VMM wishes to emulate the | |
6830 | IOAPIC and PIC (and also the PIT, even though this has to be enabled | |
6831 | separately). | |
6832 | ||
b053b2ae SR |
6833 | This capability also enables in kernel routing of interrupt requests; |
6834 | when KVM_CAP_SPLIT_IRQCHIP only routes of KVM_IRQ_ROUTING_MSI type are | |
6835 | used in the IRQ routing table. The first args[0] MSI routes are reserved | |
6836 | for the IOAPIC pins. Whenever the LAPIC receives an EOI for these routes, | |
6837 | a KVM_EXIT_IOAPIC_EOI vmexit will be reported to userspace. | |
49df6397 SR |
6838 | |
6839 | Fails if VCPU has already been created, or if the irqchip is already in the | |
6840 | kernel (i.e. KVM_CREATE_IRQCHIP has already been called). | |
6841 | ||
051c87f7 | 6842 | 7.6 KVM_CAP_S390_RI |
106ee47d | 6843 | ------------------- |
051c87f7 | 6844 | |
106ee47d MCC |
6845 | :Architectures: s390 |
6846 | :Parameters: none | |
051c87f7 DH |
6847 | |
6848 | Allows use of runtime-instrumentation introduced with zEC12 processor. | |
6849 | Will return -EINVAL if the machine does not support runtime-instrumentation. | |
6850 | Will return -EBUSY if a VCPU has already been created. | |
e928e9cb | 6851 | |
37131313 | 6852 | 7.7 KVM_CAP_X2APIC_API |
106ee47d | 6853 | ---------------------- |
37131313 | 6854 | |
106ee47d MCC |
6855 | :Architectures: x86 |
6856 | :Parameters: args[0] - features that should be enabled | |
6857 | :Returns: 0 on success, -EINVAL when args[0] contains invalid features | |
37131313 | 6858 | |
106ee47d | 6859 | Valid feature flags in args[0] are:: |
37131313 | 6860 | |
106ee47d MCC |
6861 | #define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0) |
6862 | #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1) | |
37131313 RK |
6863 | |
6864 | Enabling KVM_X2APIC_API_USE_32BIT_IDS changes the behavior of | |
6865 | KVM_SET_GSI_ROUTING, KVM_SIGNAL_MSI, KVM_SET_LAPIC, and KVM_GET_LAPIC, | |
6866 | allowing the use of 32-bit APIC IDs. See KVM_CAP_X2APIC_API in their | |
6867 | respective sections. | |
6868 | ||
c519265f RK |
6869 | KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK must be enabled for x2APIC to work |
6870 | in logical mode or with more than 255 VCPUs. Otherwise, KVM treats 0xff | |
6871 | as a broadcast even in x2APIC mode in order to support physical x2APIC | |
6872 | without interrupt remapping. This is undesirable in logical mode, | |
6873 | where 0xff represents CPUs 0-7 in cluster 0. | |
37131313 | 6874 | |
6502a34c | 6875 | 7.8 KVM_CAP_S390_USER_INSTR0 |
106ee47d | 6876 | ---------------------------- |
6502a34c | 6877 | |
106ee47d MCC |
6878 | :Architectures: s390 |
6879 | :Parameters: none | |
6502a34c DH |
6880 | |
6881 | With this capability enabled, all illegal instructions 0x0000 (2 bytes) will | |
6882 | be intercepted and forwarded to user space. User space can use this | |
6883 | mechanism e.g. to realize 2-byte software breakpoints. The kernel will | |
6884 | not inject an operating exception for these instructions, user space has | |
6885 | to take care of that. | |
6886 | ||
6887 | This capability can be enabled dynamically even if VCPUs were already | |
6888 | created and are running. | |
37131313 | 6889 | |
4e0b1ab7 | 6890 | 7.9 KVM_CAP_S390_GS |
106ee47d | 6891 | ------------------- |
4e0b1ab7 | 6892 | |
106ee47d MCC |
6893 | :Architectures: s390 |
6894 | :Parameters: none | |
6895 | :Returns: 0 on success; -EINVAL if the machine does not support | |
6896 | guarded storage; -EBUSY if a VCPU has already been created. | |
4e0b1ab7 FZ |
6897 | |
6898 | Allows use of guarded storage for the KVM guest. | |
6899 | ||
47a4693e | 6900 | 7.10 KVM_CAP_S390_AIS |
106ee47d | 6901 | --------------------- |
47a4693e | 6902 | |
106ee47d MCC |
6903 | :Architectures: s390 |
6904 | :Parameters: none | |
47a4693e YMZ |
6905 | |
6906 | Allow use of adapter-interruption suppression. | |
106ee47d | 6907 | :Returns: 0 on success; -EBUSY if a VCPU has already been created. |
47a4693e | 6908 | |
3c313524 | 6909 | 7.11 KVM_CAP_PPC_SMT |
106ee47d | 6910 | -------------------- |
3c313524 | 6911 | |
106ee47d MCC |
6912 | :Architectures: ppc |
6913 | :Parameters: vsmt_mode, flags | |
3c313524 PM |
6914 | |
6915 | Enabling this capability on a VM provides userspace with a way to set | |
6916 | the desired virtual SMT mode (i.e. the number of virtual CPUs per | |
6917 | virtual core). The virtual SMT mode, vsmt_mode, must be a power of 2 | |
6918 | between 1 and 8. On POWER8, vsmt_mode must also be no greater than | |
6919 | the number of threads per subcore for the host. Currently flags must | |
6920 | be 0. A successful call to enable this capability will result in | |
6921 | vsmt_mode being returned when the KVM_CAP_PPC_SMT capability is | |
6922 | subsequently queried for the VM. This capability is only supported by | |
6923 | HV KVM, and can only be set before any VCPUs have been created. | |
2ed4f9dd PM |
6924 | The KVM_CAP_PPC_SMT_POSSIBLE capability indicates which virtual SMT |
6925 | modes are available. | |
3c313524 | 6926 | |
134764ed | 6927 | 7.12 KVM_CAP_PPC_FWNMI |
106ee47d | 6928 | ---------------------- |
134764ed | 6929 | |
106ee47d MCC |
6930 | :Architectures: ppc |
6931 | :Parameters: none | |
134764ed AP |
6932 | |
6933 | With this capability a machine check exception in the guest address | |
6934 | space will cause KVM to exit the guest with NMI exit reason. This | |
6935 | enables QEMU to build error log and branch to guest kernel registered | |
6936 | machine check handling routine. Without this capability KVM will | |
6937 | branch to guests' 0x200 interrupt vector. | |
6938 | ||
4d5422ce | 6939 | 7.13 KVM_CAP_X86_DISABLE_EXITS |
106ee47d | 6940 | ------------------------------ |
4d5422ce | 6941 | |
106ee47d MCC |
6942 | :Architectures: x86 |
6943 | :Parameters: args[0] defines which exits are disabled | |
6944 | :Returns: 0 on success, -EINVAL when args[0] contains invalid exits | |
4d5422ce | 6945 | |
106ee47d | 6946 | Valid bits in args[0] are:: |
4d5422ce | 6947 | |
106ee47d MCC |
6948 | #define KVM_X86_DISABLE_EXITS_MWAIT (1 << 0) |
6949 | #define KVM_X86_DISABLE_EXITS_HLT (1 << 1) | |
6950 | #define KVM_X86_DISABLE_EXITS_PAUSE (1 << 2) | |
6951 | #define KVM_X86_DISABLE_EXITS_CSTATE (1 << 3) | |
4d5422ce WL |
6952 | |
6953 | Enabling this capability on a VM provides userspace with a way to no | |
6954 | longer intercept some instructions for improved latency in some | |
6955 | workloads, and is suggested when vCPUs are associated to dedicated | |
6956 | physical CPUs. More bits can be added in the future; userspace can | |
6957 | just pass the KVM_CHECK_EXTENSION result to KVM_ENABLE_CAP to disable | |
6958 | all such vmexits. | |
6959 | ||
caa057a2 | 6960 | Do not enable KVM_FEATURE_PV_UNHALT if you disable HLT exits. |
4d5422ce | 6961 | |
a4499382 | 6962 | 7.14 KVM_CAP_S390_HPAGE_1M |
106ee47d | 6963 | -------------------------- |
a4499382 | 6964 | |
106ee47d MCC |
6965 | :Architectures: s390 |
6966 | :Parameters: none | |
6967 | :Returns: 0 on success, -EINVAL if hpage module parameter was not set | |
6968 | or cmma is enabled, or the VM has the KVM_VM_S390_UCONTROL | |
6969 | flag set | |
a4499382 JF |
6970 | |
6971 | With this capability the KVM support for memory backing with 1m pages | |
6972 | through hugetlbfs can be enabled for a VM. After the capability is | |
6973 | enabled, cmma can't be enabled anymore and pfmfi and the storage key | |
6974 | interpretation are disabled. If cmma has already been enabled or the | |
6975 | hpage module parameter is not set to 1, -EINVAL is returned. | |
6976 | ||
6977 | While it is generally possible to create a huge page backed VM without | |
6978 | this capability, the VM will not be able to run. | |
6979 | ||
c4f55198 | 6980 | 7.15 KVM_CAP_MSR_PLATFORM_INFO |
106ee47d | 6981 | ------------------------------ |
6fbbde9a | 6982 | |
106ee47d MCC |
6983 | :Architectures: x86 |
6984 | :Parameters: args[0] whether feature should be enabled or not | |
6fbbde9a DS |
6985 | |
6986 | With this capability, a guest may read the MSR_PLATFORM_INFO MSR. Otherwise, | |
6987 | a #GP would be raised when the guest tries to access. Currently, this | |
6988 | capability does not enable write permissions of this MSR for the guest. | |
6989 | ||
aa069a99 | 6990 | 7.16 KVM_CAP_PPC_NESTED_HV |
106ee47d | 6991 | -------------------------- |
aa069a99 | 6992 | |
106ee47d MCC |
6993 | :Architectures: ppc |
6994 | :Parameters: none | |
6995 | :Returns: 0 on success, -EINVAL when the implementation doesn't support | |
6996 | nested-HV virtualization. | |
aa069a99 PM |
6997 | |
6998 | HV-KVM on POWER9 and later systems allows for "nested-HV" | |
6999 | virtualization, which provides a way for a guest VM to run guests that | |
7000 | can run using the CPU's supervisor mode (privileged non-hypervisor | |
7001 | state). Enabling this capability on a VM depends on the CPU having | |
7002 | the necessary functionality and on the facility being enabled with a | |
7003 | kvm-hv module parameter. | |
7004 | ||
c4f55198 | 7005 | 7.17 KVM_CAP_EXCEPTION_PAYLOAD |
106ee47d | 7006 | ------------------------------ |
c4f55198 | 7007 | |
106ee47d MCC |
7008 | :Architectures: x86 |
7009 | :Parameters: args[0] whether feature should be enabled or not | |
c4f55198 JM |
7010 | |
7011 | With this capability enabled, CR2 will not be modified prior to the | |
7012 | emulated VM-exit when L1 intercepts a #PF exception that occurs in | |
7013 | L2. Similarly, for kvm-intel only, DR6 will not be modified prior to | |
7014 | the emulated VM-exit when L1 intercepts a #DB exception that occurs in | |
7015 | L2. As a result, when KVM_GET_VCPU_EVENTS reports a pending #PF (or | |
7016 | #DB) exception for L2, exception.has_payload will be set and the | |
7017 | faulting address (or the new DR6 bits*) will be reported in the | |
7018 | exception_payload field. Similarly, when userspace injects a #PF (or | |
7019 | #DB) into L2 using KVM_SET_VCPU_EVENTS, it is expected to set | |
106ee47d MCC |
7020 | exception.has_payload and to put the faulting address - or the new DR6 |
7021 | bits\ [#]_ - in the exception_payload field. | |
c4f55198 JM |
7022 | |
7023 | This capability also enables exception.pending in struct | |
7024 | kvm_vcpu_events, which allows userspace to distinguish between pending | |
7025 | and injected exceptions. | |
7026 | ||
7027 | ||
106ee47d MCC |
7028 | .. [#] For the new DR6 bits, note that bit 16 is set iff the #DB exception |
7029 | will clear DR6.RTM. | |
c4f55198 | 7030 | |
d7547c55 | 7031 | 7.18 KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 |
2a31b9db | 7032 | |
3fbf4207 | 7033 | :Architectures: x86, arm64, mips |
106ee47d | 7034 | :Parameters: args[0] whether feature should be enabled or not |
2a31b9db | 7035 | |
3c9bd400 JZ |
7036 | Valid flags are:: |
7037 | ||
7038 | #define KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE (1 << 0) | |
7039 | #define KVM_DIRTY_LOG_INITIALLY_SET (1 << 1) | |
7040 | ||
7041 | With KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE is set, KVM_GET_DIRTY_LOG will not | |
7042 | automatically clear and write-protect all pages that are returned as dirty. | |
2a31b9db PB |
7043 | Rather, userspace will have to do this operation separately using |
7044 | KVM_CLEAR_DIRTY_LOG. | |
7045 | ||
7046 | At the cost of a slightly more complicated operation, this provides better | |
7047 | scalability and responsiveness for two reasons. First, | |
7048 | KVM_CLEAR_DIRTY_LOG ioctl can operate on a 64-page granularity rather | |
7049 | than requiring to sync a full memslot; this ensures that KVM does not | |
7050 | take spinlocks for an extended period of time. Second, in some cases a | |
7051 | large amount of time can pass between a call to KVM_GET_DIRTY_LOG and | |
7052 | userspace actually using the data in the page. Pages can be modified | |
3c9bd400 | 7053 | during this time, which is inefficient for both the guest and userspace: |
2a31b9db PB |
7054 | the guest will incur a higher penalty due to write protection faults, |
7055 | while userspace can see false reports of dirty pages. Manual reprotection | |
7056 | helps reducing this time, improving guest performance and reducing the | |
7057 | number of dirty log false positives. | |
7058 | ||
3c9bd400 JZ |
7059 | With KVM_DIRTY_LOG_INITIALLY_SET set, all the bits of the dirty bitmap |
7060 | will be initialized to 1 when created. This also improves performance because | |
7061 | dirty logging can be enabled gradually in small chunks on the first call | |
7062 | to KVM_CLEAR_DIRTY_LOG. KVM_DIRTY_LOG_INITIALLY_SET depends on | |
7063 | KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE (it is also only available on | |
c862626e | 7064 | x86 and arm64 for now). |
3c9bd400 | 7065 | |
d7547c55 PX |
7066 | KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 was previously available under the name |
7067 | KVM_CAP_MANUAL_DIRTY_LOG_PROTECT, but the implementation had bugs that make | |
7068 | it hard or impossible to use it correctly. The availability of | |
7069 | KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 signals that those bugs are fixed. | |
7070 | Userspace should not try to use KVM_CAP_MANUAL_DIRTY_LOG_PROTECT. | |
2a31b9db | 7071 | |
9a5788c6 PM |
7072 | 7.19 KVM_CAP_PPC_SECURE_GUEST |
7073 | ------------------------------ | |
7074 | ||
7075 | :Architectures: ppc | |
7076 | ||
7077 | This capability indicates that KVM is running on a host that has | |
7078 | ultravisor firmware and thus can support a secure guest. On such a | |
7079 | system, a guest can ask the ultravisor to make it a secure guest, | |
7080 | one whose memory is inaccessible to the host except for pages which | |
7081 | are explicitly requested to be shared with the host. The ultravisor | |
7082 | notifies KVM when a guest requests to become a secure guest, and KVM | |
7083 | has the opportunity to veto the transition. | |
7084 | ||
7085 | If present, this capability can be enabled for a VM, meaning that KVM | |
7086 | will allow the transition to secure guest mode. Otherwise KVM will | |
7087 | veto the transition. | |
7088 | ||
acd05785 DM |
7089 | 7.20 KVM_CAP_HALT_POLL |
7090 | ---------------------- | |
7091 | ||
7092 | :Architectures: all | |
7093 | :Target: VM | |
7094 | :Parameters: args[0] is the maximum poll time in nanoseconds | |
7095 | :Returns: 0 on success; -1 on error | |
7096 | ||
7097 | This capability overrides the kvm module parameter halt_poll_ns for the | |
7098 | target VM. | |
7099 | ||
7100 | VCPU polling allows a VCPU to poll for wakeup events instead of immediately | |
7101 | scheduling during guest halts. The maximum time a VCPU can spend polling is | |
7102 | controlled by the kvm module parameter halt_poll_ns. This capability allows | |
7103 | the maximum halt time to specified on a per-VM basis, effectively overriding | |
7104 | the module parameter for the target VM. | |
7105 | ||
1ae09954 AG |
7106 | 7.21 KVM_CAP_X86_USER_SPACE_MSR |
7107 | ------------------------------- | |
7108 | ||
7109 | :Architectures: x86 | |
7110 | :Target: VM | |
7111 | :Parameters: args[0] contains the mask of KVM_MSR_EXIT_REASON_* events to report | |
7112 | :Returns: 0 on success; -1 on error | |
7113 | ||
7114 | This capability enables trapping of #GP invoking RDMSR and WRMSR instructions | |
7115 | into user space. | |
7116 | ||
7117 | When a guest requests to read or write an MSR, KVM may not implement all MSRs | |
7118 | that are relevant to a respective system. It also does not differentiate by | |
7119 | CPU type. | |
7120 | ||
7121 | To allow more fine grained control over MSR handling, user space may enable | |
7122 | this capability. With it enabled, MSR accesses that match the mask specified in | |
7123 | args[0] and trigger a #GP event inside the guest by KVM will instead trigger | |
7124 | KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR exit notifications which user space | |
7125 | can then handle to implement model specific MSR handling and/or user notifications | |
7126 | to inform a user that an MSR was not handled. | |
7127 | ||
c32b1b89 CQ |
7128 | 7.22 KVM_CAP_X86_BUS_LOCK_EXIT |
7129 | ------------------------------- | |
7130 | ||
7131 | :Architectures: x86 | |
7132 | :Target: VM | |
7133 | :Parameters: args[0] defines the policy used when bus locks detected in guest | |
7134 | :Returns: 0 on success, -EINVAL when args[0] contains invalid bits | |
7135 | ||
7136 | Valid bits in args[0] are:: | |
7137 | ||
7138 | #define KVM_BUS_LOCK_DETECTION_OFF (1 << 0) | |
7139 | #define KVM_BUS_LOCK_DETECTION_EXIT (1 << 1) | |
7140 | ||
7141 | Enabling this capability on a VM provides userspace with a way to select | |
7142 | a policy to handle the bus locks detected in guest. Userspace can obtain | |
7143 | the supported modes from the result of KVM_CHECK_EXTENSION and define it | |
7144 | through the KVM_ENABLE_CAP. | |
7145 | ||
7146 | KVM_BUS_LOCK_DETECTION_OFF and KVM_BUS_LOCK_DETECTION_EXIT are supported | |
7147 | currently and mutually exclusive with each other. More bits can be added in | |
7148 | the future. | |
7149 | ||
7150 | With KVM_BUS_LOCK_DETECTION_OFF set, bus locks in guest will not cause vm exits | |
7151 | so that no additional actions are needed. This is the default mode. | |
7152 | ||
7153 | With KVM_BUS_LOCK_DETECTION_EXIT set, vm exits happen when bus lock detected | |
7154 | in VM. KVM just exits to userspace when handling them. Userspace can enforce | |
7155 | its own throttling or other policy based mitigations. | |
7156 | ||
7157 | This capability is aimed to address the thread that VM can exploit bus locks to | |
7158 | degree the performance of the whole system. Once the userspace enable this | |
7159 | capability and select the KVM_BUS_LOCK_DETECTION_EXIT mode, KVM will set the | |
7160 | KVM_RUN_BUS_LOCK flag in vcpu-run->flags field and exit to userspace. Concerning | |
7161 | the bus lock vm exit can be preempted by a higher priority VM exit, the exit | |
7162 | notifications to userspace can be KVM_EXIT_BUS_LOCK or other reasons. | |
7163 | KVM_RUN_BUS_LOCK flag is used to distinguish between them. | |
7164 | ||
7d2cdad0 | 7165 | 7.23 KVM_CAP_PPC_DAWR1 |
d9a47eda RB |
7166 | ---------------------- |
7167 | ||
7168 | :Architectures: ppc | |
7169 | :Parameters: none | |
7170 | :Returns: 0 on success, -EINVAL when CPU doesn't support 2nd DAWR | |
7171 | ||
7172 | This capability can be used to check / enable 2nd DAWR feature provided | |
7173 | by POWER10 processor. | |
7174 | ||
19238e75 | 7175 | |
54526d1f NT |
7176 | 7.24 KVM_CAP_VM_COPY_ENC_CONTEXT_FROM |
7177 | ------------------------------------- | |
7178 | ||
7179 | Architectures: x86 SEV enabled | |
7180 | Type: vm | |
7181 | Parameters: args[0] is the fd of the source vm | |
7182 | Returns: 0 on success; ENOTTY on error | |
7183 | ||
7184 | This capability enables userspace to copy encryption context from the vm | |
7185 | indicated by the fd to the vm this is called on. | |
7186 | ||
7187 | This is intended to support in-guest workloads scheduled by the host. This | |
7188 | allows the in-guest workload to maintain its own NPTs and keeps the two vms | |
7189 | from accidentally clobbering each other with interrupts and the like (separate | |
7190 | APIC/MSRs/etc). | |
7191 | ||
fe7e9488 | 7192 | 7.25 KVM_CAP_SGX_ATTRIBUTE |
f82762fb | 7193 | -------------------------- |
fe7e9488 SC |
7194 | |
7195 | :Architectures: x86 | |
7196 | :Target: VM | |
7197 | :Parameters: args[0] is a file handle of a SGX attribute file in securityfs | |
7198 | :Returns: 0 on success, -EINVAL if the file handle is invalid or if a requested | |
7199 | attribute is not supported by KVM. | |
7200 | ||
7201 | KVM_CAP_SGX_ATTRIBUTE enables a userspace VMM to grant a VM access to one or | |
7202 | more priveleged enclave attributes. args[0] must hold a file handle to a valid | |
7203 | SGX attribute file corresponding to an attribute that is supported/restricted | |
7204 | by KVM (currently only PROVISIONKEY). | |
7205 | ||
7206 | The SGX subsystem restricts access to a subset of enclave attributes to provide | |
7207 | additional security for an uncompromised kernel, e.g. use of the PROVISIONKEY | |
7208 | is restricted to deter malware from using the PROVISIONKEY to obtain a stable | |
7209 | system fingerprint. To prevent userspace from circumventing such restrictions | |
7210 | by running an enclave in a VM, KVM prevents access to privileged attributes by | |
7211 | default. | |
7212 | ||
0a5fab9f | 7213 | See Documentation/x86/sgx.rst for more details. |
fe7e9488 | 7214 | |
b87cc116 BR |
7215 | 7.26 KVM_CAP_PPC_RPT_INVALIDATE |
7216 | ------------------------------- | |
7217 | ||
7218 | :Capability: KVM_CAP_PPC_RPT_INVALIDATE | |
7219 | :Architectures: ppc | |
7220 | :Type: vm | |
7221 | ||
7222 | This capability indicates that the kernel is capable of handling | |
7223 | H_RPT_INVALIDATE hcall. | |
7224 | ||
7225 | In order to enable the use of H_RPT_INVALIDATE in the guest, | |
7226 | user space might have to advertise it for the guest. For example, | |
7227 | IBM pSeries (sPAPR) guest starts using it if "hcall-rpt-invalidate" is | |
7228 | present in the "ibm,hypertas-functions" device-tree property. | |
7229 | ||
7230 | This capability is enabled for hypervisors on platforms like POWER9 | |
7231 | that support radix MMU. | |
7232 | ||
19238e75 AL |
7233 | 7.27 KVM_CAP_EXIT_ON_EMULATION_FAILURE |
7234 | -------------------------------------- | |
7235 | ||
7236 | :Architectures: x86 | |
7237 | :Parameters: args[0] whether the feature should be enabled or not | |
7238 | ||
7239 | When this capability is enabled, an emulation failure will result in an exit | |
7240 | to userspace with KVM_INTERNAL_ERROR (except when the emulator was invoked | |
7241 | to handle a VMware backdoor instruction). Furthermore, KVM will now provide up | |
7242 | to 15 instruction bytes for any exit to userspace resulting from an emulation | |
7243 | failure. When these exits to userspace occur use the emulation_failure struct | |
7244 | instead of the internal struct. They both have the same layout, but the | |
7245 | emulation_failure struct matches the content better. It also explicitly | |
7246 | defines the 'flags' field which is used to describe the fields in the struct | |
7247 | that are valid (ie: if KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES is | |
7248 | set in the 'flags' field then both 'insn_size' and 'insn_bytes' have valid data | |
7249 | in them.) | |
7250 | ||
b8917b4a | 7251 | 7.28 KVM_CAP_ARM_MTE |
04c02c20 SP |
7252 | -------------------- |
7253 | ||
7254 | :Architectures: arm64 | |
7255 | :Parameters: none | |
7256 | ||
7257 | This capability indicates that KVM (and the hardware) supports exposing the | |
7258 | Memory Tagging Extensions (MTE) to the guest. It must also be enabled by the | |
7259 | VMM before creating any VCPUs to allow the guest access. Note that MTE is only | |
7260 | available to a guest running in AArch64 mode and enabling this capability will | |
7261 | cause attempts to create AArch32 VCPUs to fail. | |
7262 | ||
7263 | When enabled the guest is able to access tags associated with any memory given | |
7264 | to the guest. KVM will ensure that the tags are maintained during swap or | |
7265 | hibernation of the host; however the VMM needs to manually save/restore the | |
7266 | tags as appropriate if the VM is migrated. | |
7267 | ||
7268 | When this capability is enabled all memory in memslots must be mapped as | |
7269 | not-shareable (no MAP_SHARED), attempts to create a memslot with a | |
7270 | MAP_SHARED mmap will result in an -EINVAL return. | |
7271 | ||
7272 | When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to | |
7273 | perform a bulk copy of tags to/from the guest. | |
19238e75 | 7274 | |
b5663931 PG |
7275 | 7.29 KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM |
7276 | ------------------------------------- | |
7277 | ||
7278 | Architectures: x86 SEV enabled | |
7279 | Type: vm | |
7280 | Parameters: args[0] is the fd of the source vm | |
7281 | Returns: 0 on success | |
7282 | ||
7283 | This capability enables userspace to migrate the encryption context from the VM | |
7284 | indicated by the fd to the VM this is called on. | |
7285 | ||
7286 | This is intended to support intra-host migration of VMs between userspace VMMs, | |
7287 | upgrading the VMM process without interrupting the guest. | |
7288 | ||
93b71801 NP |
7289 | 7.30 KVM_CAP_PPC_AIL_MODE_3 |
7290 | ------------------------------- | |
7291 | ||
7292 | :Capability: KVM_CAP_PPC_AIL_MODE_3 | |
7293 | :Architectures: ppc | |
7294 | :Type: vm | |
7295 | ||
7296 | This capability indicates that the kernel supports the mode 3 setting for the | |
7297 | "Address Translation Mode on Interrupt" aka "Alternate Interrupt Location" | |
7298 | resource that is controlled with the H_SET_MODE hypercall. | |
7299 | ||
7300 | This capability allows a guest kernel to use a better-performance mode for | |
7301 | handling interrupts and system calls. | |
7302 | ||
6d849191 OU |
7303 | 7.31 KVM_CAP_DISABLE_QUIRKS2 |
7304 | ---------------------------- | |
7305 | ||
7306 | :Capability: KVM_CAP_DISABLE_QUIRKS2 | |
7307 | :Parameters: args[0] - set of KVM quirks to disable | |
7308 | :Architectures: x86 | |
7309 | :Type: vm | |
7310 | ||
7311 | This capability, if enabled, will cause KVM to disable some behavior | |
7312 | quirks. | |
7313 | ||
7314 | Calling KVM_CHECK_EXTENSION for this capability returns a bitmask of | |
7315 | quirks that can be disabled in KVM. | |
7316 | ||
7317 | The argument to KVM_ENABLE_CAP for this capability is a bitmask of | |
7318 | quirks to disable, and must be a subset of the bitmask returned by | |
7319 | KVM_CHECK_EXTENSION. | |
7320 | ||
7321 | The valid bits in cap.args[0] are: | |
7322 | ||
7323 | =================================== ============================================ | |
7324 | KVM_X86_QUIRK_LINT0_REENABLED By default, the reset value for the LVT | |
7325 | LINT0 register is 0x700 (APIC_MODE_EXTINT). | |
7326 | When this quirk is disabled, the reset value | |
7327 | is 0x10000 (APIC_LVT_MASKED). | |
7328 | ||
7329 | KVM_X86_QUIRK_CD_NW_CLEARED By default, KVM clears CR0.CD and CR0.NW. | |
7330 | When this quirk is disabled, KVM does not | |
7331 | change the value of CR0.CD and CR0.NW. | |
7332 | ||
7333 | KVM_X86_QUIRK_LAPIC_MMIO_HOLE By default, the MMIO LAPIC interface is | |
7334 | available even when configured for x2APIC | |
7335 | mode. When this quirk is disabled, KVM | |
7336 | disables the MMIO LAPIC interface if the | |
7337 | LAPIC is in x2APIC mode. | |
7338 | ||
7339 | KVM_X86_QUIRK_OUT_7E_INC_RIP By default, KVM pre-increments %rip before | |
7340 | exiting to userspace for an OUT instruction | |
7341 | to port 0x7e. When this quirk is disabled, | |
7342 | KVM does not pre-increment %rip before | |
7343 | exiting to userspace. | |
7344 | ||
7345 | KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT When this quirk is disabled, KVM sets | |
7346 | CPUID.01H:ECX[bit 3] (MONITOR/MWAIT) if | |
7347 | IA32_MISC_ENABLE[bit 18] (MWAIT) is set. | |
7348 | Additionally, when this quirk is disabled, | |
7349 | KVM clears CPUID.01H:ECX[bit 3] if | |
7350 | IA32_MISC_ENABLE[bit 18] is cleared. | |
f1a9761f OU |
7351 | |
7352 | KVM_X86_QUIRK_FIX_HYPERCALL_INSN By default, KVM rewrites guest | |
7353 | VMMCALL/VMCALL instructions to match the | |
7354 | vendor's hypercall instruction for the | |
7355 | system. When this quirk is disabled, KVM | |
7356 | will no longer rewrite invalid guest | |
7357 | hypercall instructions. Executing the | |
7358 | incorrect hypercall instruction will | |
7359 | generate a #UD within the guest. | |
6d849191 OU |
7360 | =================================== ============================================ |
7361 | ||
e928e9cb | 7362 | 8. Other capabilities. |
106ee47d | 7363 | ====================== |
e928e9cb ME |
7364 | |
7365 | This section lists capabilities that give information about other | |
7366 | features of the KVM implementation. | |
7367 | ||
7368 | 8.1 KVM_CAP_PPC_HWRNG | |
106ee47d | 7369 | --------------------- |
e928e9cb | 7370 | |
106ee47d | 7371 | :Architectures: ppc |
e928e9cb ME |
7372 | |
7373 | This capability, if KVM_CHECK_EXTENSION indicates that it is | |
3747c5d3 | 7374 | available, means that the kernel has an implementation of the |
e928e9cb ME |
7375 | H_RANDOM hypercall backed by a hardware random-number generator. |
7376 | If present, the kernel H_RANDOM handler can be enabled for guest use | |
7377 | with the KVM_CAP_PPC_ENABLE_HCALL capability. | |
5c919412 AS |
7378 | |
7379 | 8.2 KVM_CAP_HYPERV_SYNIC | |
106ee47d MCC |
7380 | ------------------------ |
7381 | ||
7382 | :Architectures: x86 | |
5c919412 | 7383 | |
5c919412 | 7384 | This capability, if KVM_CHECK_EXTENSION indicates that it is |
3747c5d3 | 7385 | available, means that the kernel has an implementation of the |
5c919412 AS |
7386 | Hyper-V Synthetic interrupt controller(SynIC). Hyper-V SynIC is |
7387 | used to support Windows Hyper-V based guest paravirt drivers(VMBus). | |
7388 | ||
7389 | In order to use SynIC, it has to be activated by setting this | |
7390 | capability via KVM_ENABLE_CAP ioctl on the vcpu fd. Note that this | |
7391 | will disable the use of APIC hardware virtualization even if supported | |
7392 | by the CPU, as it's incompatible with SynIC auto-EOI behavior. | |
c9270132 PM |
7393 | |
7394 | 8.3 KVM_CAP_PPC_RADIX_MMU | |
106ee47d | 7395 | ------------------------- |
c9270132 | 7396 | |
106ee47d | 7397 | :Architectures: ppc |
c9270132 PM |
7398 | |
7399 | This capability, if KVM_CHECK_EXTENSION indicates that it is | |
3747c5d3 | 7400 | available, means that the kernel can support guests using the |
c9270132 PM |
7401 | radix MMU defined in Power ISA V3.00 (as implemented in the POWER9 |
7402 | processor). | |
7403 | ||
7404 | 8.4 KVM_CAP_PPC_HASH_MMU_V3 | |
106ee47d | 7405 | --------------------------- |
c9270132 | 7406 | |
106ee47d | 7407 | :Architectures: ppc |
c9270132 PM |
7408 | |
7409 | This capability, if KVM_CHECK_EXTENSION indicates that it is | |
3747c5d3 | 7410 | available, means that the kernel can support guests using the |
c9270132 PM |
7411 | hashed page table MMU defined in Power ISA V3.00 (as implemented in |
7412 | the POWER9 processor), including in-memory segment tables. | |
a8a3c426 JH |
7413 | |
7414 | 8.5 KVM_CAP_MIPS_VZ | |
106ee47d | 7415 | ------------------- |
a8a3c426 | 7416 | |
106ee47d | 7417 | :Architectures: mips |
a8a3c426 JH |
7418 | |
7419 | This capability, if KVM_CHECK_EXTENSION on the main kvm handle indicates that | |
7420 | it is available, means that full hardware assisted virtualization capabilities | |
7421 | of the hardware are available for use through KVM. An appropriate | |
7422 | KVM_VM_MIPS_* type must be passed to KVM_CREATE_VM to create a VM which | |
7423 | utilises it. | |
7424 | ||
7425 | If KVM_CHECK_EXTENSION on a kvm VM handle indicates that this capability is | |
7426 | available, it means that the VM is using full hardware assisted virtualization | |
7427 | capabilities of the hardware. This is useful to check after creating a VM with | |
7428 | KVM_VM_MIPS_DEFAULT. | |
7429 | ||
7430 | The value returned by KVM_CHECK_EXTENSION should be compared against known | |
7431 | values (see below). All other values are reserved. This is to allow for the | |
7432 | possibility of other hardware assisted virtualization implementations which | |
7433 | may be incompatible with the MIPS VZ ASE. | |
7434 | ||
106ee47d MCC |
7435 | == ========================================================================== |
7436 | 0 The trap & emulate implementation is in use to run guest code in user | |
a8a3c426 JH |
7437 | mode. Guest virtual memory segments are rearranged to fit the guest in the |
7438 | user mode address space. | |
7439 | ||
106ee47d | 7440 | 1 The MIPS VZ ASE is in use, providing full hardware assisted |
a8a3c426 | 7441 | virtualization, including standard guest virtual memory segments. |
106ee47d | 7442 | == ========================================================================== |
a8a3c426 JH |
7443 | |
7444 | 8.6 KVM_CAP_MIPS_TE | |
106ee47d | 7445 | ------------------- |
a8a3c426 | 7446 | |
106ee47d | 7447 | :Architectures: mips |
a8a3c426 JH |
7448 | |
7449 | This capability, if KVM_CHECK_EXTENSION on the main kvm handle indicates that | |
7450 | it is available, means that the trap & emulate implementation is available to | |
7451 | run guest code in user mode, even if KVM_CAP_MIPS_VZ indicates that hardware | |
7452 | assisted virtualisation is also available. KVM_VM_MIPS_TE (0) must be passed | |
7453 | to KVM_CREATE_VM to create a VM which utilises it. | |
7454 | ||
7455 | If KVM_CHECK_EXTENSION on a kvm VM handle indicates that this capability is | |
7456 | available, it means that the VM is using trap & emulate. | |
578fd61d JH |
7457 | |
7458 | 8.7 KVM_CAP_MIPS_64BIT | |
106ee47d | 7459 | ---------------------- |
578fd61d | 7460 | |
106ee47d | 7461 | :Architectures: mips |
578fd61d JH |
7462 | |
7463 | This capability indicates the supported architecture type of the guest, i.e. the | |
7464 | supported register and address width. | |
7465 | ||
7466 | The values returned when this capability is checked by KVM_CHECK_EXTENSION on a | |
7467 | kvm VM handle correspond roughly to the CP0_Config.AT register field, and should | |
7468 | be checked specifically against known values (see below). All other values are | |
7469 | reserved. | |
7470 | ||
106ee47d MCC |
7471 | == ======================================================================== |
7472 | 0 MIPS32 or microMIPS32. | |
578fd61d JH |
7473 | Both registers and addresses are 32-bits wide. |
7474 | It will only be possible to run 32-bit guest code. | |
7475 | ||
106ee47d | 7476 | 1 MIPS64 or microMIPS64 with access only to 32-bit compatibility segments. |
578fd61d JH |
7477 | Registers are 64-bits wide, but addresses are 32-bits wide. |
7478 | 64-bit guest code may run but cannot access MIPS64 memory segments. | |
7479 | It will also be possible to run 32-bit guest code. | |
7480 | ||
106ee47d | 7481 | 2 MIPS64 or microMIPS64 with access to all address segments. |
578fd61d JH |
7482 | Both registers and addresses are 64-bits wide. |
7483 | It will be possible to run 64-bit or 32-bit guest code. | |
106ee47d | 7484 | == ======================================================================== |
668fffa3 | 7485 | |
c24a7be2 | 7486 | 8.9 KVM_CAP_ARM_USER_IRQ |
106ee47d MCC |
7487 | ------------------------ |
7488 | ||
3fbf4207 | 7489 | :Architectures: arm64 |
3fe17e68 | 7490 | |
3fe17e68 AG |
7491 | This capability, if KVM_CHECK_EXTENSION indicates that it is available, means |
7492 | that if userspace creates a VM without an in-kernel interrupt controller, it | |
7493 | will be notified of changes to the output level of in-kernel emulated devices, | |
7494 | which can generate virtual interrupts, presented to the VM. | |
7495 | For such VMs, on every return to userspace, the kernel | |
7496 | updates the vcpu's run->s.regs.device_irq_level field to represent the actual | |
7497 | output level of the device. | |
7498 | ||
7499 | Whenever kvm detects a change in the device output level, kvm guarantees at | |
7500 | least one return to userspace before running the VM. This exit could either | |
7501 | be a KVM_EXIT_INTR or any other exit event, like KVM_EXIT_MMIO. This way, | |
7502 | userspace can always sample the device output level and re-compute the state of | |
7503 | the userspace interrupt controller. Userspace should always check the state | |
7504 | of run->s.regs.device_irq_level on every kvm exit. | |
7505 | The value in run->s.regs.device_irq_level can represent both level and edge | |
7506 | triggered interrupt signals, depending on the device. Edge triggered interrupt | |
7507 | signals will exit to userspace with the bit in run->s.regs.device_irq_level | |
7508 | set exactly once per edge signal. | |
7509 | ||
7510 | The field run->s.regs.device_irq_level is available independent of | |
7511 | run->kvm_valid_regs or run->kvm_dirty_regs bits. | |
7512 | ||
7513 | If KVM_CAP_ARM_USER_IRQ is supported, the KVM_CHECK_EXTENSION ioctl returns a | |
7514 | number larger than 0 indicating the version of this capability is implemented | |
3747c5d3 | 7515 | and thereby which bits in run->s.regs.device_irq_level can signal values. |
3fe17e68 | 7516 | |
106ee47d | 7517 | Currently the following bits are defined for the device_irq_level bitmap:: |
3fe17e68 AG |
7518 | |
7519 | KVM_CAP_ARM_USER_IRQ >= 1: | |
7520 | ||
7521 | KVM_ARM_DEV_EL1_VTIMER - EL1 virtual timer | |
7522 | KVM_ARM_DEV_EL1_PTIMER - EL1 physical timer | |
7523 | KVM_ARM_DEV_PMU - ARM PMU overflow interrupt signal | |
7524 | ||
7525 | Future versions of kvm may implement additional events. These will get | |
7526 | indicated by returning a higher number from KVM_CHECK_EXTENSION and will be | |
7527 | listed above. | |
2ed4f9dd PM |
7528 | |
7529 | 8.10 KVM_CAP_PPC_SMT_POSSIBLE | |
106ee47d | 7530 | ----------------------------- |
2ed4f9dd | 7531 | |
106ee47d | 7532 | :Architectures: ppc |
2ed4f9dd PM |
7533 | |
7534 | Querying this capability returns a bitmap indicating the possible | |
7535 | virtual SMT modes that can be set using KVM_CAP_PPC_SMT. If bit N | |
7536 | (counting from the right) is set, then a virtual SMT mode of 2^N is | |
7537 | available. | |
efc479e6 RK |
7538 | |
7539 | 8.11 KVM_CAP_HYPERV_SYNIC2 | |
106ee47d | 7540 | -------------------------- |
efc479e6 | 7541 | |
106ee47d | 7542 | :Architectures: x86 |
efc479e6 RK |
7543 | |
7544 | This capability enables a newer version of Hyper-V Synthetic interrupt | |
7545 | controller (SynIC). The only difference with KVM_CAP_HYPERV_SYNIC is that KVM | |
7546 | doesn't clear SynIC message and event flags pages when they are enabled by | |
7547 | writing to the respective MSRs. | |
d3457c87 RK |
7548 | |
7549 | 8.12 KVM_CAP_HYPERV_VP_INDEX | |
106ee47d | 7550 | ---------------------------- |
d3457c87 | 7551 | |
106ee47d | 7552 | :Architectures: x86 |
d3457c87 RK |
7553 | |
7554 | This capability indicates that userspace can load HV_X64_MSR_VP_INDEX msr. Its | |
7555 | value is used to denote the target vcpu for a SynIC interrupt. For | |
7556 | compatibilty, KVM initializes this msr to KVM's internal vcpu index. When this | |
7557 | capability is absent, userspace can still query this msr's value. | |
da9a1446 CB |
7558 | |
7559 | 8.13 KVM_CAP_S390_AIS_MIGRATION | |
106ee47d | 7560 | ------------------------------- |
da9a1446 | 7561 | |
106ee47d MCC |
7562 | :Architectures: s390 |
7563 | :Parameters: none | |
da9a1446 CB |
7564 | |
7565 | This capability indicates if the flic device will be able to get/set the | |
7566 | AIS states for migration via the KVM_DEV_FLIC_AISM_ALL attribute and allows | |
7567 | to discover this without having to create a flic device. | |
5c2b4d5b CB |
7568 | |
7569 | 8.14 KVM_CAP_S390_PSW | |
106ee47d | 7570 | --------------------- |
5c2b4d5b | 7571 | |
106ee47d | 7572 | :Architectures: s390 |
5c2b4d5b CB |
7573 | |
7574 | This capability indicates that the PSW is exposed via the kvm_run structure. | |
7575 | ||
7576 | 8.15 KVM_CAP_S390_GMAP | |
106ee47d | 7577 | ---------------------- |
5c2b4d5b | 7578 | |
106ee47d | 7579 | :Architectures: s390 |
5c2b4d5b CB |
7580 | |
7581 | This capability indicates that the user space memory used as guest mapping can | |
7582 | be anywhere in the user memory address space, as long as the memory slots are | |
7583 | aligned and sized to a segment (1MB) boundary. | |
7584 | ||
7585 | 8.16 KVM_CAP_S390_COW | |
106ee47d | 7586 | --------------------- |
5c2b4d5b | 7587 | |
106ee47d | 7588 | :Architectures: s390 |
5c2b4d5b CB |
7589 | |
7590 | This capability indicates that the user space memory used as guest mapping can | |
7591 | use copy-on-write semantics as well as dirty pages tracking via read-only page | |
7592 | tables. | |
7593 | ||
7594 | 8.17 KVM_CAP_S390_BPB | |
106ee47d | 7595 | --------------------- |
5c2b4d5b | 7596 | |
106ee47d | 7597 | :Architectures: s390 |
5c2b4d5b CB |
7598 | |
7599 | This capability indicates that kvm will implement the interfaces to handle | |
7600 | reset, migration and nested KVM for branch prediction blocking. The stfle | |
7601 | facility 82 should not be provided to the guest without this capability. | |
c1aea919 | 7602 | |
2ddc6498 | 7603 | 8.18 KVM_CAP_HYPERV_TLBFLUSH |
106ee47d | 7604 | ---------------------------- |
c1aea919 | 7605 | |
106ee47d | 7606 | :Architectures: x86 |
c1aea919 VK |
7607 | |
7608 | This capability indicates that KVM supports paravirtualized Hyper-V TLB Flush | |
7609 | hypercalls: | |
7610 | HvFlushVirtualAddressSpace, HvFlushVirtualAddressSpaceEx, | |
7611 | HvFlushVirtualAddressList, HvFlushVirtualAddressListEx. | |
be26b3a7 | 7612 | |
688e0581 | 7613 | 8.19 KVM_CAP_ARM_INJECT_SERROR_ESR |
106ee47d | 7614 | ---------------------------------- |
be26b3a7 | 7615 | |
3fbf4207 | 7616 | :Architectures: arm64 |
be26b3a7 DG |
7617 | |
7618 | This capability indicates that userspace can specify (via the | |
7619 | KVM_SET_VCPU_EVENTS ioctl) the syndrome value reported to the guest when it | |
7620 | takes a virtual SError interrupt exception. | |
7621 | If KVM advertises this capability, userspace can only specify the ISS field for | |
7622 | the ESR syndrome. Other parts of the ESR, such as the EC are generated by the | |
7623 | CPU when the exception is taken. If this virtual SError is taken to EL1 using | |
7624 | AArch64, this value will be reported in the ISS field of ESR_ELx. | |
7625 | ||
7626 | See KVM_CAP_VCPU_EVENTS for more details. | |
106ee47d | 7627 | |
214ff83d | 7628 | 8.20 KVM_CAP_HYPERV_SEND_IPI |
106ee47d | 7629 | ---------------------------- |
214ff83d | 7630 | |
106ee47d | 7631 | :Architectures: x86 |
214ff83d VK |
7632 | |
7633 | This capability indicates that KVM supports paravirtualized Hyper-V IPI send | |
7634 | hypercalls: | |
7635 | HvCallSendSyntheticClusterIpi, HvCallSendSyntheticClusterIpiEx. | |
106ee47d | 7636 | |
344c6c80 | 7637 | 8.21 KVM_CAP_HYPERV_DIRECT_TLBFLUSH |
106ee47d | 7638 | ----------------------------------- |
344c6c80 | 7639 | |
739c7af7 | 7640 | :Architectures: x86 |
344c6c80 TL |
7641 | |
7642 | This capability indicates that KVM running on top of Hyper-V hypervisor | |
7643 | enables Direct TLB flush for its guests meaning that TLB flush | |
7644 | hypercalls are handled by Level 0 hypervisor (Hyper-V) bypassing KVM. | |
7645 | Due to the different ABI for hypercall parameters between Hyper-V and | |
7646 | KVM, enabling this capability effectively disables all hypercall | |
7647 | handling by KVM (as some KVM hypercall may be mistakenly treated as TLB | |
7648 | flush hypercalls by Hyper-V) so userspace should disable KVM identification | |
7649 | in CPUID and only exposes Hyper-V identification. In this case, guest | |
7650 | thinks it's running on Hyper-V and only use Hyper-V hypercalls. | |
7de3f142 JF |
7651 | |
7652 | 8.22 KVM_CAP_S390_VCPU_RESETS | |
739c7af7 | 7653 | ----------------------------- |
7de3f142 | 7654 | |
739c7af7 | 7655 | :Architectures: s390 |
7de3f142 JF |
7656 | |
7657 | This capability indicates that the KVM_S390_NORMAL_RESET and | |
7658 | KVM_S390_CLEAR_RESET ioctls are available. | |
04ed89dc JF |
7659 | |
7660 | 8.23 KVM_CAP_S390_PROTECTED | |
739c7af7 | 7661 | --------------------------- |
04ed89dc | 7662 | |
739c7af7 | 7663 | :Architectures: s390 |
04ed89dc JF |
7664 | |
7665 | This capability indicates that the Ultravisor has been initialized and | |
7666 | KVM can therefore start protected VMs. | |
7667 | This capability governs the KVM_S390_PV_COMMAND ioctl and the | |
7668 | KVM_MP_STATE_LOAD MP_STATE. KVM_SET_MP_STATE can fail for protected | |
7669 | guests when the state change is invalid. | |
004a0124 AJ |
7670 | |
7671 | 8.24 KVM_CAP_STEAL_TIME | |
7672 | ----------------------- | |
7673 | ||
7674 | :Architectures: arm64, x86 | |
7675 | ||
7676 | This capability indicates that KVM supports steal time accounting. | |
7677 | When steal time accounting is supported it may be enabled with | |
7678 | architecture-specific interfaces. This capability and the architecture- | |
7679 | specific interfaces must be consistent, i.e. if one says the feature | |
7680 | is supported, than the other should as well and vice versa. For arm64 | |
7681 | see Documentation/virt/kvm/devices/vcpu.rst "KVM_ARM_VCPU_PVTIME_CTRL". | |
7682 | For x86 see Documentation/virt/kvm/msr.rst "MSR_KVM_STEAL_TIME". | |
f20d4e92 CW |
7683 | |
7684 | 8.25 KVM_CAP_S390_DIAG318 | |
7685 | ------------------------- | |
7686 | ||
7687 | :Architectures: s390 | |
7688 | ||
7689 | This capability enables a guest to set information about its control program | |
7690 | (i.e. guest kernel type and version). The information is helpful during | |
7691 | system/firmware service events, providing additional data about the guest | |
7692 | environments running on the machine. | |
7693 | ||
7694 | The information is associated with the DIAGNOSE 0x318 instruction, which sets | |
7695 | an 8-byte value consisting of a one-byte Control Program Name Code (CPNC) and | |
7696 | a 7-byte Control Program Version Code (CPVC). The CPNC determines what | |
7697 | environment the control program is running in (e.g. Linux, z/VM...), and the | |
7698 | CPVC is used for information specific to OS (e.g. Linux version, Linux | |
7699 | distribution...) | |
7700 | ||
7701 | If this capability is available, then the CPNC and CPVC can be synchronized | |
7702 | between KVM and userspace via the sync regs mechanism (KVM_SYNC_DIAG318). | |
1ae09954 AG |
7703 | |
7704 | 8.26 KVM_CAP_X86_USER_SPACE_MSR | |
7705 | ------------------------------- | |
7706 | ||
7707 | :Architectures: x86 | |
7708 | ||
7709 | This capability indicates that KVM supports deflection of MSR reads and | |
7710 | writes to user space. It can be enabled on a VM level. If enabled, MSR | |
7711 | accesses that would usually trigger a #GP by KVM into the guest will | |
7712 | instead get bounced to user space through the KVM_EXIT_X86_RDMSR and | |
7713 | KVM_EXIT_X86_WRMSR exit notifications. | |
1a155254 | 7714 | |
46a63924 | 7715 | 8.27 KVM_CAP_X86_MSR_FILTER |
1a155254 AG |
7716 | --------------------------- |
7717 | ||
7718 | :Architectures: x86 | |
7719 | ||
7720 | This capability indicates that KVM supports that accesses to user defined MSRs | |
7721 | may be rejected. With this capability exposed, KVM exports new VM ioctl | |
7722 | KVM_X86_SET_MSR_FILTER which user space can call to specify bitmaps of MSR | |
7723 | ranges that KVM should reject access to. | |
7724 | ||
7725 | In combination with KVM_CAP_X86_USER_SPACE_MSR, this allows user space to | |
7726 | trap and emulate MSRs that are outside of the scope of KVM as well as | |
7727 | limit the attack surface on KVM's MSR emulation code. | |
66570e96 | 7728 | |
0e691ee7 | 7729 | 8.28 KVM_CAP_ENFORCE_PV_FEATURE_CPUID |
e2e83a73 | 7730 | ------------------------------------- |
66570e96 OU |
7731 | |
7732 | Architectures: x86 | |
7733 | ||
7734 | When enabled, KVM will disable paravirtual features provided to the | |
7735 | guest according to the bits in the KVM_CPUID_FEATURES CPUID leaf | |
7736 | (0x40000001). Otherwise, a guest may use the paravirtual features | |
7737 | regardless of what has actually been exposed through the CPUID leaf. | |
fb04a1ed | 7738 | |
fb04a1ed PX |
7739 | 8.29 KVM_CAP_DIRTY_LOG_RING |
7740 | --------------------------- | |
7741 | ||
7742 | :Architectures: x86 | |
7743 | :Parameters: args[0] - size of the dirty log ring | |
7744 | ||
7745 | KVM is capable of tracking dirty memory using ring buffers that are | |
7746 | mmaped into userspace; there is one dirty ring per vcpu. | |
7747 | ||
7748 | The dirty ring is available to userspace as an array of | |
7749 | ``struct kvm_dirty_gfn``. Each dirty entry it's defined as:: | |
7750 | ||
7751 | struct kvm_dirty_gfn { | |
7752 | __u32 flags; | |
7753 | __u32 slot; /* as_id | slot_id */ | |
7754 | __u64 offset; | |
7755 | }; | |
7756 | ||
7757 | The following values are defined for the flags field to define the | |
7758 | current state of the entry:: | |
7759 | ||
7760 | #define KVM_DIRTY_GFN_F_DIRTY BIT(0) | |
7761 | #define KVM_DIRTY_GFN_F_RESET BIT(1) | |
7762 | #define KVM_DIRTY_GFN_F_MASK 0x3 | |
7763 | ||
7764 | Userspace should call KVM_ENABLE_CAP ioctl right after KVM_CREATE_VM | |
7765 | ioctl to enable this capability for the new guest and set the size of | |
7766 | the rings. Enabling the capability is only allowed before creating any | |
7767 | vCPU, and the size of the ring must be a power of two. The larger the | |
7768 | ring buffer, the less likely the ring is full and the VM is forced to | |
7769 | exit to userspace. The optimal size depends on the workload, but it is | |
7770 | recommended that it be at least 64 KiB (4096 entries). | |
7771 | ||
7772 | Just like for dirty page bitmaps, the buffer tracks writes to | |
7773 | all user memory regions for which the KVM_MEM_LOG_DIRTY_PAGES flag was | |
7774 | set in KVM_SET_USER_MEMORY_REGION. Once a memory region is registered | |
7775 | with the flag set, userspace can start harvesting dirty pages from the | |
7776 | ring buffer. | |
7777 | ||
7778 | An entry in the ring buffer can be unused (flag bits ``00``), | |
7779 | dirty (flag bits ``01``) or harvested (flag bits ``1X``). The | |
7780 | state machine for the entry is as follows:: | |
7781 | ||
7782 | dirtied harvested reset | |
7783 | 00 -----------> 01 -------------> 1X -------+ | |
7784 | ^ | | |
7785 | | | | |
7786 | +------------------------------------------+ | |
7787 | ||
7788 | To harvest the dirty pages, userspace accesses the mmaped ring buffer | |
7789 | to read the dirty GFNs. If the flags has the DIRTY bit set (at this stage | |
7790 | the RESET bit must be cleared), then it means this GFN is a dirty GFN. | |
7791 | The userspace should harvest this GFN and mark the flags from state | |
7792 | ``01b`` to ``1Xb`` (bit 0 will be ignored by KVM, but bit 1 must be set | |
7793 | to show that this GFN is harvested and waiting for a reset), and move | |
7794 | on to the next GFN. The userspace should continue to do this until the | |
7795 | flags of a GFN have the DIRTY bit cleared, meaning that it has harvested | |
7796 | all the dirty GFNs that were available. | |
7797 | ||
7798 | It's not necessary for userspace to harvest the all dirty GFNs at once. | |
7799 | However it must collect the dirty GFNs in sequence, i.e., the userspace | |
7800 | program cannot skip one dirty GFN to collect the one next to it. | |
7801 | ||
7802 | After processing one or more entries in the ring buffer, userspace | |
7803 | calls the VM ioctl KVM_RESET_DIRTY_RINGS to notify the kernel about | |
7804 | it, so that the kernel will reprotect those collected GFNs. | |
7805 | Therefore, the ioctl must be called *before* reading the content of | |
7806 | the dirty pages. | |
7807 | ||
7808 | The dirty ring can get full. When it happens, the KVM_RUN of the | |
7809 | vcpu will return with exit reason KVM_EXIT_DIRTY_LOG_FULL. | |
7810 | ||
7811 | The dirty ring interface has a major difference comparing to the | |
7812 | KVM_GET_DIRTY_LOG interface in that, when reading the dirty ring from | |
7813 | userspace, it's still possible that the kernel has not yet flushed the | |
7814 | processor's dirty page buffers into the kernel buffer (with dirty bitmaps, the | |
7815 | flushing is done by the KVM_GET_DIRTY_LOG ioctl). To achieve that, one | |
7816 | needs to kick the vcpu out of KVM_RUN using a signal. The resulting | |
7817 | vmexit ensures that all dirty GFNs are flushed to the dirty rings. | |
b2cc64c4 PX |
7818 | |
7819 | NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding | |
7820 | ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls | |
7821 | KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG. After enabling | |
7822 | KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual | |
7823 | machine will switch to ring-buffer dirty page tracking and further | |
7824 | KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail. | |
e1f68169 DW |
7825 | |
7826 | 8.30 KVM_CAP_XEN_HVM | |
7827 | -------------------- | |
7828 | ||
7829 | :Architectures: x86 | |
7830 | ||
7831 | This capability indicates the features that Xen supports for hosting Xen | |
7832 | PVHVM guests. Valid flags are:: | |
7833 | ||
7834 | #define KVM_XEN_HVM_CONFIG_HYPERCALL_MSR (1 << 0) | |
7835 | #define KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL (1 << 1) | |
7836 | #define KVM_XEN_HVM_CONFIG_SHARED_INFO (1 << 2) | |
661a20fa DW |
7837 | #define KVM_XEN_HVM_CONFIG_RUNSTATE (1 << 3) |
7838 | #define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL (1 << 4) | |
7839 | #define KVM_XEN_HVM_CONFIG_EVTCHN_SEND (1 << 5) | |
e1f68169 DW |
7840 | |
7841 | The KVM_XEN_HVM_CONFIG_HYPERCALL_MSR flag indicates that the KVM_XEN_HVM_CONFIG | |
7842 | ioctl is available, for the guest to set its hypercall page. | |
7843 | ||
7844 | If KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL is also set, the same flag may also be | |
7845 | provided in the flags to KVM_XEN_HVM_CONFIG, without providing hypercall page | |
7846 | contents, to request that KVM generate hypercall page content automatically | |
7847 | and also enable interception of guest hypercalls with KVM_EXIT_XEN. | |
7848 | ||
7849 | The KVM_XEN_HVM_CONFIG_SHARED_INFO flag indicates the availability of the | |
7850 | KVM_XEN_HVM_SET_ATTR, KVM_XEN_HVM_GET_ATTR, KVM_XEN_VCPU_SET_ATTR and | |
7851 | KVM_XEN_VCPU_GET_ATTR ioctls, as well as the delivery of exception vectors | |
7852 | for event channel upcalls when the evtchn_upcall_pending field of a vcpu's | |
7853 | vcpu_info is set. | |
30b5c851 DW |
7854 | |
7855 | The KVM_XEN_HVM_CONFIG_RUNSTATE flag indicates that the runstate-related | |
7856 | features KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADDR/_CURRENT/_DATA/_ADJUST are | |
7857 | supported by the KVM_XEN_VCPU_SET_ATTR/KVM_XEN_VCPU_GET_ATTR ioctls. | |
24e7475f | 7858 | |
14243b38 DW |
7859 | The KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL flag indicates that IRQ routing entries |
7860 | of the type KVM_IRQ_ROUTING_XEN_EVTCHN are supported, with the priority | |
7861 | field set to indicate 2 level event channel delivery. | |
7862 | ||
661a20fa DW |
7863 | The KVM_XEN_HVM_CONFIG_EVTCHN_SEND flag indicates that KVM supports |
7864 | injecting event channel events directly into the guest with the | |
7865 | KVM_XEN_HVM_EVTCHN_SEND ioctl. It also indicates support for the | |
7866 | KVM_XEN_ATTR_TYPE_EVTCHN/XEN_VERSION HVM attributes and the | |
7867 | KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID/TIMER/UPCALL_VECTOR vCPU attributes. | |
7868 | related to event channel delivery, timers, and the XENVER_version | |
7869 | interception. | |
7870 | ||
24e7475f EGE |
7871 | 8.31 KVM_CAP_PPC_MULTITCE |
7872 | ------------------------- | |
7873 | ||
7874 | :Capability: KVM_CAP_PPC_MULTITCE | |
7875 | :Architectures: ppc | |
7876 | :Type: vm | |
7877 | ||
7878 | This capability means the kernel is capable of handling hypercalls | |
7879 | H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user | |
7880 | space. This significantly accelerates DMA operations for PPC KVM guests. | |
7881 | User space should expect that its handlers for these hypercalls | |
7882 | are not going to be called if user space previously registered LIOBN | |
7883 | in KVM (via KVM_CREATE_SPAPR_TCE or similar calls). | |
7884 | ||
7885 | In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest, | |
7886 | user space might have to advertise it for the guest. For example, | |
7887 | IBM pSeries (sPAPR) guest starts using them if "hcall-multi-tce" is | |
7888 | present in the "ibm,hypertas-functions" device-tree property. | |
7889 | ||
7890 | The hypercalls mentioned above may or may not be processed successfully | |
7891 | in the kernel based fast path. If they can not be handled by the kernel, | |
7892 | they will get passed on to user space. So user space still has to have | |
7893 | an implementation for these despite the in kernel acceleration. | |
7894 | ||
54526d1f | 7895 | This capability is always enabled. |
c4f71901 PB |
7896 | |
7897 | 8.32 KVM_CAP_PTP_KVM | |
3bf72569 JW |
7898 | -------------------- |
7899 | ||
7900 | :Architectures: arm64 | |
7901 | ||
7902 | This capability indicates that the KVM virtual PTP service is | |
7903 | supported in the host. A VMM can check whether the service is | |
7904 | available to the guest on migration. | |
644f7067 VK |
7905 | |
7906 | 8.33 KVM_CAP_HYPERV_ENFORCE_CPUID | |
8b967164 | 7907 | --------------------------------- |
644f7067 VK |
7908 | |
7909 | Architectures: x86 | |
7910 | ||
7911 | When enabled, KVM will disable emulated Hyper-V features provided to the | |
7912 | guest according to the bits Hyper-V CPUID feature leaves. Otherwise, all | |
7913 | currently implmented Hyper-V features are provided unconditionally when | |
7914 | Hyper-V identification is set in the HYPERV_CPUID_INTERFACE (0x40000001) | |
7915 | leaf. | |
0dbb1123 AK |
7916 | |
7917 | 8.34 KVM_CAP_EXIT_HYPERCALL | |
7918 | --------------------------- | |
7919 | ||
7920 | :Capability: KVM_CAP_EXIT_HYPERCALL | |
7921 | :Architectures: x86 | |
7922 | :Type: vm | |
7923 | ||
7924 | This capability, if enabled, will cause KVM to exit to userspace | |
7925 | with KVM_EXIT_HYPERCALL exit reason to process some hypercalls. | |
7926 | ||
7927 | Calling KVM_CHECK_EXTENSION for this capability will return a bitmask | |
7928 | of hypercalls that can be configured to exit to userspace. | |
7929 | Right now, the only such hypercall is KVM_HC_MAP_GPA_RANGE. | |
7930 | ||
7931 | The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset | |
7932 | of the result of KVM_CHECK_EXTENSION. KVM will forward to userspace | |
7933 | the hypercalls whose corresponding bit is in the argument, and return | |
7934 | ENOSYS for the others. | |
ba7bb663 DD |
7935 | |
7936 | 8.35 KVM_CAP_PMU_CAPABILITY | |
7937 | --------------------------- | |
7938 | ||
7939 | :Capability KVM_CAP_PMU_CAPABILITY | |
7940 | :Architectures: x86 | |
7941 | :Type: vm | |
7942 | :Parameters: arg[0] is bitmask of PMU virtualization capabilities. | |
7943 | :Returns 0 on success, -EINVAL when arg[0] contains invalid bits | |
7944 | ||
7945 | This capability alters PMU virtualization in KVM. | |
7946 | ||
7947 | Calling KVM_CHECK_EXTENSION for this capability returns a bitmask of | |
7948 | PMU virtualization capabilities that can be adjusted on a VM. | |
7949 | ||
7950 | The argument to KVM_ENABLE_CAP is also a bitmask and selects specific | |
7951 | PMU virtualization capabilities to be applied to the VM. This can | |
7952 | only be invoked on a VM prior to the creation of VCPUs. | |
7953 | ||
7954 | At this time, KVM_PMU_CAP_DISABLE is the only capability. Setting | |
7955 | this capability will disable PMU virtualization for that VM. Usermode | |
7956 | should adjust CPUID leaf 0xA to reflect that the PMU is disabled. | |
cde363ab | 7957 | |
bfbab445 OU |
7958 | 8.36 KVM_CAP_ARM_SYSTEM_SUSPEND |
7959 | ------------------------------- | |
7960 | ||
7961 | :Capability: KVM_CAP_ARM_SYSTEM_SUSPEND | |
7962 | :Architectures: arm64 | |
7963 | :Type: vm | |
7964 | ||
7965 | When enabled, KVM will exit to userspace with KVM_EXIT_SYSTEM_EVENT of | |
7966 | type KVM_SYSTEM_EVENT_SUSPEND to process the guest suspend request. | |
7967 | ||
cde363ab PB |
7968 | 9. Known KVM API problems |
7969 | ========================= | |
7970 | ||
7971 | In some cases, KVM's API has some inconsistencies or common pitfalls | |
7972 | that userspace need to be aware of. This section details some of | |
7973 | these issues. | |
7974 | ||
7975 | Most of them are architecture specific, so the section is split by | |
7976 | architecture. | |
7977 | ||
7978 | 9.1. x86 | |
7979 | -------- | |
7980 | ||
7981 | ``KVM_GET_SUPPORTED_CPUID`` issues | |
7982 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
7983 | ||
7984 | In general, ``KVM_GET_SUPPORTED_CPUID`` is designed so that it is possible | |
7985 | to take its result and pass it directly to ``KVM_SET_CPUID2``. This section | |
7986 | documents some cases in which that requires some care. | |
7987 | ||
7988 | Local APIC features | |
7989 | ~~~~~~~~~~~~~~~~~~~ | |
7990 | ||
7991 | CPU[EAX=1]:ECX[21] (X2APIC) is reported by ``KVM_GET_SUPPORTED_CPUID``, | |
7992 | but it can only be enabled if ``KVM_CREATE_IRQCHIP`` or | |
7993 | ``KVM_ENABLE_CAP(KVM_CAP_IRQCHIP_SPLIT)`` are used to enable in-kernel emulation of | |
7994 | the local APIC. | |
7995 | ||
7996 | The same is true for the ``KVM_FEATURE_PV_UNHALT`` paravirtualized feature. | |
7997 | ||
7998 | CPU[EAX=1]:ECX[24] (TSC_DEADLINE) is not reported by ``KVM_GET_SUPPORTED_CPUID``. | |
7999 | It can be enabled if ``KVM_CAP_TSC_DEADLINE_TIMER`` is present and the kernel | |
8000 | has enabled in-kernel emulation of the local APIC. | |
8001 | ||
8002 | Obsolete ioctls and capabilities | |
8003 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
8004 | ||
8005 | KVM_CAP_DISABLE_QUIRKS does not let userspace know which quirks are actually | |
8006 | available. Use ``KVM_CHECK_EXTENSION(KVM_CAP_DISABLE_QUIRKS2)`` instead if | |
8007 | available. | |
8008 | ||
8009 | Ordering of KVM_GET_*/KVM_SET_* ioctls | |
8010 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
8011 | ||
8012 | TBD |