ASoC: DAPM: Replace sprintf() calls with sysfs_emit_at()
[linux-block.git] / Documentation / powerpc / papr_hcalls.rst
CommitLineData
58b278f5
VJ
1.. SPDX-License-Identifier: GPL-2.0
2
3===========================
4Hypercall Op-codes (hcalls)
5===========================
6
7Overview
8=========
9
10Virtualization on 64-bit Power Book3S Platforms is based on the PAPR
11specification [1]_ which describes the run-time environment for a guest
12operating system and how it should interact with the hypervisor for
13privileged operations. Currently there are two PAPR compliant hypervisors:
14
15- **IBM PowerVM (PHYP)**: IBM's proprietary hypervisor that supports AIX,
16 IBM-i and Linux as supported guests (termed as Logical Partitions
17 or LPARS). It supports the full PAPR specification.
18
19- **Qemu/KVM**: Supports PPC64 linux guests running on a PPC64 linux host.
20 Though it only implements a subset of PAPR specification called LoPAPR [2]_.
21
22On PPC64 arch a guest kernel running on top of a PAPR hypervisor is called
23a *pSeries guest*. A pseries guest runs in a supervisor mode (HV=0) and must
24issue hypercalls to the hypervisor whenever it needs to perform an action
25that is hypervisor priviledged [3]_ or for other services managed by the
26hypervisor.
27
28Hence a Hypercall (hcall) is essentially a request by the pseries guest
29asking hypervisor to perform a privileged operation on behalf of the guest. The
30guest issues a with necessary input operands. The hypervisor after performing
31the privilege operation returns a status code and output operands back to the
32guest.
33
34HCALL ABI
35=========
36The ABI specification for a hcall between a pseries guest and PAPR hypervisor
37is covered in section 14.5.3 of ref [2]_. Switch to the Hypervisor context is
38done via the instruction **HVCS** that expects the Opcode for hcall is set in *r3*
39and any in-arguments for the hcall are provided in registers *r4-r12*. If values
40have to be passed through a memory buffer, the data stored in that buffer should be
41in Big-endian byte order.
42
f8b42777 43Once control returns back to the guest after hypervisor has serviced the
58b278f5
VJ
44'HVCS' instruction the return value of the hcall is available in *r3* and any
45out values are returned in registers *r4-r12*. Again like in case of in-arguments,
46any out values stored in a memory buffer will be in Big-endian byte order.
47
48Powerpc arch code provides convenient wrappers named **plpar_hcall_xxx** defined
49in a arch specific header [4]_ to issue hcalls from the linux kernel
50running as pseries guest.
51
52Register Conventions
53====================
54
55Any hcall should follow same register convention as described in section 2.2.1.1
56of "64-Bit ELF V2 ABI Specification: Power Architecture"[5]_. Table below
57summarizes these conventions:
58
59+----------+----------+-------------------------------------------+
60| Register |Volatile | Purpose |
61| Range |(Y/N) | |
62+==========+==========+===========================================+
63| r0 | Y | Optional-usage |
64+----------+----------+-------------------------------------------+
65| r1 | N | Stack Pointer |
66+----------+----------+-------------------------------------------+
67| r2 | N | TOC |
68+----------+----------+-------------------------------------------+
69| r3 | Y | hcall opcode/return value |
70+----------+----------+-------------------------------------------+
71| r4-r10 | Y | in and out values |
72+----------+----------+-------------------------------------------+
73| r11 | Y | Optional-usage/Environmental pointer |
74+----------+----------+-------------------------------------------+
75| r12 | Y | Optional-usage/Function entry address at |
76| | | global entry point |
77+----------+----------+-------------------------------------------+
78| r13 | N | Thread-Pointer |
79+----------+----------+-------------------------------------------+
80| r14-r31 | N | Local Variables |
81+----------+----------+-------------------------------------------+
82| LR | Y | Link Register |
83+----------+----------+-------------------------------------------+
84| CTR | Y | Loop Counter |
85+----------+----------+-------------------------------------------+
86| XER | Y | Fixed-point exception register. |
87+----------+----------+-------------------------------------------+
88| CR0-1 | Y | Condition register fields. |
89+----------+----------+-------------------------------------------+
90| CR2-4 | N | Condition register fields. |
91+----------+----------+-------------------------------------------+
92| CR5-7 | Y | Condition register fields. |
93+----------+----------+-------------------------------------------+
94| Others | N | |
95+----------+----------+-------------------------------------------+
96
97DRC & DRC Indexes
98=================
99::
100
101 DR1 Guest
102 +--+ +------------+ +---------+
103 | | <----> | | | User |
104 +--+ DRC1 | | DRC | Space |
105 | PAPR | Index +---------+
106 DR2 | Hypervisor | | |
107 +--+ | | <-----> | Kernel |
108 | | <----> | | Hcall | |
109 +--+ DRC2 +------------+ +---------+
110
111PAPR hypervisor terms shared hardware resources like PCI devices, NVDIMMs etc
112available for use by LPARs as Dynamic Resource (DR). When a DR is allocated to
113an LPAR, PHYP creates a data-structure called Dynamic Resource Connector (DRC)
114to manage LPAR access. An LPAR refers to a DRC via an opaque 32-bit number
115called DRC-Index. The DRC-index value is provided to the LPAR via device-tree
116where its present as an attribute in the device tree node associated with the
117DR.
118
119HCALL Return-values
120===================
121
122After servicing the hcall, hypervisor sets the return-value in *r3* indicating
123success or failure of the hcall. In case of a failure an error code indicates
124the cause for error. These codes are defined and documented in arch specific
125header [4]_.
126
127In some cases a hcall can potentially take a long time and need to be issued
128multiple times in order to be completely serviced. These hcalls will usually
129accept an opaque value *continue-token* within there argument list and a
130return value of *H_CONTINUE* indicates that hypervisor hasn't still finished
131servicing the hcall yet.
132
133To make such hcalls the guest need to set *continue-token == 0* for the
134initial call and use the hypervisor returned value of *continue-token*
135for each subsequent hcall until hypervisor returns a non *H_CONTINUE*
136return value.
137
138HCALL Op-codes
139==============
140
141Below is a partial list of HCALLs that are supported by PHYP. For the
142corresponding opcode values please look into the arch specific header [4]_:
143
144**H_SCM_READ_METADATA**
145
146| Input: *drcIndex, offset, buffer-address, numBytesToRead*
147| Out: *numBytesRead*
148| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_Hardware*
149
f8b42777 150Given a DRC Index of an NVDIMM, read N-bytes from the metadata area
58b278f5
VJ
151associated with it, at a specified offset and copy it to provided buffer.
152The metadata area stores configuration information such as label information,
153bad-blocks etc. The metadata area is located out-of-band of NVDIMM storage
154area hence a separate access semantics is provided.
155
156**H_SCM_WRITE_METADATA**
157
158| Input: *drcIndex, offset, data, numBytesToWrite*
159| Out: *None*
160| Return Value: *H_Success, H_Parameter, H_P2, H_P4, H_Hardware*
161
162Given a DRC Index of an NVDIMM, write N-bytes to the metadata area
163associated with it, at the specified offset and from the provided buffer.
164
165**H_SCM_BIND_MEM**
166
167| Input: *drcIndex, startingScmBlockIndex, numScmBlocksToBind,*
168| *targetLogicalMemoryAddress, continue-token*
169| Out: *continue-token, targetLogicalMemoryAddress, numScmBlocksToBound*
170| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_P4, H_Overlap,*
171| *H_Too_Big, H_P5, H_Busy*
172
173Given a DRC-Index of an NVDIMM, map a continuous SCM blocks range
174*(startingScmBlockIndex, startingScmBlockIndex+numScmBlocksToBind)* to the guest
175at *targetLogicalMemoryAddress* within guest physical address space. In
176case *targetLogicalMemoryAddress == 0xFFFFFFFF_FFFFFFFF* then hypervisor
177assigns a target address to the guest. The HCALL can fail if the Guest has
178an active PTE entry to the SCM block being bound.
179
180**H_SCM_UNBIND_MEM**
181| Input: drcIndex, startingScmLogicalMemoryAddress, numScmBlocksToUnbind
182| Out: numScmBlocksUnbound
183| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Overlap,*
184| *H_Busy, H_LongBusyOrder1mSec, H_LongBusyOrder10mSec*
185
186Given a DRC-Index of an NVDimm, unmap *numScmBlocksToUnbind* SCM blocks starting
187at *startingScmLogicalMemoryAddress* from guest physical address space. The
188HCALL can fail if the Guest has an active PTE entry to the SCM block being
189unbound.
190
191**H_SCM_QUERY_BLOCK_MEM_BINDING**
192
193| Input: *drcIndex, scmBlockIndex*
194| Out: *Guest-Physical-Address*
195| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound*
196
197Given a DRC-Index and an SCM Block index return the guest physical address to
198which the SCM block is mapped to.
199
200**H_SCM_QUERY_LOGICAL_MEM_BINDING**
201
202| Input: *Guest-Physical-Address*
203| Out: *drcIndex, scmBlockIndex*
204| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound*
205
206Given a guest physical address return which DRC Index and SCM block is mapped
207to that address.
208
209**H_SCM_UNBIND_ALL**
210
211| Input: *scmTargetScope, drcIndex*
212| Out: *None*
213| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Busy,*
214| *H_LongBusyOrder1mSec, H_LongBusyOrder10mSec*
215
216Depending on the Target scope unmap all SCM blocks belonging to all NVDIMMs
217or all SCM blocks belonging to a single NVDIMM identified by its drcIndex
218from the LPAR memory.
219
220**H_SCM_HEALTH**
221
222| Input: drcIndex
901e3490 223| Out: *health-bitmap (r4), health-bit-valid-bitmap (r5)*
58b278f5
VJ
224| Return Value: *H_Success, H_Parameter, H_Hardware*
225
226Given a DRC Index return the info on predictive failure and overall health of
901e3490
VJ
227the PMEM device. The asserted bits in the health-bitmap indicate one or more states
228(described in table below) of the PMEM device and health-bit-valid-bitmap indicate
229which bits in health-bitmap are valid. The bits are reported in
230reverse bit ordering for example a value of 0xC400000000000000
231indicates bits 0, 1, and 5 are valid.
232
233Health Bitmap Flags:
234
235+------+-----------------------------------------------------------------------+
236| Bit | Definition |
237+======+=======================================================================+
238| 00 | PMEM device is unable to persist memory contents. |
239| | If the system is powered down, nothing will be saved. |
240+------+-----------------------------------------------------------------------+
241| 01 | PMEM device failed to persist memory contents. Either contents were |
242| | not saved successfully on power down or were not restored properly on |
243| | power up. |
244+------+-----------------------------------------------------------------------+
245| 02 | PMEM device contents are persisted from previous IPL. The data from |
246| | the last boot were successfully restored. |
247+------+-----------------------------------------------------------------------+
248| 03 | PMEM device contents are not persisted from previous IPL. There was no|
249| | data to restore from the last boot. |
250+------+-----------------------------------------------------------------------+
251| 04 | PMEM device memory life remaining is critically low |
252+------+-----------------------------------------------------------------------+
253| 05 | PMEM device will be garded off next IPL due to failure |
254+------+-----------------------------------------------------------------------+
255| 06 | PMEM device contents cannot persist due to current platform health |
256| | status. A hardware failure may prevent data from being saved or |
257| | restored. |
258+------+-----------------------------------------------------------------------+
259| 07 | PMEM device is unable to persist memory contents in certain conditions|
260+------+-----------------------------------------------------------------------+
261| 08 | PMEM device is encrypted |
262+------+-----------------------------------------------------------------------+
263| 09 | PMEM device has successfully completed a requested erase or secure |
264| | erase procedure. |
265+------+-----------------------------------------------------------------------+
266|10:63 | Reserved / Unused |
267+------+-----------------------------------------------------------------------+
58b278f5
VJ
268
269**H_SCM_PERFORMANCE_STATS**
270
271| Input: drcIndex, resultBuffer Addr
272| Out: None
273| Return Value: *H_Success, H_Parameter, H_Unsupported, H_Hardware, H_Authority, H_Privilege*
274
275Given a DRC Index collect the performance statistics for NVDIMM and copy them
276to the resultBuffer.
277
75b7c05e
SB
278**H_SCM_FLUSH**
279
280| Input: *drcIndex, continue-token*
281| Out: *continue-token*
282| Return Value: *H_SUCCESS, H_Parameter, H_P2, H_BUSY*
283
284Given a DRC Index Flush the data to backend NVDIMM device.
285
286The hcall returns H_BUSY when the flush takes longer time and the hcall needs
287to be issued multiple times in order to be completely serviced. The
288*continue-token* from the output to be passed in the argument list of
289subsequent hcalls to the hypervisor until the hcall is completely serviced
290at which point H_SUCCESS or other error is returned by the hypervisor.
291
58b278f5
VJ
292References
293==========
294.. [1] "Power Architecture Platform Reference"
295 https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
296.. [2] "Linux on Power Architecture Platform Reference"
297 https://members.openpowerfoundation.org/document/dl/469
298.. [3] "Definitions and Notation" Book III-Section 14.5.3
299 https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0
300.. [4] arch/powerpc/include/asm/hvcall.h
301.. [5] "64-Bit ELF V2 ABI Specification: Power Architecture"
302 https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture