Commit | Line | Data |
---|---|---|
97162a1e MCC |
1 | ====================== |
2 | Userspace verbs access | |
3 | ====================== | |
6f50142e RD |
4 | |
5 | The ib_uverbs module, built by enabling CONFIG_INFINIBAND_USER_VERBS, | |
6 | enables direct userspace access to IB hardware via "verbs," as | |
7 | described in chapter 11 of the InfiniBand Architecture Specification. | |
8 | ||
9 | To use the verbs, the libibverbs library, available from | |
46adb179 | 10 | https://github.com/linux-rdma/rdma-core, is required. libibverbs contains a |
6f50142e RD |
11 | device-independent API for using the ib_uverbs interface. |
12 | libibverbs also requires appropriate device-dependent kernel and | |
13 | userspace driver for your InfiniBand hardware. For example, to use | |
14 | a Mellanox HCA, you will need the ib_mthca kernel module and the | |
15 | libmthca userspace driver be installed. | |
16 | ||
17 | User-kernel communication | |
97162a1e | 18 | ========================= |
6f50142e RD |
19 | |
20 | Userspace communicates with the kernel for slow path, resource | |
21 | management operations via the /dev/infiniband/uverbsN character | |
22 | devices. Fast path operations are typically performed by writing | |
23 | directly to hardware registers mmap()ed into userspace, with no | |
24 | system call or context switch into the kernel. | |
25 | ||
26 | Commands are sent to the kernel via write()s on these device files. | |
27 | The ABI is defined in drivers/infiniband/include/ib_user_verbs.h. | |
28 | The structs for commands that require a response from the kernel | |
29 | contain a 64-bit field used to pass a pointer to an output buffer. | |
30 | Status is returned to userspace as the return value of the write() | |
31 | system call. | |
32 | ||
33 | Resource management | |
97162a1e | 34 | =================== |
6f50142e RD |
35 | |
36 | Since creation and destruction of all IB resources is done by | |
37 | commands passed through a file descriptor, the kernel can keep track | |
38 | of which resources are attached to a given userspace context. The | |
39 | ib_uverbs module maintains idr tables that are used to translate | |
40 | between kernel pointers and opaque userspace handles, so that kernel | |
41 | pointers are never exposed to userspace and userspace cannot trick | |
42 | the kernel into following a bogus pointer. | |
43 | ||
44 | This also allows the kernel to clean up when a process exits and | |
45 | prevent one process from touching another process's resources. | |
46 | ||
47 | Memory pinning | |
97162a1e | 48 | ============== |
6f50142e RD |
49 | |
50 | Direct userspace I/O requires that memory regions that are potential | |
51 | I/O targets be kept resident at the same physical address. The | |
52 | ib_uverbs module manages pinning and unpinning memory regions via | |
53 | get_user_pages() and put_page() calls. It also accounts for the | |
1a7a05e8 | 54 | amount of memory pinned in the process's pinned_vm, and checks that |
6f50142e RD |
55 | unprivileged processes do not exceed their RLIMIT_MEMLOCK limit. |
56 | ||
57 | Pages that are pinned multiple times are counted each time they are | |
1a7a05e8 | 58 | pinned, so the value of pinned_vm may be an overestimate of the |
6f50142e RD |
59 | number of pages pinned by a process. |
60 | ||
61 | /dev files | |
97162a1e | 62 | ========== |
6f50142e RD |
63 | |
64 | To create the appropriate character device files automatically with | |
97162a1e | 65 | udev, a rule like:: |
6f50142e | 66 | |
aa07a994 | 67 | KERNEL=="uverbs*", NAME="infiniband/%k" |
6f50142e | 68 | |
97162a1e | 69 | can be used. This will create device nodes named:: |
6f50142e RD |
70 | |
71 | /dev/infiniband/uverbs0 | |
72 | ||
73 | and so on. Since the InfiniBand userspace verbs should be safe for | |
74 | use by non-privileged processes, it may be useful to add an | |
75 | appropriate MODE or GROUP to the udev rule. |