Commit | Line | Data |
---|---|---|
6a2d98b1 JK |
1 | .. SPDX-License-Identifier: GPL-2.0 |
2 | ||
3 | ============================================== | |
4 | Management Component Transport Protocol (MCTP) | |
5 | ============================================== | |
6 | ||
7 | net/mctp/ contains protocol support for MCTP, as defined by DMTF standard | |
8 | DSP0236. Physical interface drivers ("bindings" in the specification) are | |
9 | provided in drivers/net/mctp/. | |
10 | ||
11 | The core code provides a socket-based interface to send and receive MCTP | |
12 | messages, through an AF_MCTP, SOCK_DGRAM socket. | |
13 | ||
14 | Structure: interfaces & networks | |
15 | ================================ | |
16 | ||
17 | The kernel models the local MCTP topology through two items: interfaces and | |
18 | networks. | |
19 | ||
20 | An interface (or "link") is an instance of an MCTP physical transport binding | |
21 | (as defined by DSP0236, section 3.2.47), likely connected to a specific hardware | |
22 | device. This is represented as a ``struct netdevice``. | |
23 | ||
24 | A network defines a unique address space for MCTP endpoints by endpoint-ID | |
25 | (described by DSP0236, section 3.2.31). A network has a user-visible identifier | |
26 | to allow references from userspace. Route definitions are specific to one | |
27 | network. | |
28 | ||
29 | Interfaces are associated with one network. A network may be associated with one | |
30 | or more interfaces. | |
31 | ||
32 | If multiple networks are present, each may contain endpoint IDs (EIDs) that are | |
33 | also present on other networks. | |
34 | ||
35 | Sockets API | |
36 | =========== | |
37 | ||
38 | Protocol definitions | |
39 | -------------------- | |
40 | ||
41 | MCTP uses ``AF_MCTP`` / ``PF_MCTP`` for the address- and protocol- families. | |
42 | Since MCTP is message-based, only ``SOCK_DGRAM`` sockets are supported. | |
43 | ||
44 | .. code-block:: C | |
45 | ||
46 | int sd = socket(AF_MCTP, SOCK_DGRAM, 0); | |
47 | ||
48 | The only (current) value for the ``protocol`` argument is 0. | |
49 | ||
50 | As with all socket address families, source and destination addresses are | |
51 | specified with a ``sockaddr`` type, with a single-byte endpoint address: | |
52 | ||
53 | .. code-block:: C | |
54 | ||
55 | typedef __u8 mctp_eid_t; | |
56 | ||
57 | struct mctp_addr { | |
58 | mctp_eid_t s_addr; | |
59 | }; | |
60 | ||
61 | struct sockaddr_mctp { | |
b416beb2 JK |
62 | __kernel_sa_family_t smctp_family; |
63 | unsigned int smctp_network; | |
64 | struct mctp_addr smctp_addr; | |
65 | __u8 smctp_type; | |
66 | __u8 smctp_tag; | |
6a2d98b1 JK |
67 | }; |
68 | ||
69 | #define MCTP_NET_ANY 0x0 | |
70 | #define MCTP_ADDR_ANY 0xff | |
71 | ||
72 | ||
73 | Syscall behaviour | |
74 | ----------------- | |
75 | ||
76 | The following sections describe the MCTP-specific behaviours of the standard | |
77 | socket system calls. These behaviours have been chosen to map closely to the | |
78 | existing sockets APIs. | |
79 | ||
80 | ``bind()`` : set local socket address | |
81 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
82 | ||
83 | Sockets that receive incoming request packets will bind to a local address, | |
84 | using the ``bind()`` syscall. | |
85 | ||
86 | .. code-block:: C | |
87 | ||
88 | struct sockaddr_mctp addr; | |
89 | ||
90 | addr.smctp_family = AF_MCTP; | |
91 | addr.smctp_network = MCTP_NET_ANY; | |
92 | addr.smctp_addr.s_addr = MCTP_ADDR_ANY; | |
93 | addr.smctp_type = MCTP_TYPE_PLDM; | |
94 | addr.smctp_tag = MCTP_TAG_OWNER; | |
95 | ||
96 | int rc = bind(sd, (struct sockaddr *)&addr, sizeof(addr)); | |
97 | ||
98 | This establishes the local address of the socket. Incoming MCTP messages that | |
99 | match the network, address, and message type will be received by this socket. | |
100 | The reference to 'incoming' is important here; a bound socket will only receive | |
101 | messages with the TO bit set, to indicate an incoming request message, rather | |
102 | than a response. | |
103 | ||
104 | The ``smctp_tag`` value will configure the tags accepted from the remote side of | |
105 | this socket. Given the above, the only valid value is ``MCTP_TAG_OWNER``, which | |
106 | will result in remotely "owned" tags being routed to this socket. Since | |
107 | ``MCTP_TAG_OWNER`` is set, the 3 least-significant bits of ``smctp_tag`` are not | |
108 | used; callers must set them to zero. | |
109 | ||
110 | A ``smctp_network`` value of ``MCTP_NET_ANY`` will configure the socket to | |
111 | receive incoming packets from any locally-connected network. A specific network | |
112 | value will cause the socket to only receive incoming messages from that network. | |
113 | ||
114 | The ``smctp_addr`` field specifies a local address to bind to. A value of | |
115 | ``MCTP_ADDR_ANY`` configures the socket to receive messages addressed to any | |
116 | local destination EID. | |
117 | ||
118 | The ``smctp_type`` field specifies which message types to receive. Only the | |
119 | lower 7 bits of the type is matched on incoming messages (ie., the | |
120 | most-significant IC bit is not part of the match). This results in the socket | |
121 | receiving packets with and without a message integrity check footer. | |
122 | ||
123 | ``sendto()``, ``sendmsg()``, ``send()`` : transmit an MCTP message | |
124 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
125 | ||
126 | An MCTP message is transmitted using one of the ``sendto()``, ``sendmsg()`` or | |
127 | ``send()`` syscalls. Using ``sendto()`` as the primary example: | |
128 | ||
129 | .. code-block:: C | |
130 | ||
131 | struct sockaddr_mctp addr; | |
132 | char buf[14]; | |
133 | ssize_t len; | |
134 | ||
135 | /* set message destination */ | |
136 | addr.smctp_family = AF_MCTP; | |
137 | addr.smctp_network = 0; | |
138 | addr.smctp_addr.s_addr = 8; | |
139 | addr.smctp_tag = MCTP_TAG_OWNER; | |
140 | addr.smctp_type = MCTP_TYPE_ECHO; | |
141 | ||
142 | /* arbitrary message to send, with message-type header */ | |
143 | buf[0] = MCTP_TYPE_ECHO; | |
144 | memcpy(buf + 1, "hello, world!", sizeof(buf) - 1); | |
145 | ||
146 | len = sendto(sd, buf, sizeof(buf), 0, | |
147 | (struct sockaddr_mctp *)&addr, sizeof(addr)); | |
148 | ||
149 | The network and address fields of ``addr`` define the remote address to send to. | |
150 | If ``smctp_tag`` has the ``MCTP_TAG_OWNER``, the kernel will ignore any bits set | |
151 | in ``MCTP_TAG_VALUE``, and generate a tag value suitable for the destination | |
152 | EID. If ``MCTP_TAG_OWNER`` is not set, the message will be sent with the tag | |
153 | value as specified. If a tag value cannot be allocated, the system call will | |
154 | report an errno of ``EAGAIN``. | |
155 | ||
156 | The application must provide the message type byte as the first byte of the | |
157 | message buffer passed to ``sendto()``. If a message integrity check is to be | |
158 | included in the transmitted message, it must also be provided in the message | |
159 | buffer, and the most-significant bit of the message type byte must be 1. | |
160 | ||
161 | The ``sendmsg()`` system call allows a more compact argument interface, and the | |
162 | message buffer to be specified as a scatter-gather list. At present no ancillary | |
163 | message types (used for the ``msg_control`` data passed to ``sendmsg()``) are | |
164 | defined. | |
165 | ||
166 | Transmitting a message on an unconnected socket with ``MCTP_TAG_OWNER`` | |
167 | specified will cause an allocation of a tag, if no valid tag is already | |
168 | allocated for that destination. The (destination-eid,tag) tuple acts as an | |
169 | implicit local socket address, to allow the socket to receive responses to this | |
170 | outgoing message. If any previous allocation has been performed (to for a | |
171 | different remote EID), that allocation is lost. | |
172 | ||
173 | Sockets will only receive responses to requests they have sent (with TO=1) and | |
174 | may only respond (with TO=0) to requests they have received. | |
175 | ||
176 | ``recvfrom()``, ``recvmsg()``, ``recv()`` : receive an MCTP message | |
177 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
178 | ||
179 | An MCTP message can be received by an application using one of the | |
180 | ``recvfrom()``, ``recvmsg()``, or ``recv()`` system calls. Using ``recvfrom()`` | |
181 | as the primary example: | |
182 | ||
183 | .. code-block:: C | |
184 | ||
185 | struct sockaddr_mctp addr; | |
186 | socklen_t addrlen; | |
187 | char buf[14]; | |
188 | ssize_t len; | |
189 | ||
190 | addrlen = sizeof(addr); | |
191 | ||
192 | len = recvfrom(sd, buf, sizeof(buf), 0, | |
193 | (struct sockaddr_mctp *)&addr, &addrlen); | |
194 | ||
195 | /* We can expect addr to describe an MCTP address */ | |
196 | assert(addrlen >= sizeof(buf)); | |
197 | assert(addr.smctp_family == AF_MCTP); | |
198 | ||
199 | printf("received %zd bytes from remote EID %d\n", rc, addr.smctp_addr); | |
200 | ||
201 | The address argument to ``recvfrom`` and ``recvmsg`` is populated with the | |
202 | remote address of the incoming message, including tag value (this will be needed | |
203 | in order to reply to the message). | |
204 | ||
205 | The first byte of the message buffer will contain the message type byte. If an | |
206 | integrity check follows the message, it will be included in the received buffer. | |
207 | ||
208 | The ``recv()`` system call behaves in a similar way, but does not provide a | |
209 | remote address to the application. Therefore, these are only useful if the | |
210 | remote address is already known, or the message does not require a reply. | |
211 | ||
212 | Like the send calls, sockets will only receive responses to requests they have | |
213 | sent (TO=1) and may only respond (TO=0) to requests they have received. | |
f4d41c59 | 214 | |
63ed1aab MJ |
215 | ``ioctl(SIOCMCTPALLOCTAG)`` and ``ioctl(SIOCMCTPDROPTAG)`` |
216 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
217 | ||
218 | These tags give applications more control over MCTP message tags, by allocating | |
219 | (and dropping) tag values explicitly, rather than the kernel automatically | |
220 | allocating a per-message tag at ``sendmsg()`` time. | |
221 | ||
222 | In general, you will only need to use these ioctls if your MCTP protocol does | |
223 | not fit the usual request/response model. For example, if you need to persist | |
224 | tags across multiple requests, or a request may generate more than one response. | |
225 | In these cases, the ioctls allow you to decouple the tag allocation (and | |
226 | release) from individual message send and receive operations. | |
227 | ||
228 | Both ioctls are passed a pointer to a ``struct mctp_ioc_tag_ctl``: | |
229 | ||
230 | .. code-block:: C | |
231 | ||
232 | struct mctp_ioc_tag_ctl { | |
233 | mctp_eid_t peer_addr; | |
234 | __u8 tag; | |
235 | __u16 flags; | |
236 | }; | |
237 | ||
238 | ``SIOCMCTPALLOCTAG`` allocates a tag for a specific peer, which an application | |
239 | can use in future ``sendmsg()`` calls. The application populates the | |
240 | ``peer_addr`` member with the remote EID. Other fields must be zero. | |
241 | ||
242 | On return, the ``tag`` member will be populated with the allocated tag value. | |
243 | The allocated tag will have the following tag bits set: | |
244 | ||
245 | - ``MCTP_TAG_OWNER``: it only makes sense to allocate tags if you're the tag | |
246 | owner | |
247 | ||
248 | - ``MCTP_TAG_PREALLOC``: to indicate to ``sendmsg()`` that this is a | |
249 | preallocated tag. | |
250 | ||
251 | - ... and the actual tag value, within the least-significant three bits | |
252 | (``MCTP_TAG_MASK``). Note that zero is a valid tag value. | |
253 | ||
254 | The tag value should be used as-is for the ``smctp_tag`` member of ``struct | |
255 | sockaddr_mctp``. | |
256 | ||
257 | ``SIOCMCTPDROPTAG`` releases a tag that has been previously allocated by a | |
258 | ``SIOCMCTPALLOCTAG`` ioctl. The ``peer_addr`` must be the same as used for the | |
259 | allocation, and the ``tag`` value must match exactly the tag returned from the | |
260 | allocation (including the ``MCTP_TAG_OWNER`` and ``MCTP_TAG_PREALLOC`` bits). | |
261 | The ``flags`` field must be zero. | |
262 | ||
f4d41c59 JK |
263 | Kernel internals |
264 | ================ | |
265 | ||
266 | There are a few possible packet flows in the MCTP stack: | |
267 | ||
268 | 1. local TX to remote endpoint, message <= MTU:: | |
269 | ||
270 | sendmsg() | |
271 | -> mctp_local_output() | |
272 | : route lookup | |
273 | -> rt->output() (== mctp_route_output) | |
274 | -> dev_queue_xmit() | |
275 | ||
276 | 2. local TX to remote endpoint, message > MTU:: | |
277 | ||
278 | sendmsg() | |
279 | -> mctp_local_output() | |
280 | -> mctp_do_fragment_route() | |
281 | : creates packet-sized skbs. For each new skb: | |
282 | -> rt->output() (== mctp_route_output) | |
283 | -> dev_queue_xmit() | |
284 | ||
285 | 3. remote TX to local endpoint, single-packet message:: | |
286 | ||
287 | mctp_pkttype_receive() | |
288 | : route lookup | |
289 | -> rt->output() (== mctp_route_input) | |
290 | : sk_key lookup | |
291 | -> sock_queue_rcv_skb() | |
292 | ||
293 | 4. remote TX to local endpoint, multiple-packet message:: | |
294 | ||
295 | mctp_pkttype_receive() | |
296 | : route lookup | |
297 | -> rt->output() (== mctp_route_input) | |
298 | : sk_key lookup | |
299 | : stores skb in struct sk_key->reasm_head | |
300 | ||
301 | mctp_pkttype_receive() | |
302 | : route lookup | |
303 | -> rt->output() (== mctp_route_input) | |
304 | : sk_key lookup | |
305 | : finds existing reassembly in sk_key->reasm_head | |
306 | : appends new fragment | |
307 | -> sock_queue_rcv_skb() | |
308 | ||
309 | Key refcounts | |
310 | ------------- | |
311 | ||
312 | * keys are refed by: | |
313 | ||
314 | - a skb: during route output, stored in ``skb->cb``. | |
315 | ||
316 | - netns and sock lists. | |
317 | ||
318 | * keys can be associated with a device, in which case they hold a | |
319 | reference to the dev (set through ``key->dev``, counted through | |
320 | ``dev->key_count``). Multiple keys can reference the device. |