Commit | Line | Data |
---|---|---|
7fe0e38b DS |
1 | .. SPDX-License-Identifier: GPL-2.0 |
2 | ||
3 | ======================================================== | |
4 | TCP Authentication Option Linux implementation (RFC5925) | |
5 | ======================================================== | |
6 | ||
7 | TCP Authentication Option (TCP-AO) provides a TCP extension aimed at verifying | |
8 | segments between trusted peers. It adds a new TCP header option with | |
9 | a Message Authentication Code (MAC). MACs are produced from the content | |
10 | of a TCP segment using a hashing function with a password known to both peers. | |
11 | The intent of TCP-AO is to deprecate TCP-MD5 providing better security, | |
12 | key rotation and support for variety of hashing algorithms. | |
13 | ||
14 | 1. Introduction | |
15 | =============== | |
16 | ||
17 | .. table:: Short and Limited Comparison of TCP-AO and TCP-MD5 | |
18 | ||
19 | +----------------------+------------------------+-----------------------+ | |
20 | | | TCP-MD5 | TCP-AO | | |
21 | +======================+========================+=======================+ | |
22 | |Supported hashing |MD5 |Must support HMAC-SHA1 | | |
23 | |algorithms |(cryptographically weak)|(chosen-prefix attacks)| | |
24 | | | |and CMAC-AES-128 (only | | |
25 | | | |side-channel attacks). | | |
26 | | | |May support any hashing| | |
27 | | | |algorithm. | | |
28 | +----------------------+------------------------+-----------------------+ | |
29 | |Length of MACs (bytes)|16 |Typically 12-16. | | |
30 | | | |Other variants that fit| | |
31 | | | |TCP header permitted. | | |
32 | +----------------------+------------------------+-----------------------+ | |
33 | |Number of keys per |1 |Many | | |
34 | |TCP connection | | | | |
35 | +----------------------+------------------------+-----------------------+ | |
36 | |Possibility to change |Non-practical (both |Supported by protocol | | |
37 | |an active key |peers have to change | | | |
38 | | |them during MSL) | | | |
39 | +----------------------+------------------------+-----------------------+ | |
40 | |Protection against |No |Yes: ignoring them | | |
41 | |ICMP 'hard errors' | |by default on | | |
42 | | | |established connections| | |
43 | +----------------------+------------------------+-----------------------+ | |
44 | |Protection against |No |Yes: pseudo-header | | |
45 | |traffic-crossing | |includes TCP ports. | | |
46 | |attack | | | | |
47 | +----------------------+------------------------+-----------------------+ | |
48 | |Protection against |No |Sequence Number | | |
49 | |replayed TCP segments | |Extension (SNE) and | | |
50 | | | |Initial Sequence | | |
51 | | | |Numbers (ISNs) | | |
52 | +----------------------+------------------------+-----------------------+ | |
53 | |Supports |Yes |No. ISNs+SNE are needed| | |
54 | |Connectionless Resets | |to correctly sign RST. | | |
55 | +----------------------+------------------------+-----------------------+ | |
56 | |Standards |RFC 2385 |RFC 5925, RFC 5926 | | |
57 | +----------------------+------------------------+-----------------------+ | |
58 | ||
59 | ||
60 | 1.1 Frequently Asked Questions (FAQ) with references to RFC 5925 | |
61 | ---------------------------------------------------------------- | |
62 | ||
63 | Q: Can either SendID or RecvID be non-unique for the same 4-tuple | |
64 | (srcaddr, srcport, dstaddr, dstport)? | |
65 | ||
66 | A: No [3.1]:: | |
67 | ||
68 | >> The IDs of MKTs MUST NOT overlap where their TCP connection | |
69 | identifiers overlap. | |
70 | ||
71 | Q: Can Master Key Tuple (MKT) for an active connection be removed? | |
72 | ||
73 | A: No, unless it's copied to Transport Control Block (TCB) [3.1]:: | |
74 | ||
75 | It is presumed that an MKT affecting a particular connection cannot | |
76 | be destroyed during an active connection -- or, equivalently, that | |
77 | its parameters are copied to an area local to the connection (i.e., | |
78 | instantiated) and so changes would affect only new connections. | |
79 | ||
80 | Q: If an old MKT needs to be deleted, how should it be done in order | |
81 | to not remove it for an active connection? (As it can be still in use | |
82 | at any moment later) | |
83 | ||
84 | A: Not specified by RFC 5925, seems to be a problem for key management | |
85 | to ensure that no one uses such MKT before trying to remove it. | |
86 | ||
87 | Q: Can an old MKT exist forever and be used by another peer? | |
88 | ||
89 | A: It can, it's a key management task to decide when to remove an old key [6.1]:: | |
90 | ||
91 | Deciding when to start using a key is a performance issue. Deciding | |
92 | when to remove an MKT is a security issue. Invalid MKTs are expected | |
93 | to be removed. TCP-AO provides no mechanism to coordinate their removal, | |
94 | as we consider this a key management operation. | |
95 | ||
96 | also [6.1]:: | |
97 | ||
98 | The only way to avoid reuse of previously used MKTs is to remove the MKT | |
99 | when it is no longer considered permitted. | |
100 | ||
101 | Linux TCP-AO will try its best to prevent you from removing a key that's | |
102 | being used, considering it a key management failure. But sine keeping | |
103 | an outdated key may become a security issue and as a peer may | |
104 | unintentionally prevent the removal of an old key by always setting | |
105 | it as RNextKeyID - a forced key removal mechanism is provided, where | |
106 | userspace has to supply KeyID to use instead of the one that's being removed | |
107 | and the kernel will atomically delete the old key, even if the peer is | |
108 | still requesting it. There are no guarantees for force-delete as the peer | |
109 | may yet not have the new key - the TCP connection may just break. | |
110 | Alternatively, one may choose to shut down the socket. | |
111 | ||
112 | Q: What happens when a packet is received on a new connection with no known | |
113 | MKT's RecvID? | |
114 | ||
115 | A: RFC 5925 specifies that by default it is accepted with a warning logged, but | |
116 | the behaviour can be configured by the user [7.5.1.a]:: | |
117 | ||
118 | If the segment is a SYN, then this is the first segment of a new | |
119 | connection. Find the matching MKT for this segment, using the segment's | |
120 | socket pair and its TCP-AO KeyID, matched against the MKT's TCP connection | |
121 | identifier and the MKT's RecvID. | |
122 | ||
123 | i. If there is no matching MKT, remove TCP-AO from the segment. | |
124 | Proceed with further TCP handling of the segment. | |
125 | NOTE: this presumes that connections that do not match any MKT | |
126 | should be silently accepted, as noted in Section 7.3. | |
127 | ||
128 | [7.3]:: | |
129 | ||
130 | >> A TCP-AO implementation MUST allow for configuration of the behavior | |
131 | of segments with TCP-AO but that do not match an MKT. The initial default | |
132 | of this configuration SHOULD be to silently accept such connections. | |
133 | If this is not the desired case, an MKT can be included to match such | |
134 | connections, or the connection can indicate that TCP-AO is required. | |
135 | Alternately, the configuration can be changed to discard segments with | |
136 | the AO option not matching an MKT. | |
137 | ||
138 | [10.2.b]:: | |
139 | ||
140 | Connections not matching any MKT do not require TCP-AO. Further, incoming | |
141 | segments with TCP-AO are not discarded solely because they include | |
142 | the option, provided they do not match any MKT. | |
143 | ||
144 | Note that Linux TCP-AO implementation differs in this aspect. Currently, TCP-AO | |
145 | segments with unknown key signatures are discarded with warnings logged. | |
146 | ||
147 | Q: Does the RFC imply centralized kernel key management in any way? | |
148 | (i.e. that a key on all connections MUST be rotated at the same time?) | |
149 | ||
150 | A: Not specified. MKTs can be managed in userspace, the only relevant part to | |
151 | key changes is [7.3]:: | |
152 | ||
153 | >> All TCP segments MUST be checked against the set of MKTs for matching | |
154 | TCP connection identifiers. | |
155 | ||
156 | Q: What happens when RNextKeyID requested by a peer is unknown? Should | |
157 | the connection be reset? | |
158 | ||
159 | A: It should not, no action needs to be performed [7.5.2.e]:: | |
160 | ||
161 | ii. If they differ, determine whether the RNextKeyID MKT is ready. | |
162 | ||
163 | 1. If the MKT corresponding to the segment’s socket pair and RNextKeyID | |
164 | is not available, no action is required (RNextKeyID of a received | |
165 | segment needs to match the MKT’s SendID). | |
166 | ||
167 | Q: How current_key is set and when does it change? It is a user-triggered | |
168 | change, or is it by a request from the remote peer? Is it set by the user | |
169 | explicitly, or by a matching rule? | |
170 | ||
171 | A: current_key is set by RNextKeyID [6.1]:: | |
172 | ||
173 | Rnext_key is changed only by manual user intervention or MKT management | |
174 | protocol operation. It is not manipulated by TCP-AO. Current_key is updated | |
175 | by TCP-AO when processing received TCP segments as discussed in the segment | |
176 | processing description in Section 7.5. Note that the algorithm allows | |
177 | the current_key to change to a new MKT, then change back to a previously | |
178 | used MKT (known as "backing up"). This can occur during an MKT change when | |
179 | segments are received out of order, and is considered a feature of TCP-AO, | |
180 | because reordering does not result in drops. | |
181 | ||
182 | [7.5.2.e.ii]:: | |
183 | ||
184 | 2. If the matching MKT corresponding to the segment’s socket pair and | |
185 | RNextKeyID is available: | |
186 | ||
187 | a. Set current_key to the RNextKeyID MKT. | |
188 | ||
189 | Q: If both peers have multiple MKTs matching the connection's socket pair | |
190 | (with different KeyIDs), how should the sender/receiver pick KeyID to use? | |
191 | ||
192 | A: Some mechanism should pick the "desired" MKT [3.3]:: | |
193 | ||
194 | Multiple MKTs may match a single outgoing segment, e.g., when MKTs | |
195 | are being changed. Those MKTs cannot have conflicting IDs (as noted | |
196 | elsewhere), and some mechanism must determine which MKT to use for each | |
197 | given outgoing segment. | |
198 | ||
199 | >> An outgoing TCP segment MUST match at most one desired MKT, indicated | |
200 | by the segment’s socket pair. The segment MAY match multiple MKTs, provided | |
201 | that exactly one MKT is indicated as desired. Other information in | |
202 | the segment MAY be used to determine the desired MKT when multiple MKTs | |
203 | match; such information MUST NOT include values in any TCP option fields. | |
204 | ||
205 | Q: Can TCP-MD5 connection migrate to TCP-AO (and vice-versa): | |
206 | ||
207 | A: No [1]:: | |
208 | ||
209 | TCP MD5-protected connections cannot be migrated to TCP-AO because TCP MD5 | |
210 | does not support any changes to a connection’s security algorithm | |
211 | once established. | |
212 | ||
213 | Q: If all MKTs are removed on a connection, can it become a non-TCP-AO signed | |
214 | connection? | |
215 | ||
216 | A: [7.5.2] doesn't have the same choice as SYN packet handling in [7.5.1.i] | |
217 | that would allow accepting segments without a sign (which would be insecure). | |
218 | While switching to non-TCP-AO connection is not prohibited directly, it seems | |
219 | what the RFC means. Also, there's a requirement for TCP-AO connections to | |
220 | always have one current_key [3.3]:: | |
221 | ||
222 | TCP-AO requires that every protected TCP segment match exactly one MKT. | |
223 | ||
224 | [3.3]:: | |
225 | ||
226 | >> An incoming TCP segment including TCP-AO MUST match exactly one MKT, | |
227 | indicated solely by the segment’s socket pair and its TCP-AO KeyID. | |
228 | ||
229 | [4.4]:: | |
230 | ||
231 | One or more MKTs. These are the MKTs that match this connection’s | |
232 | socket pair. | |
233 | ||
234 | Q: Can a non-TCP-AO connection become a TCP-AO-enabled one? | |
235 | ||
236 | A: No: for already established non-TCP-AO connection it would be impossible | |
237 | to switch using TCP-AO as the traffic key generation requires the initial | |
238 | sequence numbers. Paraphrasing, starting using TCP-AO would require | |
239 | re-establishing the TCP connection. | |
240 | ||
241 | 2. In-kernel MKTs database vs database in userspace | |
242 | =================================================== | |
243 | ||
244 | Linux TCP-AO support is implemented using ``setsockopt()s``, in a similar way | |
245 | to TCP-MD5. It means that a userspace application that wants to use TCP-AO | |
246 | should perform ``setsockopt()`` on a TCP socket when it wants to add, | |
247 | remove or rotate MKTs. This approach moves the key management responsibility | |
248 | to userspace as well as decisions on corner cases, i.e. what to do if | |
249 | the peer doesn't respect RNextKeyID; moving more code to userspace, especially | |
250 | responsible for the policy decisions. Besides, it's flexible and scales well | |
251 | (with less locking needed than in the case of an in-kernel database). One also | |
252 | should keep in mind that mainly intended users are BGP processes, not any | |
253 | random applications, which means that compared to IPsec tunnels, | |
254 | no transparency is really needed and modern BGP daemons already have | |
255 | ``setsockopt()s`` for TCP-MD5 support. | |
256 | ||
257 | .. table:: Considered pros and cons of the approaches | |
258 | ||
259 | +----------------------+------------------------+-----------------------+ | |
260 | | | ``setsockopt()`` | in-kernel DB | | |
261 | +======================+========================+=======================+ | |
262 | | Extendability | ``setsockopt()`` | Netlink messages are | | |
263 | | | commands should be | simple and extendable | | |
264 | | | extendable syscalls | | | |
265 | +----------------------+------------------------+-----------------------+ | |
266 | | Required userspace | BGP or any application | could be transparent | | |
267 | | changes | that wants TCP-AO needs| as tunnels, providing | | |
268 | | | to perform | something like | | |
269 | | | ``setsockopt()s`` | ``ip tcpao add key`` | | |
270 | | | and do key management | (delete/show/rotate) | | |
271 | +----------------------+------------------------+-----------------------+ | |
272 | |MKTs removal or adding| harder for userspace | harder for kernel | | |
273 | +----------------------+------------------------+-----------------------+ | |
274 | | Dump-ability | ``getsockopt()`` | Netlink .dump() | | |
275 | | | | callback | | |
276 | +----------------------+------------------------+-----------------------+ | |
277 | | Limits on kernel | equal | | |
278 | | resources/memory | | | |
279 | +----------------------+------------------------+-----------------------+ | |
280 | | Scalability | contention on | contention on | | |
281 | | | ``TCP_LISTEN`` sockets | the whole database | | |
282 | +----------------------+------------------------+-----------------------+ | |
283 | | Monitoring & warnings| ``TCP_DIAG`` | same Netlink socket | | |
284 | +----------------------+------------------------+-----------------------+ | |
285 | | Matching of MKTs | half-problem: only | hard | | |
286 | | | listen sockets | | | |
287 | +----------------------+------------------------+-----------------------+ | |
288 | ||
289 | ||
290 | 3. uAPI | |
291 | ======= | |
292 | ||
293 | Linux provides a set of ``setsockopt()s`` and ``getsockopt()s`` that let | |
294 | userspace manage TCP-AO on a per-socket basis. In order to add/delete MKTs | |
295 | ``TCP_AO_ADD_KEY`` and ``TCP_AO_DEL_KEY`` TCP socket options must be used | |
296 | It is not allowed to add a key on an established non-TCP-AO connection | |
297 | as well as to remove the last key from TCP-AO connection. | |
298 | ||
299 | ``setsockopt(TCP_AO_DEL_KEY)`` command may specify ``tcp_ao_del::current_key`` | |
300 | + ``tcp_ao_del::set_current`` and/or ``tcp_ao_del::rnext`` | |
301 | + ``tcp_ao_del::set_rnext`` which makes such delete "forced": it | |
302 | provides userspace a way to delete a key that's being used and atomically set | |
303 | another one instead. This is not intended for normal use and should be used | |
304 | only when the peer ignores RNextKeyID and keeps requesting/using an old key. | |
305 | It provides a way to force-delete a key that's not trusted but may break | |
306 | the TCP-AO connection. | |
307 | ||
308 | The usual/normal key-rotation can be performed with ``setsockopt(TCP_AO_INFO)``. | |
309 | It also provides a uAPI to change per-socket TCP-AO settings, such as | |
310 | ignoring ICMPs, as well as clear per-socket TCP-AO packet counters. | |
311 | The corresponding ``getsockopt(TCP_AO_INFO)`` can be used to get those | |
312 | per-socket TCP-AO settings. | |
313 | ||
314 | Another useful command is ``getsockopt(TCP_AO_GET_KEYS)``. One can use it | |
315 | to list all MKTs on a TCP socket or use a filter to get keys for a specific | |
316 | peer and/or sndid/rcvid, VRF L3 interface or get current_key/rnext_key. | |
317 | ||
318 | To repair TCP-AO connections ``setsockopt(TCP_AO_REPAIR)`` is available, | |
319 | provided that the user previously has checkpointed/dumped the socket with | |
320 | ``getsockopt(TCP_AO_REPAIR)``. | |
321 | ||
322 | A tip here for scaled TCP_LISTEN sockets, that may have some thousands TCP-AO | |
323 | keys, is: use filters in ``getsockopt(TCP_AO_GET_KEYS)`` and asynchronous | |
324 | delete with ``setsockopt(TCP_AO_DEL_KEY)``. | |
325 | ||
326 | Linux TCP-AO also provides a bunch of segment counters that can be helpful | |
327 | with troubleshooting/debugging issues. Every MKT has good/bad counters | |
328 | that reflect how many packets passed/failed verification. | |
329 | Each TCP-AO socket has the following counters: | |
330 | - for good segments (properly signed) | |
331 | - for bad segments (failed TCP-AO verification) | |
332 | - for segments with unknown keys | |
333 | - for segments where an AO signature was expected, but wasn't found | |
334 | - for the number of ignored ICMPs | |
335 | ||
336 | TCP-AO per-socket counters are also duplicated with per-netns counters, | |
337 | exposed with SNMP. Those are ``TCPAOGood``, ``TCPAOBad``, ``TCPAOKeyNotFound``, | |
338 | ``TCPAORequired`` and ``TCPAODroppedIcmps``. | |
339 | ||
340 | RFC 5925 very permissively specifies how TCP port matching can be done for | |
341 | MKTs:: | |
342 | ||
343 | TCP connection identifier. A TCP socket pair, i.e., a local IP | |
344 | address, a remote IP address, a TCP local port, and a TCP remote port. | |
345 | Values can be partially specified using ranges (e.g., 2-30), masks | |
346 | (e.g., 0xF0), wildcards (e.g., "*"), or any other suitable indication. | |
347 | ||
348 | Currently Linux TCP-AO implementation doesn't provide any TCP port matching. | |
349 | Probably, port ranges are the most flexible for uAPI, but so far | |
350 | not implemented. | |
351 | ||
352 | 4. ``setsockopt()`` vs ``accept()`` race | |
353 | ======================================== | |
354 | ||
355 | In contrast with TCP-MD5 established connection which has just one key, | |
356 | TCP-AO connections may have many keys, which means that accepted connections | |
357 | on a listen socket may have any amount of keys as well. As copying all those | |
358 | keys on a first properly signed SYN would make the request socket bigger, that | |
359 | would be undesirable. Currently, the implementation doesn't copy keys | |
360 | to request sockets, but rather look them up on the "parent" listener socket. | |
361 | ||
362 | The result is that when userspace removes TCP-AO keys, that may break | |
363 | not-yet-established connections on request sockets as well as not removing | |
364 | keys from sockets that were already established, but not yet ``accept()``'ed, | |
365 | hanging in the accept queue. | |
366 | ||
367 | The reverse is valid as well: if userspace adds a new key for a peer on | |
368 | a listener socket, the established sockets in accept queue won't | |
369 | have the new keys. | |
370 | ||
371 | At this moment, the resolution for the two races: | |
372 | ``setsockopt(TCP_AO_ADD_KEY)`` vs ``accept()`` | |
373 | and ``setsockopt(TCP_AO_DEL_KEY)`` vs ``accept()`` is delegated to userspace. | |
374 | This means that it's expected that userspace would check the MKTs on the socket | |
375 | that was returned by ``accept()`` to verify that any key rotation that | |
376 | happened on listen socket is reflected on the newly established connection. | |
377 | ||
378 | This is a similar "do-nothing" approach to TCP-MD5 from the kernel side and | |
379 | may be changed later by introducing new flags to ``tcp_ao_add`` | |
380 | and ``tcp_ao_del``. | |
381 | ||
382 | Note that this race is rare for it needs TCP-AO key rotation to happen | |
383 | during the 3-way handshake for the new TCP connection. | |
384 | ||
385 | 5. Interaction with TCP-MD5 | |
386 | =========================== | |
387 | ||
388 | A TCP connection can not migrate between TCP-AO and TCP-MD5 options. The | |
389 | established sockets that have either AO or MD5 keys are restricted for | |
390 | adding keys of the other option. | |
391 | ||
392 | For listening sockets the picture is different: BGP server may want to receive | |
393 | both TCP-AO and (deprecated) TCP-MD5 clients. As a result, both types of keys | |
394 | may be added to TCP_CLOSED or TCP_LISTEN sockets. It's not allowed to add | |
395 | different types of keys for the same peer. | |
396 | ||
397 | 6. SNE Linux implementation | |
398 | =========================== | |
399 | ||
400 | RFC 5925 [6.2] describes the algorithm of how to extend TCP sequence numbers | |
401 | with SNE. In short: TCP has to track the previous sequence numbers and set | |
402 | sne_flag when the current SEQ number rolls over. The flag is cleared when | |
403 | both current and previous SEQ numbers cross 0x7fff, which is 32Kb. | |
404 | ||
405 | In times when sne_flag is set, the algorithm compares SEQ for each packet with | |
406 | 0x7fff and if it's higher than 32Kb, it assumes that the packet should be | |
407 | verified with SNE before the increment. As a result, there's | |
408 | this [0; 32Kb] window, when packets with (SNE - 1) can be accepted. | |
409 | ||
410 | Linux implementation simplifies this a bit: as the network stack already tracks | |
411 | the first SEQ byte that ACK is wanted for (snd_una) and the next SEQ byte that | |
412 | is wanted (rcv_nxt) - that's enough information for a rough estimation | |
413 | on where in the 4GB SEQ number space both sender and receiver are. | |
414 | When they roll over to zero, the corresponding SNE gets incremented. | |
415 | ||
416 | tcp_ao_compute_sne() is called for each TCP-AO segment. It compares SEQ numbers | |
417 | from the segment with snd_una or rcv_nxt and fits the result into a 2GB window around them, | |
418 | detecting SEQ numbers rolling over. That simplifies the code a lot and only | |
419 | requires SNE numbers to be stored on every TCP-AO socket. | |
420 | ||
421 | The 2GB window at first glance seems much more permissive compared to | |
422 | RFC 5926. But that is only used to pick the correct SNE before/after | |
423 | a rollover. It allows more TCP segment replays, but yet all regular | |
424 | TCP checks in tcp_sequence() are applied on the verified segment. | |
425 | So, it trades a bit more permissive acceptance of replayed/retransmitted | |
426 | segments for the simplicity of the algorithm and what seems better behaviour | |
427 | for large TCP windows. | |
428 | ||
429 | 7. Links | |
430 | ======== | |
431 | ||
432 | RFC 5925 The TCP Authentication Option | |
433 | https://www.rfc-editor.org/rfc/pdfrfc/rfc5925.txt.pdf | |
434 | ||
435 | RFC 5926 Cryptographic Algorithms for the TCP Authentication Option (TCP-AO) | |
436 | https://www.rfc-editor.org/rfc/pdfrfc/rfc5926.txt.pdf | |
437 | ||
438 | Draft "SHA-2 Algorithm for the TCP Authentication Option (TCP-AO)" | |
439 | https://datatracker.ietf.org/doc/html/draft-nayak-tcp-sha2-03 | |
440 | ||
441 | RFC 2385 Protection of BGP Sessions via the TCP MD5 Signature Option | |
442 | https://www.rfc-editor.org/rfc/pdfrfc/rfc2385.txt.pdf | |
443 | ||
444 | :Author: Dmitry Safonov <dima@arista.com> |