Merge tag 'tpmdd-next-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko...
[linux-2.6-block.git] / Documentation / networking / tls.rst
CommitLineData
f42c104f
JK
1.. _kernel_tls:
2
f3c0f3c6
JK
3==========
4Kernel TLS
5==========
6
99c195fb
DW
7Overview
8========
9
10Transport Layer Security (TLS) is a Upper Layer Protocol (ULP) that runs over
11TCP. TLS provides end-to-end data integrity and confidentiality.
12
13User interface
14==============
15
16Creating a TLS connection
17-------------------------
18
19First create a new TCP socket and set the TLS ULP.
20
f3c0f3c6
JK
21.. code-block:: c
22
99c195fb
DW
23 sock = socket(AF_INET, SOCK_STREAM, 0);
24 setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));
25
26Setting the TLS ULP allows us to set/get TLS socket options. Currently
27only the symmetric encryption is handled in the kernel. After the TLS
28handshake is complete, we have all the parameters required to move the
29data-path to the kernel. There is a separate socket option for moving
30the transmit and the receive into the kernel.
31
f3c0f3c6
JK
32.. code-block:: c
33
99c195fb
DW
34 /* From linux/tls.h */
35 struct tls_crypto_info {
36 unsigned short version;
37 unsigned short cipher_type;
38 };
39
40 struct tls12_crypto_info_aes_gcm_128 {
41 struct tls_crypto_info info;
42 unsigned char iv[TLS_CIPHER_AES_GCM_128_IV_SIZE];
43 unsigned char key[TLS_CIPHER_AES_GCM_128_KEY_SIZE];
44 unsigned char salt[TLS_CIPHER_AES_GCM_128_SALT_SIZE];
45 unsigned char rec_seq[TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE];
46 };
47
48
49 struct tls12_crypto_info_aes_gcm_128 crypto_info;
50
51 crypto_info.info.version = TLS_1_2_VERSION;
52 crypto_info.info.cipher_type = TLS_CIPHER_AES_GCM_128;
53 memcpy(crypto_info.iv, iv_write, TLS_CIPHER_AES_GCM_128_IV_SIZE);
54 memcpy(crypto_info.rec_seq, seq_number_write,
55 TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE);
56 memcpy(crypto_info.key, cipher_key_write, TLS_CIPHER_AES_GCM_128_KEY_SIZE);
57 memcpy(crypto_info.salt, implicit_iv_write, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
58
59 setsockopt(sock, SOL_TLS, TLS_TX, &crypto_info, sizeof(crypto_info));
60
b6c535b1
DW
61Transmit and receive are set separately, but the setup is the same, using either
62TLS_TX or TLS_RX.
63
99c195fb
DW
64Sending TLS application data
65----------------------------
66
67After setting the TLS_TX socket option all application data sent over this
68socket is encrypted using TLS and the parameters provided in the socket option.
69For example, we can send an encrypted hello world record as follows:
70
f3c0f3c6
JK
71.. code-block:: c
72
99c195fb
DW
73 const char *msg = "hello world\n";
74 send(sock, msg, strlen(msg));
75
76send() data is directly encrypted from the userspace buffer provided
77to the encrypted kernel send buffer if possible.
78
79The sendfile system call will send the file's data over TLS records of maximum
80length (2^14).
81
f3c0f3c6
JK
82.. code-block:: c
83
99c195fb
DW
84 file = open(filename, O_RDONLY);
85 fstat(file, &stat);
86 sendfile(sock, file, &offset, stat.st_size);
87
88TLS records are created and sent after each send() call, unless
89MSG_MORE is passed. MSG_MORE will delay creation of a record until
90MSG_MORE is not passed, or the maximum record size is reached.
91
92The kernel will need to allocate a buffer for the encrypted data.
93This buffer is allocated at the time send() is called, such that
94either the entire send() call will return -ENOMEM (or block waiting
95for memory), or the encryption will always succeed. If send() returns
96-ENOMEM and some data was left on the socket buffer from a previous
97call using MSG_MORE, the MSG_MORE data is left on the socket buffer.
98
b6c535b1
DW
99Receiving TLS application data
100------------------------------
101
102After setting the TLS_RX socket option, all recv family socket calls
103are decrypted using TLS parameters provided. A full TLS record must
104be received before decryption can happen.
105
f3c0f3c6
JK
106.. code-block:: c
107
b6c535b1
DW
108 char buffer[16384];
109 recv(sock, buffer, 16384);
110
111Received data is decrypted directly in to the user buffer if it is
112large enough, and no additional allocations occur. If the userspace
113buffer is too small, data is decrypted in the kernel and copied to
114userspace.
115
f3c0f3c6 116``EINVAL`` is returned if the TLS version in the received message does not
b6c535b1
DW
117match the version passed in setsockopt.
118
f3c0f3c6 119``EMSGSIZE`` is returned if the received message is too big.
b6c535b1 120
f3c0f3c6 121``EBADMSG`` is returned if decryption failed for any other reason.
b6c535b1 122
99c195fb
DW
123Send TLS control messages
124-------------------------
125
126Other than application data, TLS has control messages such as alert
127messages (record type 21) and handshake messages (record type 22), etc.
128These messages can be sent over the socket by providing the TLS record type
129via a CMSG. For example the following function sends @data of @length bytes
130using a record of type @record_type.
131
f3c0f3c6
JK
132.. code-block:: c
133
134 /* send TLS control message using record_type */
99c195fb 135 static int klts_send_ctrl_message(int sock, unsigned char record_type,
f3c0f3c6 136 void *data, size_t length)
99c195fb
DW
137 {
138 struct msghdr msg = {0};
139 int cmsg_len = sizeof(record_type);
140 struct cmsghdr *cmsg;
141 char buf[CMSG_SPACE(cmsg_len)];
142 struct iovec msg_iov; /* Vector of data to send/receive into. */
143
144 msg.msg_control = buf;
145 msg.msg_controllen = sizeof(buf);
146 cmsg = CMSG_FIRSTHDR(&msg);
147 cmsg->cmsg_level = SOL_TLS;
148 cmsg->cmsg_type = TLS_SET_RECORD_TYPE;
149 cmsg->cmsg_len = CMSG_LEN(cmsg_len);
150 *CMSG_DATA(cmsg) = record_type;
151 msg.msg_controllen = cmsg->cmsg_len;
152
153 msg_iov.iov_base = data;
154 msg_iov.iov_len = length;
155 msg.msg_iov = &msg_iov;
156 msg.msg_iovlen = 1;
157
158 return sendmsg(sock, &msg, 0);
159 }
160
161Control message data should be provided unencrypted, and will be
162encrypted by the kernel.
163
b6c535b1
DW
164Receiving TLS control messages
165------------------------------
166
167TLS control messages are passed in the userspace buffer, with message
168type passed via cmsg. If no cmsg buffer is provided, an error is
169returned if a control message is received. Data messages may be
170received without a cmsg buffer set.
171
f3c0f3c6
JK
172.. code-block:: c
173
b6c535b1
DW
174 char buffer[16384];
175 char cmsg[CMSG_SPACE(sizeof(unsigned char))];
176 struct msghdr msg = {0};
177 msg.msg_control = cmsg;
178 msg.msg_controllen = sizeof(cmsg);
179
180 struct iovec msg_iov;
181 msg_iov.iov_base = buffer;
182 msg_iov.iov_len = 16384;
183
184 msg.msg_iov = &msg_iov;
185 msg.msg_iovlen = 1;
186
187 int ret = recvmsg(sock, &msg, 0 /* flags */);
188
189 struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
190 if (cmsg->cmsg_level == SOL_TLS &&
191 cmsg->cmsg_type == TLS_GET_RECORD_TYPE) {
192 int record_type = *((unsigned char *)CMSG_DATA(cmsg));
193 // Do something with record_type, and control message data in
194 // buffer.
195 //
196 // Note that record_type may be == to application data (23).
197 } else {
198 // Buffer contains application data.
199 }
200
201recv will never return data from mixed types of TLS records.
202
99c195fb
DW
203Integrating in to userspace TLS library
204---------------------------------------
205
206At a high level, the kernel TLS ULP is a replacement for the record
207layer of a userspace TLS library.
208
f3c0f3c6
JK
209A patchset to OpenSSL to use ktls as the record layer is
210`here <https://github.com/Mellanox/openssl/commits/tls_rx2>`_.
99c195fb 211
f3c0f3c6
JK
212`An example <https://github.com/ktls/af_ktls-tool/commits/RX>`_
213of calling send directly after a handshake using gnutls.
214Since it doesn't implement a full record layer, control
215messages are not supported.
d26b698d 216
7e5e8ec7
JK
217Optional optimizations
218----------------------
219
220There are certain condition-specific optimizations the TLS ULP can make,
221if requested. Those optimizations are either not universally beneficial
222or may impact correctness, hence they require an opt-in.
223All options are set per-socket using setsockopt(), and their
224state can be checked using getsockopt() and via socket diag (``ss``).
225
226TLS_TX_ZEROCOPY_RO
227~~~~~~~~~~~~~~~~~~
228
229For device offload only. Allow sendfile() data to be transmitted directly
230to the NIC without making an in-kernel copy. This allows true zero-copy
231behavior when device offload is enabled.
232
233The application must make sure that the data is not modified between being
234submitted and transmission completing. In other words this is mostly
235applicable if the data sent on a socket via sendfile() is read-only.
236
237Modifying the data may result in different versions of the data being used
238for the original TCP transmission and TCP retransmissions. To the receiver
239this will look like TLS records had been tampered with and will result
240in record authentication failures.
241
88527790
JK
242TLS_RX_EXPECT_NO_PAD
243~~~~~~~~~~~~~~~~~~~~
244
245TLS 1.3 only. Expect the sender to not pad records. This allows the data
246to be decrypted directly into user space buffers with TLS 1.3.
247
248This optimization is safe to enable only if the remote end is trusted,
249otherwise it is an attack vector to doubling the TLS processing cost.
250
251If the record decrypted turns out to had been padded or is not a data
252record it will be decrypted again into a kernel buffer without zero copy.
253Such events are counted in the ``TlsDecryptRetry`` statistic.
254
d26b698d
JK
255Statistics
256==========
257
258TLS implementation exposes the following per-namespace statistics
259(``/proc/net/tls_stat``):
b32fd3cc
JK
260
261- ``TlsCurrTxSw``, ``TlsCurrRxSw`` -
262 number of TX and RX sessions currently installed where host handles
263 cryptography
264
265- ``TlsCurrTxDevice``, ``TlsCurrRxDevice`` -
266 number of TX and RX sessions currently installed where NIC handles
267 cryptography
268
269- ``TlsTxSw``, ``TlsRxSw`` -
270 number of TX and RX sessions opened with host cryptography
271
272- ``TlsTxDevice``, ``TlsRxDevice`` -
273 number of TX and RX sessions opened with NIC cryptography
5c5ec668
JK
274
275- ``TlsDecryptError`` -
276 record decryption failed (e.g. due to incorrect authentication tag)
a4d26fdb
JK
277
278- ``TlsDeviceRxResync`` -
279 number of RX resyncs sent to NICs handling cryptography
88527790
JK
280
281- ``TlsDecryptRetry`` -
282 number of RX records which had to be re-decrypted due to
283 ``TLS_RX_EXPECT_NO_PAD`` mis-prediction. Note that this counter will
284 also increment for non-data records.
bb56cea9
JK
285
286- ``TlsRxNoPadViolation`` -
287 number of data RX records which had to be re-decrypted due to
288 ``TLS_RX_EXPECT_NO_PAD`` mis-prediction.