Commit | Line | Data |
---|---|---|
c4a0eb93 | 1 | .. SPDX-License-Identifier: GPL-2.0 |
6c8f7c43 | 2 | .. _xfrm_device: |
5c0bb261 SN |
3 | |
4 | =============================================== | |
5 | XFRM device - offloading the IPsec computations | |
6 | =============================================== | |
c4a0eb93 | 7 | |
5c0bb261 | 8 | Shannon Nelson <shannon.nelson@oracle.com> |
2b7c72e0 | 9 | Leon Romanovsky <leonro@nvidia.com> |
5c0bb261 SN |
10 | |
11 | ||
12 | Overview | |
13 | ======== | |
14 | ||
15 | IPsec is a useful feature for securing network traffic, but the | |
16 | computational cost is high: a 10Gbps link can easily be brought down | |
17 | to under 1Gbps, depending on the traffic and link configuration. | |
18 | Luckily, there are NICs that offer a hardware based IPsec offload which | |
19 | can radically increase throughput and decrease CPU utilization. The XFRM | |
20 | Device interface allows NIC drivers to offer to the stack access to the | |
21 | hardware offload. | |
22 | ||
2b7c72e0 LR |
23 | Right now, there are two types of hardware offload that kernel supports. |
24 | * IPsec crypto offload: | |
25 | * NIC performs encrypt/decrypt | |
26 | * Kernel does everything else | |
27 | * IPsec packet offload: | |
28 | * NIC performs encrypt/decrypt | |
29 | * NIC does encapsulation | |
30 | * Kernel and NIC have SA and policy in-sync | |
31 | * NIC handles the SA and policies states | |
32 | * The Kernel talks to the keymanager | |
33 | ||
5c0bb261 SN |
34 | Userland access to the offload is typically through a system such as |
35 | libreswan or KAME/raccoon, but the iproute2 'ip xfrm' command set can | |
36 | be handy when experimenting. An example command might look something | |
2b7c72e0 | 37 | like this for crypto offload: |
5c0bb261 SN |
38 | |
39 | ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \ | |
40 | reqid 0x07 replay-window 32 \ | |
41 | aead 'rfc4106(gcm(aes))' 0x44434241343332312423222114131211f4f3f2f1 128 \ | |
42 | sel src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp \ | |
43 | offload dev eth4 dir in | |
44 | ||
2b7c72e0 LR |
45 | and for packet offload |
46 | ||
47 | ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \ | |
48 | reqid 0x07 replay-window 32 \ | |
49 | aead 'rfc4106(gcm(aes))' 0x44434241343332312423222114131211f4f3f2f1 128 \ | |
50 | sel src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp \ | |
51 | offload packet dev eth4 dir in | |
52 | ||
53 | ip x p add src 14.0.0.70 dst 14.0.0.52 offload packet dev eth4 dir in | |
54 | tmpl src 14.0.0.70 dst 14.0.0.52 proto esp reqid 10000 mode transport | |
55 | ||
5c0bb261 SN |
56 | Yes, that's ugly, but that's what shell scripts and/or libreswan are for. |
57 | ||
58 | ||
59 | ||
60 | Callbacks to implement | |
61 | ====================== | |
62 | ||
c4a0eb93 MCC |
63 | :: |
64 | ||
65 | /* from include/linux/netdevice.h */ | |
66 | struct xfrmdev_ops { | |
2b7c72e0 | 67 | /* Crypto and Packet offload callbacks */ |
7681a4f5 | 68 | int (*xdo_dev_state_add) (struct xfrm_state *x, struct netlink_ext_ack *extack); |
5c0bb261 SN |
69 | void (*xdo_dev_state_delete) (struct xfrm_state *x); |
70 | void (*xdo_dev_state_free) (struct xfrm_state *x); | |
71 | bool (*xdo_dev_offload_ok) (struct sk_buff *skb, | |
72 | struct xfrm_state *x); | |
50bd870a | 73 | void (*xdo_dev_state_advance_esn) (struct xfrm_state *x); |
2b7c72e0 LR |
74 | |
75 | /* Solely packet offload callbacks */ | |
76 | void (*xdo_dev_state_update_curlft) (struct xfrm_state *x); | |
3089386d | 77 | int (*xdo_dev_policy_add) (struct xfrm_policy *x, struct netlink_ext_ack *extack); |
2b7c72e0 LR |
78 | void (*xdo_dev_policy_delete) (struct xfrm_policy *x); |
79 | void (*xdo_dev_policy_free) (struct xfrm_policy *x); | |
c4a0eb93 | 80 | }; |
5c0bb261 | 81 | |
2b7c72e0 LR |
82 | The NIC driver offering ipsec offload will need to implement callbacks |
83 | relevant to supported offload to make the offload available to the network | |
84 | stack's XFRM subsystem. Additionally, the feature bits NETIF_F_HW_ESP and | |
5c0bb261 SN |
85 | NETIF_F_HW_ESP_TX_CSUM will signal the availability of the offload. |
86 | ||
87 | ||
88 | ||
89 | Flow | |
90 | ==== | |
91 | ||
92 | At probe time and before the call to register_netdev(), the driver should | |
93 | set up local data structures and XFRM callbacks, and set the feature bits. | |
94 | The XFRM code's listener will finish the setup on NETDEV_REGISTER. | |
95 | ||
c4a0eb93 MCC |
96 | :: |
97 | ||
5c0bb261 SN |
98 | adapter->netdev->xfrmdev_ops = &ixgbe_xfrmdev_ops; |
99 | adapter->netdev->features |= NETIF_F_HW_ESP; | |
100 | adapter->netdev->hw_enc_features |= NETIF_F_HW_ESP; | |
101 | ||
102 | When new SAs are set up with a request for "offload" feature, the | |
103 | driver's xdo_dev_state_add() will be given the new SA to be offloaded | |
104 | and an indication of whether it is for Rx or Tx. The driver should | |
c4a0eb93 | 105 | |
5c0bb261 SN |
106 | - verify the algorithm is supported for offloads |
107 | - store the SA information (key, salt, target-ip, protocol, etc) | |
108 | - enable the HW offload of the SA | |
4a132095 | 109 | - return status value: |
c4a0eb93 MCC |
110 | |
111 | =========== =================================== | |
4a132095 | 112 | 0 success |
2b7c72e0 LR |
113 | -EOPNETSUPP offload not supported, try SW IPsec, |
114 | not applicable for packet offload mode | |
4a132095 | 115 | other fail the request |
c4a0eb93 | 116 | =========== =================================== |
5c0bb261 SN |
117 | |
118 | The driver can also set an offload_handle in the SA, an opaque void pointer | |
c4a0eb93 | 119 | that can be used to convey context into the fast-path offload requests:: |
5c0bb261 SN |
120 | |
121 | xs->xso.offload_handle = context; | |
122 | ||
123 | ||
124 | When the network stack is preparing an IPsec packet for an SA that has | |
125 | been setup for offload, it first calls into xdo_dev_offload_ok() with | |
126 | the skb and the intended offload state to ask the driver if the offload | |
127 | will serviceable. This can check the packet information to be sure the | |
128 | offload can be supported (e.g. IPv4 or IPv6, no IPv4 options, etc) and | |
129 | return true of false to signify its support. | |
130 | ||
2b7c72e0 | 131 | Crypto offload mode: |
5c0bb261 SN |
132 | When ready to send, the driver needs to inspect the Tx packet for the |
133 | offload information, including the opaque context, and set up the packet | |
c4a0eb93 | 134 | send accordingly:: |
5c0bb261 SN |
135 | |
136 | xs = xfrm_input_state(skb); | |
137 | context = xs->xso.offload_handle; | |
138 | set up HW for send | |
139 | ||
140 | The stack has already inserted the appropriate IPsec headers in the | |
141 | packet data, the offload just needs to do the encryption and fix up the | |
142 | header values. | |
143 | ||
144 | ||
145 | When a packet is received and the HW has indicated that it offloaded a | |
146 | decryption, the driver needs to add a reference to the decoded SA into | |
147 | the packet's skb. At this point the data should be decrypted but the | |
148 | IPsec headers are still in the packet data; they are removed later up | |
149 | the stack in xfrm_input(). | |
150 | ||
c4a0eb93 MCC |
151 | find and hold the SA that was used to the Rx skb:: |
152 | ||
5c0bb261 SN |
153 | get spi, protocol, and destination IP from packet headers |
154 | xs = find xs from (spi, protocol, dest_IP) | |
155 | xfrm_state_hold(xs); | |
156 | ||
c4a0eb93 MCC |
157 | store the state information into the skb:: |
158 | ||
4165079b FW |
159 | sp = secpath_set(skb); |
160 | if (!sp) return; | |
161 | sp->xvec[sp->len++] = xs; | |
162 | sp->olen++; | |
5c0bb261 | 163 | |
c4a0eb93 MCC |
164 | indicate the success and/or error status of the offload:: |
165 | ||
5c0bb261 SN |
166 | xo = xfrm_offload(skb); |
167 | xo->flags = CRYPTO_DONE; | |
168 | xo->status = crypto_status; | |
169 | ||
170 | hand the packet to napi_gro_receive() as usual | |
171 | ||
50bd870a YE |
172 | In ESN mode, xdo_dev_state_advance_esn() is called from xfrm_replay_advance_esn(). |
173 | Driver will check packet seq number and update HW ESN state machine if needed. | |
5c0bb261 | 174 | |
2b7c72e0 LR |
175 | Packet offload mode: |
176 | HW adds and deletes XFRM headers. So in RX path, XFRM stack is bypassed if HW | |
177 | reported success. In TX path, the packet lefts kernel without extra header | |
178 | and not encrypted, the HW is responsible to perform it. | |
179 | ||
5c0bb261 | 180 | When the SA is removed by the user, the driver's xdo_dev_state_delete() |
2b7c72e0 LR |
181 | and xdo_dev_policy_delete() are asked to disable the offload. Later, |
182 | xdo_dev_state_free() and xdo_dev_policy_free() are called from a garbage | |
183 | collection routine after all reference counts to the state and policy | |
5c0bb261 SN |
184 | have been removed and any remaining resources can be cleared for the |
185 | offload state. How these are used by the driver will depend on specific | |
186 | hardware needs. | |
187 | ||
188 | As a netdev is set to DOWN the XFRM stack's netdev listener will call | |
2b7c72e0 LR |
189 | xdo_dev_state_delete(), xdo_dev_policy_delete(), xdo_dev_state_free() and |
190 | xdo_dev_policy_free() on any remaining offloaded states. | |
191 | ||
192 | Outcome of HW handling packets, the XFRM core can't count hard, soft limits. | |
193 | The HW/driver are responsible to perform it and provide accurate data when | |
194 | xdo_dev_state_update_curlft() is called. In case of one of these limits | |
195 | occuried, the driver needs to call to xfrm_state_check_expire() to make sure | |
196 | that XFRM performs rekeying sequence. |