Commit | Line | Data |
---|---|---|
1b23f5e9 OS |
1 | .. SPDX-License-Identifier: GPL-2.0 |
2 | ||
b83eb68c OS |
3 | ===================== |
4 | Segmentation Offloads | |
5 | ===================== | |
1b23f5e9 | 6 | |
f7a6272b AD |
7 | |
8 | Introduction | |
9 | ============ | |
10 | ||
11 | This document describes a set of techniques in the Linux networking stack | |
12 | to take advantage of segmentation offload capabilities of various NICs. | |
13 | ||
14 | The following technologies are described: | |
15 | * TCP Segmentation Offload - TSO | |
16 | * UDP Fragmentation Offload - UFO | |
17 | * IPIP, SIT, GRE, and UDP Tunnel Offloads | |
18 | * Generic Segmentation Offload - GSO | |
19 | * Generic Receive Offload - GRO | |
20 | * Partial Generic Segmentation Offload - GSO_PARTIAL | |
ba3c4385 | 21 | * SCTP acceleration with GSO - GSO_BY_FRAGS |
f7a6272b | 22 | |
1b23f5e9 | 23 | |
f7a6272b AD |
24 | TCP Segmentation Offload |
25 | ======================== | |
26 | ||
27 | TCP segmentation allows a device to segment a single frame into multiple | |
28 | frames with a data payload size specified in skb_shinfo()->gso_size. | |
3d07e074 DA |
29 | When TCP segmentation requested the bit for either SKB_GSO_TCPV4 or |
30 | SKB_GSO_TCPV6 should be set in skb_shinfo()->gso_type and | |
f7a6272b AD |
31 | skb_shinfo()->gso_size should be set to a non-zero value. |
32 | ||
33 | TCP segmentation is dependent on support for the use of partial checksum | |
34 | offload. For this reason TSO is normally disabled if the Tx checksum | |
35 | offload for a given device is disabled. | |
36 | ||
37 | In order to support TCP segmentation offload it is necessary to populate | |
38 | the network and transport header offsets of the skbuff so that the device | |
39 | drivers will be able determine the offsets of the IP or IPv6 header and the | |
40 | TCP header. In addition as CHECKSUM_PARTIAL is required csum_start should | |
41 | also point to the TCP header of the packet. | |
42 | ||
43 | For IPv4 segmentation we support one of two types in terms of the IP ID. | |
44 | The default behavior is to increment the IP ID with every segment. If the | |
45 | GSO type SKB_GSO_TCP_FIXEDID is specified then we will not increment the IP | |
46 | ID and all segments will use the same IP ID. If a device has | |
47 | NETIF_F_TSO_MANGLEID set then the IP ID can be ignored when performing TSO | |
48 | and we will either increment the IP ID for all frames, or leave it at a | |
49 | static value based on driver preference. | |
50 | ||
1b23f5e9 | 51 | |
f7a6272b AD |
52 | UDP Fragmentation Offload |
53 | ========================= | |
54 | ||
55 | UDP fragmentation offload allows a device to fragment an oversized UDP | |
56 | datagram into multiple IPv4 fragments. Many of the requirements for UDP | |
57 | fragmentation offload are the same as TSO. However the IPv4 ID for | |
58 | fragments should not increment as a single IPv4 datagram is fragmented. | |
59 | ||
a65820e6 DA |
60 | UFO is deprecated: modern kernels will no longer generate UFO skbs, but can |
61 | still receive them from tuntap and similar devices. Offload of UDP-based | |
62 | tunnel protocols is still supported. | |
63 | ||
1b23f5e9 | 64 | |
f7a6272b AD |
65 | IPIP, SIT, GRE, UDP Tunnel, and Remote Checksum Offloads |
66 | ======================================================== | |
67 | ||
68 | In addition to the offloads described above it is possible for a frame to | |
69 | contain additional headers such as an outer tunnel. In order to account | |
70 | for such instances an additional set of segmentation offload types were | |
11bafd54 | 71 | introduced including SKB_GSO_IPXIP4, SKB_GSO_IPXIP6, SKB_GSO_GRE, and |
f7a6272b AD |
72 | SKB_GSO_UDP_TUNNEL. These extra segmentation types are used to identify |
73 | cases where there are more than just 1 set of headers. For example in the | |
74 | case of IPIP and SIT we should have the network and transport headers moved | |
75 | from the standard list of headers to "inner" header offsets. | |
76 | ||
77 | Currently only two levels of headers are supported. The convention is to | |
78 | refer to the tunnel headers as the outer headers, while the encapsulated | |
79 | data is normally referred to as the inner headers. Below is the list of | |
80 | calls to access the given headers: | |
81 | ||
1b23f5e9 OS |
82 | IPIP/SIT Tunnel:: |
83 | ||
84 | Outer Inner | |
85 | MAC skb_mac_header | |
86 | Network skb_network_header skb_inner_network_header | |
87 | Transport skb_transport_header | |
f7a6272b | 88 | |
1b23f5e9 OS |
89 | UDP/GRE Tunnel:: |
90 | ||
91 | Outer Inner | |
92 | MAC skb_mac_header skb_inner_mac_header | |
93 | Network skb_network_header skb_inner_network_header | |
94 | Transport skb_transport_header skb_inner_transport_header | |
f7a6272b AD |
95 | |
96 | In addition to the above tunnel types there are also SKB_GSO_GRE_CSUM and | |
97 | SKB_GSO_UDP_TUNNEL_CSUM. These two additional tunnel types reflect the | |
98 | fact that the outer header also requests to have a non-zero checksum | |
99 | included in the outer header. | |
100 | ||
bc3c2431 DA |
101 | Finally there is SKB_GSO_TUNNEL_REMCSUM which indicates that a given tunnel |
102 | header has requested a remote checksum offload. In this case the inner | |
103 | headers will be left with a partial checksum and only the outer header | |
104 | checksum will be computed. | |
f7a6272b | 105 | |
1b23f5e9 | 106 | |
f7a6272b AD |
107 | Generic Segmentation Offload |
108 | ============================ | |
109 | ||
110 | Generic segmentation offload is a pure software offload that is meant to | |
111 | deal with cases where device drivers cannot perform the offloads described | |
112 | above. What occurs in GSO is that a given skbuff will have its data broken | |
113 | out over multiple skbuffs that have been resized to match the MSS provided | |
114 | via skb_shinfo()->gso_size. | |
115 | ||
116 | Before enabling any hardware segmentation offload a corresponding software | |
117 | offload is required in GSO. Otherwise it becomes possible for a frame to | |
118 | be re-routed between devices and end up being unable to be transmitted. | |
119 | ||
1b23f5e9 | 120 | |
f7a6272b AD |
121 | Generic Receive Offload |
122 | ======================= | |
123 | ||
124 | Generic receive offload is the complement to GSO. Ideally any frame | |
125 | assembled by GRO should be segmented to create an identical sequence of | |
126 | frames using GSO, and any sequence of frames segmented by GSO should be | |
127 | able to be reassembled back to the original by GRO. The only exception to | |
128 | this is IPv4 ID in the case that the DF bit is set for a given IP header. | |
129 | If the value of the IPv4 ID is not sequentially incrementing it will be | |
130 | altered so that it is when a frame assembled via GRO is segmented via GSO. | |
131 | ||
1b23f5e9 | 132 | |
f7a6272b AD |
133 | Partial Generic Segmentation Offload |
134 | ==================================== | |
135 | ||
136 | Partial generic segmentation offload is a hybrid between TSO and GSO. What | |
137 | it effectively does is take advantage of certain traits of TCP and tunnels | |
138 | so that instead of having to rewrite the packet headers for each segment | |
139 | only the inner-most transport header and possibly the outer-most network | |
140 | header need to be updated. This allows devices that do not support tunnel | |
141 | offloads or tunnel offloads with checksum to still make use of segmentation. | |
142 | ||
143 | With the partial offload what occurs is that all headers excluding the | |
144 | inner transport header are updated such that they will contain the correct | |
145 | values for if the header was simply duplicated. The one exception to this | |
146 | is the outer IPv4 ID field. It is up to the device drivers to guarantee | |
147 | that the IPv4 ID field is incremented in the case that a given header does | |
148 | not have the DF bit set. | |
a6770889 | 149 | |
1b23f5e9 | 150 | |
ba3c4385 | 151 | SCTP acceleration with GSO |
a6770889 DA |
152 | =========================== |
153 | ||
154 | SCTP - despite the lack of hardware support - can still take advantage of | |
155 | GSO to pass one large packet through the network stack, rather than | |
156 | multiple small packets. | |
157 | ||
158 | This requires a different approach to other offloads, as SCTP packets | |
159 | cannot be just segmented to (P)MTU. Rather, the chunks must be contained in | |
160 | IP segments, padding respected. So unlike regular GSO, SCTP can't just | |
161 | generate a big skb, set gso_size to the fragmentation point and deliver it | |
162 | to IP layer. | |
163 | ||
164 | Instead, the SCTP protocol layer builds an skb with the segments correctly | |
165 | padded and stored as chained skbs, and skb_segment() splits based on those. | |
166 | To signal this, gso_size is set to the special value GSO_BY_FRAGS. | |
167 | ||
168 | Therefore, any code in the core networking stack must be aware of the | |
169 | possibility that gso_size will be GSO_BY_FRAGS and handle that case | |
d02f51cb DA |
170 | appropriately. |
171 | ||
1dd27cde DA |
172 | There are some helpers to make this easier: |
173 | ||
1b23f5e9 OS |
174 | - skb_is_gso(skb) && skb_is_gso_sctp(skb) is the best way to see if |
175 | an skb is an SCTP GSO skb. | |
d02f51cb | 176 | |
1b23f5e9 OS |
177 | - For size checks, the skb_gso_validate_*_len family of helpers correctly |
178 | considers GSO_BY_FRAGS. | |
d02f51cb | 179 | |
1b23f5e9 OS |
180 | - For manipulating packets, skb_increase_gso_size and skb_decrease_gso_size |
181 | will check for GSO_BY_FRAGS and WARN if asked to manipulate these skbs. | |
a6770889 DA |
182 | |
183 | This also affects drivers with the NETIF_F_FRAGLIST & NETIF_F_GSO_SCTP bits | |
184 | set. Note also that NETIF_F_GSO_SCTP is included in NETIF_F_GSO_SOFTWARE. |