Commit | Line | Data |
---|---|---|
f7a6272b AD |
1 | Segmentation Offloads in the Linux Networking Stack |
2 | ||
3 | Introduction | |
4 | ============ | |
5 | ||
6 | This document describes a set of techniques in the Linux networking stack | |
7 | to take advantage of segmentation offload capabilities of various NICs. | |
8 | ||
9 | The following technologies are described: | |
10 | * TCP Segmentation Offload - TSO | |
11 | * UDP Fragmentation Offload - UFO | |
12 | * IPIP, SIT, GRE, and UDP Tunnel Offloads | |
13 | * Generic Segmentation Offload - GSO | |
14 | * Generic Receive Offload - GRO | |
15 | * Partial Generic Segmentation Offload - GSO_PARTIAL | |
a6770889 | 16 | * SCTP accelleration with GSO - GSO_BY_FRAGS |
f7a6272b AD |
17 | |
18 | TCP Segmentation Offload | |
19 | ======================== | |
20 | ||
21 | TCP segmentation allows a device to segment a single frame into multiple | |
22 | frames with a data payload size specified in skb_shinfo()->gso_size. | |
3d07e074 DA |
23 | When TCP segmentation requested the bit for either SKB_GSO_TCPV4 or |
24 | SKB_GSO_TCPV6 should be set in skb_shinfo()->gso_type and | |
f7a6272b AD |
25 | skb_shinfo()->gso_size should be set to a non-zero value. |
26 | ||
27 | TCP segmentation is dependent on support for the use of partial checksum | |
28 | offload. For this reason TSO is normally disabled if the Tx checksum | |
29 | offload for a given device is disabled. | |
30 | ||
31 | In order to support TCP segmentation offload it is necessary to populate | |
32 | the network and transport header offsets of the skbuff so that the device | |
33 | drivers will be able determine the offsets of the IP or IPv6 header and the | |
34 | TCP header. In addition as CHECKSUM_PARTIAL is required csum_start should | |
35 | also point to the TCP header of the packet. | |
36 | ||
37 | For IPv4 segmentation we support one of two types in terms of the IP ID. | |
38 | The default behavior is to increment the IP ID with every segment. If the | |
39 | GSO type SKB_GSO_TCP_FIXEDID is specified then we will not increment the IP | |
40 | ID and all segments will use the same IP ID. If a device has | |
41 | NETIF_F_TSO_MANGLEID set then the IP ID can be ignored when performing TSO | |
42 | and we will either increment the IP ID for all frames, or leave it at a | |
43 | static value based on driver preference. | |
44 | ||
45 | UDP Fragmentation Offload | |
46 | ========================= | |
47 | ||
48 | UDP fragmentation offload allows a device to fragment an oversized UDP | |
49 | datagram into multiple IPv4 fragments. Many of the requirements for UDP | |
50 | fragmentation offload are the same as TSO. However the IPv4 ID for | |
51 | fragments should not increment as a single IPv4 datagram is fragmented. | |
52 | ||
a65820e6 DA |
53 | UFO is deprecated: modern kernels will no longer generate UFO skbs, but can |
54 | still receive them from tuntap and similar devices. Offload of UDP-based | |
55 | tunnel protocols is still supported. | |
56 | ||
f7a6272b AD |
57 | IPIP, SIT, GRE, UDP Tunnel, and Remote Checksum Offloads |
58 | ======================================================== | |
59 | ||
60 | In addition to the offloads described above it is possible for a frame to | |
61 | contain additional headers such as an outer tunnel. In order to account | |
62 | for such instances an additional set of segmentation offload types were | |
11bafd54 | 63 | introduced including SKB_GSO_IPXIP4, SKB_GSO_IPXIP6, SKB_GSO_GRE, and |
f7a6272b AD |
64 | SKB_GSO_UDP_TUNNEL. These extra segmentation types are used to identify |
65 | cases where there are more than just 1 set of headers. For example in the | |
66 | case of IPIP and SIT we should have the network and transport headers moved | |
67 | from the standard list of headers to "inner" header offsets. | |
68 | ||
69 | Currently only two levels of headers are supported. The convention is to | |
70 | refer to the tunnel headers as the outer headers, while the encapsulated | |
71 | data is normally referred to as the inner headers. Below is the list of | |
72 | calls to access the given headers: | |
73 | ||
74 | IPIP/SIT Tunnel: | |
75 | Outer Inner | |
76 | MAC skb_mac_header | |
77 | Network skb_network_header skb_inner_network_header | |
78 | Transport skb_transport_header | |
79 | ||
80 | UDP/GRE Tunnel: | |
81 | Outer Inner | |
82 | MAC skb_mac_header skb_inner_mac_header | |
83 | Network skb_network_header skb_inner_network_header | |
84 | Transport skb_transport_header skb_inner_transport_header | |
85 | ||
86 | In addition to the above tunnel types there are also SKB_GSO_GRE_CSUM and | |
87 | SKB_GSO_UDP_TUNNEL_CSUM. These two additional tunnel types reflect the | |
88 | fact that the outer header also requests to have a non-zero checksum | |
89 | included in the outer header. | |
90 | ||
bc3c2431 DA |
91 | Finally there is SKB_GSO_TUNNEL_REMCSUM which indicates that a given tunnel |
92 | header has requested a remote checksum offload. In this case the inner | |
93 | headers will be left with a partial checksum and only the outer header | |
94 | checksum will be computed. | |
f7a6272b AD |
95 | |
96 | Generic Segmentation Offload | |
97 | ============================ | |
98 | ||
99 | Generic segmentation offload is a pure software offload that is meant to | |
100 | deal with cases where device drivers cannot perform the offloads described | |
101 | above. What occurs in GSO is that a given skbuff will have its data broken | |
102 | out over multiple skbuffs that have been resized to match the MSS provided | |
103 | via skb_shinfo()->gso_size. | |
104 | ||
105 | Before enabling any hardware segmentation offload a corresponding software | |
106 | offload is required in GSO. Otherwise it becomes possible for a frame to | |
107 | be re-routed between devices and end up being unable to be transmitted. | |
108 | ||
109 | Generic Receive Offload | |
110 | ======================= | |
111 | ||
112 | Generic receive offload is the complement to GSO. Ideally any frame | |
113 | assembled by GRO should be segmented to create an identical sequence of | |
114 | frames using GSO, and any sequence of frames segmented by GSO should be | |
115 | able to be reassembled back to the original by GRO. The only exception to | |
116 | this is IPv4 ID in the case that the DF bit is set for a given IP header. | |
117 | If the value of the IPv4 ID is not sequentially incrementing it will be | |
118 | altered so that it is when a frame assembled via GRO is segmented via GSO. | |
119 | ||
120 | Partial Generic Segmentation Offload | |
121 | ==================================== | |
122 | ||
123 | Partial generic segmentation offload is a hybrid between TSO and GSO. What | |
124 | it effectively does is take advantage of certain traits of TCP and tunnels | |
125 | so that instead of having to rewrite the packet headers for each segment | |
126 | only the inner-most transport header and possibly the outer-most network | |
127 | header need to be updated. This allows devices that do not support tunnel | |
128 | offloads or tunnel offloads with checksum to still make use of segmentation. | |
129 | ||
130 | With the partial offload what occurs is that all headers excluding the | |
131 | inner transport header are updated such that they will contain the correct | |
132 | values for if the header was simply duplicated. The one exception to this | |
133 | is the outer IPv4 ID field. It is up to the device drivers to guarantee | |
134 | that the IPv4 ID field is incremented in the case that a given header does | |
135 | not have the DF bit set. | |
a6770889 DA |
136 | |
137 | SCTP accelleration with GSO | |
138 | =========================== | |
139 | ||
140 | SCTP - despite the lack of hardware support - can still take advantage of | |
141 | GSO to pass one large packet through the network stack, rather than | |
142 | multiple small packets. | |
143 | ||
144 | This requires a different approach to other offloads, as SCTP packets | |
145 | cannot be just segmented to (P)MTU. Rather, the chunks must be contained in | |
146 | IP segments, padding respected. So unlike regular GSO, SCTP can't just | |
147 | generate a big skb, set gso_size to the fragmentation point and deliver it | |
148 | to IP layer. | |
149 | ||
150 | Instead, the SCTP protocol layer builds an skb with the segments correctly | |
151 | padded and stored as chained skbs, and skb_segment() splits based on those. | |
152 | To signal this, gso_size is set to the special value GSO_BY_FRAGS. | |
153 | ||
154 | Therefore, any code in the core networking stack must be aware of the | |
155 | possibility that gso_size will be GSO_BY_FRAGS and handle that case | |
d02f51cb DA |
156 | appropriately. |
157 | ||
1dd27cde DA |
158 | There are some helpers to make this easier: |
159 | ||
160 | - skb_is_gso(skb) && skb_is_gso_sctp(skb) is the best way to see if | |
161 | an skb is an SCTP GSO skb. | |
d02f51cb DA |
162 | |
163 | - For size checks, the skb_gso_validate_*_len family of helpers correctly | |
164 | considers GSO_BY_FRAGS. | |
165 | ||
166 | - For manipulating packets, skb_increase_gso_size and skb_decrease_gso_size | |
167 | will check for GSO_BY_FRAGS and WARN if asked to manipulate these skbs. | |
a6770889 DA |
168 | |
169 | This also affects drivers with the NETIF_F_FRAGLIST & NETIF_F_GSO_SCTP bits | |
170 | set. Note also that NETIF_F_GSO_SCTP is included in NETIF_F_GSO_SOFTWARE. |