Commit | Line | Data |
---|---|---|
f7a6272b AD |
1 | Segmentation Offloads in the Linux Networking Stack |
2 | ||
3 | Introduction | |
4 | ============ | |
5 | ||
6 | This document describes a set of techniques in the Linux networking stack | |
7 | to take advantage of segmentation offload capabilities of various NICs. | |
8 | ||
9 | The following technologies are described: | |
10 | * TCP Segmentation Offload - TSO | |
11 | * UDP Fragmentation Offload - UFO | |
12 | * IPIP, SIT, GRE, and UDP Tunnel Offloads | |
13 | * Generic Segmentation Offload - GSO | |
14 | * Generic Receive Offload - GRO | |
15 | * Partial Generic Segmentation Offload - GSO_PARTIAL | |
16 | ||
17 | TCP Segmentation Offload | |
18 | ======================== | |
19 | ||
20 | TCP segmentation allows a device to segment a single frame into multiple | |
21 | frames with a data payload size specified in skb_shinfo()->gso_size. | |
22 | When TCP segmentation requested the bit for either SKB_GSO_TCP or | |
23 | SKB_GSO_TCP6 should be set in skb_shinfo()->gso_type and | |
24 | skb_shinfo()->gso_size should be set to a non-zero value. | |
25 | ||
26 | TCP segmentation is dependent on support for the use of partial checksum | |
27 | offload. For this reason TSO is normally disabled if the Tx checksum | |
28 | offload for a given device is disabled. | |
29 | ||
30 | In order to support TCP segmentation offload it is necessary to populate | |
31 | the network and transport header offsets of the skbuff so that the device | |
32 | drivers will be able determine the offsets of the IP or IPv6 header and the | |
33 | TCP header. In addition as CHECKSUM_PARTIAL is required csum_start should | |
34 | also point to the TCP header of the packet. | |
35 | ||
36 | For IPv4 segmentation we support one of two types in terms of the IP ID. | |
37 | The default behavior is to increment the IP ID with every segment. If the | |
38 | GSO type SKB_GSO_TCP_FIXEDID is specified then we will not increment the IP | |
39 | ID and all segments will use the same IP ID. If a device has | |
40 | NETIF_F_TSO_MANGLEID set then the IP ID can be ignored when performing TSO | |
41 | and we will either increment the IP ID for all frames, or leave it at a | |
42 | static value based on driver preference. | |
43 | ||
44 | UDP Fragmentation Offload | |
45 | ========================= | |
46 | ||
47 | UDP fragmentation offload allows a device to fragment an oversized UDP | |
48 | datagram into multiple IPv4 fragments. Many of the requirements for UDP | |
49 | fragmentation offload are the same as TSO. However the IPv4 ID for | |
50 | fragments should not increment as a single IPv4 datagram is fragmented. | |
51 | ||
52 | IPIP, SIT, GRE, UDP Tunnel, and Remote Checksum Offloads | |
53 | ======================================================== | |
54 | ||
55 | In addition to the offloads described above it is possible for a frame to | |
56 | contain additional headers such as an outer tunnel. In order to account | |
57 | for such instances an additional set of segmentation offload types were | |
58 | introduced including SKB_GSO_IPIP, SKB_GSO_SIT, SKB_GSO_GRE, and | |
59 | SKB_GSO_UDP_TUNNEL. These extra segmentation types are used to identify | |
60 | cases where there are more than just 1 set of headers. For example in the | |
61 | case of IPIP and SIT we should have the network and transport headers moved | |
62 | from the standard list of headers to "inner" header offsets. | |
63 | ||
64 | Currently only two levels of headers are supported. The convention is to | |
65 | refer to the tunnel headers as the outer headers, while the encapsulated | |
66 | data is normally referred to as the inner headers. Below is the list of | |
67 | calls to access the given headers: | |
68 | ||
69 | IPIP/SIT Tunnel: | |
70 | Outer Inner | |
71 | MAC skb_mac_header | |
72 | Network skb_network_header skb_inner_network_header | |
73 | Transport skb_transport_header | |
74 | ||
75 | UDP/GRE Tunnel: | |
76 | Outer Inner | |
77 | MAC skb_mac_header skb_inner_mac_header | |
78 | Network skb_network_header skb_inner_network_header | |
79 | Transport skb_transport_header skb_inner_transport_header | |
80 | ||
81 | In addition to the above tunnel types there are also SKB_GSO_GRE_CSUM and | |
82 | SKB_GSO_UDP_TUNNEL_CSUM. These two additional tunnel types reflect the | |
83 | fact that the outer header also requests to have a non-zero checksum | |
84 | included in the outer header. | |
85 | ||
86 | Finally there is SKB_GSO_REMCSUM which indicates that a given tunnel header | |
87 | has requested a remote checksum offload. In this case the inner headers | |
88 | will be left with a partial checksum and only the outer header checksum | |
89 | will be computed. | |
90 | ||
91 | Generic Segmentation Offload | |
92 | ============================ | |
93 | ||
94 | Generic segmentation offload is a pure software offload that is meant to | |
95 | deal with cases where device drivers cannot perform the offloads described | |
96 | above. What occurs in GSO is that a given skbuff will have its data broken | |
97 | out over multiple skbuffs that have been resized to match the MSS provided | |
98 | via skb_shinfo()->gso_size. | |
99 | ||
100 | Before enabling any hardware segmentation offload a corresponding software | |
101 | offload is required in GSO. Otherwise it becomes possible for a frame to | |
102 | be re-routed between devices and end up being unable to be transmitted. | |
103 | ||
104 | Generic Receive Offload | |
105 | ======================= | |
106 | ||
107 | Generic receive offload is the complement to GSO. Ideally any frame | |
108 | assembled by GRO should be segmented to create an identical sequence of | |
109 | frames using GSO, and any sequence of frames segmented by GSO should be | |
110 | able to be reassembled back to the original by GRO. The only exception to | |
111 | this is IPv4 ID in the case that the DF bit is set for a given IP header. | |
112 | If the value of the IPv4 ID is not sequentially incrementing it will be | |
113 | altered so that it is when a frame assembled via GRO is segmented via GSO. | |
114 | ||
115 | Partial Generic Segmentation Offload | |
116 | ==================================== | |
117 | ||
118 | Partial generic segmentation offload is a hybrid between TSO and GSO. What | |
119 | it effectively does is take advantage of certain traits of TCP and tunnels | |
120 | so that instead of having to rewrite the packet headers for each segment | |
121 | only the inner-most transport header and possibly the outer-most network | |
122 | header need to be updated. This allows devices that do not support tunnel | |
123 | offloads or tunnel offloads with checksum to still make use of segmentation. | |
124 | ||
125 | With the partial offload what occurs is that all headers excluding the | |
126 | inner transport header are updated such that they will contain the correct | |
127 | values for if the header was simply duplicated. The one exception to this | |
128 | is the outer IPv4 ID field. It is up to the device drivers to guarantee | |
129 | that the IPv4 ID field is incremented in the case that a given header does | |
130 | not have the DF bit set. |