Commit | Line | Data |
---|---|---|
1b23f5e9 OS |
1 | .. SPDX-License-Identifier: GPL-2.0 |
2 | ||
b83eb68c OS |
3 | ================= |
4 | Checksum Offloads | |
5 | ================= | |
e8ae7b00 EC |
6 | |
7 | ||
8 | Introduction | |
9 | ============ | |
10 | ||
1b23f5e9 OS |
11 | This document describes a set of techniques in the Linux networking stack to |
12 | take advantage of checksum offload capabilities of various NICs. | |
e8ae7b00 EC |
13 | |
14 | The following technologies are described: | |
1b23f5e9 OS |
15 | |
16 | * TX Checksum Offload | |
17 | * LCO: Local Checksum Offload | |
18 | * RCO: Remote Checksum Offload | |
e8ae7b00 EC |
19 | |
20 | Things that should be documented here but aren't yet: | |
1b23f5e9 OS |
21 | |
22 | * RX Checksum Offload | |
23 | * CHECKSUM_UNNECESSARY conversion | |
e8ae7b00 EC |
24 | |
25 | ||
26 | TX Checksum Offload | |
27 | =================== | |
28 | ||
1b23f5e9 OS |
29 | The interface for offloading a transmit checksum to a device is explained in |
30 | detail in comments near the top of include/linux/skbuff.h. | |
31 | ||
e8ae7b00 | 32 | In brief, it allows to request the device fill in a single ones-complement |
1b23f5e9 OS |
33 | checksum defined by the sk_buff fields skb->csum_start and skb->csum_offset. |
34 | The device should compute the 16-bit ones-complement checksum (i.e. the | |
35 | 'IP-style' checksum) from csum_start to the end of the packet, and fill in the | |
36 | result at (csum_start + csum_offset). | |
37 | ||
38 | Because csum_offset cannot be negative, this ensures that the previous value of | |
39 | the checksum field is included in the checksum computation, thus it can be used | |
40 | to supply any needed corrections to the checksum (such as the sum of the | |
41 | pseudo-header for UDP or TCP). | |
42 | ||
e8ae7b00 | 43 | This interface only allows a single checksum to be offloaded. Where |
1b23f5e9 OS |
44 | encapsulation is used, the packet may have multiple checksum fields in |
45 | different header layers, and the rest will have to be handled by another | |
46 | mechanism such as LCO or RCO. | |
47 | ||
43c26a1a | 48 | CRC32c can also be offloaded using this interface, by means of filling |
1b23f5e9 OS |
49 | skb->csum_start and skb->csum_offset as described above, and setting |
50 | skb->csum_not_inet: see skbuff.h comment (section 'D') for more details. | |
51 | ||
e8ae7b00 | 52 | No offloading of the IP header checksum is performed; it is always done in |
1b23f5e9 OS |
53 | software. This is OK because when we build the IP header, we obviously have it |
54 | in cache, so summing it isn't expensive. It's also rather short. | |
55 | ||
e8ae7b00 | 56 | The requirements for GSO are more complicated, because when segmenting an |
1b23f5e9 OS |
57 | encapsulated packet both the inner and outer checksums may need to be edited or |
58 | recomputed for each resulting segment. See the skbuff.h comment (section 'E') | |
59 | for more details. | |
e8ae7b00 EC |
60 | |
61 | A driver declares its offload capabilities in netdev->hw_features; see | |
ea5bacaa | 62 | Documentation/networking/netdev-features.rst for more. Note that a device |
1b23f5e9 OS |
63 | which only advertises NETIF_F_IP[V6]_CSUM must still obey the csum_start and |
64 | csum_offset given in the SKB; if it tries to deduce these itself in hardware | |
65 | (as some NICs do) the driver should check that the values in the SKB match | |
66 | those which the hardware will deduce, and if not, fall back to checksumming in | |
67 | software instead (with skb_csum_hwoffload_help() or one of the | |
68 | skb_checksum_help() / skb_crc32c_csum_help functions, as mentioned in | |
69 | include/linux/skbuff.h). | |
70 | ||
71 | The stack should, for the most part, assume that checksum offload is supported | |
72 | by the underlying device. The only place that should check is | |
73 | validate_xmit_skb(), and the functions it calls directly or indirectly. That | |
74 | function compares the offload features requested by the SKB (which may include | |
75 | other offloads besides TX Checksum Offload) and, if they are not supported or | |
76 | enabled on the device (determined by netdev->features), performs the | |
77 | corresponding offload in software. In the case of TX Checksum Offload, that | |
78 | means calling skb_csum_hwoffload_help(skb, features). | |
e8ae7b00 EC |
79 | |
80 | ||
81 | LCO: Local Checksum Offload | |
82 | =========================== | |
83 | ||
84 | LCO is a technique for efficiently computing the outer checksum of an | |
1b23f5e9 OS |
85 | encapsulated datagram when the inner checksum is due to be offloaded. |
86 | ||
87 | The ones-complement sum of a correctly checksummed TCP or UDP packet is equal | |
88 | to the complement of the sum of the pseudo header, because everything else gets | |
89 | 'cancelled out' by the checksum field. This is because the sum was | |
90 | complemented before being written to the checksum field. | |
91 | ||
e8ae7b00 | 92 | More generally, this holds in any case where the 'IP-style' ones complement |
1b23f5e9 OS |
93 | checksum is used, and thus any checksum that TX Checksum Offload supports. |
94 | ||
e8ae7b00 | 95 | That is, if we have set up TX Checksum Offload with a start/offset pair, we |
1b23f5e9 OS |
96 | know that after the device has filled in that checksum, the ones complement sum |
97 | from csum_start to the end of the packet will be equal to the complement of | |
98 | whatever value we put in the checksum field beforehand. This allows us to | |
99 | compute the outer checksum without looking at the payload: we simply stop | |
100 | summing when we get to csum_start, then add the complement of the 16-bit word | |
101 | at (csum_start + csum_offset). | |
102 | ||
e8ae7b00 | 103 | Then, when the true inner checksum is filled in (either by hardware or by |
1b23f5e9 OS |
104 | skb_checksum_help()), the outer checksum will become correct by virtue of the |
105 | arithmetic. | |
e8ae7b00 EC |
106 | |
107 | LCO is performed by the stack when constructing an outer UDP header for an | |
1b23f5e9 OS |
108 | encapsulation such as VXLAN or GENEVE, in udp_set_csum(). Similarly for the |
109 | IPv6 equivalents, in udp6_set_csum(). | |
110 | ||
e8ae7b00 | 111 | It is also performed when constructing an IPv4 GRE header, in |
1b23f5e9 OS |
112 | net/ipv4/ip_gre.c:build_header(). It is *not* currently performed when |
113 | constructing an IPv6 GRE header; the GRE checksum is computed over the whole | |
114 | packet in net/ipv6/ip6_gre.c:ip6gre_xmit2(), but it should be possible to use | |
115 | LCO here as IPv6 GRE still uses an IP-style checksum. | |
116 | ||
e8ae7b00 | 117 | All of the LCO implementations use a helper function lco_csum(), in |
1b23f5e9 | 118 | include/linux/skbuff.h. |
e8ae7b00 EC |
119 | |
120 | LCO can safely be used for nested encapsulations; in this case, the outer | |
1b23f5e9 OS |
121 | encapsulation layer will sum over both its own header and the 'middle' header. |
122 | This does mean that the 'middle' header will get summed multiple times, but | |
123 | there doesn't seem to be a way to avoid that without incurring bigger costs | |
124 | (e.g. in SKB bloat). | |
e8ae7b00 EC |
125 | |
126 | ||
127 | RCO: Remote Checksum Offload | |
128 | ============================ | |
129 | ||
1b23f5e9 OS |
130 | RCO is a technique for eliding the inner checksum of an encapsulated datagram, |
131 | allowing the outer checksum to be offloaded. It does, however, involve a | |
132 | change to the encapsulation protocols, which the receiver must also support. | |
133 | For this reason, it is disabled by default. | |
134 | ||
e8ae7b00 | 135 | RCO is detailed in the following Internet-Drafts: |
1b23f5e9 OS |
136 | |
137 | * https://tools.ietf.org/html/draft-herbert-remotecsumoffload-00 | |
138 | * https://tools.ietf.org/html/draft-herbert-vxlan-rco-00 | |
139 | ||
140 | In Linux, RCO is implemented individually in each encapsulation protocol, and | |
141 | most tunnel types have flags controlling its use. For instance, VXLAN has the | |
142 | flag VXLAN_F_REMCSUM_TX (per struct vxlan_rdst) to indicate that RCO should be | |
143 | used when transmitting to a given remote destination. |