Commit | Line | Data |
---|---|---|
8fe2f761 WB |
1 | |
2 | 1. Control Interfaces | |
3 | ||
4 | The interfaces for receiving network packages timestamps are: | |
cb9eff09 PO |
5 | |
6 | * SO_TIMESTAMP | |
8fe2f761 WB |
7 | Generates a timestamp for each incoming packet in (not necessarily |
8 | monotonic) system time. Reports the timestamp via recvmsg() in a | |
9dd49211 DD |
9 | control message in usec resolution. |
10 | SO_TIMESTAMP is defined as SO_TIMESTAMP_NEW or SO_TIMESTAMP_OLD | |
11 | based on the architecture type and time_t representation of libc. | |
12 | Control message format is in struct __kernel_old_timeval for | |
13 | SO_TIMESTAMP_OLD and in struct __kernel_sock_timeval for | |
14 | SO_TIMESTAMP_NEW options respectively. | |
cb9eff09 PO |
15 | |
16 | * SO_TIMESTAMPNS | |
8fe2f761 | 17 | Same timestamping mechanism as SO_TIMESTAMP, but reports the |
9dd49211 DD |
18 | timestamp as struct timespec in nsec resolution. |
19 | SO_TIMESTAMPNS is defined as SO_TIMESTAMPNS_NEW or SO_TIMESTAMPNS_OLD | |
20 | based on the architecture type and time_t representation of libc. | |
21 | Control message format is in struct timespec for SO_TIMESTAMPNS_OLD | |
22 | and in struct __kernel_timespec for SO_TIMESTAMPNS_NEW options | |
23 | respectively. | |
cb9eff09 PO |
24 | |
25 | * IP_MULTICAST_LOOP + SO_TIMESTAMP[NS] | |
8fe2f761 WB |
26 | Only for multicast:approximate transmit timestamp obtained by |
27 | reading the looped packet receive timestamp. | |
cb9eff09 | 28 | |
8fe2f761 WB |
29 | * SO_TIMESTAMPING |
30 | Generates timestamps on reception, transmission or both. Supports | |
31 | multiple timestamp sources, including hardware. Supports generating | |
32 | timestamps for stream sockets. | |
cb9eff09 | 33 | |
cb9eff09 | 34 | |
9dd49211 | 35 | 1.1 SO_TIMESTAMP (also SO_TIMESTAMP_OLD and SO_TIMESTAMP_NEW): |
adca4767 | 36 | |
8fe2f761 WB |
37 | This socket option enables timestamping of datagrams on the reception |
38 | path. Because the destination socket, if any, is not known early in | |
39 | the network stack, the feature has to be enabled for all packets. The | |
40 | same is true for all early receive timestamp options. | |
adca4767 | 41 | |
8fe2f761 WB |
42 | For interface details, see `man 7 socket`. |
43 | ||
9dd49211 DD |
44 | Always use SO_TIMESTAMP_NEW timestamp to always get timestamp in |
45 | struct __kernel_sock_timeval format. | |
8fe2f761 | 46 | |
9dd49211 DD |
47 | SO_TIMESTAMP_OLD returns incorrect timestamps after the year 2038 |
48 | on 32 bit machines. | |
49 | ||
50 | 1.2 SO_TIMESTAMPNS (also SO_TIMESTAMPNS_OLD and SO_TIMESTAMPNS_NEW): | |
8fe2f761 WB |
51 | |
52 | This option is identical to SO_TIMESTAMP except for the returned data type. | |
53 | Its struct timespec allows for higher resolution (ns) timestamps than the | |
54 | timeval of SO_TIMESTAMP (ms). | |
55 | ||
9dd49211 DD |
56 | Always use SO_TIMESTAMPNS_NEW timestamp to always get timestamp in |
57 | struct __kernel_timespec format. | |
58 | ||
59 | SO_TIMESTAMPNS_OLD returns incorrect timestamps after the year 2038 | |
60 | on 32 bit machines. | |
8fe2f761 | 61 | |
9dd49211 | 62 | 1.3 SO_TIMESTAMPING (also SO_TIMESTAMPING_OLD and SO_TIMESTAMPING_NEW): |
8fe2f761 WB |
63 | |
64 | Supports multiple types of timestamp requests. As a result, this | |
65 | socket option takes a bitmap of flags, not a boolean. In | |
66 | ||
5e34fa23 | 67 | err = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val)); |
8fe2f761 WB |
68 | |
69 | val is an integer with any of the following bits set. Setting other | |
70 | bit returns EINVAL and does not change the current state. | |
adca4767 | 71 | |
fd91e12f SHY |
72 | The socket option configures timestamp generation for individual |
73 | sk_buffs (1.3.1), timestamp reporting to the socket's error | |
74 | queue (1.3.2) and options (1.3.3). Timestamp generation can also | |
75 | be enabled for individual sendmsg calls using cmsg (1.3.4). | |
76 | ||
adca4767 | 77 | |
8fe2f761 | 78 | 1.3.1 Timestamp Generation |
adca4767 | 79 | |
8fe2f761 WB |
80 | Some bits are requests to the stack to try to generate timestamps. Any |
81 | combination of them is valid. Changes to these bits apply to newly | |
82 | created packets, not to packets already in the stack. As a result, it | |
83 | is possible to selectively request timestamps for a subset of packets | |
84 | (e.g., for sampling) by embedding an send() call within two setsockopt | |
85 | calls, one to enable timestamp generation and one to disable it. | |
86 | Timestamps may also be generated for reasons other than being | |
87 | requested by a particular socket, such as when receive timestamping is | |
88 | enabled system wide, as explained earlier. | |
adca4767 | 89 | |
8fe2f761 WB |
90 | SOF_TIMESTAMPING_RX_HARDWARE: |
91 | Request rx timestamps generated by the network adapter. | |
92 | ||
93 | SOF_TIMESTAMPING_RX_SOFTWARE: | |
94 | Request rx timestamps when data enters the kernel. These timestamps | |
95 | are generated just after a device driver hands a packet to the | |
96 | kernel receive stack. | |
97 | ||
98 | SOF_TIMESTAMPING_TX_HARDWARE: | |
fd91e12f SHY |
99 | Request tx timestamps generated by the network adapter. This flag |
100 | can be enabled via both socket options and control messages. | |
8fe2f761 WB |
101 | |
102 | SOF_TIMESTAMPING_TX_SOFTWARE: | |
103 | Request tx timestamps when data leaves the kernel. These timestamps | |
104 | are generated in the device driver as close as possible, but always | |
105 | prior to, passing the packet to the network interface. Hence, they | |
106 | require driver support and may not be available for all devices. | |
fd91e12f SHY |
107 | This flag can be enabled via both socket options and control messages. |
108 | ||
8fe2f761 WB |
109 | |
110 | SOF_TIMESTAMPING_TX_SCHED: | |
111 | Request tx timestamps prior to entering the packet scheduler. Kernel | |
112 | transmit latency is, if long, often dominated by queuing delay. The | |
113 | difference between this timestamp and one taken at | |
114 | SOF_TIMESTAMPING_TX_SOFTWARE will expose this latency independent | |
115 | of protocol processing. The latency incurred in protocol | |
116 | processing, if any, can be computed by subtracting a userspace | |
117 | timestamp taken immediately before send() from this timestamp. On | |
118 | machines with virtual devices where a transmitted packet travels | |
119 | through multiple devices and, hence, multiple packet schedulers, | |
120 | a timestamp is generated at each layer. This allows for fine | |
fd91e12f SHY |
121 | grained measurement of queuing delay. This flag can be enabled |
122 | via both socket options and control messages. | |
8fe2f761 WB |
123 | |
124 | SOF_TIMESTAMPING_TX_ACK: | |
125 | Request tx timestamps when all data in the send buffer has been | |
126 | acknowledged. This only makes sense for reliable protocols. It is | |
127 | currently only implemented for TCP. For that protocol, it may | |
128 | over-report measurement, because the timestamp is generated when all | |
129 | data up to and including the buffer at send() was acknowledged: the | |
130 | cumulative acknowledgment. The mechanism ignores SACK and FACK. | |
fd91e12f | 131 | This flag can be enabled via both socket options and control messages. |
8fe2f761 WB |
132 | |
133 | ||
134 | 1.3.2 Timestamp Reporting | |
135 | ||
136 | The other three bits control which timestamps will be reported in a | |
137 | generated control message. Changes to the bits take immediate | |
138 | effect at the timestamp reporting locations in the stack. Timestamps | |
139 | are only reported for packets that also have the relevant timestamp | |
140 | generation request set. | |
141 | ||
142 | SOF_TIMESTAMPING_SOFTWARE: | |
143 | Report any software timestamps when available. | |
144 | ||
145 | SOF_TIMESTAMPING_SYS_HARDWARE: | |
146 | This option is deprecated and ignored. | |
147 | ||
148 | SOF_TIMESTAMPING_RAW_HARDWARE: | |
149 | Report hardware timestamps as generated by | |
150 | SOF_TIMESTAMPING_TX_HARDWARE when available. | |
151 | ||
152 | ||
153 | 1.3.3 Timestamp Options | |
154 | ||
829ae9d6 | 155 | The interface supports the options |
8fe2f761 WB |
156 | |
157 | SOF_TIMESTAMPING_OPT_ID: | |
158 | ||
159 | Generate a unique identifier along with each packet. A process can | |
160 | have multiple concurrent timestamping requests outstanding. Packets | |
161 | can be reordered in the transmit path, for instance in the packet | |
162 | scheduler. In that case timestamps will be queued onto the error | |
cbd3aad5 WB |
163 | queue out of order from the original send() calls. It is not always |
164 | possible to uniquely match timestamps to the original send() calls | |
165 | based on timestamp order or payload inspection alone, then. | |
166 | ||
167 | This option associates each packet at send() with a unique | |
168 | identifier and returns that along with the timestamp. The identifier | |
169 | is derived from a per-socket u32 counter (that wraps). For datagram | |
170 | sockets, the counter increments with each sent packet. For stream | |
171 | sockets, it increments with every byte. | |
172 | ||
173 | The counter starts at zero. It is initialized the first time that | |
174 | the socket option is enabled. It is reset each time the option is | |
175 | enabled after having been disabled. Resetting the counter does not | |
176 | change the identifiers of existing packets in the system. | |
8fe2f761 WB |
177 | |
178 | This option is implemented only for transmit timestamps. There, the | |
179 | timestamp is always looped along with a struct sock_extended_err. | |
138a7f49 | 180 | The option modifies field ee_data to pass an id that is unique |
8fe2f761 | 181 | among all possibly concurrently outstanding timestamp requests for |
cbd3aad5 | 182 | that socket. |
8fe2f761 WB |
183 | |
184 | ||
829ae9d6 WB |
185 | SOF_TIMESTAMPING_OPT_CMSG: |
186 | ||
187 | Support recv() cmsg for all timestamped packets. Control messages | |
188 | are already supported unconditionally on all packets with receive | |
189 | timestamps and on IPv6 packets with transmit timestamp. This option | |
190 | extends them to IPv4 packets with transmit timestamp. One use case | |
191 | is to correlate packets with their egress device, by enabling socket | |
192 | option IP_PKTINFO simultaneously. | |
193 | ||
194 | ||
49ca0d8b WB |
195 | SOF_TIMESTAMPING_OPT_TSONLY: |
196 | ||
197 | Applies to transmit timestamps only. Makes the kernel return the | |
198 | timestamp as a cmsg alongside an empty packet, as opposed to | |
199 | alongside the original packet. This reduces the amount of memory | |
200 | charged to the socket's receive budget (SO_RCVBUF) and delivers | |
201 | the timestamp even if sysctl net.core.tstamp_allow_data is 0. | |
202 | This option disables SOF_TIMESTAMPING_OPT_CMSG. | |
203 | ||
1c885808 FY |
204 | SOF_TIMESTAMPING_OPT_STATS: |
205 | ||
206 | Optional stats that are obtained along with the transmit timestamps. | |
207 | It must be used together with SOF_TIMESTAMPING_OPT_TSONLY. When the | |
208 | transmit timestamp is available, the stats are available in a | |
209 | separate control message of type SCM_TIMESTAMPING_OPT_STATS, as a | |
210 | list of TLVs (struct nlattr) of types. These stats allow the | |
211 | application to associate various transport layer stats with | |
212 | the transmit timestamps, such as how long a certain block of | |
213 | data was limited by peer's receiver window. | |
49ca0d8b | 214 | |
aad9c8c4 ML |
215 | SOF_TIMESTAMPING_OPT_PKTINFO: |
216 | ||
217 | Enable the SCM_TIMESTAMPING_PKTINFO control message for incoming | |
218 | packets with hardware timestamps. The message contains struct | |
219 | scm_ts_pktinfo, which supplies the index of the real interface which | |
220 | received the packet and its length at layer 2. A valid (non-zero) | |
221 | interface index will be returned only if CONFIG_NET_RX_BUSY_POLL is | |
222 | enabled and the driver is using NAPI. The struct contains also two | |
223 | other fields, but they are reserved and undefined. | |
224 | ||
b50a5c70 ML |
225 | SOF_TIMESTAMPING_OPT_TX_SWHW: |
226 | ||
227 | Request both hardware and software timestamps for outgoing packets | |
228 | when SOF_TIMESTAMPING_TX_HARDWARE and SOF_TIMESTAMPING_TX_SOFTWARE | |
229 | are enabled at the same time. If both timestamps are generated, | |
230 | two separate messages will be looped to the socket's error queue, | |
231 | each containing just one timestamp. | |
232 | ||
49ca0d8b WB |
233 | New applications are encouraged to pass SOF_TIMESTAMPING_OPT_ID to |
234 | disambiguate timestamps and SOF_TIMESTAMPING_OPT_TSONLY to operate | |
235 | regardless of the setting of sysctl net.core.tstamp_allow_data. | |
236 | ||
237 | An exception is when a process needs additional cmsg data, for | |
238 | instance SOL_IP/IP_PKTINFO to detect the egress network interface. | |
239 | Then pass option SOF_TIMESTAMPING_OPT_CMSG. This option depends on | |
240 | having access to the contents of the original packet, so cannot be | |
241 | combined with SOF_TIMESTAMPING_OPT_TSONLY. | |
242 | ||
243 | ||
fd91e12f SHY |
244 | 1.3.4. Enabling timestamps via control messages |
245 | ||
246 | In addition to socket options, timestamp generation can be requested | |
247 | per write via cmsg, only for SOF_TIMESTAMPING_TX_* (see Section 1.3.1). | |
248 | Using this feature, applications can sample timestamps per sendmsg() | |
249 | without paying the overhead of enabling and disabling timestamps via | |
250 | setsockopt: | |
251 | ||
252 | struct msghdr *msg; | |
253 | ... | |
254 | cmsg = CMSG_FIRSTHDR(msg); | |
255 | cmsg->cmsg_level = SOL_SOCKET; | |
256 | cmsg->cmsg_type = SO_TIMESTAMPING; | |
257 | cmsg->cmsg_len = CMSG_LEN(sizeof(__u32)); | |
258 | *((__u32 *) CMSG_DATA(cmsg)) = SOF_TIMESTAMPING_TX_SCHED | | |
259 | SOF_TIMESTAMPING_TX_SOFTWARE | | |
260 | SOF_TIMESTAMPING_TX_ACK; | |
261 | err = sendmsg(fd, msg, 0); | |
262 | ||
263 | The SOF_TIMESTAMPING_TX_* flags set via cmsg will override | |
264 | the SOF_TIMESTAMPING_TX_* flags set via setsockopt. | |
265 | ||
266 | Moreover, applications must still enable timestamp reporting via | |
267 | setsockopt to receive timestamps: | |
268 | ||
269 | __u32 val = SOF_TIMESTAMPING_SOFTWARE | | |
270 | SOF_TIMESTAMPING_OPT_ID /* or any other flag */; | |
5e34fa23 | 271 | err = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val)); |
fd91e12f SHY |
272 | |
273 | ||
8fe2f761 WB |
274 | 1.4 Bytestream Timestamps |
275 | ||
276 | The SO_TIMESTAMPING interface supports timestamping of bytes in a | |
277 | bytestream. Each request is interpreted as a request for when the | |
278 | entire contents of the buffer has passed a timestamping point. That | |
279 | is, for streams option SOF_TIMESTAMPING_TX_SOFTWARE will record | |
280 | when all bytes have reached the device driver, regardless of how | |
281 | many packets the data has been converted into. | |
282 | ||
283 | In general, bytestreams have no natural delimiters and therefore | |
284 | correlating a timestamp with data is non-trivial. A range of bytes | |
285 | may be split across segments, any segments may be merged (possibly | |
286 | coalescing sections of previously segmented buffers associated with | |
287 | independent send() calls). Segments can be reordered and the same | |
288 | byte range can coexist in multiple segments for protocols that | |
289 | implement retransmissions. | |
290 | ||
291 | It is essential that all timestamps implement the same semantics, | |
292 | regardless of these possible transformations, as otherwise they are | |
293 | incomparable. Handling "rare" corner cases differently from the | |
294 | simple case (a 1:1 mapping from buffer to skb) is insufficient | |
295 | because performance debugging often needs to focus on such outliers. | |
296 | ||
297 | In practice, timestamps can be correlated with segments of a | |
298 | bytestream consistently, if both semantics of the timestamp and the | |
299 | timing of measurement are chosen correctly. This challenge is no | |
300 | different from deciding on a strategy for IP fragmentation. There, the | |
301 | definition is that only the first fragment is timestamped. For | |
302 | bytestreams, we chose that a timestamp is generated only when all | |
303 | bytes have passed a point. SOF_TIMESTAMPING_TX_ACK as defined is easy to | |
304 | implement and reason about. An implementation that has to take into | |
305 | account SACK would be more complex due to possible transmission holes | |
306 | and out of order arrival. | |
307 | ||
308 | On the host, TCP can also break the simple 1:1 mapping from buffer to | |
309 | skbuff as a result of Nagle, cork, autocork, segmentation and GSO. The | |
310 | implementation ensures correctness in all cases by tracking the | |
311 | individual last byte passed to send(), even if it is no longer the | |
312 | last byte after an skbuff extend or merge operation. It stores the | |
313 | relevant sequence number in skb_shinfo(skb)->tskey. Because an skbuff | |
314 | has only one such field, only one timestamp can be generated. | |
315 | ||
316 | In rare cases, a timestamp request can be missed if two requests are | |
317 | collapsed onto the same skb. A process can detect this situation by | |
318 | enabling SOF_TIMESTAMPING_OPT_ID and comparing the byte offset at | |
319 | send time with the value returned for each timestamp. It can prevent | |
320 | the situation by always flushing the TCP stack in between requests, | |
321 | for instance by enabling TCP_NODELAY and disabling TCP_CORK and | |
322 | autocork. | |
323 | ||
324 | These precautions ensure that the timestamp is generated only when all | |
325 | bytes have passed a timestamp point, assuming that the network stack | |
326 | itself does not reorder the segments. The stack indeed tries to avoid | |
327 | reordering. The one exception is under administrator control: it is | |
328 | possible to construct a packet scheduler configuration that delays | |
329 | segments from the same stream differently. Such a setup would be | |
330 | unusual. | |
331 | ||
332 | ||
333 | 2 Data Interfaces | |
334 | ||
335 | Timestamps are read using the ancillary data feature of recvmsg(). | |
336 | See `man 3 cmsg` for details of this interface. The socket manual | |
337 | page (`man 7 socket`) describes how timestamps generated with | |
338 | SO_TIMESTAMP and SO_TIMESTAMPNS records can be retrieved. | |
339 | ||
340 | ||
341 | 2.1 SCM_TIMESTAMPING records | |
342 | ||
343 | These timestamps are returned in a control message with cmsg_level | |
344 | SOL_SOCKET, cmsg_type SCM_TIMESTAMPING, and payload of type | |
69298698 | 345 | |
9dd49211 DD |
346 | For SO_TIMESTAMPING_OLD: |
347 | ||
69298698 | 348 | struct scm_timestamping { |
8fe2f761 | 349 | struct timespec ts[3]; |
69298698 | 350 | }; |
cb9eff09 | 351 | |
9dd49211 DD |
352 | For SO_TIMESTAMPING_NEW: |
353 | ||
354 | struct scm_timestamping64 { | |
355 | struct __kernel_timespec ts[3]; | |
356 | ||
357 | Always use SO_TIMESTAMPING_NEW timestamp to always get timestamp in | |
358 | struct scm_timestamping64 format. | |
359 | ||
360 | SO_TIMESTAMPING_OLD returns incorrect timestamps after the year 2038 | |
361 | on 32 bit machines. | |
362 | ||
8fe2f761 | 363 | The structure can return up to three timestamps. This is a legacy |
67953d47 | 364 | feature. At least one field is non-zero at any time. Most timestamps |
8fe2f761 WB |
365 | are passed in ts[0]. Hardware timestamps are passed in ts[2]. |
366 | ||
367 | ts[1] used to hold hardware timestamps converted to system time. | |
368 | Instead, expose the hardware clock device on the NIC directly as | |
369 | a HW PTP clock source, to allow time conversion in userspace and | |
370 | optionally synchronize system time with a userspace PTP stack such | |
329f0041 | 371 | as linuxptp. For the PTP clock API, see Documentation/driver-api/ptp.rst. |
8fe2f761 | 372 | |
67953d47 ML |
373 | Note that if the SO_TIMESTAMP or SO_TIMESTAMPNS option is enabled |
374 | together with SO_TIMESTAMPING using SOF_TIMESTAMPING_SOFTWARE, a false | |
375 | software timestamp will be generated in the recvmsg() call and passed | |
376 | in ts[0] when a real software timestamp is missing. This happens also | |
377 | on hardware transmit timestamps. | |
378 | ||
8fe2f761 WB |
379 | 2.1.1 Transmit timestamps with MSG_ERRQUEUE |
380 | ||
381 | For transmit timestamps the outgoing packet is looped back to the | |
382 | socket's error queue with the send timestamp(s) attached. A process | |
383 | receives the timestamps by calling recvmsg() with flag MSG_ERRQUEUE | |
384 | set and with a msg_control buffer sufficiently large to receive the | |
385 | relevant metadata structures. The recvmsg call returns the original | |
386 | outgoing data packet with two ancillary messages attached. | |
387 | ||
388 | A message of cm_level SOL_IP(V6) and cm_type IP(V6)_RECVERR | |
389 | embeds a struct sock_extended_err. This defines the error type. For | |
390 | timestamps, the ee_errno field is ENOMSG. The other ancillary message | |
391 | will have cm_level SOL_SOCKET and cm_type SCM_TIMESTAMPING. This | |
392 | embeds the struct scm_timestamping. | |
393 | ||
394 | ||
395 | 2.1.1.2 Timestamp types | |
396 | ||
397 | The semantics of the three struct timespec are defined by field | |
398 | ee_info in the extended error structure. It contains a value of | |
399 | type SCM_TSTAMP_* to define the actual timestamp passed in | |
400 | scm_timestamping. | |
401 | ||
402 | The SCM_TSTAMP_* types are 1:1 matches to the SOF_TIMESTAMPING_* | |
403 | control fields discussed previously, with one exception. For legacy | |
404 | reasons, SCM_TSTAMP_SND is equal to zero and can be set for both | |
405 | SOF_TIMESTAMPING_TX_HARDWARE and SOF_TIMESTAMPING_TX_SOFTWARE. It | |
406 | is the first if ts[2] is non-zero, the second otherwise, in which | |
407 | case the timestamp is stored in ts[0]. | |
408 | ||
409 | ||
410 | 2.1.1.3 Fragmentation | |
411 | ||
412 | Fragmentation of outgoing datagrams is rare, but is possible, e.g., by | |
413 | explicitly disabling PMTU discovery. If an outgoing packet is fragmented, | |
414 | then only the first fragment is timestamped and returned to the sending | |
415 | socket. | |
416 | ||
417 | ||
418 | 2.1.1.4 Packet Payload | |
419 | ||
420 | The calling application is often not interested in receiving the whole | |
421 | packet payload that it passed to the stack originally: the socket | |
422 | error queue mechanism is just a method to piggyback the timestamp on. | |
423 | In this case, the application can choose to read datagrams with a | |
424 | smaller buffer, possibly even of length 0. The payload is truncated | |
425 | accordingly. Until the process calls recvmsg() on the error queue, | |
426 | however, the full packet is queued, taking up budget from SO_RCVBUF. | |
427 | ||
428 | ||
429 | 2.1.1.5 Blocking Read | |
430 | ||
431 | Reading from the error queue is always a non-blocking operation. To | |
432 | block waiting on a timestamp, use poll or select. poll() will return | |
433 | POLLERR in pollfd.revents if any data is ready on the error queue. | |
434 | There is no need to pass this flag in pollfd.events. This flag is | |
435 | ignored on request. See also `man 2 poll`. | |
436 | ||
437 | ||
438 | 2.1.2 Receive timestamps | |
439 | ||
440 | On reception, there is no reason to read from the socket error queue. | |
441 | The SCM_TIMESTAMPING ancillary data is sent along with the packet data | |
442 | on a normal recvmsg(). Since this is not a socket error, it is not | |
443 | accompanied by a message SOL_IP(V6)/IP(V6)_RECVERROR. In this case, | |
444 | the meaning of the three fields in struct scm_timestamping is | |
445 | implicitly defined. ts[0] holds a software timestamp if set, ts[1] | |
446 | is again deprecated and ts[2] holds a hardware timestamp if set. | |
447 | ||
448 | ||
449 | 3. Hardware Timestamping configuration: SIOCSHWTSTAMP and SIOCGHWTSTAMP | |
cb9eff09 PO |
450 | |
451 | Hardware time stamping must also be initialized for each device driver | |
69298698 | 452 | that is expected to do hardware time stamping. The parameter is defined in |
f655f8b8 | 453 | include/uapi/linux/net_tstamp.h as: |
cb9eff09 PO |
454 | |
455 | struct hwtstamp_config { | |
69298698 PL |
456 | int flags; /* no flags defined right now, must be zero */ |
457 | int tx_type; /* HWTSTAMP_TX_* */ | |
458 | int rx_filter; /* HWTSTAMP_FILTER_* */ | |
cb9eff09 PO |
459 | }; |
460 | ||
461 | Desired behavior is passed into the kernel and to a specific device by | |
462 | calling ioctl(SIOCSHWTSTAMP) with a pointer to a struct ifreq whose | |
463 | ifr_data points to a struct hwtstamp_config. The tx_type and | |
464 | rx_filter are hints to the driver what it is expected to do. If | |
465 | the requested fine-grained filtering for incoming packets is not | |
466 | supported, the driver may time stamp more than just the requested types | |
467 | of packets. | |
468 | ||
eff3cddc JK |
469 | Drivers are free to use a more permissive configuration than the requested |
470 | configuration. It is expected that drivers should only implement directly the | |
471 | most generic mode that can be supported. For example if the hardware can | |
472 | support HWTSTAMP_FILTER_V2_EVENT, then it should generally always upscale | |
473 | HWTSTAMP_FILTER_V2_L2_SYNC_MESSAGE, and so forth, as HWTSTAMP_FILTER_V2_EVENT | |
474 | is more generic (and more useful to applications). | |
475 | ||
cb9eff09 PO |
476 | A driver which supports hardware time stamping shall update the struct |
477 | with the actual, possibly more permissive configuration. If the | |
478 | requested packets cannot be time stamped, then nothing should be | |
479 | changed and ERANGE shall be returned (in contrast to EINVAL, which | |
480 | indicates that SIOCSHWTSTAMP is not supported at all). | |
481 | ||
482 | Only a processes with admin rights may change the configuration. User | |
483 | space is responsible to ensure that multiple processes don't interfere | |
484 | with each other and that the settings are reset. | |
485 | ||
fd468c74 BH |
486 | Any process can read the actual configuration by passing this |
487 | structure to ioctl(SIOCGHWTSTAMP) in the same way. However, this has | |
488 | not been implemented in all drivers. | |
489 | ||
cb9eff09 PO |
490 | /* possible values for hwtstamp_config->tx_type */ |
491 | enum { | |
492 | /* | |
493 | * no outgoing packet will need hardware time stamping; | |
494 | * should a packet arrive which asks for it, no hardware | |
495 | * time stamping will be done | |
496 | */ | |
497 | HWTSTAMP_TX_OFF, | |
498 | ||
499 | /* | |
500 | * enables hardware time stamping for outgoing packets; | |
501 | * the sender of the packet decides which are to be | |
502 | * time stamped by setting SOF_TIMESTAMPING_TX_SOFTWARE | |
503 | * before sending the packet | |
504 | */ | |
505 | HWTSTAMP_TX_ON, | |
506 | }; | |
507 | ||
508 | /* possible values for hwtstamp_config->rx_filter */ | |
509 | enum { | |
510 | /* time stamp no incoming packet at all */ | |
511 | HWTSTAMP_FILTER_NONE, | |
512 | ||
513 | /* time stamp any incoming packet */ | |
514 | HWTSTAMP_FILTER_ALL, | |
515 | ||
69298698 PL |
516 | /* return value: time stamp all packets requested plus some others */ |
517 | HWTSTAMP_FILTER_SOME, | |
cb9eff09 PO |
518 | |
519 | /* PTP v1, UDP, any kind of event packet */ | |
520 | HWTSTAMP_FILTER_PTP_V1_L4_EVENT, | |
521 | ||
69298698 | 522 | /* for the complete list of values, please check |
f655f8b8 | 523 | * the include file include/uapi/linux/net_tstamp.h |
69298698 | 524 | */ |
cb9eff09 PO |
525 | }; |
526 | ||
8fe2f761 | 527 | 3.1 Hardware Timestamping Implementation: Device Drivers |
cb9eff09 PO |
528 | |
529 | A driver which supports hardware time stamping must support the | |
69298698 | 530 | SIOCSHWTSTAMP ioctl and update the supplied struct hwtstamp_config with |
fd468c74 BH |
531 | the actual values as described in the section on SIOCSHWTSTAMP. It |
532 | should also support SIOCGHWTSTAMP. | |
69298698 PL |
533 | |
534 | Time stamps for received packets must be stored in the skb. To get a pointer | |
535 | to the shared time stamp structure of the skb call skb_hwtstamps(). Then | |
536 | set the time stamps in the structure: | |
537 | ||
538 | struct skb_shared_hwtstamps { | |
539 | /* hardware time stamp transformed into duration | |
540 | * since arbitrary point in time | |
541 | */ | |
542 | ktime_t hwtstamp; | |
69298698 | 543 | }; |
cb9eff09 PO |
544 | |
545 | Time stamps for outgoing packets are to be generated as follows: | |
2244d07b OH |
546 | - In hard_start_xmit(), check if (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) |
547 | is set no-zero. If yes, then the driver is expected to do hardware time | |
548 | stamping. | |
cb9eff09 | 549 | - If this is possible for the skb and requested, then declare |
2244d07b OH |
550 | that the driver is doing the time stamping by setting the flag |
551 | SKBTX_IN_PROGRESS in skb_shinfo(skb)->tx_flags , e.g. with | |
552 | ||
553 | skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS; | |
554 | ||
555 | You might want to keep a pointer to the associated skb for the next step | |
556 | and not free the skb. A driver not supporting hardware time stamping doesn't | |
557 | do that. A driver must never touch sk_buff::tstamp! It is used to store | |
558 | software generated time stamps by the network subsystem. | |
59cb89e6 JK |
559 | - Driver should call skb_tx_timestamp() as close to passing sk_buff to hardware |
560 | as possible. skb_tx_timestamp() provides a software time stamp if requested | |
561 | and hardware timestamping is not possible (SKBTX_IN_PROGRESS not set). | |
cb9eff09 PO |
562 | - As soon as the driver has sent the packet and/or obtained a |
563 | hardware time stamp for it, it passes the time stamp back by | |
564 | calling skb_hwtstamp_tx() with the original skb, the raw | |
69298698 PL |
565 | hardware time stamp. skb_hwtstamp_tx() clones the original skb and |
566 | adds the timestamps, therefore the original skb has to be freed now. | |
567 | If obtaining the hardware time stamp somehow fails, then the driver | |
568 | should not fall back to software time stamping. The rationale is that | |
569 | this would occur at a later time in the processing pipeline than other | |
570 | software time stamping and therefore could lead to unexpected deltas | |
571 | between time stamps. |