Commit | Line | Data |
---|---|---|
1cec2cac MCC |
1 | .. SPDX-License-Identifier: GPL-2.0 |
2 | ||
3 | ========= | |
4 | IP Sysctl | |
5 | ========= | |
6 | ||
7 | /proc/sys/net/ipv4/* Variables | |
8 | ============================== | |
1da177e4 LT |
9 | |
10 | ip_forward - BOOLEAN | |
1cec2cac MCC |
11 | - 0 - disabled (default) |
12 | - not 0 - enabled | |
1da177e4 LT |
13 | |
14 | Forward Packets between interfaces. | |
15 | ||
16 | This variable is special, its change resets all configuration | |
17 | parameters to their default state (RFC1122 for hosts, RFC1812 | |
18 | for routers) | |
19 | ||
20 | ip_default_ttl - INTEGER | |
cc6f02dd ED |
21 | Default value of TTL field (Time To Live) for outgoing (but not |
22 | forwarded) IP packets. Should be between 1 and 255 inclusive. | |
23 | Default: 64 (as recommended by RFC1700) | |
1da177e4 | 24 | |
cd174e67 HFS |
25 | ip_no_pmtu_disc - INTEGER |
26 | Disable Path MTU Discovery. If enabled in mode 1 and a | |
188b04d5 | 27 | fragmentation-required ICMP is received, the PMTU to this |
be1c5b53 | 28 | destination will be set to the smallest of the old MTU to |
29 | this destination and min_pmtu (see below). You will need | |
188b04d5 HFS |
30 | to raise min_pmtu to the smallest interface MTU on your system |
31 | manually if you want to avoid locally generated fragments. | |
cd174e67 HFS |
32 | |
33 | In mode 2 incoming Path MTU Discovery messages will be | |
34 | discarded. Outgoing frames are handled the same as in mode 1, | |
35 | implicitly setting IP_PMTUDISC_DONT on every created socket. | |
36 | ||
bb38ccce | 37 | Mode 3 is a hardened pmtu discover mode. The kernel will only |
8ed1dc44 HFS |
38 | accept fragmentation-needed errors if the underlying protocol |
39 | can verify them besides a plain socket lookup. Current | |
40 | protocols for which pmtu events will be honored are TCP, SCTP | |
41 | and DCCP as they verify e.g. the sequence number or the | |
42 | association. This mode should not be enabled globally but is | |
43 | only intended to secure e.g. name servers in namespaces where | |
44 | TCP path mtu must still work but path MTU information of other | |
45 | protocols should be discarded. If enabled globally this mode | |
46 | could break other protocols. | |
47 | ||
48 | Possible values: 0-3 | |
1cec2cac | 49 | |
188b04d5 | 50 | Default: FALSE |
1da177e4 LT |
51 | |
52 | min_pmtu - INTEGER | |
a266ef69 | 53 | default 552 - minimum Path MTU. Unless this is changed manually, |
be1c5b53 | 54 | each cached pmtu will never be lower than this setting. |
1da177e4 | 55 | |
f87c10a8 HFS |
56 | ip_forward_use_pmtu - BOOLEAN |
57 | By default we don't trust protocol path MTUs while forwarding | |
58 | because they could be easily forged and can lead to unwanted | |
59 | fragmentation by the router. | |
60 | You only need to enable this if you have user-space software | |
61 | which tries to discover path mtus by itself and depends on the | |
62 | kernel honoring this information. This is normally not the | |
63 | case. | |
1cec2cac | 64 | |
f87c10a8 | 65 | Default: 0 (disabled) |
1cec2cac | 66 | |
f87c10a8 | 67 | Possible values: |
1cec2cac MCC |
68 | |
69 | - 0 - disabled | |
70 | - 1 - enabled | |
f87c10a8 | 71 | |
219b5f29 LV |
72 | fwmark_reflect - BOOLEAN |
73 | Controls the fwmark of kernel-generated IPv4 reply packets that are not | |
74 | associated with a socket for example, TCP RSTs or ICMP echo replies). | |
75 | If unset, these packets have a fwmark of zero. If set, they have the | |
76 | fwmark of the packet they are replying to. | |
1cec2cac | 77 | |
219b5f29 LV |
78 | Default: 0 |
79 | ||
a6db4494 DA |
80 | fib_multipath_use_neigh - BOOLEAN |
81 | Use status of existing neighbor entry when determining nexthop for | |
82 | multipath routes. If disabled, neighbor information is not used and | |
83 | packets could be directed to a failed nexthop. Only valid for kernels | |
84 | built with CONFIG_IP_ROUTE_MULTIPATH enabled. | |
1cec2cac | 85 | |
a6db4494 | 86 | Default: 0 (disabled) |
1cec2cac | 87 | |
a6db4494 | 88 | Possible values: |
1cec2cac MCC |
89 | |
90 | - 0 - disabled | |
91 | - 1 - enabled | |
a6db4494 | 92 | |
bf4e0a3d NA |
93 | fib_multipath_hash_policy - INTEGER |
94 | Controls which hash policy to use for multipath routes. Only valid | |
95 | for kernels built with CONFIG_IP_ROUTE_MULTIPATH enabled. | |
1cec2cac | 96 | |
bf4e0a3d | 97 | Default: 0 (Layer 3) |
1cec2cac | 98 | |
bf4e0a3d | 99 | Possible values: |
1cec2cac MCC |
100 | |
101 | - 0 - Layer 3 | |
102 | - 1 - Layer 4 | |
103 | - 2 - Layer 3 or inner Layer 3 if present | |
4253b498 IS |
104 | - 3 - Custom multipath hash. Fields used for multipath hash calculation |
105 | are determined by fib_multipath_hash_fields sysctl | |
bf4e0a3d | 106 | |
ce5c9c20 IS |
107 | fib_multipath_hash_fields - UNSIGNED INTEGER |
108 | When fib_multipath_hash_policy is set to 3 (custom multipath hash), the | |
109 | fields used for multipath hash calculation are determined by this | |
110 | sysctl. | |
111 | ||
112 | This value is a bitmask which enables various fields for multipath hash | |
113 | calculation. | |
114 | ||
115 | Possible fields are: | |
116 | ||
117 | ====== ============================ | |
118 | 0x0001 Source IP address | |
119 | 0x0002 Destination IP address | |
120 | 0x0004 IP protocol | |
121 | 0x0008 Unused (Flow Label) | |
122 | 0x0010 Source port | |
123 | 0x0020 Destination port | |
124 | 0x0040 Inner source IP address | |
125 | 0x0080 Inner destination IP address | |
126 | 0x0100 Inner IP protocol | |
127 | 0x0200 Inner Flow Label | |
128 | 0x0400 Inner source port | |
129 | 0x0800 Inner destination port | |
130 | ====== ============================ | |
131 | ||
132 | Default: 0x0007 (source IP, destination IP and IP protocol) | |
133 | ||
9ab948a9 DA |
134 | fib_sync_mem - UNSIGNED INTEGER |
135 | Amount of dirty memory from fib entries that can be backlogged before | |
136 | synchronize_rcu is forced. | |
1cec2cac MCC |
137 | |
138 | Default: 512kB Minimum: 64kB Maximum: 64MB | |
9ab948a9 | 139 | |
432e05d3 PM |
140 | ip_forward_update_priority - INTEGER |
141 | Whether to update SKB priority from "TOS" field in IPv4 header after it | |
142 | is forwarded. The new SKB priority is mapped from TOS field value | |
143 | according to an rt_tos2priority table (see e.g. man tc-prio). | |
1cec2cac | 144 | |
432e05d3 | 145 | Default: 1 (Update priority.) |
1cec2cac | 146 | |
432e05d3 | 147 | Possible values: |
1cec2cac MCC |
148 | |
149 | - 0 - Do not update priority. | |
150 | - 1 - Update priority. | |
432e05d3 | 151 | |
cbaf087a BG |
152 | route/max_size - INTEGER |
153 | Maximum number of routes allowed in the kernel. Increase | |
154 | this when using large numbers of interfaces and/or routes. | |
1cec2cac | 155 | |
25050c63 AS |
156 | From linux kernel 3.6 onwards, this is deprecated for ipv4 |
157 | as route cache is no longer used. | |
cbaf087a | 158 | |
695a376b JM |
159 | From linux kernel 6.3 onwards, this is deprecated for ipv6 |
160 | as garbage collection manages cached route entries. | |
161 | ||
2724680b YH |
162 | neigh/default/gc_thresh1 - INTEGER |
163 | Minimum number of entries to keep. Garbage collector will not | |
164 | purge entries if there are fewer than this number. | |
1cec2cac | 165 | |
b66c66dc | 166 | Default: 128 |
2724680b | 167 | |
a3d12146 | 168 | neigh/default/gc_thresh2 - INTEGER |
169 | Threshold when garbage collector becomes more aggressive about | |
170 | purging entries. Entries older than 5 seconds will be cleared | |
171 | when over this number. | |
1cec2cac | 172 | |
a3d12146 | 173 | Default: 512 |
174 | ||
cbaf087a | 175 | neigh/default/gc_thresh3 - INTEGER |
58956317 DA |
176 | Maximum number of non-PERMANENT neighbor entries allowed. Increase |
177 | this when using large numbers of interfaces and when communicating | |
cbaf087a | 178 | with large numbers of directly-connected peers. |
1cec2cac | 179 | |
cc868028 | 180 | Default: 1024 |
cbaf087a | 181 | |
8b5c171b ED |
182 | neigh/default/unres_qlen_bytes - INTEGER |
183 | The maximum number of bytes which may be used by packets | |
184 | queued for each unresolved address by other network layers. | |
185 | (added in linux 3.3) | |
1cec2cac | 186 | |
3b09adcb | 187 | Setting negative value is meaningless and will return error. |
1cec2cac | 188 | |
eaa72dc4 | 189 | Default: SK_WMEM_MAX, (same as net.core.wmem_default). |
1cec2cac | 190 | |
eaa72dc4 ED |
191 | Exact value depends on architecture and kernel options, |
192 | but should be enough to allow queuing 256 packets | |
193 | of medium size. | |
8b5c171b ED |
194 | |
195 | neigh/default/unres_qlen - INTEGER | |
196 | The maximum number of packets which may be queued for each | |
197 | unresolved address by other network layers. | |
1cec2cac | 198 | |
8b5c171b | 199 | (deprecated in linux 3.3) : use unres_qlen_bytes instead. |
1cec2cac | 200 | |
cc868028 | 201 | Prior to linux 3.3, the default value is 3 which may cause |
5d248c49 | 202 | unexpected packet loss. The current default value is calculated |
cc868028 SW |
203 | according to default value of unres_qlen_bytes and true size of |
204 | packet. | |
1cec2cac | 205 | |
eaa72dc4 | 206 | Default: 101 |
8b5c171b | 207 | |
211da42e YW |
208 | neigh/default/interval_probe_time_ms - INTEGER |
209 | The probe interval for neighbor entries with NTF_MANAGED flag, | |
210 | the min value is 1. | |
211 | ||
212 | Default: 5000 | |
213 | ||
1da177e4 LT |
214 | mtu_expires - INTEGER |
215 | Time, in seconds, that cached PMTU information is kept. | |
216 | ||
217 | min_adv_mss - INTEGER | |
218 | The advertised MSS depends on the first hop route MTU, but will | |
219 | never be lower than this setting. | |
220 | ||
680aea08 AC |
221 | fib_notify_on_flag_change - INTEGER |
222 | Whether to emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/ | |
648106c3 | 223 | RTM_F_TRAP/RTM_F_OFFLOAD_FAILED flags are changed. |
680aea08 AC |
224 | |
225 | After installing a route to the kernel, user space receives an | |
226 | acknowledgment, which means the route was installed in the kernel, | |
227 | but not necessarily in hardware. | |
228 | It is also possible for a route already installed in hardware to change | |
229 | its action and therefore its flags. For example, a host route that is | |
230 | trapping packets can be "promoted" to perform decapsulation following | |
231 | the installation of an IPinIP/VXLAN tunnel. | |
232 | The notifications will indicate to user-space the state of the route. | |
233 | ||
234 | Default: 0 (Do not emit notifications.) | |
235 | ||
236 | Possible values: | |
237 | ||
238 | - 0 - Do not emit notifications. | |
239 | - 1 - Emit notifications. | |
648106c3 | 240 | - 2 - Emit notifications only for RTM_F_OFFLOAD_FAILED flag change. |
680aea08 | 241 | |
1da177e4 LT |
242 | IP Fragmentation: |
243 | ||
3e67f106 | 244 | ipfrag_high_thresh - LONG INTEGER |
648700f7 | 245 | Maximum memory used to reassemble IP fragments. |
e18f5feb | 246 | |
3e67f106 | 247 | ipfrag_low_thresh - LONG INTEGER |
648700f7 | 248 | (Obsolete since linux-4.17) |
b13d3cbf FW |
249 | Maximum memory used to reassemble IP fragments before the kernel |
250 | begins to remove incomplete fragment queues to free up resources. | |
251 | The kernel still accepts new fragments for defragmentation. | |
1da177e4 LT |
252 | |
253 | ipfrag_time - INTEGER | |
e18f5feb | 254 | Time in seconds to keep an IP fragment in memory. |
1da177e4 | 255 | |
89cee8b1 | 256 | ipfrag_max_dist - INTEGER |
e18f5feb JDB |
257 | ipfrag_max_dist is a non-negative integer value which defines the |
258 | maximum "disorder" which is allowed among fragments which share a | |
259 | common IP source address. Note that reordering of packets is | |
260 | not unusual, but if a large number of fragments arrive from a source | |
261 | IP address while a particular fragment queue remains incomplete, it | |
262 | probably indicates that one or more fragments belonging to that queue | |
263 | have been lost. When ipfrag_max_dist is positive, an additional check | |
264 | is done on fragments before they are added to a reassembly queue - if | |
265 | ipfrag_max_dist (or more) fragments have arrived from a particular IP | |
266 | address between additions to any IP fragment queue using that source | |
267 | address, it's presumed that one or more fragments in the queue are | |
268 | lost. The existing fragment queue will be dropped, and a new one | |
89cee8b1 HX |
269 | started. An ipfrag_max_dist value of zero disables this check. |
270 | ||
271 | Using a very small value, e.g. 1 or 2, for ipfrag_max_dist can | |
272 | result in unnecessarily dropping fragment queues when normal | |
e18f5feb JDB |
273 | reordering of packets occurs, which could lead to poor application |
274 | performance. Using a very large value, e.g. 50000, increases the | |
275 | likelihood of incorrectly reassembling IP fragments that originate | |
89cee8b1 HX |
276 | from different IP datagrams, which could result in data corruption. |
277 | Default: 64 | |
278 | ||
c6a4254c ND |
279 | bc_forwarding - INTEGER |
280 | bc_forwarding enables the feature described in rfc1812#section-5.3.5.2 | |
281 | and rfc2644. It allows the router to forward directed broadcast. | |
282 | To enable this feature, the 'all' entry and the input interface entry | |
283 | should be set to 1. | |
284 | Default: 0 | |
285 | ||
1cec2cac MCC |
286 | INET peer storage |
287 | ================= | |
1da177e4 LT |
288 | |
289 | inet_peer_threshold - INTEGER | |
e18f5feb | 290 | The approximate size of the storage. Starting from this threshold |
1da177e4 LT |
291 | entries will be thrown aggressively. This threshold also determines |
292 | entries' time-to-live and time intervals between garbage collection | |
293 | passes. More entries, less time-to-live, less GC interval. | |
294 | ||
295 | inet_peer_minttl - INTEGER | |
296 | Minimum time-to-live of entries. Should be enough to cover fragment | |
297 | time-to-live on the reassembling side. This minimum time-to-live is | |
298 | guaranteed if the pool size is less than inet_peer_threshold. | |
77a538d5 | 299 | Measured in seconds. |
1da177e4 LT |
300 | |
301 | inet_peer_maxttl - INTEGER | |
302 | Maximum time-to-live of entries. Unused entries will expire after | |
303 | this period of time if there is no memory pressure on the pool (i.e. | |
304 | when the number of entries in the pool is very small). | |
77a538d5 | 305 | Measured in seconds. |
1da177e4 | 306 | |
1cec2cac MCC |
307 | TCP variables |
308 | ============= | |
1da177e4 | 309 | |
ef56e622 SH |
310 | somaxconn - INTEGER |
311 | Limit of socket listen() backlog, known in userspace as SOMAXCONN. | |
19f92a03 ED |
312 | Defaults to 4096. (Was 128 before linux-5.4) |
313 | See also tcp_max_syn_backlog for additional tuning for TCP sockets. | |
ef56e622 | 314 | |
ef56e622 SH |
315 | tcp_abort_on_overflow - BOOLEAN |
316 | If listening service is too slow to accept new connections, | |
317 | reset them. Default state is FALSE. It means that if overflow | |
318 | occurred due to a burst, connection will recover. Enable this | |
319 | option _only_ if you are really sure that listening daemon | |
320 | cannot be tuned to accept connections faster. Enabling this | |
321 | option can harm clients of your server. | |
1da177e4 | 322 | |
ef56e622 SH |
323 | tcp_adv_win_scale - INTEGER |
324 | Count buffering overhead as bytes/2^tcp_adv_win_scale | |
325 | (if tcp_adv_win_scale > 0) or bytes-bytes/2^(-tcp_adv_win_scale), | |
326 | if it is <= 0. | |
1cec2cac | 327 | |
0147fc05 | 328 | Possible values are [-31, 31], inclusive. |
1cec2cac | 329 | |
b49960a0 | 330 | Default: 1 |
1da177e4 | 331 | |
ef56e622 SH |
332 | tcp_allowed_congestion_control - STRING |
333 | Show/set the congestion control choices available to non-privileged | |
334 | processes. The list is a subset of those listed in | |
335 | tcp_available_congestion_control. | |
1cec2cac | 336 | |
ef56e622 | 337 | Default is "reno" and the default setting (tcp_congestion_control). |
1da177e4 | 338 | |
ef56e622 SH |
339 | tcp_app_win - INTEGER |
340 | Reserve max(window/2^tcp_app_win, mss) of window for application | |
341 | buffer. Value 0 is special, it means that nothing is reserved. | |
1cec2cac | 342 | |
dc5110c2 Y |
343 | Possible values are [0, 31], inclusive. |
344 | ||
ef56e622 | 345 | Default: 31 |
1da177e4 | 346 | |
f54b3111 ED |
347 | tcp_autocorking - BOOLEAN |
348 | Enable TCP auto corking : | |
349 | When applications do consecutive small write()/sendmsg() system calls, | |
350 | we try to coalesce these small writes as much as possible, to lower | |
351 | total amount of sent packets. This is done if at least one prior | |
352 | packet for the flow is waiting in Qdisc queues or device transmit | |
353 | queue. Applications can still use TCP_CORK for optimal behavior | |
354 | when they know how/when to uncork their sockets. | |
1cec2cac | 355 | |
f54b3111 ED |
356 | Default : 1 |
357 | ||
ef56e622 SH |
358 | tcp_available_congestion_control - STRING |
359 | Shows the available congestion control choices that are registered. | |
360 | More congestion control algorithms may be available as modules, | |
361 | but not loaded. | |
1da177e4 | 362 | |
71599cd1 | 363 | tcp_base_mss - INTEGER |
4edc2f34 SH |
364 | The initial value of search_low to be used by the packetization layer |
365 | Path MTU discovery (MTU probing). If MTU probing is enabled, | |
366 | this is the initial MSS used by the connection. | |
71599cd1 | 367 | |
c04b79b6 JH |
368 | tcp_mtu_probe_floor - INTEGER |
369 | If MTU probing is enabled this caps the minimum MSS used for search_low | |
370 | for the connection. | |
371 | ||
372 | Default : 48 | |
373 | ||
5f3e2bf0 ED |
374 | tcp_min_snd_mss - INTEGER |
375 | TCP SYN and SYNACK messages usually advertise an ADVMSS option, | |
376 | as described in RFC 1122 and RFC 6691. | |
1cec2cac | 377 | |
5f3e2bf0 ED |
378 | If this ADVMSS option is smaller than tcp_min_snd_mss, |
379 | it is silently capped to tcp_min_snd_mss. | |
380 | ||
381 | Default : 48 (at least 8 bytes of payload per segment) | |
382 | ||
ef56e622 SH |
383 | tcp_congestion_control - STRING |
384 | Set the congestion control algorithm to be used for new | |
385 | connections. The algorithm "reno" is always available, but | |
386 | additional choices may be available based on kernel configuration. | |
387 | Default is set as part of kernel configuration. | |
d8a6e65f ED |
388 | For passive connections, the listener congestion control choice |
389 | is inherited. | |
1cec2cac | 390 | |
d8a6e65f | 391 | [see setsockopt(listenfd, SOL_TCP, TCP_CONGESTION, "name" ...) ] |
1da177e4 | 392 | |
ef56e622 SH |
393 | tcp_dsack - BOOLEAN |
394 | Allows TCP to send "duplicate" SACKs. | |
1da177e4 | 395 | |
eed530b6 | 396 | tcp_early_retrans - INTEGER |
bec41a11 YC |
397 | Tail loss probe (TLP) converts RTOs occurring due to tail |
398 | losses into fast recovery (draft-ietf-tcpm-rack). Note that | |
399 | TLP requires RACK to function properly (see tcp_recovery below) | |
1cec2cac | 400 | |
eed530b6 | 401 | Possible values: |
1cec2cac MCC |
402 | |
403 | - 0 disables TLP | |
404 | - 3 or 4 enables TLP | |
405 | ||
6ba8a3b1 | 406 | Default: 3 |
eed530b6 | 407 | |
34a6ef38 | 408 | tcp_ecn - INTEGER |
7e3a2dc5 RJ |
409 | Control use of Explicit Congestion Notification (ECN) by TCP. |
410 | ECN is used only when both ends of the TCP connection indicate | |
411 | support for it. This feature is useful in avoiding losses due | |
412 | to congestion by allowing supporting routers to signal | |
413 | congestion before having to drop packets. | |
1cec2cac | 414 | |
255cac91 | 415 | Possible values are: |
1cec2cac MCC |
416 | |
417 | = ===================================================== | |
418 | 0 Disable ECN. Neither initiate nor accept ECN. | |
419 | 1 Enable ECN when requested by incoming connections and | |
420 | also request ECN on outgoing connection attempts. | |
421 | 2 Enable ECN when requested by incoming connections | |
422 | but do not request ECN on outgoing connections. | |
423 | = ===================================================== | |
424 | ||
255cac91 | 425 | Default: 2 |
ef56e622 | 426 | |
49213555 DB |
427 | tcp_ecn_fallback - BOOLEAN |
428 | If the kernel detects that ECN connection misbehaves, enable fall | |
429 | back to non-ECN. Currently, this knob implements the fallback | |
430 | from RFC3168, section 6.1.1.1., but we reserve that in future, | |
431 | additional detection mechanisms could be implemented under this | |
432 | knob. The value is not used, if tcp_ecn or per route (or congestion | |
433 | control) ECN settings are disabled. | |
1cec2cac | 434 | |
49213555 DB |
435 | Default: 1 (fallback enabled) |
436 | ||
ef56e622 | 437 | tcp_fack - BOOLEAN |
713bafea | 438 | This is a legacy option, it has no effect anymore. |
1da177e4 LT |
439 | |
440 | tcp_fin_timeout - INTEGER | |
d825da2e RJ |
441 | The length of time an orphaned (no longer referenced by any |
442 | application) connection will remain in the FIN_WAIT_2 state | |
443 | before it is aborted at the local end. While a perfectly | |
444 | valid "receive only" state for an un-orphaned connection, an | |
445 | orphaned connection in FIN_WAIT_2 state could otherwise wait | |
446 | forever for the remote to close its end of the connection. | |
1cec2cac | 447 | |
d825da2e | 448 | Cf. tcp_max_orphans |
1cec2cac | 449 | |
d825da2e | 450 | Default: 60 seconds |
1da177e4 | 451 | |
89808060 | 452 | tcp_frto - INTEGER |
e33099f9 | 453 | Enables Forward RTO-Recovery (F-RTO) defined in RFC5682. |
cd99889c | 454 | F-RTO is an enhanced recovery algorithm for TCP retransmission |
e33099f9 YC |
455 | timeouts. It is particularly beneficial in networks where the |
456 | RTT fluctuates (e.g., wireless). F-RTO is sender-side only | |
457 | modification. It does not require any support from the peer. | |
458 | ||
459 | By default it's enabled with a non-zero value. 0 disables F-RTO. | |
1da177e4 | 460 | |
e2d00e62 LC |
461 | tcp_fwmark_accept - BOOLEAN |
462 | If set, incoming connections to listening sockets that do not have a | |
463 | socket mark will set the mark of the accepting socket to the fwmark of | |
464 | the incoming SYN packet. This will cause all packets on that connection | |
465 | (starting from the first SYNACK) to be sent with that fwmark. The | |
466 | listening socket's mark is unchanged. Listening sockets that already | |
467 | have a fwmark set via setsockopt(SOL_SOCKET, SO_MARK, ...) are | |
468 | unaffected. | |
469 | ||
470 | Default: 0 | |
471 | ||
032ee423 NC |
472 | tcp_invalid_ratelimit - INTEGER |
473 | Limit the maximal rate for sending duplicate acknowledgments | |
474 | in response to incoming TCP packets that are for an existing | |
475 | connection but that are invalid due to any of these reasons: | |
476 | ||
477 | (a) out-of-window sequence number, | |
478 | (b) out-of-window acknowledgment number, or | |
479 | (c) PAWS (Protection Against Wrapped Sequence numbers) check failure | |
480 | ||
481 | This can help mitigate simple "ack loop" DoS attacks, wherein | |
482 | a buggy or malicious middlebox or man-in-the-middle can | |
483 | rewrite TCP header fields in manner that causes each endpoint | |
484 | to think that the other is sending invalid TCP segments, thus | |
485 | causing each side to send an unterminating stream of duplicate | |
486 | acknowledgments for invalid segments. | |
487 | ||
488 | Using 0 disables rate-limiting of dupacks in response to | |
489 | invalid segments; otherwise this value specifies the minimal | |
490 | space between sending such dupacks, in milliseconds. | |
491 | ||
492 | Default: 500 (milliseconds). | |
493 | ||
ef56e622 SH |
494 | tcp_keepalive_time - INTEGER |
495 | How often TCP sends out keepalive messages when keepalive is enabled. | |
496 | Default: 2hours. | |
1da177e4 | 497 | |
ef56e622 SH |
498 | tcp_keepalive_probes - INTEGER |
499 | How many keepalive probes TCP sends out, until it decides that the | |
500 | connection is broken. Default value: 9. | |
501 | ||
502 | tcp_keepalive_intvl - INTEGER | |
503 | How frequently the probes are send out. Multiplied by | |
504 | tcp_keepalive_probes it is time to kill not responding connection, | |
505 | after probes started. Default value: 75sec i.e. connection | |
506 | will be aborted after ~11 minutes of retries. | |
507 | ||
6dd9a14e DA |
508 | tcp_l3mdev_accept - BOOLEAN |
509 | Enables child sockets to inherit the L3 master device index. | |
510 | Enabling this option allows a "global" listen socket to work | |
511 | across L3 master domains (e.g., VRFs) with connected sockets | |
512 | derived from the listen socket to be bound to the L3 domain in | |
513 | which the packets originated. Only valid when the kernel was | |
514 | compiled with CONFIG_NET_L3_MASTER_DEV. | |
1cec2cac MCC |
515 | |
516 | Default: 0 (disabled) | |
6dd9a14e | 517 | |
ef56e622 | 518 | tcp_low_latency - BOOLEAN |
b6690b14 | 519 | This is a legacy option, it has no effect anymore. |
1da177e4 LT |
520 | |
521 | tcp_max_orphans - INTEGER | |
522 | Maximal number of TCP sockets not attached to any user file handle, | |
523 | held by system. If this number is exceeded orphaned connections are | |
524 | reset immediately and warning is printed. This limit exists | |
525 | only to prevent simple DoS attacks, you _must_ not rely on this | |
526 | or lower the limit artificially, but rather increase it | |
527 | (probably, after increasing installed memory), | |
528 | if network conditions require more than default value, | |
529 | and tune network services to linger and kill such states | |
530 | more aggressively. Let me to remind again: each orphan eats | |
531 | up to ~64K of unswappable memory. | |
532 | ||
1da177e4 | 533 | tcp_max_syn_backlog - INTEGER |
623d0c2d ED |
534 | Maximal number of remembered connection requests (SYN_RECV), |
535 | which have not received an acknowledgment from connecting client. | |
1cec2cac | 536 | |
623d0c2d | 537 | This is a per-listener limit. |
1cec2cac | 538 | |
99b53bdd PP |
539 | The minimal value is 128 for low memory machines, and it will |
540 | increase in proportion to the memory of machine. | |
1cec2cac | 541 | |
99b53bdd | 542 | If server suffers from overload, try increasing this number. |
1cec2cac | 543 | |
623d0c2d ED |
544 | Remember to also check /proc/sys/net/core/somaxconn |
545 | A SYN_RECV request socket consumes about 304 bytes of memory. | |
1da177e4 | 546 | |
ef56e622 SH |
547 | tcp_max_tw_buckets - INTEGER |
548 | Maximal number of timewait sockets held by system simultaneously. | |
549 | If this number is exceeded time-wait socket is immediately destroyed | |
550 | and warning is printed. This limit exists only to prevent | |
551 | simple DoS attacks, you _must_ not lower the limit artificially, | |
552 | but rather increase it (probably, after increasing installed memory), | |
553 | if network conditions require more than default value. | |
1da177e4 | 554 | |
ef56e622 SH |
555 | tcp_mem - vector of 3 INTEGERs: min, pressure, max |
556 | min: below this number of pages TCP is not bothered about its | |
557 | memory appetite. | |
1da177e4 | 558 | |
ef56e622 SH |
559 | pressure: when amount of memory allocated by TCP exceeds this number |
560 | of pages, TCP moderates its memory consumption and enters memory | |
561 | pressure mode, which is exited when memory consumption falls | |
562 | under "min". | |
1da177e4 | 563 | |
ef56e622 | 564 | max: number of pages allowed for queueing by all TCP sockets. |
1da177e4 | 565 | |
ef56e622 SH |
566 | Defaults are calculated at boot time from amount of available |
567 | memory. | |
1da177e4 | 568 | |
f6722583 YC |
569 | tcp_min_rtt_wlen - INTEGER |
570 | The window length of the windowed min filter to track the minimum RTT. | |
571 | A shorter window lets a flow more quickly pick up new (higher) | |
572 | minimum RTT when it is moved to a longer path (e.g., due to traffic | |
573 | engineering). A longer window makes the filter more resistant to RTT | |
574 | inflations such as transient congestion. The unit is seconds. | |
1cec2cac | 575 | |
19fad20d | 576 | Possible values: 0 - 86400 (1 day) |
1cec2cac | 577 | |
f6722583 YC |
578 | Default: 300 |
579 | ||
71599cd1 | 580 | tcp_moderate_rcvbuf - BOOLEAN |
4edc2f34 | 581 | If set, TCP performs receive buffer auto-tuning, attempting to |
71599cd1 JH |
582 | automatically size the buffer (no greater than tcp_rmem[2]) to |
583 | match the size required by the path for full throughput. Enabled by | |
584 | default. | |
585 | ||
586 | tcp_mtu_probing - INTEGER | |
587 | Controls TCP Packetization-Layer Path MTU Discovery. Takes three | |
588 | values: | |
1cec2cac MCC |
589 | |
590 | - 0 - Disabled | |
591 | - 1 - Disabled by default, enabled when an ICMP black hole detected | |
592 | - 2 - Always enabled, use initial MSS of tcp_base_mss. | |
71599cd1 | 593 | |
d4ce5808 | 594 | tcp_probe_interval - UNSIGNED INTEGER |
fab42760 FD |
595 | Controls how often to start TCP Packetization-Layer Path MTU |
596 | Discovery reprobe. The default is reprobing every 10 minutes as | |
597 | per RFC4821. | |
598 | ||
599 | tcp_probe_threshold - INTEGER | |
600 | Controls when TCP Packetization-Layer Path MTU Discovery probing | |
601 | will stop in respect to the width of search range in bytes. Default | |
602 | is 8 bytes. | |
603 | ||
71599cd1 JH |
604 | tcp_no_metrics_save - BOOLEAN |
605 | By default, TCP saves various connection metrics in the route cache | |
606 | when the connection closes, so that connections established in the | |
607 | near future can use these to set initial conditions. Usually, this | |
608 | increases overall performance, but may sometimes cause performance | |
0f035b8e | 609 | degradation. If set, TCP will not cache metrics on closing |
71599cd1 JH |
610 | connections. |
611 | ||
65e6d901 KYY |
612 | tcp_no_ssthresh_metrics_save - BOOLEAN |
613 | Controls whether TCP saves ssthresh metrics in the route cache. | |
1cec2cac | 614 | |
65e6d901 KYY |
615 | Default is 1, which disables ssthresh metrics. |
616 | ||
ef56e622 | 617 | tcp_orphan_retries - INTEGER |
5d789229 DL |
618 | This value influences the timeout of a locally closed TCP connection, |
619 | when RTO retransmissions remain unacknowledged. | |
620 | See tcp_retries2 for more details. | |
621 | ||
06b8fc5d | 622 | The default value is 8. |
1cec2cac | 623 | |
5d789229 | 624 | If your machine is a loaded WEB server, |
ef56e622 SH |
625 | you should think about lowering this value, such sockets |
626 | may consume significant resources. Cf. tcp_max_orphans. | |
1da177e4 | 627 | |
4f41b1c5 YC |
628 | tcp_recovery - INTEGER |
629 | This value is a bitmap to enable various experimental loss recovery | |
630 | features. | |
631 | ||
1cec2cac MCC |
632 | ========= ============================================================= |
633 | RACK: 0x1 enables the RACK loss detection for fast detection of lost | |
634 | retransmissions and tail drops. It also subsumes and disables | |
635 | RFC6675 recovery for SACK connections. | |
636 | ||
637 | RACK: 0x2 makes RACK's reordering window static (min_rtt/4). | |
638 | ||
639 | RACK: 0x4 disables RACK's DUPACK threshold heuristic | |
640 | ========= ============================================================= | |
4f41b1c5 YC |
641 | |
642 | Default: 0x1 | |
643 | ||
1c7249e4 GN |
644 | tcp_reflect_tos - BOOLEAN |
645 | For listening sockets, reuse the DSCP value of the initial SYN message | |
646 | for outgoing packets. This allows to have both directions of a TCP | |
647 | stream to use the same DSCP value, assuming DSCP remains unchanged for | |
648 | the lifetime of the connection. | |
649 | ||
650 | This options affects both IPv4 and IPv6. | |
651 | ||
652 | Default: 0 (disabled) | |
653 | ||
1da177e4 | 654 | tcp_reordering - INTEGER |
dca145ff ED |
655 | Initial reordering level of packets in a TCP stream. |
656 | TCP stack can then dynamically adjust flow reordering level | |
657 | between this initial value and tcp_max_reordering | |
1cec2cac | 658 | |
e18f5feb | 659 | Default: 3 |
1da177e4 | 660 | |
dca145ff ED |
661 | tcp_max_reordering - INTEGER |
662 | Maximal reordering level of packets in a TCP stream. | |
663 | 300 is a fairly conservative value, but you might increase it | |
664 | if paths are using per packet load balancing (like bonding rr mode) | |
1cec2cac | 665 | |
dca145ff ED |
666 | Default: 300 |
667 | ||
1da177e4 LT |
668 | tcp_retrans_collapse - BOOLEAN |
669 | Bug-to-bug compatibility with some broken printers. | |
670 | On retransmit try to send bigger packets to work around bugs in | |
671 | certain TCP stacks. | |
672 | ||
ef56e622 | 673 | tcp_retries1 - INTEGER |
5d789229 DL |
674 | This value influences the time, after which TCP decides, that |
675 | something is wrong due to unacknowledged RTO retransmissions, | |
676 | and reports this suspicion to the network layer. | |
677 | See tcp_retries2 for more details. | |
678 | ||
679 | RFC 1122 recommends at least 3 retransmissions, which is the | |
680 | default. | |
1da177e4 | 681 | |
ef56e622 | 682 | tcp_retries2 - INTEGER |
5d789229 DL |
683 | This value influences the timeout of an alive TCP connection, |
684 | when RTO retransmissions remain unacknowledged. | |
685 | Given a value of N, a hypothetical TCP connection following | |
686 | exponential backoff with an initial RTO of TCP_RTO_MIN would | |
687 | retransmit N times before killing the connection at the (N+1)th RTO. | |
688 | ||
689 | The default value of 15 yields a hypothetical timeout of 924.6 | |
690 | seconds and is a lower bound for the effective timeout. | |
691 | TCP will effectively time out at the first RTO which exceeds the | |
692 | hypothetical timeout. | |
693 | ||
694 | RFC 1122 recommends at least 100 seconds for the timeout, | |
695 | which corresponds to a value of at least 8. | |
1da177e4 | 696 | |
ef56e622 SH |
697 | tcp_rfc1337 - BOOLEAN |
698 | If set, the TCP stack behaves conforming to RFC1337. If unset, | |
699 | we are not conforming to RFC, but prevent TCP TIME_WAIT | |
700 | assassination. | |
1cec2cac | 701 | |
ef56e622 | 702 | Default: 0 |
1da177e4 LT |
703 | |
704 | tcp_rmem - vector of 3 INTEGERs: min, default, max | |
705 | min: Minimal size of receive buffer used by TCP sockets. | |
706 | It is guaranteed to each TCP socket, even under moderate memory | |
707 | pressure. | |
1cec2cac | 708 | |
a61a86f8 | 709 | Default: 4K |
1da177e4 | 710 | |
53025f5e | 711 | default: initial size of receive buffer used by TCP sockets. |
1da177e4 | 712 | This value overrides net.core.rmem_default used by other protocols. |
1d1be912 ED |
713 | Default: 131072 bytes. |
714 | This value results in initial window of 65535. | |
1da177e4 LT |
715 | |
716 | max: maximal size of receive buffer allowed for automatically | |
717 | selected receiver buffers for TCP socket. This value does not override | |
53025f5e BF |
718 | net.core.rmem_max. Calling setsockopt() with SO_RCVBUF disables |
719 | automatic tuning of that socket's receive buffer size, in which | |
720 | case this value is ignored. | |
1d1be912 | 721 | Default: between 131072 and 6MB, depending on RAM size. |
1da177e4 | 722 | |
ef56e622 SH |
723 | tcp_sack - BOOLEAN |
724 | Enable select acknowledgments (SACKS). | |
1da177e4 | 725 | |
6d82aa24 ED |
726 | tcp_comp_sack_delay_ns - LONG INTEGER |
727 | TCP tries to reduce number of SACK sent, using a timer | |
728 | based on 5% of SRTT, capped by this sysctl, in nano seconds. | |
729 | The default is 1ms, based on TSO autosizing period. | |
730 | ||
731 | Default : 1,000,000 ns (1 ms) | |
732 | ||
a70437cc ED |
733 | tcp_comp_sack_slack_ns - LONG INTEGER |
734 | This sysctl control the slack used when arming the | |
735 | timer used by SACK compression. This gives extra time | |
736 | for small RTT flows, and reduces system overhead by allowing | |
737 | opportunistic reduction of timer interrupts. | |
738 | ||
739 | Default : 100,000 ns (100 us) | |
740 | ||
9c21d2fc | 741 | tcp_comp_sack_nr - INTEGER |
2bcd9d84 | 742 | Max number of SACK that can be compressed. |
9c21d2fc ED |
743 | Using 0 disables SACK compression. |
744 | ||
2bcd9d84 | 745 | Default : 44 |
9c21d2fc | 746 | |
ef56e622 SH |
747 | tcp_slow_start_after_idle - BOOLEAN |
748 | If set, provide RFC2861 behavior and time out the congestion | |
749 | window after an idle period. An idle period is defined at | |
750 | the current RTO. If unset, the congestion window will not | |
751 | be timed out after an idle period. | |
1cec2cac | 752 | |
ef56e622 | 753 | Default: 1 |
1da177e4 | 754 | |
ef56e622 | 755 | tcp_stdurg - BOOLEAN |
4edc2f34 | 756 | Use the Host requirements interpretation of the TCP urgent pointer field. |
ef56e622 SH |
757 | Most hosts use the older BSD interpretation, so if you turn this on |
758 | Linux might not communicate correctly with them. | |
1cec2cac | 759 | |
ef56e622 | 760 | Default: FALSE |
1da177e4 | 761 | |
ef56e622 SH |
762 | tcp_synack_retries - INTEGER |
763 | Number of times SYNACKs for a passive TCP connection attempt will | |
764 | be retransmitted. Should not be higher than 255. Default value | |
6c9ff979 AB |
765 | is 5, which corresponds to 31seconds till the last retransmission |
766 | with the current initial RTO of 1second. With this the final timeout | |
767 | for a passive TCP connection will happen after 63seconds. | |
1da177e4 | 768 | |
d8513df2 | 769 | tcp_syncookies - INTEGER |
a3c910d2 | 770 | Only valid when the kernel was compiled with CONFIG_SYN_COOKIES |
ef56e622 | 771 | Send out syncookies when the syn backlog queue of a socket |
4edc2f34 | 772 | overflows. This is to prevent against the common 'SYN flood attack' |
a3c910d2 | 773 | Default: 1 |
1da177e4 | 774 | |
ef56e622 SH |
775 | Note, that syncookies is fallback facility. |
776 | It MUST NOT be used to help highly loaded servers to stand | |
4edc2f34 | 777 | against legal connection rate. If you see SYN flood warnings |
ef56e622 SH |
778 | in your logs, but investigation shows that they occur |
779 | because of overload with legal connections, you should tune | |
780 | another parameters until this warning disappear. | |
781 | See: tcp_max_syn_backlog, tcp_synack_retries, tcp_abort_on_overflow. | |
1da177e4 | 782 | |
ef56e622 SH |
783 | syncookies seriously violate TCP protocol, do not allow |
784 | to use TCP extensions, can result in serious degradation | |
785 | of some services (f.e. SMTP relaying), visible not by you, | |
786 | but your clients and relays, contacting you. While you see | |
4edc2f34 | 787 | SYN flood warnings in logs not being really flooded, your server |
ef56e622 | 788 | is seriously misconfigured. |
1da177e4 | 789 | |
5ad37d5d HFS |
790 | If you want to test which effects syncookies have to your |
791 | network connections you can set this knob to 2 to enable | |
792 | unconditionally generation of syncookies. | |
793 | ||
f9ac779f KI |
794 | tcp_migrate_req - BOOLEAN |
795 | The incoming connection is tied to a specific listening socket when | |
796 | the initial SYN packet is received during the three-way handshake. | |
797 | When a listener is closed, in-flight request sockets during the | |
798 | handshake and established sockets in the accept queue are aborted. | |
799 | ||
800 | If the listener has SO_REUSEPORT enabled, other listeners on the | |
801 | same port should have been able to accept such connections. This | |
802 | option makes it possible to migrate such child sockets to another | |
803 | listener after close() or shutdown(). | |
804 | ||
805 | The BPF_SK_REUSEPORT_SELECT_OR_MIGRATE type of eBPF program should | |
806 | usually be used to define the policy to pick an alive listener. | |
807 | Otherwise, the kernel will randomly pick an alive listener only if | |
808 | this option is enabled. | |
809 | ||
810 | Note that migration between listeners with different settings may | |
811 | crash applications. Let's say migration happens from listener A to | |
812 | B, and only B has TCP_SAVE_SYN enabled. B cannot read SYN data from | |
813 | the requests migrated from A. To avoid such a situation, cancel | |
814 | migration by returning SK_DROP in the type of eBPF program, or | |
815 | disable this option. | |
816 | ||
817 | Default: 0 | |
818 | ||
cf60af03 | 819 | tcp_fastopen - INTEGER |
cebc5cba YC |
820 | Enable TCP Fast Open (RFC7413) to send and accept data in the opening |
821 | SYN packet. | |
10467163 | 822 | |
cebc5cba YC |
823 | The client support is enabled by flag 0x1 (on by default). The client |
824 | then must use sendmsg() or sendto() with the MSG_FASTOPEN flag, | |
825 | rather than connect() to send data in SYN. | |
cf60af03 | 826 | |
cebc5cba YC |
827 | The server support is enabled by flag 0x2 (off by default). Then |
828 | either enable for all listeners with another flag (0x400) or | |
829 | enable individual listeners via TCP_FASTOPEN socket option with | |
830 | the option value being the length of the syn-data backlog. | |
cf60af03 | 831 | |
cebc5cba | 832 | The values (bitmap) are |
1cec2cac MCC |
833 | |
834 | ===== ======== ====================================================== | |
835 | 0x1 (client) enables sending data in the opening SYN on the client. | |
836 | 0x2 (server) enables the server support, i.e., allowing data in | |
cebc5cba YC |
837 | a SYN packet to be accepted and passed to the |
838 | application before 3-way handshake finishes. | |
1cec2cac | 839 | 0x4 (client) send data in the opening SYN regardless of cookie |
cebc5cba | 840 | availability and without a cookie option. |
1cec2cac MCC |
841 | 0x200 (server) accept data-in-SYN w/o any cookie option present. |
842 | 0x400 (server) enable all listeners to support Fast Open by | |
cebc5cba | 843 | default without explicit TCP_FASTOPEN socket option. |
1cec2cac | 844 | ===== ======== ====================================================== |
cebc5cba YC |
845 | |
846 | Default: 0x1 | |
10467163 | 847 | |
a7db3c76 | 848 | Note that additional client or server features are only |
cebc5cba | 849 | effective if the basic support (0x1 and 0x2) are enabled respectively. |
10467163 | 850 | |
cf1ef3f0 WW |
851 | tcp_fastopen_blackhole_timeout_sec - INTEGER |
852 | Initial time period in second to disable Fastopen on active TCP sockets | |
853 | when a TFO firewall blackhole issue happens. | |
854 | This time period will grow exponentially when more blackhole issues | |
855 | get detected right after Fastopen is re-enabled and will reset to | |
856 | initial value when the blackhole issue goes away. | |
7268586b | 857 | 0 to disable the blackhole detection. |
1cec2cac | 858 | |
213ad73d | 859 | By default, it is set to 0 (feature is disabled). |
cf1ef3f0 | 860 | |
2dc7e48d JB |
861 | tcp_fastopen_key - list of comma separated 32-digit hexadecimal INTEGERs |
862 | The list consists of a primary key and an optional backup key. The | |
863 | primary key is used for both creating and validating cookies, while the | |
864 | optional backup key is only used for validating cookies. The purpose of | |
865 | the backup key is to maximize TFO validation when keys are rotated. | |
866 | ||
867 | A randomly chosen primary key may be configured by the kernel if | |
868 | the tcp_fastopen sysctl is set to 0x400 (see above), or if the | |
869 | TCP_FASTOPEN setsockopt() optname is set and a key has not been | |
870 | previously configured via sysctl. If keys are configured via | |
871 | setsockopt() by using the TCP_FASTOPEN_KEY optname, then those | |
872 | per-socket keys will be used instead of any keys that are specified via | |
873 | sysctl. | |
874 | ||
875 | A key is specified as 4 8-digit hexadecimal integers which are separated | |
876 | by a '-' as: xxxxxxxx-xxxxxxxx-xxxxxxxx-xxxxxxxx. Leading zeros may be | |
877 | omitted. A primary and a backup key may be specified by separating them | |
878 | by a comma. If only one key is specified, it becomes the primary key and | |
879 | any previously configured backup keys are removed. | |
880 | ||
ef56e622 SH |
881 | tcp_syn_retries - INTEGER |
882 | Number of times initial SYNs for an active TCP connection attempt | |
bffae697 | 883 | will be retransmitted. Should not be higher than 127. Default value |
3b09adcb | 884 | is 6, which corresponds to 63seconds till the last retransmission |
6c9ff979 AB |
885 | with the current initial RTO of 1second. With this the final timeout |
886 | for an active TCP connection attempt will happen after 127seconds. | |
ef56e622 | 887 | |
25429d7b | 888 | tcp_timestamps - INTEGER |
1cec2cac MCC |
889 | Enable timestamps as defined in RFC1323. |
890 | ||
891 | - 0: Disabled. | |
892 | - 1: Enable timestamps as defined in RFC1323 and use random offset for | |
893 | each connection rather than only using the current time. | |
894 | - 2: Like 1, but without random offsets. | |
895 | ||
25429d7b | 896 | Default: 1 |
1da177e4 | 897 | |
95bd09eb ED |
898 | tcp_min_tso_segs - INTEGER |
899 | Minimal number of segments per TSO frame. | |
1cec2cac | 900 | |
95bd09eb ED |
901 | Since linux-3.12, TCP does an automatic sizing of TSO frames, |
902 | depending on flow rate, instead of filling 64Kbytes packets. | |
903 | For specific usages, it's possible to force TCP to build big | |
904 | TSO frames. Note that TCP stack might split too big TSO packets | |
905 | if available window is too small. | |
1cec2cac | 906 | |
95bd09eb ED |
907 | Default: 2 |
908 | ||
65466904 ED |
909 | tcp_tso_rtt_log - INTEGER |
910 | Adjustment of TSO packet sizes based on min_rtt | |
911 | ||
912 | Starting from linux-5.18, TCP autosizing can be tweaked | |
913 | for flows having small RTT. | |
914 | ||
915 | Old autosizing was splitting the pacing budget to send 1024 TSO | |
916 | per second. | |
917 | ||
918 | tso_packet_size = sk->sk_pacing_rate / 1024; | |
919 | ||
920 | With the new mechanism, we increase this TSO sizing using: | |
921 | ||
922 | distance = min_rtt_usec / (2^tcp_tso_rtt_log) | |
923 | tso_packet_size += gso_max_size >> distance; | |
924 | ||
925 | This means that flows between very close hosts can use bigger | |
926 | TSO packets, reducing their cpu costs. | |
927 | ||
928 | If you want to use the old autosizing, set this sysctl to 0. | |
929 | ||
930 | Default: 9 (2^9 = 512 usec) | |
931 | ||
43e122b0 ED |
932 | tcp_pacing_ss_ratio - INTEGER |
933 | sk->sk_pacing_rate is set by TCP stack using a ratio applied | |
934 | to current rate. (current_rate = cwnd * mss / srtt) | |
935 | If TCP is in slow start, tcp_pacing_ss_ratio is applied | |
936 | to let TCP probe for bigger speeds, assuming cwnd can be | |
937 | doubled every other RTT. | |
1cec2cac | 938 | |
43e122b0 ED |
939 | Default: 200 |
940 | ||
941 | tcp_pacing_ca_ratio - INTEGER | |
942 | sk->sk_pacing_rate is set by TCP stack using a ratio applied | |
943 | to current rate. (current_rate = cwnd * mss / srtt) | |
944 | If TCP is in congestion avoidance phase, tcp_pacing_ca_ratio | |
945 | is applied to conservatively probe for bigger throughput. | |
1cec2cac | 946 | |
43e122b0 ED |
947 | Default: 120 |
948 | ||
1da177e4 | 949 | tcp_tso_win_divisor - INTEGER |
ef56e622 SH |
950 | This allows control over what percentage of the congestion window |
951 | can be consumed by a single TSO frame. | |
952 | The setting of this parameter is a choice between burstiness and | |
953 | building larger TSO frames. | |
1cec2cac | 954 | |
ef56e622 | 955 | Default: 3 |
1da177e4 | 956 | |
79e9fed4 MŻ |
957 | tcp_tw_reuse - INTEGER |
958 | Enable reuse of TIME-WAIT sockets for new connections when it is | |
959 | safe from protocol viewpoint. | |
1cec2cac MCC |
960 | |
961 | - 0 - disable | |
962 | - 1 - global enable | |
963 | - 2 - enable for loopback traffic only | |
964 | ||
ef56e622 SH |
965 | It should not be changed without advice/request of technical |
966 | experts. | |
1cec2cac | 967 | |
79e9fed4 | 968 | Default: 2 |
ce7bc3bf | 969 | |
ef56e622 SH |
970 | tcp_window_scaling - BOOLEAN |
971 | Enable window scaling as defined in RFC1323. | |
3ff825b2 | 972 | |
ef56e622 | 973 | tcp_wmem - vector of 3 INTEGERs: min, default, max |
53025f5e | 974 | min: Amount of memory reserved for send buffers for TCP sockets. |
ef56e622 | 975 | Each TCP socket has rights to use it due to fact of its birth. |
1cec2cac | 976 | |
a61a86f8 | 977 | Default: 4K |
9d7bcfc6 | 978 | |
53025f5e BF |
979 | default: initial size of send buffer used by TCP sockets. This |
980 | value overrides net.core.wmem_default used by other protocols. | |
1cec2cac | 981 | |
53025f5e | 982 | It is usually lower than net.core.wmem_default. |
1cec2cac | 983 | |
ef56e622 SH |
984 | Default: 16K |
985 | ||
53025f5e BF |
986 | max: Maximal amount of memory allowed for automatically tuned |
987 | send buffers for TCP sockets. This value does not override | |
988 | net.core.wmem_max. Calling setsockopt() with SO_SNDBUF disables | |
989 | automatic tuning of that socket's send buffer size, in which case | |
990 | this value is ignored. | |
1cec2cac | 991 | |
53025f5e | 992 | Default: between 64K and 4MB, depending on RAM size. |
1da177e4 | 993 | |
c9bee3b7 ED |
994 | tcp_notsent_lowat - UNSIGNED INTEGER |
995 | A TCP socket can control the amount of unsent bytes in its write queue, | |
996 | thanks to TCP_NOTSENT_LOWAT socket option. poll()/select()/epoll() | |
997 | reports POLLOUT events if the amount of unsent bytes is below a per | |
998 | socket value, and if the write queue is not full. sendmsg() will | |
999 | also not add new buffers if the limit is hit. | |
1000 | ||
1001 | This global variable controls the amount of unsent data for | |
1002 | sockets not using TCP_NOTSENT_LOWAT. For these sockets, a change | |
1003 | to the global variable has immediate effect. | |
1004 | ||
1005 | Default: UINT_MAX (0xFFFFFFFF) | |
1006 | ||
15d99e02 RJ |
1007 | tcp_workaround_signed_windows - BOOLEAN |
1008 | If set, assume no receipt of a window scaling option means the | |
1009 | remote TCP is broken and treats the window as a signed quantity. | |
1010 | If unset, assume the remote TCP is not broken even if we do | |
1011 | not receive a window scaling option from them. | |
1cec2cac | 1012 | |
15d99e02 RJ |
1013 | Default: 0 |
1014 | ||
36e31b0a AP |
1015 | tcp_thin_linear_timeouts - BOOLEAN |
1016 | Enable dynamic triggering of linear timeouts for thin streams. | |
1017 | If set, a check is performed upon retransmission by timeout to | |
1018 | determine if the stream is thin (less than 4 packets in flight). | |
1019 | As long as the stream is found to be thin, up to 6 linear | |
1020 | timeouts may be performed before exponential backoff mode is | |
1021 | initiated. This improves retransmission latency for | |
1022 | non-aggressive thin streams, often found to be time-dependent. | |
1023 | For more information on thin streams, see | |
ff159f4f | 1024 | Documentation/networking/tcp-thin.rst |
1cec2cac | 1025 | |
36e31b0a AP |
1026 | Default: 0 |
1027 | ||
46d3ceab ED |
1028 | tcp_limit_output_bytes - INTEGER |
1029 | Controls TCP Small Queue limit per tcp socket. | |
1030 | TCP bulk sender tends to increase packets in flight until it | |
1031 | gets losses notifications. With SNDBUF autotuning, this can | |
9c4c3252 FL |
1032 | result in a large amount of packets queued on the local machine |
1033 | (e.g.: qdiscs, CPU backlog, or device) hurting latency of other | |
1034 | flows, for typical pfifo_fast qdiscs. tcp_limit_output_bytes | |
1035 | limits the number of bytes on qdisc or device to reduce artificial | |
1036 | RTT/cwnd and reduce bufferbloat. | |
1cec2cac | 1037 | |
c73e5807 | 1038 | Default: 1048576 (16 * 65536) |
46d3ceab | 1039 | |
282f23c6 ED |
1040 | tcp_challenge_ack_limit - INTEGER |
1041 | Limits number of Challenge ACK sent per second, as recommended | |
1042 | in RFC 5961 (Improving TCP's Robustness to Blind In-Window Attacks) | |
79e3602c ED |
1043 | Note that this per netns rate limit can allow some side channel |
1044 | attacks and probably should not be enabled. | |
1045 | TCP stack implements per TCP socket limits anyway. | |
1046 | Default: INT_MAX (unlimited) | |
282f23c6 | 1047 | |
d1e5e640 KI |
1048 | tcp_ehash_entries - INTEGER |
1049 | Show the number of hash buckets for TCP sockets in the current | |
1050 | networking namespace. | |
1051 | ||
1052 | A negative value means the networking namespace does not own its | |
1053 | hash buckets and shares the initial networking namespace's one. | |
1054 | ||
1055 | tcp_child_ehash_entries - INTEGER | |
1056 | Control the number of hash buckets for TCP sockets in the child | |
1057 | networking namespace, which must be set before clone() or unshare(). | |
1058 | ||
1059 | If the value is not 0, the kernel uses a value rounded up to 2^n | |
1060 | as the actual hash bucket size. 0 is a special value, meaning | |
1061 | the child networking namespace will share the initial networking | |
1062 | namespace's hash buckets. | |
1063 | ||
1064 | Note that the child will use the global one in case the kernel | |
1065 | fails to allocate enough memory. In addition, the global hash | |
1066 | buckets are spread over available NUMA nodes, but the allocation | |
1067 | of the child hash table depends on the current process's NUMA | |
1068 | policy, which could result in performance differences. | |
1069 | ||
1070 | Note also that the default value of tcp_max_tw_buckets and | |
1071 | tcp_max_syn_backlog depend on the hash bucket size. | |
1072 | ||
1073 | Possible values: 0, 2^n (n: 0 - 24 (16Mi)) | |
1074 | ||
1075 | Default: 0 | |
1076 | ||
bd456f28 MAQ |
1077 | tcp_plb_enabled - BOOLEAN |
1078 | If set and the underlying congestion control (e.g. DCTCP) supports | |
1079 | and enables PLB feature, TCP PLB (Protective Load Balancing) is | |
1080 | enabled. PLB is described in the following paper: | |
1081 | https://doi.org/10.1145/3544216.3544226. Based on PLB parameters, | |
1082 | upon sensing sustained congestion, TCP triggers a change in | |
1083 | flow label field for outgoing IPv6 packets. A change in flow label | |
1084 | field potentially changes the path of outgoing packets for switches | |
1085 | that use ECMP/WCMP for routing. | |
1086 | ||
1087 | PLB changes socket txhash which results in a change in IPv6 Flow Label | |
1088 | field, and currently no-op for IPv4 headers. It is possible | |
1089 | to apply PLB for IPv4 with other network header fields (e.g. TCP | |
1090 | or IPv4 options) or using encapsulation where outer header is used | |
1091 | by switches to determine next hop. In either case, further host | |
1092 | and switch side changes will be needed. | |
1093 | ||
1094 | When set, PLB assumes that congestion signal (e.g. ECN) is made | |
1095 | available and used by congestion control module to estimate a | |
1096 | congestion measure (e.g. ce_ratio). PLB needs a congestion measure to | |
1097 | make repathing decisions. | |
1098 | ||
1099 | Default: FALSE | |
1100 | ||
1101 | tcp_plb_idle_rehash_rounds - INTEGER | |
1102 | Number of consecutive congested rounds (RTT) seen after which | |
1103 | a rehash can be performed, given there are no packets in flight. | |
1104 | This is referred to as M in PLB paper: | |
1105 | https://doi.org/10.1145/3544216.3544226. | |
1106 | ||
1107 | Possible Values: 0 - 31 | |
1108 | ||
1109 | Default: 3 | |
1110 | ||
1111 | tcp_plb_rehash_rounds - INTEGER | |
1112 | Number of consecutive congested rounds (RTT) seen after which | |
1113 | a forced rehash can be performed. Be careful when setting this | |
1114 | parameter, as a small value increases the risk of retransmissions. | |
1115 | This is referred to as N in PLB paper: | |
1116 | https://doi.org/10.1145/3544216.3544226. | |
1117 | ||
1118 | Possible Values: 0 - 31 | |
1119 | ||
1120 | Default: 12 | |
1121 | ||
1122 | tcp_plb_suspend_rto_sec - INTEGER | |
1123 | Time, in seconds, to suspend PLB in event of an RTO. In order to avoid | |
1124 | having PLB repath onto a connectivity "black hole", after an RTO a TCP | |
1125 | connection suspends PLB repathing for a random duration between 1x and | |
1126 | 2x of this parameter. Randomness is added to avoid concurrent rehashing | |
1127 | of multiple TCP connections. This should be set corresponding to the | |
1128 | amount of time it takes to repair a failed link. | |
1129 | ||
1130 | Possible Values: 0 - 255 | |
1131 | ||
1132 | Default: 60 | |
1133 | ||
1134 | tcp_plb_cong_thresh - INTEGER | |
1135 | Fraction of packets marked with congestion over a round (RTT) to | |
1136 | tag that round as congested. This is referred to as K in the PLB paper: | |
1137 | https://doi.org/10.1145/3544216.3544226. | |
1138 | ||
1139 | The 0-1 fraction range is mapped to 0-256 range to avoid floating | |
1140 | point operations. For example, 128 means that if at least 50% of | |
1141 | the packets in a round were marked as congested then the round | |
1142 | will be tagged as congested. | |
1143 | ||
1144 | Setting threshold to 0 means that PLB repaths every RTT regardless | |
1145 | of congestion. This is not intended behavior for PLB and should be | |
1146 | used only for experimentation purpose. | |
1147 | ||
1148 | Possible Values: 0 - 256 | |
1149 | ||
1150 | Default: 128 | |
1151 | ||
1cec2cac MCC |
1152 | UDP variables |
1153 | ============= | |
95766fff | 1154 | |
63a6fff3 RS |
1155 | udp_l3mdev_accept - BOOLEAN |
1156 | Enabling this option allows a "global" bound socket to work | |
1157 | across L3 master domains (e.g., VRFs) with packets capable of | |
1158 | being received regardless of the L3 domain in which they | |
1159 | originated. Only valid when the kernel was compiled with | |
1160 | CONFIG_NET_L3_MASTER_DEV. | |
1cec2cac MCC |
1161 | |
1162 | Default: 0 (disabled) | |
63a6fff3 | 1163 | |
95766fff HA |
1164 | udp_mem - vector of 3 INTEGERs: min, pressure, max |
1165 | Number of pages allowed for queueing by all UDP sockets. | |
1166 | ||
69dfccbc | 1167 | min: Number of pages allowed for queueing by all UDP sockets. |
95766fff HA |
1168 | |
1169 | pressure: This value was introduced to follow format of tcp_mem. | |
1170 | ||
69dfccbc | 1171 | max: This value was introduced to follow format of tcp_mem. |
95766fff HA |
1172 | |
1173 | Default is calculated at boot time from amount of available memory. | |
1174 | ||
1175 | udp_rmem_min - INTEGER | |
1176 | Minimal size of receive buffer used by UDP sockets in moderation. | |
1177 | Each UDP socket is able to use the size for receiving data, even if | |
1178 | total pages of UDP sockets exceed udp_mem pressure. The unit is byte. | |
1cec2cac | 1179 | |
320bd6de | 1180 | Default: 4K |
95766fff HA |
1181 | |
1182 | udp_wmem_min - INTEGER | |
c6b10de5 | 1183 | UDP does not have tx memory accounting and this tunable has no effect. |
95766fff | 1184 | |
9804985b KI |
1185 | udp_hash_entries - INTEGER |
1186 | Show the number of hash buckets for UDP sockets in the current | |
1187 | networking namespace. | |
1188 | ||
1189 | A negative value means the networking namespace does not own its | |
1190 | hash buckets and shares the initial networking namespace's one. | |
1191 | ||
1192 | udp_child_ehash_entries - INTEGER | |
1193 | Control the number of hash buckets for UDP sockets in the child | |
1194 | networking namespace, which must be set before clone() or unshare(). | |
1195 | ||
1196 | If the value is not 0, the kernel uses a value rounded up to 2^n | |
1197 | as the actual hash bucket size. 0 is a special value, meaning | |
1198 | the child networking namespace will share the initial networking | |
1199 | namespace's hash buckets. | |
1200 | ||
1201 | Note that the child will use the global one in case the kernel | |
1202 | fails to allocate enough memory. In addition, the global hash | |
1203 | buckets are spread over available NUMA nodes, but the allocation | |
1204 | of the child hash table depends on the current process's NUMA | |
1205 | policy, which could result in performance differences. | |
1206 | ||
1207 | Possible values: 0, 2^n (n: 7 (128) - 16 (64K)) | |
1208 | ||
1209 | Default: 0 | |
1210 | ||
1211 | ||
1cec2cac MCC |
1212 | RAW variables |
1213 | ============= | |
6897445f MM |
1214 | |
1215 | raw_l3mdev_accept - BOOLEAN | |
1216 | Enabling this option allows a "global" bound socket to work | |
1217 | across L3 master domains (e.g., VRFs) with packets capable of | |
1218 | being received regardless of the L3 domain in which they | |
1219 | originated. Only valid when the kernel was compiled with | |
1220 | CONFIG_NET_L3_MASTER_DEV. | |
1cec2cac | 1221 | |
6897445f MM |
1222 | Default: 1 (enabled) |
1223 | ||
1cec2cac MCC |
1224 | CIPSOv4 Variables |
1225 | ================= | |
8802f616 PM |
1226 | |
1227 | cipso_cache_enable - BOOLEAN | |
1228 | If set, enable additions to and lookups from the CIPSO label mapping | |
1229 | cache. If unset, additions are ignored and lookups always result in a | |
1230 | miss. However, regardless of the setting the cache is still | |
1231 | invalidated when required when means you can safely toggle this on and | |
1232 | off and the cache will always be "safe". | |
1cec2cac | 1233 | |
8802f616 PM |
1234 | Default: 1 |
1235 | ||
1236 | cipso_cache_bucket_size - INTEGER | |
1237 | The CIPSO label cache consists of a fixed size hash table with each | |
1238 | hash bucket containing a number of cache entries. This variable limits | |
dd44f04b | 1239 | the number of entries in each hash bucket; the larger the value is, the |
8802f616 PM |
1240 | more CIPSO label mappings that can be cached. When the number of |
1241 | entries in a given hash bucket reaches this limit adding new entries | |
1242 | causes the oldest entry in the bucket to be removed to make room. | |
1cec2cac | 1243 | |
8802f616 PM |
1244 | Default: 10 |
1245 | ||
1246 | cipso_rbm_optfmt - BOOLEAN | |
1247 | Enable the "Optimized Tag 1 Format" as defined in section 3.4.2.6 of | |
1248 | the CIPSO draft specification (see Documentation/netlabel for details). | |
1249 | This means that when set the CIPSO tag will be padded with empty | |
1250 | categories in order to make the packet data 32-bit aligned. | |
1cec2cac | 1251 | |
8802f616 PM |
1252 | Default: 0 |
1253 | ||
1254 | cipso_rbm_structvalid - BOOLEAN | |
1255 | If set, do a very strict check of the CIPSO option when | |
1256 | ip_options_compile() is called. If unset, relax the checks done during | |
1257 | ip_options_compile(). Either way is "safe" as errors are caught else | |
1258 | where in the CIPSO processing code but setting this to 0 (False) should | |
1259 | result in less work (i.e. it should be faster) but could cause problems | |
1260 | with other implementations that require strict checking. | |
1cec2cac | 1261 | |
8802f616 PM |
1262 | Default: 0 |
1263 | ||
1cec2cac MCC |
1264 | IP Variables |
1265 | ============ | |
1da177e4 LT |
1266 | |
1267 | ip_local_port_range - 2 INTEGERS | |
1268 | Defines the local port range that is used by TCP and UDP to | |
e18f5feb | 1269 | choose the local port. The first number is the first, the |
07f4c900 | 1270 | second the last local port number. |
ac71676c MŻ |
1271 | If possible, it is better these numbers have different parity |
1272 | (one even and one odd value). | |
1273 | Must be greater than or equal to ip_unprivileged_port_start. | |
07f4c900 | 1274 | The default values are 32768 and 60999 respectively. |
1da177e4 | 1275 | |
e3826f1e AW |
1276 | ip_local_reserved_ports - list of comma separated ranges |
1277 | Specify the ports which are reserved for known third-party | |
1278 | applications. These ports will not be used by automatic port | |
1279 | assignments (e.g. when calling connect() or bind() with port | |
1280 | number 0). Explicit port allocation behavior is unchanged. | |
1281 | ||
1282 | The format used for both input and output is a comma separated | |
1283 | list of ranges (e.g. "1,2-4,10-10" for ports 1, 2, 3, 4 and | |
1284 | 10). Writing to the file will clear all previously reserved | |
1285 | ports and update the current list with the one given in the | |
1286 | input. | |
1287 | ||
1288 | Note that ip_local_port_range and ip_local_reserved_ports | |
1289 | settings are independent and both are considered by the kernel | |
1290 | when determining which ports are available for automatic port | |
1291 | assignments. | |
1292 | ||
1293 | You can reserve ports which are not in the current | |
1cec2cac | 1294 | ip_local_port_range, e.g.:: |
e3826f1e | 1295 | |
1cec2cac MCC |
1296 | $ cat /proc/sys/net/ipv4/ip_local_port_range |
1297 | 32000 60999 | |
1298 | $ cat /proc/sys/net/ipv4/ip_local_reserved_ports | |
1299 | 8080,9148 | |
e3826f1e AW |
1300 | |
1301 | although this is redundant. However such a setting is useful | |
1302 | if later the port range is changed to a value that will | |
a7a80b17 OH |
1303 | include the reserved ports. Also keep in mind, that overlapping |
1304 | of these ranges may affect probability of selecting ephemeral | |
1305 | ports which are right after block of reserved ports. | |
e3826f1e AW |
1306 | |
1307 | Default: Empty | |
1308 | ||
4548b683 KJ |
1309 | ip_unprivileged_port_start - INTEGER |
1310 | This is a per-namespace sysctl. It defines the first | |
1311 | unprivileged port in the network namespace. Privileged ports | |
1312 | require root or CAP_NET_BIND_SERVICE in order to bind to them. | |
ac71676c MŻ |
1313 | To disable all privileged ports, set this to 0. They must not |
1314 | overlap with the ip_local_port_range. | |
4548b683 KJ |
1315 | |
1316 | Default: 1024 | |
1317 | ||
1da177e4 LT |
1318 | ip_nonlocal_bind - BOOLEAN |
1319 | If set, allows processes to bind() to non-local IP addresses, | |
1320 | which can be quite useful - but may break some applications. | |
1cec2cac | 1321 | |
1da177e4 LT |
1322 | Default: 0 |
1323 | ||
4b01a967 KI |
1324 | ip_autobind_reuse - BOOLEAN |
1325 | By default, bind() does not select the ports automatically even if | |
1326 | the new socket and all sockets bound to the port have SO_REUSEADDR. | |
1327 | ip_autobind_reuse allows bind() to reuse the port and this is useful | |
1328 | when you use bind()+connect(), but may break some applications. | |
1329 | The preferred solution is to use IP_BIND_ADDRESS_NO_PORT and this | |
1330 | option should only be set by experts. | |
1331 | Default: 0 | |
1332 | ||
e49e4aff | 1333 | ip_dynaddr - INTEGER |
1da177e4 LT |
1334 | If set non-zero, enables support for dynamic addresses. |
1335 | If set to a non-zero value larger than 1, a kernel log | |
1336 | message will be printed when dynamic address rewriting | |
1337 | occurs. | |
1cec2cac | 1338 | |
1da177e4 LT |
1339 | Default: 0 |
1340 | ||
e3d73bce CW |
1341 | ip_early_demux - BOOLEAN |
1342 | Optimize input packet processing down to one demux for | |
1343 | certain kinds of local sockets. Currently we only do this | |
dddb64bc | 1344 | for established TCP and connected UDP sockets. |
e3d73bce CW |
1345 | |
1346 | It may add an additional cost for pure routing workloads that | |
1347 | reduces overall throughput, in such case you should disable it. | |
1cec2cac | 1348 | |
e3d73bce CW |
1349 | Default: 1 |
1350 | ||
5cc4adbc SH |
1351 | ping_group_range - 2 INTEGERS |
1352 | Restrict ICMP_PROTO datagram sockets to users in the group range. | |
1353 | The default is "1 0", meaning, that nobody (not even root) may | |
1354 | create ping sockets. Setting it to "100 100" would grant permissions | |
1355 | to the single group. "0 4294967295" would enable it for the world, "100 | |
1356 | 4294967295" would enable it for the users, but not daemons. | |
1357 | ||
dddb64bc | 1358 | tcp_early_demux - BOOLEAN |
1359 | Enable early demux for established TCP sockets. | |
1cec2cac | 1360 | |
dddb64bc | 1361 | Default: 1 |
1362 | ||
1363 | udp_early_demux - BOOLEAN | |
1364 | Enable early demux for connected UDP sockets. Disable this if | |
1365 | your system could experience more unconnected load. | |
1cec2cac | 1366 | |
dddb64bc | 1367 | Default: 1 |
1368 | ||
1da177e4 | 1369 | icmp_echo_ignore_all - BOOLEAN |
7ce31246 DM |
1370 | If set non-zero, then the kernel will ignore all ICMP ECHO |
1371 | requests sent to it. | |
1cec2cac | 1372 | |
7ce31246 DM |
1373 | Default: 0 |
1374 | ||
f1b8fa9f AR |
1375 | icmp_echo_enable_probe - BOOLEAN |
1376 | If set to one, then the kernel will respond to RFC 8335 PROBE | |
1377 | requests sent to it. | |
1378 | ||
1379 | Default: 0 | |
1380 | ||
1da177e4 | 1381 | icmp_echo_ignore_broadcasts - BOOLEAN |
7ce31246 DM |
1382 | If set non-zero, then the kernel will ignore all ICMP ECHO and |
1383 | TIMESTAMP requests sent to it via broadcast/multicast. | |
1cec2cac | 1384 | |
7ce31246 | 1385 | Default: 1 |
1da177e4 LT |
1386 | |
1387 | icmp_ratelimit - INTEGER | |
1388 | Limit the maximal rates for sending ICMP packets whose type matches | |
1389 | icmp_ratemask (see below) to specific targets. | |
6dbf4bca SH |
1390 | 0 to disable any limiting, |
1391 | otherwise the minimal space between responses in milliseconds. | |
4cdf507d ED |
1392 | Note that another sysctl, icmp_msgs_per_sec limits the number |
1393 | of ICMP packets sent on all targets. | |
1cec2cac | 1394 | |
6dbf4bca | 1395 | Default: 1000 |
1da177e4 | 1396 | |
4cdf507d ED |
1397 | icmp_msgs_per_sec - INTEGER |
1398 | Limit maximal number of ICMP packets sent per second from this host. | |
1399 | Only messages whose type matches icmp_ratemask (see below) are | |
b38e7819 ED |
1400 | controlled by this limit. For security reasons, the precise count |
1401 | of messages per second is randomized. | |
1cec2cac | 1402 | |
6dbf4bca | 1403 | Default: 1000 |
1da177e4 | 1404 | |
4cdf507d ED |
1405 | icmp_msgs_burst - INTEGER |
1406 | icmp_msgs_per_sec controls number of ICMP packets sent per second, | |
1407 | while icmp_msgs_burst controls the burst size of these packets. | |
b38e7819 | 1408 | For security reasons, the precise burst size is randomized. |
1cec2cac | 1409 | |
4cdf507d ED |
1410 | Default: 50 |
1411 | ||
1da177e4 LT |
1412 | icmp_ratemask - INTEGER |
1413 | Mask made of ICMP types for which rates are being limited. | |
1cec2cac | 1414 | |
1da177e4 | 1415 | Significant bits: IHGFEDCBA9876543210 |
1cec2cac | 1416 | |
1da177e4 LT |
1417 | Default mask: 0000001100000011000 (6168) |
1418 | ||
1419 | Bit definitions (see include/linux/icmp.h): | |
1cec2cac MCC |
1420 | |
1421 | = ========================= | |
1da177e4 | 1422 | 0 Echo Reply |
1cec2cac MCC |
1423 | 3 Destination Unreachable [1]_ |
1424 | 4 Source Quench [1]_ | |
1da177e4 LT |
1425 | 5 Redirect |
1426 | 8 Echo Request | |
1cec2cac MCC |
1427 | B Time Exceeded [1]_ |
1428 | C Parameter Problem [1]_ | |
1da177e4 LT |
1429 | D Timestamp Request |
1430 | E Timestamp Reply | |
1431 | F Info Request | |
1432 | G Info Reply | |
1433 | H Address Mask Request | |
1434 | I Address Mask Reply | |
1cec2cac | 1435 | = ========================= |
1da177e4 | 1436 | |
1cec2cac | 1437 | .. [1] These are rate limited by default (see default mask above) |
1da177e4 LT |
1438 | |
1439 | icmp_ignore_bogus_error_responses - BOOLEAN | |
1440 | Some routers violate RFC1122 by sending bogus responses to broadcast | |
1441 | frames. Such violations are normally logged via a kernel warning. | |
1442 | If this is set to TRUE, the kernel will not give such warnings, which | |
1443 | will avoid log file clutter. | |
1cec2cac | 1444 | |
e8b265e8 | 1445 | Default: 1 |
1da177e4 | 1446 | |
95f7daf1 H |
1447 | icmp_errors_use_inbound_ifaddr - BOOLEAN |
1448 | ||
02a6d613 PA |
1449 | If zero, icmp error messages are sent with the primary address of |
1450 | the exiting interface. | |
e18f5feb | 1451 | |
95f7daf1 H |
1452 | If non-zero, the message will be sent with the primary address of |
1453 | the interface that received the packet that caused the icmp error. | |
31628201 | 1454 | This is the behaviour many network administrators will expect from |
95f7daf1 | 1455 | a router. And it can make debugging complicated network layouts |
e18f5feb | 1456 | much easier. |
95f7daf1 H |
1457 | |
1458 | Note that if no primary address exists for the interface selected, | |
1459 | then the primary address of the first non-loopback interface that | |
d6bc8ac9 | 1460 | has one will be used regardless of this setting. |
95f7daf1 H |
1461 | |
1462 | Default: 0 | |
1463 | ||
1da177e4 LT |
1464 | igmp_max_memberships - INTEGER |
1465 | Change the maximum number of multicast groups we can subscribe to. | |
1466 | Default: 20 | |
1467 | ||
d67ef35f JE |
1468 | Theoretical maximum value is bounded by having to send a membership |
1469 | report in a single datagram (i.e. the report can't span multiple | |
1470 | datagrams, or risk confusing the switch and leaving groups you don't | |
1471 | intend to). | |
1da177e4 | 1472 | |
d67ef35f JE |
1473 | The number of supported groups 'M' is bounded by the number of group |
1474 | report entries you can fit into a single datagram of 65535 bytes. | |
1475 | ||
1476 | M = 65536-sizeof (ip header)/(sizeof(Group record)) | |
1477 | ||
1478 | Group records are variable length, with a minimum of 12 bytes. | |
1479 | So net.ipv4.igmp_max_memberships should not be set higher than: | |
1480 | ||
1481 | (65536-24) / 12 = 5459 | |
1482 | ||
1483 | The value 5459 assumes no IP header options, so in practice | |
1484 | this number may be lower. | |
1485 | ||
537377d3 BP |
1486 | igmp_max_msf - INTEGER |
1487 | Maximum number of addresses allowed in the source filter list for a | |
1488 | multicast group. | |
1cec2cac | 1489 | |
537377d3 BP |
1490 | Default: 10 |
1491 | ||
a9fe8e29 | 1492 | igmp_qrv - INTEGER |
537377d3 | 1493 | Controls the IGMP query robustness variable (see RFC2236 8.1). |
1cec2cac | 1494 | |
537377d3 | 1495 | Default: 2 (as specified by RFC2236 8.1) |
1cec2cac | 1496 | |
537377d3 | 1497 | Minimum: 1 (as specified by RFC6636 4.5) |
a9fe8e29 | 1498 | |
1af92836 | 1499 | force_igmp_version - INTEGER |
1cec2cac MCC |
1500 | - 0 - (default) No enforcement of a IGMP version, IGMPv1/v2 fallback |
1501 | allowed. Will back to IGMPv3 mode again if all IGMPv1/v2 Querier | |
1502 | Present timer expires. | |
1503 | - 1 - Enforce to use IGMP version 1. Will also reply IGMPv1 report if | |
1504 | receive IGMPv2/v3 query. | |
1505 | - 2 - Enforce to use IGMP version 2. Will fallback to IGMPv1 if receive | |
1506 | IGMPv1 query message. Will reply report if receive IGMPv3 query. | |
1507 | - 3 - Enforce to use IGMP version 3. The same react with default 0. | |
1508 | ||
1509 | .. note:: | |
1af92836 | 1510 | |
1cec2cac MCC |
1511 | this is not the same with force_mld_version because IGMPv3 RFC3376 |
1512 | Security Considerations does not have clear description that we could | |
1513 | ignore other version messages completely as MLDv2 RFC3810. So make | |
1514 | this value as default 0 is recommended. | |
1af92836 | 1515 | |
1cec2cac MCC |
1516 | ``conf/interface/*`` |
1517 | changes special settings per interface (where | |
1518 | interface" is the name of your network interface) | |
6b226e2f | 1519 | |
1cec2cac MCC |
1520 | ``conf/all/*`` |
1521 | is special, changes the settings for all interfaces | |
6b226e2f | 1522 | |
1da177e4 LT |
1523 | log_martians - BOOLEAN |
1524 | Log packets with impossible addresses to kernel log. | |
1525 | log_martians for the interface will be enabled if at least one of | |
1526 | conf/{all,interface}/log_martians is set to TRUE, | |
1527 | it will be disabled otherwise | |
1528 | ||
1529 | accept_redirects - BOOLEAN | |
1530 | Accept ICMP redirect messages. | |
1531 | accept_redirects for the interface will be enabled if: | |
1cec2cac | 1532 | |
e18f5feb JDB |
1533 | - both conf/{all,interface}/accept_redirects are TRUE in the case |
1534 | forwarding for the interface is enabled | |
1cec2cac | 1535 | |
1da177e4 | 1536 | or |
1cec2cac | 1537 | |
e18f5feb JDB |
1538 | - at least one of conf/{all,interface}/accept_redirects is TRUE in the |
1539 | case forwarding for the interface is disabled | |
1cec2cac | 1540 | |
1da177e4 | 1541 | accept_redirects for the interface will be disabled otherwise |
1cec2cac MCC |
1542 | |
1543 | default: | |
1544 | ||
1545 | - TRUE (host) | |
1546 | - FALSE (router) | |
1da177e4 LT |
1547 | |
1548 | forwarding - BOOLEAN | |
88a7cddc NJ |
1549 | Enable IP forwarding on this interface. This controls whether packets |
1550 | received _on_ this interface can be forwarded. | |
1da177e4 LT |
1551 | |
1552 | mc_forwarding - BOOLEAN | |
1553 | Do multicast routing. The kernel needs to be compiled with CONFIG_MROUTE | |
1554 | and a multicast routing daemon is required. | |
e18f5feb JDB |
1555 | conf/all/mc_forwarding must also be set to TRUE to enable multicast |
1556 | routing for the interface | |
1da177e4 LT |
1557 | |
1558 | medium_id - INTEGER | |
1559 | Integer value used to differentiate the devices by the medium they | |
1560 | are attached to. Two devices can have different id values when | |
1561 | the broadcast packets are received only on one of them. | |
1562 | The default value 0 means that the device is the only interface | |
1563 | to its medium, value of -1 means that medium is not known. | |
e18f5feb | 1564 | |
1da177e4 LT |
1565 | Currently, it is used to change the proxy_arp behavior: |
1566 | the proxy_arp feature is enabled for packets forwarded between | |
1567 | two devices attached to different media. | |
1568 | ||
1569 | proxy_arp - BOOLEAN | |
1570 | Do proxy arp. | |
1cec2cac | 1571 | |
1da177e4 LT |
1572 | proxy_arp for the interface will be enabled if at least one of |
1573 | conf/{all,interface}/proxy_arp is set to TRUE, | |
1574 | it will be disabled otherwise | |
1575 | ||
65324144 JDB |
1576 | proxy_arp_pvlan - BOOLEAN |
1577 | Private VLAN proxy arp. | |
1cec2cac | 1578 | |
65324144 JDB |
1579 | Basically allow proxy arp replies back to the same interface |
1580 | (from which the ARP request/solicitation was received). | |
1581 | ||
1582 | This is done to support (ethernet) switch features, like RFC | |
1583 | 3069, where the individual ports are NOT allowed to | |
1584 | communicate with each other, but they are allowed to talk to | |
1585 | the upstream router. As described in RFC 3069, it is possible | |
1586 | to allow these hosts to communicate through the upstream | |
1587 | router by proxy_arp'ing. Don't need to be used together with | |
1588 | proxy_arp. | |
1589 | ||
1590 | This technology is known by different names: | |
1cec2cac | 1591 | |
65324144 JDB |
1592 | In RFC 3069 it is called VLAN Aggregation. |
1593 | Cisco and Allied Telesyn call it Private VLAN. | |
1594 | Hewlett-Packard call it Source-Port filtering or port-isolation. | |
1595 | Ericsson call it MAC-Forced Forwarding (RFC Draft). | |
1596 | ||
62e395f8 BH |
1597 | proxy_delay - INTEGER |
1598 | Delay proxy response. | |
1599 | ||
1600 | Delay response to a neighbor solicitation when proxy_arp | |
1601 | or proxy_ndp is enabled. A random value between [0, proxy_delay) | |
1602 | will be chosen, setting to zero means reply with no delay. | |
1603 | Value in jiffies. Defaults to 80. | |
1604 | ||
1da177e4 LT |
1605 | shared_media - BOOLEAN |
1606 | Send(router) or accept(host) RFC1620 shared media redirects. | |
176b346b | 1607 | Overrides secure_redirects. |
1cec2cac | 1608 | |
1da177e4 LT |
1609 | shared_media for the interface will be enabled if at least one of |
1610 | conf/{all,interface}/shared_media is set to TRUE, | |
1611 | it will be disabled otherwise | |
1cec2cac | 1612 | |
1da177e4 LT |
1613 | default TRUE |
1614 | ||
1615 | secure_redirects - BOOLEAN | |
176b346b EG |
1616 | Accept ICMP redirect messages only to gateways listed in the |
1617 | interface's current gateway list. Even if disabled, RFC1122 redirect | |
1618 | rules still apply. | |
1cec2cac | 1619 | |
176b346b | 1620 | Overridden by shared_media. |
1cec2cac | 1621 | |
1da177e4 LT |
1622 | secure_redirects for the interface will be enabled if at least one of |
1623 | conf/{all,interface}/secure_redirects is set to TRUE, | |
1624 | it will be disabled otherwise | |
1cec2cac | 1625 | |
1da177e4 LT |
1626 | default TRUE |
1627 | ||
1628 | send_redirects - BOOLEAN | |
1629 | Send redirects, if router. | |
1cec2cac | 1630 | |
1da177e4 LT |
1631 | send_redirects for the interface will be enabled if at least one of |
1632 | conf/{all,interface}/send_redirects is set to TRUE, | |
1633 | it will be disabled otherwise | |
1cec2cac | 1634 | |
1da177e4 LT |
1635 | Default: TRUE |
1636 | ||
1637 | bootp_relay - BOOLEAN | |
1638 | Accept packets with source address 0.b.c.d destined | |
1639 | not to this host as local ones. It is supposed, that | |
1640 | BOOTP relay daemon will catch and forward such packets. | |
1641 | conf/all/bootp_relay must also be set to TRUE to enable BOOTP relay | |
1642 | for the interface | |
1cec2cac | 1643 | |
1da177e4 | 1644 | default FALSE |
1cec2cac | 1645 | |
1da177e4 LT |
1646 | Not Implemented Yet. |
1647 | ||
1648 | accept_source_route - BOOLEAN | |
1649 | Accept packets with SRR option. | |
1650 | conf/all/accept_source_route must also be set to TRUE to accept packets | |
1651 | with SRR option on the interface | |
1cec2cac MCC |
1652 | |
1653 | default | |
1654 | ||
1655 | - TRUE (router) | |
1656 | - FALSE (host) | |
1da177e4 | 1657 | |
8153a10c | 1658 | accept_local - BOOLEAN |
72b126a4 SB |
1659 | Accept packets with local source addresses. In combination with |
1660 | suitable routing, this can be used to direct packets between two | |
1661 | local interfaces over the wire and have them accepted properly. | |
8153a10c PM |
1662 | default FALSE |
1663 | ||
d0daebc3 TG |
1664 | route_localnet - BOOLEAN |
1665 | Do not consider loopback addresses as martian source or destination | |
1666 | while routing. This enables the use of 127/8 for local routing purposes. | |
1cec2cac | 1667 | |
d0daebc3 TG |
1668 | default FALSE |
1669 | ||
c1cf8422 | 1670 | rp_filter - INTEGER |
1cec2cac MCC |
1671 | - 0 - No source validation. |
1672 | - 1 - Strict mode as defined in RFC3704 Strict Reverse Path | |
1673 | Each incoming packet is tested against the FIB and if the interface | |
1674 | is not the best reverse path the packet check will fail. | |
1675 | By default failed packets are discarded. | |
1676 | - 2 - Loose mode as defined in RFC3704 Loose Reverse Path | |
1677 | Each incoming packet's source address is also tested against the FIB | |
1678 | and if the source address is not reachable via any interface | |
1679 | the packet check will fail. | |
c1cf8422 | 1680 | |
e18f5feb | 1681 | Current recommended practice in RFC3704 is to enable strict mode |
bf869c30 | 1682 | to prevent IP spoofing from DDos attacks. If using asymmetric routing |
e18f5feb | 1683 | or other complicated routing, then loose mode is recommended. |
c1cf8422 | 1684 | |
1f5865e7 SW |
1685 | The max value from conf/{all,interface}/rp_filter is used |
1686 | when doing source validation on the {interface}. | |
1da177e4 LT |
1687 | |
1688 | Default value is 0. Note that some distributions enable it | |
1689 | in startup scripts. | |
1690 | ||
8cf5d8cc JV |
1691 | src_valid_mark - BOOLEAN |
1692 | - 0 - The fwmark of the packet is not included in reverse path | |
1693 | route lookup. This allows for asymmetric routing configurations | |
1694 | utilizing the fwmark in only one direction, e.g., transparent | |
1695 | proxying. | |
1696 | ||
1697 | - 1 - The fwmark of the packet is included in reverse path route | |
1698 | lookup. This permits rp_filter to function when the fwmark is | |
1699 | used for routing traffic in both directions. | |
1700 | ||
1701 | This setting also affects the utilization of fmwark when | |
1702 | performing source address selection for ICMP replies, or | |
1703 | determining addresses stored for the IPOPT_TS_TSANDADDR and | |
1704 | IPOPT_RR IP options. | |
1705 | ||
1706 | The max value from conf/{all,interface}/src_valid_mark is used. | |
1707 | ||
1708 | Default value is 0. | |
1709 | ||
1da177e4 | 1710 | arp_filter - BOOLEAN |
1cec2cac MCC |
1711 | - 1 - Allows you to have multiple network interfaces on the same |
1712 | subnet, and have the ARPs for each interface be answered | |
1713 | based on whether or not the kernel would route a packet from | |
1714 | the ARP'd IP out that interface (therefore you must use source | |
1715 | based routing for this to work). In other words it allows control | |
1716 | of which cards (usually 1) will respond to an arp request. | |
1717 | ||
1718 | - 0 - (default) The kernel can respond to arp requests with addresses | |
1719 | from other interfaces. This may seem wrong but it usually makes | |
1720 | sense, because it increases the chance of successful communication. | |
1721 | IP addresses are owned by the complete host on Linux, not by | |
1722 | particular interfaces. Only for more complex setups like load- | |
1723 | balancing, does this behaviour cause problems. | |
1da177e4 LT |
1724 | |
1725 | arp_filter for the interface will be enabled if at least one of | |
1726 | conf/{all,interface}/arp_filter is set to TRUE, | |
1727 | it will be disabled otherwise | |
1728 | ||
1729 | arp_announce - INTEGER | |
1730 | Define different restriction levels for announcing the local | |
1731 | source IP address from IP packets in ARP requests sent on | |
1732 | interface: | |
1cec2cac MCC |
1733 | |
1734 | - 0 - (default) Use any local address, configured on any interface | |
1735 | - 1 - Try to avoid local addresses that are not in the target's | |
1736 | subnet for this interface. This mode is useful when target | |
1737 | hosts reachable via this interface require the source IP | |
1738 | address in ARP requests to be part of their logical network | |
1739 | configured on the receiving interface. When we generate the | |
1740 | request we will check all our subnets that include the | |
1741 | target IP and will preserve the source address if it is from | |
1742 | such subnet. If there is no such subnet we select source | |
1743 | address according to the rules for level 2. | |
1744 | - 2 - Always use the best local address for this target. | |
1745 | In this mode we ignore the source address in the IP packet | |
1746 | and try to select local address that we prefer for talks with | |
1747 | the target host. Such local address is selected by looking | |
1748 | for primary IP addresses on all our subnets on the outgoing | |
1749 | interface that include the target IP address. If no suitable | |
1750 | local address is found we select the first local address | |
1751 | we have on the outgoing interface or on all other interfaces, | |
1752 | with the hope we will receive reply for our request and | |
1753 | even sometimes no matter the source IP address we announce. | |
1da177e4 LT |
1754 | |
1755 | The max value from conf/{all,interface}/arp_announce is used. | |
1756 | ||
1757 | Increasing the restriction level gives more chance for | |
1758 | receiving answer from the resolved target while decreasing | |
1759 | the level announces more valid sender's information. | |
1760 | ||
1761 | arp_ignore - INTEGER | |
1762 | Define different modes for sending replies in response to | |
1763 | received ARP requests that resolve local target IP addresses: | |
1cec2cac MCC |
1764 | |
1765 | - 0 - (default): reply for any local target IP address, configured | |
1766 | on any interface | |
1767 | - 1 - reply only if the target IP address is local address | |
1768 | configured on the incoming interface | |
1769 | - 2 - reply only if the target IP address is local address | |
1770 | configured on the incoming interface and both with the | |
1771 | sender's IP address are part from same subnet on this interface | |
1772 | - 3 - do not reply for local addresses configured with scope host, | |
1773 | only resolutions for global and link addresses are replied | |
1774 | - 4-7 - reserved | |
1775 | - 8 - do not reply for all local addresses | |
1da177e4 LT |
1776 | |
1777 | The max value from conf/{all,interface}/arp_ignore is used | |
1778 | when ARP request is received on the {interface} | |
1779 | ||
eefef1cf SH |
1780 | arp_notify - BOOLEAN |
1781 | Define mode for notification of address and device changes. | |
1cec2cac MCC |
1782 | |
1783 | == ========================================================== | |
1784 | 0 (default): do nothing | |
1785 | 1 Generate gratuitous arp requests when device is brought up | |
1786 | or hardware address changes. | |
1787 | == ========================================================== | |
eefef1cf | 1788 | |
e68c5dcf JP |
1789 | arp_accept - INTEGER |
1790 | Define behavior for accepting gratuitous ARP (garp) frames from devices | |
1791 | that are not already present in the ARP table: | |
1cec2cac MCC |
1792 | |
1793 | - 0 - don't create new entries in the ARP table | |
1794 | - 1 - create new entries in the ARP table | |
e68c5dcf JP |
1795 | - 2 - create new entries only if the source IP address is in the same |
1796 | subnet as an address configured on the interface that received the | |
1797 | garp message. | |
6d955180 OP |
1798 | |
1799 | Both replies and requests type gratuitous arp will trigger the | |
1800 | ARP table to be updated, if this setting is on. | |
1801 | ||
1802 | If the ARP table already contains the IP address of the | |
1803 | gratuitous arp frame, the arp table will be updated regardless | |
1804 | if this setting is on or off. | |
1805 | ||
fcdb44d0 JP |
1806 | arp_evict_nocarrier - BOOLEAN |
1807 | Clears the ARP cache on NOCARRIER events. This option is important for | |
1808 | wireless devices where the ARP cache should not be cleared when roaming | |
1809 | between access points on the same network. In most cases this should | |
1810 | remain as the default (1). | |
1811 | ||
1812 | - 1 - (default): Clear the ARP cache on NOCARRIER events | |
1813 | - 0 - Do not clear ARP cache on NOCARRIER events | |
1814 | ||
89c69d3c YH |
1815 | mcast_solicit - INTEGER |
1816 | The maximum number of multicast probes in INCOMPLETE state, | |
1817 | when the associated hardware address is unknown. Defaults | |
1818 | to 3. | |
1819 | ||
1820 | ucast_solicit - INTEGER | |
1821 | The maximum number of unicast probes in PROBE state, when | |
1822 | the hardware address is being reconfirmed. Defaults to 3. | |
c1b1bce8 | 1823 | |
1da177e4 LT |
1824 | app_solicit - INTEGER |
1825 | The maximum number of probes to send to the user space ARP daemon | |
1826 | via netlink before dropping back to multicast probes (see | |
89c69d3c YH |
1827 | mcast_resolicit). Defaults to 0. |
1828 | ||
1829 | mcast_resolicit - INTEGER | |
1830 | The maximum number of multicast probes after unicast and | |
1831 | app probes in PROBE state. Defaults to 0. | |
1da177e4 LT |
1832 | |
1833 | disable_policy - BOOLEAN | |
1834 | Disable IPSEC policy (SPD) for this interface | |
1835 | ||
1836 | disable_xfrm - BOOLEAN | |
1837 | Disable IPSEC encryption on this interface, whatever the policy | |
1838 | ||
fc4eba58 HFS |
1839 | igmpv2_unsolicited_report_interval - INTEGER |
1840 | The interval in milliseconds in which the next unsolicited | |
1841 | IGMPv1 or IGMPv2 report retransmit will take place. | |
1cec2cac | 1842 | |
fc4eba58 | 1843 | Default: 10000 (10 seconds) |
1da177e4 | 1844 | |
fc4eba58 HFS |
1845 | igmpv3_unsolicited_report_interval - INTEGER |
1846 | The interval in milliseconds in which the next unsolicited | |
1847 | IGMPv3 report retransmit will take place. | |
1cec2cac | 1848 | |
fc4eba58 | 1849 | Default: 1000 (1 seconds) |
1da177e4 | 1850 | |
c0c5a60f VB |
1851 | ignore_routes_with_linkdown - BOOLEAN |
1852 | Ignore routes whose link is down when performing a FIB lookup. | |
1853 | ||
d922e1cb MS |
1854 | promote_secondaries - BOOLEAN |
1855 | When a primary IP address is removed from this interface | |
1856 | promote a corresponding secondary IP address instead of | |
1857 | removing all the corresponding secondary IP addresses. | |
1858 | ||
12b74dfa JB |
1859 | drop_unicast_in_l2_multicast - BOOLEAN |
1860 | Drop any unicast IP packets that are received in link-layer | |
1861 | multicast (or broadcast) frames. | |
1cec2cac | 1862 | |
12b74dfa JB |
1863 | This behavior (for multicast) is actually a SHOULD in RFC |
1864 | 1122, but is disabled by default for compatibility reasons. | |
1cec2cac | 1865 | |
12b74dfa JB |
1866 | Default: off (0) |
1867 | ||
97daf331 JB |
1868 | drop_gratuitous_arp - BOOLEAN |
1869 | Drop all gratuitous ARP frames, for example if there's a known | |
1870 | good ARP proxy on the network and such frames need not be used | |
1871 | (or in the case of 802.11, must not be used to prevent attacks.) | |
1cec2cac | 1872 | |
97daf331 JB |
1873 | Default: off (0) |
1874 | ||
d922e1cb | 1875 | |
1da177e4 LT |
1876 | tag - INTEGER |
1877 | Allows you to write a number, which can be used as required. | |
1cec2cac | 1878 | |
1da177e4 LT |
1879 | Default value is 0. |
1880 | ||
e69948a0 | 1881 | xfrm4_gc_thresh - INTEGER |
837f7411 | 1882 | (Obsolete since linux-4.14) |
e69948a0 AD |
1883 | The threshold at which we will start garbage collecting for IPv4 |
1884 | destination cache entries. At twice this value the system will | |
3c2a89dd | 1885 | refuse new allocations. |
e69948a0 | 1886 | |
87583ebb PD |
1887 | igmp_link_local_mcast_reports - BOOLEAN |
1888 | Enable IGMP reports for link local multicast groups in the | |
1889 | 224.0.0.X range. | |
1cec2cac | 1890 | |
87583ebb PD |
1891 | Default TRUE |
1892 | ||
1da177e4 LT |
1893 | Alexey Kuznetsov. |
1894 | kuznet@ms2.inr.ac.ru | |
1895 | ||
1896 | Updated by: | |
1da177e4 | 1897 | |
1cec2cac MCC |
1898 | - Andi Kleen |
1899 | ak@muc.de | |
1900 | - Nicolas Delon | |
1901 | delon.nicolas@wanadoo.fr | |
1da177e4 LT |
1902 | |
1903 | ||
1904 | ||
1cec2cac MCC |
1905 | |
1906 | /proc/sys/net/ipv6/* Variables | |
1907 | ============================== | |
1da177e4 LT |
1908 | |
1909 | IPv6 has no global variables such as tcp_*. tcp_* settings under ipv4/ also | |
1910 | apply to IPv6 [XXX?]. | |
1911 | ||
1912 | bindv6only - BOOLEAN | |
1913 | Default value for IPV6_V6ONLY socket option, | |
e18f5feb | 1914 | which restricts use of the IPv6 socket to IPv6 communication |
1da177e4 | 1915 | only. |
1cec2cac MCC |
1916 | |
1917 | - TRUE: disable IPv4-mapped address feature | |
1918 | - FALSE: enable IPv4-mapped address feature | |
1da177e4 | 1919 | |
d5c073ca | 1920 | Default: FALSE (as specified in RFC3493) |
1da177e4 | 1921 | |
6444f72b FF |
1922 | flowlabel_consistency - BOOLEAN |
1923 | Protect the consistency (and unicity) of flow label. | |
1924 | You have to disable it to use IPV6_FL_F_REFLECT flag on the | |
1925 | flow label manager. | |
1cec2cac MCC |
1926 | |
1927 | - TRUE: enabled | |
1928 | - FALSE: disabled | |
1929 | ||
6444f72b FF |
1930 | Default: TRUE |
1931 | ||
42240901 TH |
1932 | auto_flowlabels - INTEGER |
1933 | Automatically generate flow labels based on a flow hash of the | |
1934 | packet. This allows intermediate devices, such as routers, to | |
1935 | identify packet flows for mechanisms like Equal Cost Multipath | |
cb1ce2ef | 1936 | Routing (see RFC 6438). |
1cec2cac MCC |
1937 | |
1938 | = =========================================================== | |
1939 | 0 automatic flow labels are completely disabled | |
1940 | 1 automatic flow labels are enabled by default, they can be | |
42240901 TH |
1941 | disabled on a per socket basis using the IPV6_AUTOFLOWLABEL |
1942 | socket option | |
1cec2cac | 1943 | 2 automatic flow labels are allowed, they may be enabled on a |
42240901 | 1944 | per socket basis using the IPV6_AUTOFLOWLABEL socket option |
1cec2cac | 1945 | 3 automatic flow labels are enabled and enforced, they cannot |
42240901 | 1946 | be disabled by the socket option |
1cec2cac MCC |
1947 | = =========================================================== |
1948 | ||
b5677416 | 1949 | Default: 1 |
cb1ce2ef | 1950 | |
82a584b7 TH |
1951 | flowlabel_state_ranges - BOOLEAN |
1952 | Split the flow label number space into two ranges. 0-0x7FFFF is | |
1953 | reserved for the IPv6 flow manager facility, 0x80000-0xFFFFF | |
1954 | is reserved for stateless flow labels as described in RFC6437. | |
1cec2cac MCC |
1955 | |
1956 | - TRUE: enabled | |
1957 | - FALSE: disabled | |
1958 | ||
82a584b7 TH |
1959 | Default: true |
1960 | ||
323a53c4 ED |
1961 | flowlabel_reflect - INTEGER |
1962 | Control flow label reflection. Needed for Path MTU | |
22b6722b JS |
1963 | Discovery to work with Equal Cost Multipath Routing in anycast |
1964 | environments. See RFC 7690 and: | |
1965 | https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01 | |
323a53c4 | 1966 | |
a346abe0 | 1967 | This is a bitmask. |
323a53c4 | 1968 | |
1cec2cac MCC |
1969 | - 1: enabled for established flows |
1970 | ||
1971 | Note that this prevents automatic flowlabel changes, as done | |
1972 | in "tcp: change IPv6 flow-label upon receiving spurious retransmission" | |
1973 | and "tcp: Change txhash on every SYN and RTO retransmit" | |
323a53c4 | 1974 | |
1cec2cac MCC |
1975 | - 2: enabled for TCP RESET packets (no active listener) |
1976 | If set, a RST packet sent in response to a SYN packet on a closed | |
1977 | port will reflect the incoming flow label. | |
323a53c4 | 1978 | |
1cec2cac | 1979 | - 4: enabled for ICMPv6 echo reply messages. |
a346abe0 | 1980 | |
323a53c4 | 1981 | Default: 0 |
22b6722b | 1982 | |
b4bac172 DA |
1983 | fib_multipath_hash_policy - INTEGER |
1984 | Controls which hash policy to use for multipath routes. | |
1cec2cac | 1985 | |
b4bac172 | 1986 | Default: 0 (Layer 3) |
1cec2cac | 1987 | |
b4bac172 | 1988 | Possible values: |
1cec2cac MCC |
1989 | |
1990 | - 0 - Layer 3 (source and destination addresses plus flow label) | |
1991 | - 1 - Layer 4 (standard 5-tuple) | |
1992 | - 2 - Layer 3 or inner Layer 3 if present | |
73c2c5cb IS |
1993 | - 3 - Custom multipath hash. Fields used for multipath hash calculation |
1994 | are determined by fib_multipath_hash_fields sysctl | |
b4bac172 | 1995 | |
ed13923f IS |
1996 | fib_multipath_hash_fields - UNSIGNED INTEGER |
1997 | When fib_multipath_hash_policy is set to 3 (custom multipath hash), the | |
1998 | fields used for multipath hash calculation are determined by this | |
1999 | sysctl. | |
2000 | ||
2001 | This value is a bitmask which enables various fields for multipath hash | |
2002 | calculation. | |
2003 | ||
2004 | Possible fields are: | |
2005 | ||
2006 | ====== ============================ | |
2007 | 0x0001 Source IP address | |
2008 | 0x0002 Destination IP address | |
2009 | 0x0004 IP protocol | |
2010 | 0x0008 Flow Label | |
2011 | 0x0010 Source port | |
2012 | 0x0020 Destination port | |
2013 | 0x0040 Inner source IP address | |
2014 | 0x0080 Inner destination IP address | |
2015 | 0x0100 Inner IP protocol | |
2016 | 0x0200 Inner Flow Label | |
2017 | 0x0400 Inner source port | |
2018 | 0x0800 Inner destination port | |
2019 | ====== ============================ | |
2020 | ||
2021 | Default: 0x0007 (source IP, destination IP and IP protocol) | |
2022 | ||
509aba3b FLB |
2023 | anycast_src_echo_reply - BOOLEAN |
2024 | Controls the use of anycast addresses as source addresses for ICMPv6 | |
2025 | echo reply | |
1cec2cac MCC |
2026 | |
2027 | - TRUE: enabled | |
2028 | - FALSE: disabled | |
2029 | ||
509aba3b FLB |
2030 | Default: FALSE |
2031 | ||
9f0761c1 HFS |
2032 | idgen_delay - INTEGER |
2033 | Controls the delay in seconds after which time to retry | |
2034 | privacy stable address generation if a DAD conflict is | |
2035 | detected. | |
1cec2cac | 2036 | |
9f0761c1 HFS |
2037 | Default: 1 (as specified in RFC7217) |
2038 | ||
2039 | idgen_retries - INTEGER | |
2040 | Controls the number of retries to generate a stable privacy | |
2041 | address if a DAD conflict is detected. | |
1cec2cac | 2042 | |
9f0761c1 HFS |
2043 | Default: 3 (as specified in RFC7217) |
2044 | ||
2f711939 HFS |
2045 | mld_qrv - INTEGER |
2046 | Controls the MLD query robustness variable (see RFC3810 9.1). | |
1cec2cac | 2047 | |
2f711939 | 2048 | Default: 2 (as specified by RFC3810 9.1) |
1cec2cac | 2049 | |
2f711939 HFS |
2050 | Minimum: 1 (as specified by RFC6636 4.5) |
2051 | ||
ab913455 | 2052 | max_dst_opts_number - INTEGER |
47d3d7ac TH |
2053 | Maximum number of non-padding TLVs allowed in a Destination |
2054 | options extension header. If this value is less than zero | |
2055 | then unknown options are disallowed and the number of known | |
2056 | TLVs allowed is the absolute value of this number. | |
1cec2cac | 2057 | |
47d3d7ac TH |
2058 | Default: 8 |
2059 | ||
ab913455 | 2060 | max_hbh_opts_number - INTEGER |
47d3d7ac TH |
2061 | Maximum number of non-padding TLVs allowed in a Hop-by-Hop |
2062 | options extension header. If this value is less than zero | |
2063 | then unknown options are disallowed and the number of known | |
2064 | TLVs allowed is the absolute value of this number. | |
1cec2cac | 2065 | |
47d3d7ac TH |
2066 | Default: 8 |
2067 | ||
ab913455 | 2068 | max_dst_opts_length - INTEGER |
47d3d7ac TH |
2069 | Maximum length allowed for a Destination options extension |
2070 | header. | |
1cec2cac | 2071 | |
47d3d7ac TH |
2072 | Default: INT_MAX (unlimited) |
2073 | ||
ab913455 | 2074 | max_hbh_length - INTEGER |
47d3d7ac TH |
2075 | Maximum length allowed for a Hop-by-Hop options extension |
2076 | header. | |
1cec2cac | 2077 | |
47d3d7ac TH |
2078 | Default: INT_MAX (unlimited) |
2079 | ||
7c6bb7d2 DA |
2080 | skip_notify_on_dev_down - BOOLEAN |
2081 | Controls whether an RTM_DELROUTE message is generated for routes | |
2082 | removed when a device is taken down or deleted. IPv4 does not | |
2083 | generate this message; IPv6 does by default. Setting this sysctl | |
2084 | to true skips the message, making IPv4 and IPv6 on par in relying | |
2085 | on userspace caches to track link events and evict routes. | |
1cec2cac | 2086 | |
7c6bb7d2 DA |
2087 | Default: false (generate message) |
2088 | ||
4f80116d RP |
2089 | nexthop_compat_mode - BOOLEAN |
2090 | New nexthop API provides a means for managing nexthops independent of | |
a266ef69 | 2091 | prefixes. Backwards compatibility with old route format is enabled by |
4f80116d RP |
2092 | default which means route dumps and notifications contain the new |
2093 | nexthop attribute but also the full, expanded nexthop definition. | |
2094 | Further, updates or deletes of a nexthop configuration generate route | |
2095 | notifications for each fib entry using the nexthop. Once a system | |
2096 | understands the new API, this sysctl can be disabled to achieve full | |
2097 | performance benefits of the new API by disabling the nexthop expansion | |
2098 | and extraneous notifications. | |
2099 | Default: true (backward compat mode) | |
2100 | ||
907eea48 AC |
2101 | fib_notify_on_flag_change - INTEGER |
2102 | Whether to emit RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/ | |
6fad361a | 2103 | RTM_F_TRAP/RTM_F_OFFLOAD_FAILED flags are changed. |
907eea48 AC |
2104 | |
2105 | After installing a route to the kernel, user space receives an | |
2106 | acknowledgment, which means the route was installed in the kernel, | |
2107 | but not necessarily in hardware. | |
2108 | It is also possible for a route already installed in hardware to change | |
2109 | its action and therefore its flags. For example, a host route that is | |
2110 | trapping packets can be "promoted" to perform decapsulation following | |
2111 | the installation of an IPinIP/VXLAN tunnel. | |
2112 | The notifications will indicate to user-space the state of the route. | |
2113 | ||
2114 | Default: 0 (Do not emit notifications.) | |
2115 | ||
2116 | Possible values: | |
2117 | ||
2118 | - 0 - Do not emit notifications. | |
2119 | - 1 - Emit notifications. | |
6fad361a | 2120 | - 2 - Emit notifications only for RTM_F_OFFLOAD_FAILED flag change. |
907eea48 | 2121 | |
de8e80a5 JI |
2122 | ioam6_id - INTEGER |
2123 | Define the IOAM id of this node. Uses only 24 bits out of 32 in total. | |
2124 | ||
2125 | Min: 0 | |
2126 | Max: 0xFFFFFF | |
2127 | ||
2128 | Default: 0xFFFFFF | |
2129 | ||
2130 | ioam6_id_wide - LONG INTEGER | |
2131 | Define the wide IOAM id of this node. Uses only 56 bits out of 64 in | |
2132 | total. Can be different from ioam6_id. | |
2133 | ||
2134 | Min: 0 | |
2135 | Max: 0xFFFFFFFFFFFFFF | |
2136 | ||
2137 | Default: 0xFFFFFFFFFFFFFF | |
2138 | ||
1da177e4 LT |
2139 | IPv6 Fragmentation: |
2140 | ||
2141 | ip6frag_high_thresh - INTEGER | |
e18f5feb | 2142 | Maximum memory used to reassemble IPv6 fragments. When |
1da177e4 LT |
2143 | ip6frag_high_thresh bytes of memory is allocated for this purpose, |
2144 | the fragment handler will toss packets until ip6frag_low_thresh | |
2145 | is reached. | |
e18f5feb | 2146 | |
1da177e4 | 2147 | ip6frag_low_thresh - INTEGER |
e18f5feb | 2148 | See ip6frag_high_thresh |
1da177e4 LT |
2149 | |
2150 | ip6frag_time - INTEGER | |
2151 | Time in seconds to keep an IPv6 fragment in memory. | |
2152 | ||
1cec2cac | 2153 | ``conf/default/*``: |
1da177e4 LT |
2154 | Change the interface-specific default settings. |
2155 | ||
fc024c5c PR |
2156 | These settings would be used during creating new interfaces. |
2157 | ||
1da177e4 | 2158 | |
1cec2cac | 2159 | ``conf/all/*``: |
e18f5feb | 2160 | Change all the interface-specific settings. |
1da177e4 LT |
2161 | |
2162 | [XXX: Other special features than forwarding?] | |
2163 | ||
fc024c5c PR |
2164 | conf/all/disable_ipv6 - BOOLEAN |
2165 | Changing this value is same as changing ``conf/default/disable_ipv6`` | |
2166 | setting and also all per-interface ``disable_ipv6`` settings to the same | |
2167 | value. | |
2168 | ||
2169 | Reading this value does not have any particular meaning. It does not say | |
2170 | whether IPv6 support is enabled or disabled. Returned value can be 1 | |
2171 | also in the case when some interface has ``disable_ipv6`` set to 0 and | |
2172 | has configured IPv6 addresses. | |
2173 | ||
1da177e4 | 2174 | conf/all/forwarding - BOOLEAN |
e18f5feb | 2175 | Enable global IPv6 forwarding between all interfaces. |
1da177e4 | 2176 | |
e18f5feb | 2177 | IPv4 and IPv6 work differently here; e.g. netfilter must be used |
1da177e4 LT |
2178 | to control which interfaces may forward packets and which not. |
2179 | ||
e18f5feb | 2180 | This also sets all interfaces' Host/Router setting |
1da177e4 LT |
2181 | 'forwarding' to the specified value. See below for details. |
2182 | ||
2183 | This referred to as global forwarding. | |
2184 | ||
fbea49e1 YH |
2185 | proxy_ndp - BOOLEAN |
2186 | Do proxy ndp. | |
2187 | ||
219b5f29 LV |
2188 | fwmark_reflect - BOOLEAN |
2189 | Controls the fwmark of kernel-generated IPv6 reply packets that are not | |
2190 | associated with a socket for example, TCP RSTs or ICMPv6 echo replies). | |
2191 | If unset, these packets have a fwmark of zero. If set, they have the | |
2192 | fwmark of the packet they are replying to. | |
1cec2cac | 2193 | |
219b5f29 LV |
2194 | Default: 0 |
2195 | ||
1cec2cac | 2196 | ``conf/interface/*``: |
1da177e4 LT |
2197 | Change special settings per interface. |
2198 | ||
e18f5feb | 2199 | The functional behaviour for certain settings is different |
1da177e4 LT |
2200 | depending on whether local forwarding is enabled or not. |
2201 | ||
605b91c8 | 2202 | accept_ra - INTEGER |
1da177e4 | 2203 | Accept Router Advertisements; autoconfigure using them. |
e18f5feb | 2204 | |
026359bc TA |
2205 | It also determines whether or not to transmit Router |
2206 | Solicitations. If and only if the functional setting is to | |
2207 | accept Router Advertisements, Router Solicitations will be | |
2208 | transmitted. | |
2209 | ||
ae8abfa0 | 2210 | Possible values are: |
ae8abfa0 | 2211 | |
1cec2cac MCC |
2212 | == =========================================================== |
2213 | 0 Do not accept Router Advertisements. | |
2214 | 1 Accept Router Advertisements if forwarding is disabled. | |
2215 | 2 Overrule forwarding behaviour. Accept Router Advertisements | |
2216 | even if forwarding is enabled. | |
2217 | == =========================================================== | |
2218 | ||
2219 | Functional default: | |
2220 | ||
2221 | - enabled if local forwarding is disabled. | |
2222 | - disabled if local forwarding is enabled. | |
1da177e4 | 2223 | |
65f5c7c1 YH |
2224 | accept_ra_defrtr - BOOLEAN |
2225 | Learn default router in Router Advertisement. | |
2226 | ||
1cec2cac MCC |
2227 | Functional default: |
2228 | ||
2229 | - enabled if accept_ra is enabled. | |
2230 | - disabled if accept_ra is disabled. | |
65f5c7c1 | 2231 | |
6b2e04bc PC |
2232 | ra_defrtr_metric - UNSIGNED INTEGER |
2233 | Route metric for default route learned in Router Advertisement. This value | |
2234 | will be assigned as metric for the default route learned via IPv6 Router | |
2235 | Advertisement. Takes affect only if accept_ra_defrtr is enabled. | |
2236 | ||
2237 | Possible values: | |
2238 | 1 to 0xFFFFFFFF | |
2239 | ||
2240 | Default: IP6_RT_PRIO_USER i.e. 1024. | |
2241 | ||
d9333196 BG |
2242 | accept_ra_from_local - BOOLEAN |
2243 | Accept RA with source-address that is found on local machine | |
1cec2cac MCC |
2244 | if the RA is otherwise proper and able to be accepted. |
2245 | ||
2246 | Default is to NOT accept these as it may be an un-intended | |
2247 | network loop. | |
d9333196 BG |
2248 | |
2249 | Functional default: | |
1cec2cac MCC |
2250 | |
2251 | - enabled if accept_ra_from_local is enabled | |
2252 | on a specific interface. | |
2253 | - disabled if accept_ra_from_local is disabled | |
2254 | on a specific interface. | |
d9333196 | 2255 | |
8013d1d7 HL |
2256 | accept_ra_min_hop_limit - INTEGER |
2257 | Minimum hop limit Information in Router Advertisement. | |
2258 | ||
2259 | Hop limit Information in Router Advertisement less than this | |
2260 | variable shall be ignored. | |
2261 | ||
2262 | Default: 1 | |
2263 | ||
c4fd30eb | 2264 | accept_ra_pinfo - BOOLEAN |
2fe0ae78 | 2265 | Learn Prefix Information in Router Advertisement. |
c4fd30eb | 2266 | |
1cec2cac MCC |
2267 | Functional default: |
2268 | ||
2269 | - enabled if accept_ra is enabled. | |
2270 | - disabled if accept_ra is disabled. | |
c4fd30eb | 2271 | |
bbea124b JS |
2272 | accept_ra_rt_info_min_plen - INTEGER |
2273 | Minimum prefix length of Route Information in RA. | |
2274 | ||
2275 | Route Information w/ prefix smaller than this variable shall | |
2276 | be ignored. | |
2277 | ||
1cec2cac MCC |
2278 | Functional default: |
2279 | ||
2280 | * 0 if accept_ra_rtr_pref is enabled. | |
2281 | * -1 if accept_ra_rtr_pref is disabled. | |
bbea124b | 2282 | |
09c884d4 YH |
2283 | accept_ra_rt_info_max_plen - INTEGER |
2284 | Maximum prefix length of Route Information in RA. | |
2285 | ||
bbea124b JS |
2286 | Route Information w/ prefix larger than this variable shall |
2287 | be ignored. | |
09c884d4 | 2288 | |
1cec2cac MCC |
2289 | Functional default: |
2290 | ||
2291 | * 0 if accept_ra_rtr_pref is enabled. | |
2292 | * -1 if accept_ra_rtr_pref is disabled. | |
09c884d4 | 2293 | |
930d6ff2 YH |
2294 | accept_ra_rtr_pref - BOOLEAN |
2295 | Accept Router Preference in RA. | |
2296 | ||
1cec2cac MCC |
2297 | Functional default: |
2298 | ||
2299 | - enabled if accept_ra is enabled. | |
2300 | - disabled if accept_ra is disabled. | |
930d6ff2 | 2301 | |
c2943f14 HH |
2302 | accept_ra_mtu - BOOLEAN |
2303 | Apply the MTU value specified in RA option 5 (RFC4861). If | |
2304 | disabled, the MTU specified in the RA will be ignored. | |
2305 | ||
1cec2cac MCC |
2306 | Functional default: |
2307 | ||
2308 | - enabled if accept_ra is enabled. | |
2309 | - disabled if accept_ra is disabled. | |
c2943f14 | 2310 | |
1da177e4 LT |
2311 | accept_redirects - BOOLEAN |
2312 | Accept Redirects. | |
2313 | ||
1cec2cac MCC |
2314 | Functional default: |
2315 | ||
2316 | - enabled if local forwarding is disabled. | |
2317 | - disabled if local forwarding is enabled. | |
1da177e4 | 2318 | |
0bcbc926 YH |
2319 | accept_source_route - INTEGER |
2320 | Accept source routing (routing extension header). | |
2321 | ||
1cec2cac MCC |
2322 | - >= 0: Accept only routing header type 2. |
2323 | - < 0: Do not accept routing header. | |
0bcbc926 YH |
2324 | |
2325 | Default: 0 | |
2326 | ||
1da177e4 | 2327 | autoconf - BOOLEAN |
e18f5feb | 2328 | Autoconfigure addresses using Prefix Information in Router |
1da177e4 LT |
2329 | Advertisements. |
2330 | ||
1cec2cac MCC |
2331 | Functional default: |
2332 | ||
2333 | - enabled if accept_ra_pinfo is enabled. | |
2334 | - disabled if accept_ra_pinfo is disabled. | |
1da177e4 LT |
2335 | |
2336 | dad_transmits - INTEGER | |
2337 | The amount of Duplicate Address Detection probes to send. | |
1cec2cac | 2338 | |
1da177e4 | 2339 | Default: 1 |
e18f5feb | 2340 | |
605b91c8 | 2341 | forwarding - INTEGER |
e18f5feb | 2342 | Configure interface-specific Host/Router behaviour. |
1da177e4 | 2343 | |
1cec2cac MCC |
2344 | .. note:: |
2345 | ||
2346 | It is recommended to have the same setting on all | |
2347 | interfaces; mixed router/host scenarios are rather uncommon. | |
1da177e4 | 2348 | |
ae8abfa0 | 2349 | Possible values are: |
ae8abfa0 | 2350 | |
1cec2cac MCC |
2351 | - 0 Forwarding disabled |
2352 | - 1 Forwarding enabled | |
2353 | ||
2354 | **FALSE (0)**: | |
1da177e4 LT |
2355 | |
2356 | By default, Host behaviour is assumed. This means: | |
2357 | ||
2358 | 1. IsRouter flag is not set in Neighbour Advertisements. | |
026359bc TA |
2359 | 2. If accept_ra is TRUE (default), transmit Router |
2360 | Solicitations. | |
e18f5feb | 2361 | 3. If accept_ra is TRUE (default), accept Router |
1da177e4 LT |
2362 | Advertisements (and do autoconfiguration). |
2363 | 4. If accept_redirects is TRUE (default), accept Redirects. | |
2364 | ||
1cec2cac | 2365 | **TRUE (1)**: |
1da177e4 | 2366 | |
e18f5feb | 2367 | If local forwarding is enabled, Router behaviour is assumed. |
1da177e4 LT |
2368 | This means exactly the reverse from the above: |
2369 | ||
2370 | 1. IsRouter flag is set in Neighbour Advertisements. | |
026359bc | 2371 | 2. Router Solicitations are not sent unless accept_ra is 2. |
ae8abfa0 | 2372 | 3. Router Advertisements are ignored unless accept_ra is 2. |
1da177e4 LT |
2373 | 4. Redirects are ignored. |
2374 | ||
ae8abfa0 | 2375 | Default: 0 (disabled) if global forwarding is disabled (default), |
1cec2cac | 2376 | otherwise 1 (enabled). |
1da177e4 LT |
2377 | |
2378 | hop_limit - INTEGER | |
2379 | Default Hop Limit to set. | |
1cec2cac | 2380 | |
1da177e4 LT |
2381 | Default: 64 |
2382 | ||
2383 | mtu - INTEGER | |
2384 | Default Maximum Transfer Unit | |
1cec2cac | 2385 | |
1da177e4 LT |
2386 | Default: 1280 (IPv6 required minimum) |
2387 | ||
35a256fe TH |
2388 | ip_nonlocal_bind - BOOLEAN |
2389 | If set, allows processes to bind() to non-local IPv6 addresses, | |
2390 | which can be quite useful - but may break some applications. | |
1cec2cac | 2391 | |
35a256fe TH |
2392 | Default: 0 |
2393 | ||
52e16356 YH |
2394 | router_probe_interval - INTEGER |
2395 | Minimum interval (in seconds) between Router Probing described | |
2396 | in RFC4191. | |
2397 | ||
2398 | Default: 60 | |
2399 | ||
1da177e4 LT |
2400 | router_solicitation_delay - INTEGER |
2401 | Number of seconds to wait after interface is brought up | |
2402 | before sending Router Solicitations. | |
1cec2cac | 2403 | |
1da177e4 LT |
2404 | Default: 1 |
2405 | ||
2406 | router_solicitation_interval - INTEGER | |
2407 | Number of seconds to wait between Router Solicitations. | |
1cec2cac | 2408 | |
1da177e4 LT |
2409 | Default: 4 |
2410 | ||
2411 | router_solicitations - INTEGER | |
e18f5feb | 2412 | Number of Router Solicitations to send until assuming no |
1da177e4 | 2413 | routers are present. |
1cec2cac | 2414 | |
1da177e4 LT |
2415 | Default: 3 |
2416 | ||
3985e8a3 EK |
2417 | use_oif_addrs_only - BOOLEAN |
2418 | When enabled, the candidate source addresses for destinations | |
2419 | routed via this interface are restricted to the set of addresses | |
2420 | configured on this interface (vis. RFC 6724, section 4). | |
2421 | ||
2422 | Default: false | |
2423 | ||
1da177e4 LT |
2424 | use_tempaddr - INTEGER |
2425 | Preference for Privacy Extensions (RFC3041). | |
1cec2cac MCC |
2426 | |
2427 | * <= 0 : disable Privacy Extensions | |
2428 | * == 1 : enable Privacy Extensions, but prefer public | |
2429 | addresses over temporary addresses. | |
2430 | * > 1 : enable Privacy Extensions and prefer temporary | |
2431 | addresses over public addresses. | |
2432 | ||
2433 | Default: | |
2434 | ||
2435 | * 0 (for most devices) | |
2436 | * -1 (for point-to-point devices and loopback devices) | |
1da177e4 LT |
2437 | |
2438 | temp_valid_lft - INTEGER | |
2439 | valid lifetime (in seconds) for temporary addresses. | |
1cec2cac | 2440 | |
969c5464 | 2441 | Default: 172800 (2 days) |
1da177e4 LT |
2442 | |
2443 | temp_prefered_lft - INTEGER | |
2444 | Preferred lifetime (in seconds) for temporary addresses. | |
1cec2cac | 2445 | |
1da177e4 LT |
2446 | Default: 86400 (1 day) |
2447 | ||
f1705ec1 DA |
2448 | keep_addr_on_down - INTEGER |
2449 | Keep all IPv6 addresses on an interface down event. If set static | |
2450 | global addresses with no expiration time are not flushed. | |
1cec2cac MCC |
2451 | |
2452 | * >0 : enabled | |
2453 | * 0 : system default | |
2454 | * <0 : disabled | |
f1705ec1 DA |
2455 | |
2456 | Default: 0 (addresses are removed) | |
2457 | ||
1da177e4 LT |
2458 | max_desync_factor - INTEGER |
2459 | Maximum value for DESYNC_FACTOR, which is a random value | |
e18f5feb | 2460 | that ensures that clients don't synchronize with each |
1da177e4 LT |
2461 | other and generate new addresses at exactly the same time. |
2462 | value is in seconds. | |
1cec2cac | 2463 | |
1da177e4 | 2464 | Default: 600 |
e18f5feb | 2465 | |
1da177e4 LT |
2466 | regen_max_retry - INTEGER |
2467 | Number of attempts before give up attempting to generate | |
2468 | valid temporary addresses. | |
1cec2cac | 2469 | |
1da177e4 LT |
2470 | Default: 5 |
2471 | ||
2472 | max_addresses - INTEGER | |
e79dc484 BH |
2473 | Maximum number of autoconfigured addresses per interface. Setting |
2474 | to zero disables the limitation. It is not recommended to set this | |
2475 | value too large (or to zero) because it would be an easy way to | |
2476 | crash the kernel by allowing too many addresses to be created. | |
1cec2cac | 2477 | |
1da177e4 LT |
2478 | Default: 16 |
2479 | ||
778d80be | 2480 | disable_ipv6 - BOOLEAN |
9bdd8d40 BH |
2481 | Disable IPv6 operation. If accept_dad is set to 2, this value |
2482 | will be dynamically set to TRUE if DAD fails for the link-local | |
2483 | address. | |
1cec2cac | 2484 | |
778d80be YH |
2485 | Default: FALSE (enable IPv6 operation) |
2486 | ||
56d417b1 BH |
2487 | When this value is changed from 1 to 0 (IPv6 is being enabled), |
2488 | it will dynamically create a link-local address on the given | |
2489 | interface and start Duplicate Address Detection, if necessary. | |
2490 | ||
2491 | When this value is changed from 0 to 1 (IPv6 is being disabled), | |
2f0aaf7f LB |
2492 | it will dynamically delete all addresses and routes on the given |
2493 | interface. From now on it will not possible to add addresses/routes | |
2494 | to the selected interface. | |
56d417b1 | 2495 | |
1b34be74 YH |
2496 | accept_dad - INTEGER |
2497 | Whether to accept DAD (Duplicate Address Detection). | |
1cec2cac MCC |
2498 | |
2499 | == ============================================================== | |
2500 | 0 Disable DAD | |
2501 | 1 Enable DAD (default) | |
2502 | 2 Enable DAD, and disable IPv6 operation if MAC-based duplicate | |
2503 | link-local address has been found. | |
2504 | == ============================================================== | |
1b34be74 | 2505 | |
35e015e1 MC |
2506 | DAD operation and mode on a given interface will be selected according |
2507 | to the maximum value of conf/{all,interface}/accept_dad. | |
2508 | ||
f7734fdf OP |
2509 | force_tllao - BOOLEAN |
2510 | Enable sending the target link-layer address option even when | |
2511 | responding to a unicast neighbor solicitation. | |
1cec2cac | 2512 | |
f7734fdf OP |
2513 | Default: FALSE |
2514 | ||
2515 | Quoting from RFC 2461, section 4.4, Target link-layer address: | |
2516 | ||
2517 | "The option MUST be included for multicast solicitations in order to | |
2518 | avoid infinite Neighbor Solicitation "recursion" when the peer node | |
2519 | does not have a cache entry to return a Neighbor Advertisements | |
2520 | message. When responding to unicast solicitations, the option can be | |
2521 | omitted since the sender of the solicitation has the correct link- | |
2522 | layer address; otherwise it would not have be able to send the unicast | |
2523 | solicitation in the first place. However, including the link-layer | |
2524 | address in this case adds little overhead and eliminates a potential | |
2525 | race condition where the sender deletes the cached link-layer address | |
2526 | prior to receiving a response to a previous solicitation." | |
2527 | ||
db2b620a HFS |
2528 | ndisc_notify - BOOLEAN |
2529 | Define mode for notification of address and device changes. | |
1cec2cac MCC |
2530 | |
2531 | * 0 - (default): do nothing | |
2532 | * 1 - Generate unsolicited neighbour advertisements when device is brought | |
2533 | up or hardware address changes. | |
db2b620a | 2534 | |
2210d6b2 MŻ |
2535 | ndisc_tclass - INTEGER |
2536 | The IPv6 Traffic Class to use by default when sending IPv6 Neighbor | |
2537 | Discovery (Router Solicitation, Router Advertisement, Neighbor | |
2538 | Solicitation, Neighbor Advertisement, Redirect) messages. | |
2539 | These 8 bits can be interpreted as 6 high order bits holding the DSCP | |
2540 | value and 2 low order bits representing ECN (which you probably want | |
2541 | to leave cleared). | |
1cec2cac MCC |
2542 | |
2543 | * 0 - (default) | |
2210d6b2 | 2544 | |
18ac597a JP |
2545 | ndisc_evict_nocarrier - BOOLEAN |
2546 | Clears the neighbor discovery table on NOCARRIER events. This option is | |
2547 | important for wireless devices where the neighbor discovery cache should | |
2548 | not be cleared when roaming between access points on the same network. | |
2549 | In most cases this should remain as the default (1). | |
2550 | ||
2551 | - 1 - (default): Clear neighbor discover cache on NOCARRIER events. | |
2552 | - 0 - Do not clear neighbor discovery cache on NOCARRIER events. | |
2553 | ||
fc4eba58 HFS |
2554 | mldv1_unsolicited_report_interval - INTEGER |
2555 | The interval in milliseconds in which the next unsolicited | |
2556 | MLDv1 report retransmit will take place. | |
1cec2cac | 2557 | |
fc4eba58 HFS |
2558 | Default: 10000 (10 seconds) |
2559 | ||
2560 | mldv2_unsolicited_report_interval - INTEGER | |
2561 | The interval in milliseconds in which the next unsolicited | |
2562 | MLDv2 report retransmit will take place. | |
1cec2cac | 2563 | |
fc4eba58 HFS |
2564 | Default: 1000 (1 second) |
2565 | ||
f2127810 | 2566 | force_mld_version - INTEGER |
1cec2cac MCC |
2567 | * 0 - (default) No enforcement of a MLD version, MLDv1 fallback allowed |
2568 | * 1 - Enforce to use MLD version 1 | |
2569 | * 2 - Enforce to use MLD version 2 | |
f2127810 | 2570 | |
b800c3b9 HFS |
2571 | suppress_frag_ndisc - INTEGER |
2572 | Control RFC 6980 (Security Implications of IPv6 Fragmentation | |
2573 | with IPv6 Neighbor Discovery) behavior: | |
1cec2cac MCC |
2574 | |
2575 | * 1 - (default) discard fragmented neighbor discovery packets | |
2576 | * 0 - allow fragmented neighbor discovery packets | |
b800c3b9 | 2577 | |
7fd2561e EK |
2578 | optimistic_dad - BOOLEAN |
2579 | Whether to perform Optimistic Duplicate Address Detection (RFC 4429). | |
1cec2cac MCC |
2580 | |
2581 | * 0: disabled (default) | |
2582 | * 1: enabled | |
35e015e1 MC |
2583 | |
2584 | Optimistic Duplicate Address Detection for the interface will be enabled | |
2585 | if at least one of conf/{all,interface}/optimistic_dad is set to 1, | |
2586 | it will be disabled otherwise. | |
7fd2561e EK |
2587 | |
2588 | use_optimistic - BOOLEAN | |
2589 | If enabled, do not classify optimistic addresses as deprecated during | |
2590 | source address selection. Preferred addresses will still be chosen | |
2591 | before optimistic addresses, subject to other ranking in the source | |
2592 | address selection algorithm. | |
1cec2cac MCC |
2593 | |
2594 | * 0: disabled (default) | |
2595 | * 1: enabled | |
35e015e1 MC |
2596 | |
2597 | This will be enabled if at least one of | |
2598 | conf/{all,interface}/use_optimistic is set to 1, disabled otherwise. | |
7fd2561e | 2599 | |
9f0761c1 HFS |
2600 | stable_secret - IPv6 address |
2601 | This IPv6 address will be used as a secret to generate IPv6 | |
2602 | addresses for link-local addresses and autoconfigured | |
2603 | ones. All addresses generated after setting this secret will | |
2604 | be stable privacy ones by default. This can be changed via the | |
2605 | addrgenmode ip-link. conf/default/stable_secret is used as the | |
2606 | secret for the namespace, the interface specific ones can | |
2607 | overwrite that. Writes to conf/all/stable_secret are refused. | |
2608 | ||
2609 | It is recommended to generate this secret during installation | |
2610 | of a system and keep it stable after that. | |
2611 | ||
2612 | By default the stable secret is unset. | |
2613 | ||
f168db5e SD |
2614 | addr_gen_mode - INTEGER |
2615 | Defines how link-local and autoconf addresses are generated. | |
2616 | ||
1cec2cac MCC |
2617 | = ================================================================= |
2618 | 0 generate address based on EUI64 (default) | |
2619 | 1 do no generate a link-local address, use EUI64 for addresses | |
2620 | generated from autoconf | |
2621 | 2 generate stable privacy addresses, using the secret from | |
f168db5e | 2622 | stable_secret (RFC7217) |
1cec2cac MCC |
2623 | 3 generate stable privacy addresses, using a random secret if unset |
2624 | = ================================================================= | |
f168db5e | 2625 | |
abbc3043 JB |
2626 | drop_unicast_in_l2_multicast - BOOLEAN |
2627 | Drop any unicast IPv6 packets that are received in link-layer | |
2628 | multicast (or broadcast) frames. | |
2629 | ||
2630 | By default this is turned off. | |
2631 | ||
7a02bf89 JB |
2632 | drop_unsolicited_na - BOOLEAN |
2633 | Drop all unsolicited neighbor advertisements, for example if there's | |
2634 | a known good NA proxy on the network and such frames need not be used | |
2635 | (or in the case of 802.11, must not be used to prevent attacks.) | |
2636 | ||
2637 | By default this is turned off. | |
2638 | ||
aaa5f515 JP |
2639 | accept_untracked_na - INTEGER |
2640 | Define behavior for accepting neighbor advertisements from devices that | |
2641 | are absent in the neighbor cache: | |
2642 | ||
2643 | - 0 - (default) Do not accept unsolicited and untracked neighbor | |
2644 | advertisements. | |
2645 | ||
2646 | - 1 - Add a new neighbor cache entry in STALE state for routers on | |
2647 | receiving a neighbor advertisement (either solicited or unsolicited) | |
2648 | with target link-layer address option specified if no neighbor entry | |
2649 | is already present for the advertised IPv6 address. Without this knob, | |
2650 | NAs received for untracked addresses (absent in neighbor cache) are | |
2651 | silently ignored. | |
2652 | ||
2653 | This is as per router-side behavior documented in RFC9131. | |
2654 | ||
2655 | This has lower precedence than drop_unsolicited_na. | |
2656 | ||
2657 | This will optimize the return path for the initial off-link | |
2658 | communication that is initiated by a directly connected host, by | |
2659 | ensuring that the first-hop router which turns on this setting doesn't | |
2660 | have to buffer the initial return packets to do neighbor-solicitation. | |
2661 | The prerequisite is that the host is configured to send unsolicited | |
2662 | neighbor advertisements on interface bringup. This setting should be | |
2663 | used in conjunction with the ndisc_notify setting on the host to | |
2664 | satisfy this prerequisite. | |
2665 | ||
2666 | - 2 - Extend option (1) to add a new neighbor cache entry only if the | |
2667 | source IP address is in the same subnet as an address configured on | |
2668 | the interface that received the neighbor advertisement. | |
f9a2fb73 | 2669 | |
adc176c5 EN |
2670 | enhanced_dad - BOOLEAN |
2671 | Include a nonce option in the IPv6 neighbor solicitation messages used for | |
2672 | duplicate address detection per RFC7527. A received DAD NS will only signal | |
2673 | a duplicate address if the nonce is different. This avoids any false | |
2674 | detection of duplicates due to loopback of the NS messages that we send. | |
2675 | The nonce option will be sent on an interface unless both of | |
2676 | conf/{all,interface}/enhanced_dad are set to FALSE. | |
1cec2cac | 2677 | |
adc176c5 EN |
2678 | Default: TRUE |
2679 | ||
1cec2cac MCC |
2680 | ``icmp/*``: |
2681 | =========== | |
2682 | ||
1da177e4 | 2683 | ratelimit - INTEGER |
0bc19985 | 2684 | Limit the maximal rates for sending ICMPv6 messages. |
1cec2cac | 2685 | |
6dbf4bca SH |
2686 | 0 to disable any limiting, |
2687 | otherwise the minimal space between responses in milliseconds. | |
1cec2cac | 2688 | |
6dbf4bca | 2689 | Default: 1000 |
1da177e4 | 2690 | |
0bc19985 SS |
2691 | ratemask - list of comma separated ranges |
2692 | For ICMPv6 message types matching the ranges in the ratemask, limit | |
2693 | the sending of the message according to ratelimit parameter. | |
2694 | ||
2695 | The format used for both input and output is a comma separated | |
2696 | list of ranges (e.g. "0-127,129" for ICMPv6 message type 0 to 127 and | |
2697 | 129). Writing to the file will clear all previous ranges of ICMPv6 | |
2698 | message types and update the current list with the input. | |
2699 | ||
2700 | Refer to: https://www.iana.org/assignments/icmpv6-parameters/icmpv6-parameters.xhtml | |
2701 | for numerical values of ICMPv6 message types, e.g. echo request is 128 | |
2702 | and echo reply is 129. | |
2703 | ||
2704 | Default: 0-1,3-127 (rate limit ICMPv6 errors except Packet Too Big) | |
2705 | ||
e6f86b0f VJ |
2706 | echo_ignore_all - BOOLEAN |
2707 | If set non-zero, then the kernel will ignore all ICMP ECHO | |
2708 | requests sent to it over the IPv6 protocol. | |
1cec2cac | 2709 | |
e6f86b0f VJ |
2710 | Default: 0 |
2711 | ||
03f1eccc SS |
2712 | echo_ignore_multicast - BOOLEAN |
2713 | If set non-zero, then the kernel will ignore all ICMP ECHO | |
2714 | requests sent to it over the IPv6 protocol via multicast. | |
1cec2cac | 2715 | |
03f1eccc SS |
2716 | Default: 0 |
2717 | ||
0b03a5ca SS |
2718 | echo_ignore_anycast - BOOLEAN |
2719 | If set non-zero, then the kernel will ignore all ICMP ECHO | |
2720 | requests sent to it over the IPv6 protocol destined to anycast address. | |
1cec2cac | 2721 | |
0b03a5ca SS |
2722 | Default: 0 |
2723 | ||
7ab75456 MB |
2724 | error_anycast_as_unicast - BOOLEAN |
2725 | If set to 1, then the kernel will respond with ICMP Errors | |
2726 | resulting from requests sent to it over the IPv6 protocol destined | |
2727 | to anycast address essentially treating anycast as unicast. | |
2728 | ||
2729 | Default: 0 | |
2730 | ||
e69948a0 | 2731 | xfrm6_gc_thresh - INTEGER |
837f7411 | 2732 | (Obsolete since linux-4.14) |
e69948a0 AD |
2733 | The threshold at which we will start garbage collecting for IPv6 |
2734 | destination cache entries. At twice this value the system will | |
3c2a89dd | 2735 | refuse new allocations. |
e69948a0 | 2736 | |
1da177e4 LT |
2737 | |
2738 | IPv6 Update by: | |
2739 | Pekka Savola <pekkas@netcore.fi> | |
2740 | YOSHIFUJI Hideaki / USAGI Project <yoshfuji@linux-ipv6.org> | |
2741 | ||
2742 | ||
2743 | /proc/sys/net/bridge/* Variables: | |
1cec2cac | 2744 | ================================= |
1da177e4 LT |
2745 | |
2746 | bridge-nf-call-arptables - BOOLEAN | |
1cec2cac MCC |
2747 | - 1 : pass bridged ARP traffic to arptables' FORWARD chain. |
2748 | - 0 : disable this. | |
2749 | ||
1da177e4 LT |
2750 | Default: 1 |
2751 | ||
2752 | bridge-nf-call-iptables - BOOLEAN | |
1cec2cac MCC |
2753 | - 1 : pass bridged IPv4 traffic to iptables' chains. |
2754 | - 0 : disable this. | |
2755 | ||
1da177e4 LT |
2756 | Default: 1 |
2757 | ||
2758 | bridge-nf-call-ip6tables - BOOLEAN | |
1cec2cac MCC |
2759 | - 1 : pass bridged IPv6 traffic to ip6tables' chains. |
2760 | - 0 : disable this. | |
2761 | ||
1da177e4 LT |
2762 | Default: 1 |
2763 | ||
2764 | bridge-nf-filter-vlan-tagged - BOOLEAN | |
1cec2cac MCC |
2765 | - 1 : pass bridged vlan-tagged ARP/IP/IPv6 traffic to {arp,ip,ip6}tables. |
2766 | - 0 : disable this. | |
2767 | ||
4981682c | 2768 | Default: 0 |
516299d2 MM |
2769 | |
2770 | bridge-nf-filter-pppoe-tagged - BOOLEAN | |
1cec2cac MCC |
2771 | - 1 : pass bridged pppoe-tagged IP/IPv6 traffic to {ip,ip6}tables. |
2772 | - 0 : disable this. | |
2773 | ||
4981682c | 2774 | Default: 0 |
1da177e4 | 2775 | |
4981682c | 2776 | bridge-nf-pass-vlan-input-dev - BOOLEAN |
1cec2cac MCC |
2777 | - 1: if bridge-nf-filter-vlan-tagged is enabled, try to find a vlan |
2778 | interface on the bridge and set the netfilter input device to the | |
2779 | vlan. This allows use of e.g. "iptables -i br0.1" and makes the | |
2780 | REDIRECT target work with vlan-on-top-of-bridge interfaces. When no | |
2781 | matching vlan interface is found, or this switch is off, the input | |
2782 | device is set to the bridge interface. | |
2783 | ||
2784 | - 0: disable bridge netfilter vlan interface lookup. | |
2785 | ||
4981682c | 2786 | Default: 0 |
1da177e4 | 2787 | |
1cec2cac MCC |
2788 | ``proc/sys/net/sctp/*`` Variables: |
2789 | ================================== | |
32e8d494 VY |
2790 | |
2791 | addip_enable - BOOLEAN | |
2792 | Enable or disable extension of Dynamic Address Reconfiguration | |
2793 | (ADD-IP) functionality specified in RFC5061. This extension provides | |
2794 | the ability to dynamically add and remove new addresses for the SCTP | |
2795 | associations. | |
2796 | ||
2797 | 1: Enable extension. | |
2798 | ||
2799 | 0: Disable extension. | |
2800 | ||
2801 | Default: 0 | |
2802 | ||
566178f8 ZY |
2803 | pf_enable - INTEGER |
2804 | Enable or disable pf (pf is short for potentially failed) state. A value | |
2805 | of pf_retrans > path_max_retrans also disables pf state. That is, one of | |
2806 | both pf_enable and pf_retrans > path_max_retrans can disable pf state. | |
2807 | Since pf_retrans and path_max_retrans can be changed by userspace | |
2808 | application, sometimes user expects to disable pf state by the value of | |
2809 | pf_retrans > path_max_retrans, but occasionally the value of pf_retrans | |
2810 | or path_max_retrans is changed by the user application, this pf state is | |
2811 | enabled. As such, it is necessary to add this to dynamically enable | |
2812 | and disable pf state. See: | |
2813 | https://datatracker.ietf.org/doc/draft-ietf-tsvwg-sctp-failover for | |
2814 | details. | |
2815 | ||
2816 | 1: Enable pf. | |
2817 | ||
2818 | 0: Disable pf. | |
2819 | ||
2820 | Default: 1 | |
2821 | ||
aef587be XL |
2822 | pf_expose - INTEGER |
2823 | Unset or enable/disable pf (pf is short for potentially failed) state | |
2824 | exposure. Applications can control the exposure of the PF path state | |
2825 | in the SCTP_PEER_ADDR_CHANGE event and the SCTP_GET_PEER_ADDR_INFO | |
2826 | sockopt. When it's unset, no SCTP_PEER_ADDR_CHANGE event with | |
2827 | SCTP_ADDR_PF state will be sent and a SCTP_PF-state transport info | |
2828 | can be got via SCTP_GET_PEER_ADDR_INFO sockopt; When it's enabled, | |
2829 | a SCTP_PEER_ADDR_CHANGE event will be sent for a transport becoming | |
2830 | SCTP_PF state and a SCTP_PF-state transport info can be got via | |
a266ef69 | 2831 | SCTP_GET_PEER_ADDR_INFO sockopt; When it's disabled, no |
aef587be XL |
2832 | SCTP_PEER_ADDR_CHANGE event will be sent and it returns -EACCES when |
2833 | trying to get a SCTP_PF-state transport info via SCTP_GET_PEER_ADDR_INFO | |
2834 | sockopt. | |
2835 | ||
2836 | 0: Unset pf state exposure, Compatible with old applications. | |
2837 | ||
2838 | 1: Disable pf state exposure. | |
2839 | ||
2840 | 2: Enable pf state exposure. | |
2841 | ||
2842 | Default: 0 | |
2843 | ||
32e8d494 VY |
2844 | addip_noauth_enable - BOOLEAN |
2845 | Dynamic Address Reconfiguration (ADD-IP) requires the use of | |
2846 | authentication to protect the operations of adding or removing new | |
2847 | addresses. This requirement is mandated so that unauthorized hosts | |
2848 | would not be able to hijack associations. However, older | |
2849 | implementations may not have implemented this requirement while | |
2850 | allowing the ADD-IP extension. For reasons of interoperability, | |
2851 | we provide this variable to control the enforcement of the | |
2852 | authentication requirement. | |
2853 | ||
1cec2cac MCC |
2854 | == =============================================================== |
2855 | 1 Allow ADD-IP extension to be used without authentication. This | |
32e8d494 VY |
2856 | should only be set in a closed environment for interoperability |
2857 | with older implementations. | |
2858 | ||
1cec2cac MCC |
2859 | 0 Enforce the authentication requirement |
2860 | == =============================================================== | |
32e8d494 VY |
2861 | |
2862 | Default: 0 | |
2863 | ||
2864 | auth_enable - BOOLEAN | |
2865 | Enable or disable Authenticated Chunks extension. This extension | |
2866 | provides the ability to send and receive authenticated chunks and is | |
2867 | required for secure operation of Dynamic Address Reconfiguration | |
2868 | (ADD-IP) extension. | |
2869 | ||
1cec2cac MCC |
2870 | - 1: Enable this extension. |
2871 | - 0: Disable this extension. | |
32e8d494 VY |
2872 | |
2873 | Default: 0 | |
2874 | ||
2875 | prsctp_enable - BOOLEAN | |
2876 | Enable or disable the Partial Reliability extension (RFC3758) which | |
2877 | is used to notify peers that a given DATA should no longer be expected. | |
2878 | ||
1cec2cac MCC |
2879 | - 1: Enable extension |
2880 | - 0: Disable | |
32e8d494 VY |
2881 | |
2882 | Default: 1 | |
2883 | ||
2884 | max_burst - INTEGER | |
2885 | The limit of the number of new packets that can be initially sent. It | |
2886 | controls how bursty the generated traffic can be. | |
2887 | ||
2888 | Default: 4 | |
2889 | ||
2890 | association_max_retrans - INTEGER | |
2891 | Set the maximum number for retransmissions that an association can | |
2892 | attempt deciding that the remote end is unreachable. If this value | |
2893 | is exceeded, the association is terminated. | |
2894 | ||
2895 | Default: 10 | |
2896 | ||
2897 | max_init_retransmits - INTEGER | |
2898 | The maximum number of retransmissions of INIT and COOKIE-ECHO chunks | |
2899 | that an association will attempt before declaring the destination | |
2900 | unreachable and terminating. | |
2901 | ||
2902 | Default: 8 | |
2903 | ||
2904 | path_max_retrans - INTEGER | |
2905 | The maximum number of retransmissions that will be attempted on a given | |
2906 | path. Once this threshold is exceeded, the path is considered | |
2907 | unreachable, and new traffic will use a different path when the | |
2908 | association is multihomed. | |
2909 | ||
2910 | Default: 5 | |
2911 | ||
5aa93bcf NH |
2912 | pf_retrans - INTEGER |
2913 | The number of retransmissions that will be attempted on a given path | |
2914 | before traffic is redirected to an alternate transport (should one | |
2915 | exist). Note this is distinct from path_max_retrans, as a path that | |
2916 | passes the pf_retrans threshold can still be used. Its only | |
2917 | deprioritized when a transmission path is selected by the stack. This | |
2918 | setting is primarily used to enable fast failover mechanisms without | |
2919 | having to reduce path_max_retrans to a very low value. See: | |
2920 | http://www.ietf.org/id/draft-nishida-tsvwg-sctp-failover-05.txt | |
2921 | for details. Note also that a value of pf_retrans > path_max_retrans | |
566178f8 ZY |
2922 | disables this feature. Since both pf_retrans and path_max_retrans can |
2923 | be changed by userspace application, a variable pf_enable is used to | |
2924 | disable pf state. | |
5aa93bcf NH |
2925 | |
2926 | Default: 0 | |
2927 | ||
34515e94 XL |
2928 | ps_retrans - INTEGER |
2929 | Primary.Switchover.Max.Retrans (PSMR), it's a tunable parameter coming | |
2930 | from section-5 "Primary Path Switchover" in rfc7829. The primary path | |
2931 | will be changed to another active path when the path error counter on | |
2932 | the old primary path exceeds PSMR, so that "the SCTP sender is allowed | |
2933 | to continue data transmission on a new working path even when the old | |
2934 | primary destination address becomes active again". Note this feature | |
2935 | is disabled by initializing 'ps_retrans' per netns as 0xffff by default, | |
2936 | and its value can't be less than 'pf_retrans' when changing by sysctl. | |
2937 | ||
2938 | Default: 0xffff | |
2939 | ||
32e8d494 VY |
2940 | rto_initial - INTEGER |
2941 | The initial round trip timeout value in milliseconds that will be used | |
2942 | in calculating round trip times. This is the initial time interval | |
2943 | for retransmissions. | |
2944 | ||
2945 | Default: 3000 | |
1da177e4 | 2946 | |
32e8d494 VY |
2947 | rto_max - INTEGER |
2948 | The maximum value (in milliseconds) of the round trip timeout. This | |
2949 | is the largest time interval that can elapse between retransmissions. | |
2950 | ||
2951 | Default: 60000 | |
2952 | ||
2953 | rto_min - INTEGER | |
2954 | The minimum value (in milliseconds) of the round trip timeout. This | |
2955 | is the smallest time interval the can elapse between retransmissions. | |
2956 | ||
2957 | Default: 1000 | |
2958 | ||
2959 | hb_interval - INTEGER | |
2960 | The interval (in milliseconds) between HEARTBEAT chunks. These chunks | |
2961 | are sent at the specified interval on idle paths to probe the state of | |
2962 | a given path between 2 associations. | |
2963 | ||
2964 | Default: 30000 | |
2965 | ||
2966 | sack_timeout - INTEGER | |
2967 | The amount of time (in milliseconds) that the implementation will wait | |
2968 | to send a SACK. | |
2969 | ||
2970 | Default: 200 | |
2971 | ||
2972 | valid_cookie_life - INTEGER | |
2973 | The default lifetime of the SCTP cookie (in milliseconds). The cookie | |
2974 | is used during association establishment. | |
2975 | ||
2976 | Default: 60000 | |
2977 | ||
2978 | cookie_preserve_enable - BOOLEAN | |
2979 | Enable or disable the ability to extend the lifetime of the SCTP cookie | |
2980 | that is used during the establishment phase of SCTP association | |
2981 | ||
1cec2cac MCC |
2982 | - 1: Enable cookie lifetime extension. |
2983 | - 0: Disable | |
32e8d494 VY |
2984 | |
2985 | Default: 1 | |
2986 | ||
3c68198e NH |
2987 | cookie_hmac_alg - STRING |
2988 | Select the hmac algorithm used when generating the cookie value sent by | |
2989 | a listening sctp socket to a connecting client in the INIT-ACK chunk. | |
2990 | Valid values are: | |
1cec2cac | 2991 | |
3c68198e NH |
2992 | * md5 |
2993 | * sha1 | |
2994 | * none | |
1cec2cac | 2995 | |
3c68198e | 2996 | Ability to assign md5 or sha1 as the selected alg is predicated on the |
3b09adcb | 2997 | configuration of those algorithms at build time (CONFIG_CRYPTO_MD5 and |
3c68198e NH |
2998 | CONFIG_CRYPTO_SHA1). |
2999 | ||
3000 | Default: Dependent on configuration. MD5 if available, else SHA1 if | |
3001 | available, else none. | |
3002 | ||
32e8d494 VY |
3003 | rcvbuf_policy - INTEGER |
3004 | Determines if the receive buffer is attributed to the socket or to | |
3005 | association. SCTP supports the capability to create multiple | |
3006 | associations on a single socket. When using this capability, it is | |
3007 | possible that a single stalled association that's buffering a lot | |
3008 | of data may block other associations from delivering their data by | |
3009 | consuming all of the receive buffer space. To work around this, | |
3010 | the rcvbuf_policy could be set to attribute the receiver buffer space | |
3011 | to each association instead of the socket. This prevents the described | |
3012 | blocking. | |
3013 | ||
1cec2cac MCC |
3014 | - 1: rcvbuf space is per association |
3015 | - 0: rcvbuf space is per socket | |
32e8d494 VY |
3016 | |
3017 | Default: 0 | |
3018 | ||
3019 | sndbuf_policy - INTEGER | |
3020 | Similar to rcvbuf_policy above, this applies to send buffer space. | |
3021 | ||
1cec2cac MCC |
3022 | - 1: Send buffer is tracked per association |
3023 | - 0: Send buffer is tracked per socket. | |
32e8d494 VY |
3024 | |
3025 | Default: 0 | |
3026 | ||
3027 | sctp_mem - vector of 3 INTEGERs: min, pressure, max | |
3028 | Number of pages allowed for queueing by all SCTP sockets. | |
3029 | ||
3030 | min: Below this number of pages SCTP is not bothered about its | |
3031 | memory appetite. When amount of memory allocated by SCTP exceeds | |
3032 | this number, SCTP starts to moderate memory usage. | |
3033 | ||
3034 | pressure: This value was introduced to follow format of tcp_mem. | |
3035 | ||
3036 | max: Number of pages allowed for queueing by all SCTP sockets. | |
3037 | ||
3038 | Default is calculated at boot time from amount of available memory. | |
3039 | ||
3040 | sctp_rmem - vector of 3 INTEGERs: min, default, max | |
a6e1204b MM |
3041 | Only the first value ("min") is used, "default" and "max" are |
3042 | ignored. | |
3043 | ||
3044 | min: Minimal size of receive buffer used by SCTP socket. | |
3045 | It is guaranteed to each SCTP socket (but not association) even | |
3046 | under moderate memory pressure. | |
3047 | ||
320bd6de | 3048 | Default: 4K |
32e8d494 VY |
3049 | |
3050 | sctp_wmem - vector of 3 INTEGERs: min, default, max | |
aa709da0 XL |
3051 | Only the first value ("min") is used, "default" and "max" are |
3052 | ignored. | |
3053 | ||
3054 | min: Minimum size of send buffer that can be used by SCTP sockets. | |
3055 | It is guaranteed to each SCTP socket (but not association) even | |
3056 | under moderate memory pressure. | |
3057 | ||
3058 | Default: 4K | |
32e8d494 | 3059 | |
72388433 BD |
3060 | addr_scope_policy - INTEGER |
3061 | Control IPv4 address scoping - draft-stewart-tsvwg-sctp-ipv4-00 | |
3062 | ||
1cec2cac MCC |
3063 | - 0 - Disable IPv4 address scoping |
3064 | - 1 - Enable IPv4 address scoping | |
3065 | - 2 - Follow draft but allow IPv4 private addresses | |
3066 | - 3 - Follow draft but allow IPv4 link local addresses | |
72388433 BD |
3067 | |
3068 | Default: 1 | |
3069 | ||
046c052b XL |
3070 | udp_port - INTEGER |
3071 | The listening port for the local UDP tunneling sock. Normally it's | |
3072 | using the IANA-assigned UDP port number 9899 (sctp-tunneling). | |
3073 | ||
3074 | This UDP sock is used for processing the incoming UDP-encapsulated | |
3075 | SCTP packets (from RFC6951), and shared by all applications in the | |
3076 | same net namespace. This UDP sock will be closed when the value is | |
3077 | set to 0. | |
3078 | ||
3079 | The value will also be used to set the src port of the UDP header | |
3080 | for the outgoing UDP-encapsulated SCTP packets. For the dest port, | |
3081 | please refer to 'encap_port' below. | |
3082 | ||
3083 | Default: 0 | |
3084 | ||
e8a3001c XL |
3085 | encap_port - INTEGER |
3086 | The default remote UDP encapsulation port. | |
3087 | ||
3088 | This value is used to set the dest port of the UDP header for the | |
3089 | outgoing UDP-encapsulated SCTP packets by default. Users can also | |
3090 | change the value for each sock/asoc/transport by using setsockopt. | |
3091 | For further information, please refer to RFC6951. | |
3092 | ||
3093 | Note that when connecting to a remote server, the client should set | |
3094 | this to the port that the UDP tunneling sock on the peer server is | |
3095 | listening to and the local UDP tunneling sock on the client also | |
3096 | must be started. On the server, it would get the encap_port from | |
3097 | the incoming packet's source port. | |
3098 | ||
3099 | Default: 0 | |
3100 | ||
d1e462a7 | 3101 | plpmtud_probe_interval - INTEGER |
fea1d5b1 XL |
3102 | The time interval (in milliseconds) for the PLPMTUD probe timer, |
3103 | which is configured to expire after this period to receive an | |
3104 | acknowledgment to a probe packet. This is also the time interval | |
3105 | between the probes for the current pmtu when the probe search | |
3106 | is done. | |
3107 | ||
3108 | PLPMTUD will be disabled when 0 is set, and other values for it | |
3109 | must be >= 5000. | |
d1e462a7 XL |
3110 | |
3111 | Default: 0 | |
3112 | ||
c349ae5f XL |
3113 | reconf_enable - BOOLEAN |
3114 | Enable or disable extension of Stream Reconfiguration functionality | |
3115 | specified in RFC6525. This extension provides the ability to "reset" | |
3116 | a stream, and it includes the Parameters of "Outgoing/Incoming SSN | |
3117 | Reset", "SSN/TSN Reset" and "Add Outgoing/Incoming Streams". | |
3118 | ||
3119 | - 1: Enable extension. | |
3120 | - 0: Disable extension. | |
3121 | ||
3122 | Default: 0 | |
3123 | ||
e65775fd XL |
3124 | intl_enable - BOOLEAN |
3125 | Enable or disable extension of User Message Interleaving functionality | |
3126 | specified in RFC8260. This extension allows the interleaving of user | |
3127 | messages sent on different streams. With this feature enabled, I-DATA | |
3128 | chunk will replace DATA chunk to carry user messages if also supported | |
3129 | by the peer. Note that to use this feature, one needs to set this option | |
3130 | to 1 and also needs to set socket options SCTP_FRAGMENT_INTERLEAVE to 2 | |
3131 | and SCTP_INTERLEAVING_SUPPORTED to 1. | |
3132 | ||
3133 | - 1: Enable extension. | |
3134 | - 0: Disable extension. | |
3135 | ||
3136 | Default: 0 | |
3137 | ||
249eddaf XL |
3138 | ecn_enable - BOOLEAN |
3139 | Control use of Explicit Congestion Notification (ECN) by SCTP. | |
3140 | Like in TCP, ECN is used only when both ends of the SCTP connection | |
3141 | indicate support for it. This feature is useful in avoiding losses | |
3142 | due to congestion by allowing supporting routers to signal congestion | |
3143 | before having to drop packets. | |
3144 | ||
3145 | 1: Enable ecn. | |
3146 | 0: Disable ecn. | |
3147 | ||
3148 | Default: 1 | |
3149 | ||
b712d032 XL |
3150 | l3mdev_accept - BOOLEAN |
3151 | Enabling this option allows a "global" bound socket to work | |
3152 | across L3 master domains (e.g., VRFs) with packets capable of | |
3153 | being received regardless of the L3 domain in which they | |
3154 | originated. Only valid when the kernel was compiled with | |
3155 | CONFIG_NET_L3_MASTER_DEV. | |
3156 | ||
3157 | Default: 1 (enabled) | |
3158 | ||
1da177e4 | 3159 | |
1cec2cac MCC |
3160 | ``/proc/sys/net/core/*`` |
3161 | ======================== | |
3162 | ||
57043247 | 3163 | Please see: Documentation/admin-guide/sysctl/net.rst for descriptions of these entries. |
705efc3b | 3164 | |
4edc2f34 | 3165 | |
1cec2cac MCC |
3166 | ``/proc/sys/net/unix/*`` |
3167 | ======================== | |
3168 | ||
705efc3b WT |
3169 | max_dgram_qlen - INTEGER |
3170 | The maximum length of dgram socket receive queue | |
3171 | ||
3172 | Default: 10 | |
3173 |