Commit | Line | Data |
---|---|---|
562d897d DA |
1 | Virtual Routing and Forwarding (VRF) |
2 | ==================================== | |
3 | The VRF device combined with ip rules provides the ability to create virtual | |
4 | routing and forwarding domains (aka VRFs, VRF-lite to be specific) in the | |
5 | Linux network stack. One use case is the multi-tenancy problem where each | |
6 | tenant has their own unique routing tables and in the very least need | |
7 | different default gateways. | |
8 | ||
9 | Processes can be "VRF aware" by binding a socket to the VRF device. Packets | |
10 | through the socket then use the routing table associated with the VRF | |
11 | device. An important feature of the VRF device implementation is that it | |
12 | impacts only Layer 3 and above so L2 tools (e.g., LLDP) are not affected | |
13 | (ie., they do not need to be run in each VRF). The design also allows | |
14 | the use of higher priority ip rules (Policy Based Routing, PBR) to take | |
15 | precedence over the VRF device rules directing specific traffic as desired. | |
16 | ||
17 | In addition, VRF devices allow VRFs to be nested within namespaces. For | |
6e076537 DA |
18 | example network namespaces provide separation of network interfaces at the |
19 | device layer, VLANs on the interfaces within a namespace provide L2 separation | |
20 | and then VRF devices provide L3 separation. | |
562d897d DA |
21 | |
22 | Design | |
23 | ------ | |
24 | A VRF device is created with an associated route table. Network interfaces | |
25 | are then enslaved to a VRF device: | |
26 | ||
27 | +-----------------------------+ | |
28 | | vrf-blue | ===> route table 10 | |
29 | +-----------------------------+ | |
30 | | | | | |
31 | +------+ +------+ +-------------+ | |
32 | | eth1 | | eth2 | ... | bond1 | | |
33 | +------+ +------+ +-------------+ | |
34 | | | | |
35 | +------+ +------+ | |
36 | | eth8 | | eth9 | | |
37 | +------+ +------+ | |
38 | ||
39 | Packets received on an enslaved device and are switched to the VRF device | |
6e076537 DA |
40 | in the IPv4 and IPv6 processing stacks giving the impression that packets |
41 | flow through the VRF device. Similarly on egress routing rules are used to | |
42 | send packets to the VRF device driver before getting sent out the actual | |
43 | interface. This allows tcpdump on a VRF device to capture all packets into | |
44 | and out of the VRF as a whole.[1] Similarly, netfilter[2] and tc rules can be | |
45 | applied using the VRF device to specify rules that apply to the VRF domain | |
46 | as a whole. | |
562d897d DA |
47 | |
48 | [1] Packets in the forwarded state do not flow through the device, so those | |
49 | packets are not seen by tcpdump. Will revisit this limitation in a | |
50 | future release. | |
51 | ||
6e076537 DA |
52 | [2] Iptables on ingress supports PREROUTING with skb->dev set to the real |
53 | ingress device and both INPUT and PREROUTING rules with skb->dev set to | |
54 | the VRF device. For egress POSTROUTING and OUTPUT rules can be written | |
55 | using either the VRF device or real egress device. | |
562d897d DA |
56 | |
57 | Setup | |
58 | ----- | |
59 | 1. VRF device is created with an association to a FIB table. | |
60 | e.g, ip link add vrf-blue type vrf table 10 | |
61 | ip link set dev vrf-blue up | |
62 | ||
6e076537 DA |
63 | 2. An l3mdev FIB rule directs lookups to the table associated with the device. |
64 | A single l3mdev rule is sufficient for all VRFs. The VRF device adds the | |
65 | l3mdev rule for IPv4 and IPv6 when the first device is created with a | |
66 | default preference of 1000. Users may delete the rule if desired and add | |
67 | with a different priority or install per-VRF rules. | |
68 | ||
69 | Prior to the v4.8 kernel iif and oif rules are needed for each VRF device: | |
562d897d DA |
70 | ip ru add oif vrf-blue table 10 |
71 | ip ru add iif vrf-blue table 10 | |
72 | ||
6e076537 | 73 | 3. Set the default route for the table (and hence default route for the VRF). |
17c91884 DS |
74 | ip route add table 10 unreachable default metric 4278198272 |
75 | ||
76 | This high metric value ensures that the default unreachable route can | |
77 | be overridden by a routing protocol suite. FRRouting interprets | |
78 | kernel metrics as a combined admin distance (upper byte) and priority | |
79 | (lower 3 bytes). Thus the above metric translates to [255/8192]. | |
562d897d | 80 | |
6e076537 DA |
81 | 4. Enslave L3 interfaces to a VRF device. |
82 | ip link set dev eth1 master vrf-blue | |
562d897d DA |
83 | |
84 | Local and connected routes for enslaved devices are automatically moved to | |
85 | the table associated with VRF device. Any additional routes depending on | |
6e076537 DA |
86 | the enslaved device are dropped and will need to be reinserted to the VRF |
87 | FIB table following the enslavement. | |
88 | ||
89 | The IPv6 sysctl option keep_addr_on_down can be enabled to keep IPv6 global | |
90 | addresses as VRF enslavement changes. | |
91 | sysctl -w net.ipv6.conf.all.keep_addr_on_down=1 | |
562d897d | 92 | |
6e076537 DA |
93 | 5. Additional VRF routes are added to associated table. |
94 | ip route add table 10 ... | |
562d897d DA |
95 | |
96 | ||
97 | Applications | |
98 | ------------ | |
99 | Applications that are to work within a VRF need to bind their socket to the | |
100 | VRF device: | |
101 | ||
102 | setsockopt(sd, SOL_SOCKET, SO_BINDTODEVICE, dev, strlen(dev)+1); | |
103 | ||
104 | or to specify the output device using cmsg and IP_PKTINFO. | |
105 | ||
63a6fff3 RS |
106 | TCP & UDP services running in the default VRF context (ie., not bound |
107 | to any VRF device) can work across all VRF domains by enabling the | |
108 | tcp_l3mdev_accept and udp_l3mdev_accept sysctl options: | |
6e076537 | 109 | sysctl -w net.ipv4.tcp_l3mdev_accept=1 |
63a6fff3 | 110 | sysctl -w net.ipv4.udp_l3mdev_accept=1 |
562d897d | 111 | |
6e076537 DA |
112 | netfilter rules on the VRF device can be used to limit access to services |
113 | running in the default VRF context as well. | |
114 | ||
115 | The default VRF does not have limited scope with respect to port bindings. | |
116 | That is, if a process does a wildcard bind to a port in the default VRF it | |
117 | owns the port across all VRF domains within the network namespace. | |
4b418bff DA |
118 | |
119 | ################################################################################ | |
120 | ||
121 | Using iproute2 for VRFs | |
122 | ======================= | |
6e076537 DA |
123 | iproute2 supports the vrf keyword as of v4.7. For backwards compatibility this |
124 | section lists both commands where appropriate -- with the vrf keyword and the | |
125 | older form without it. | |
4b418bff DA |
126 | |
127 | 1. Create a VRF | |
128 | ||
129 | To instantiate a VRF device and associate it with a table: | |
130 | $ ip link add dev NAME type vrf table ID | |
131 | ||
6e076537 DA |
132 | As of v4.8 the kernel supports the l3mdev FIB rule where a single rule |
133 | covers all VRFs. The l3mdev rule is created for IPv4 and IPv6 on first | |
134 | device create. | |
4b418bff DA |
135 | |
136 | 2. List VRFs | |
137 | ||
138 | To list VRFs that have been created: | |
139 | $ ip [-d] link show type vrf | |
140 | NOTE: The -d option is needed to show the table id | |
141 | ||
142 | For example: | |
143 | $ ip -d link show type vrf | |
6e076537 | 144 | 11: mgmt: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 |
4b418bff DA |
145 | link/ether 72:b3:ba:91:e2:24 brd ff:ff:ff:ff:ff:ff promiscuity 0 |
146 | vrf table 1 addrgenmode eui64 | |
6e076537 | 147 | 12: red: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 |
4b418bff DA |
148 | link/ether b6:6f:6e:f6:da:73 brd ff:ff:ff:ff:ff:ff promiscuity 0 |
149 | vrf table 10 addrgenmode eui64 | |
6e076537 | 150 | 13: blue: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 |
4b418bff DA |
151 | link/ether 36:62:e8:7d:bb:8c brd ff:ff:ff:ff:ff:ff promiscuity 0 |
152 | vrf table 66 addrgenmode eui64 | |
6e076537 | 153 | 14: green: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 |
4b418bff DA |
154 | link/ether e6:28:b8:63:70:bb brd ff:ff:ff:ff:ff:ff promiscuity 0 |
155 | vrf table 81 addrgenmode eui64 | |
156 | ||
157 | ||
158 | Or in brief output: | |
159 | ||
160 | $ ip -br link show type vrf | |
6e076537 DA |
161 | mgmt UP 72:b3:ba:91:e2:24 <NOARP,MASTER,UP,LOWER_UP> |
162 | red UP b6:6f:6e:f6:da:73 <NOARP,MASTER,UP,LOWER_UP> | |
163 | blue UP 36:62:e8:7d:bb:8c <NOARP,MASTER,UP,LOWER_UP> | |
164 | green UP e6:28:b8:63:70:bb <NOARP,MASTER,UP,LOWER_UP> | |
4b418bff DA |
165 | |
166 | ||
167 | 3. Assign a Network Interface to a VRF | |
168 | ||
169 | Network interfaces are assigned to a VRF by enslaving the netdevice to a | |
170 | VRF device: | |
6e076537 | 171 | $ ip link set dev NAME master NAME |
4b418bff DA |
172 | |
173 | On enslavement connected and local routes are automatically moved to the | |
174 | table associated with the VRF device. | |
175 | ||
176 | For example: | |
6e076537 | 177 | $ ip link set dev eth0 master mgmt |
4b418bff DA |
178 | |
179 | ||
180 | 4. Show Devices Assigned to a VRF | |
181 | ||
182 | To show devices that have been assigned to a specific VRF add the master | |
183 | option to the ip command: | |
6e076537 DA |
184 | $ ip link show vrf NAME |
185 | $ ip link show master NAME | |
4b418bff DA |
186 | |
187 | For example: | |
6e076537 DA |
188 | $ ip link show vrf red |
189 | 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP mode DEFAULT group default qlen 1000 | |
4b418bff | 190 | link/ether 02:00:00:00:02:02 brd ff:ff:ff:ff:ff:ff |
6e076537 | 191 | 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP mode DEFAULT group default qlen 1000 |
4b418bff | 192 | link/ether 02:00:00:00:02:03 brd ff:ff:ff:ff:ff:ff |
6e076537 | 193 | 7: eth5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master red state DOWN mode DEFAULT group default qlen 1000 |
4b418bff DA |
194 | link/ether 02:00:00:00:02:06 brd ff:ff:ff:ff:ff:ff |
195 | ||
196 | ||
197 | Or using the brief output: | |
484f674b | 198 | $ ip -br link show vrf red |
4b418bff DA |
199 | eth1 UP 02:00:00:00:02:02 <BROADCAST,MULTICAST,UP,LOWER_UP> |
200 | eth2 UP 02:00:00:00:02:03 <BROADCAST,MULTICAST,UP,LOWER_UP> | |
201 | eth5 DOWN 02:00:00:00:02:06 <BROADCAST,MULTICAST> | |
202 | ||
203 | ||
204 | 5. Show Neighbor Entries for a VRF | |
205 | ||
206 | To list neighbor entries associated with devices enslaved to a VRF device | |
207 | add the master option to the ip command: | |
6e076537 DA |
208 | $ ip [-6] neigh show vrf NAME |
209 | $ ip [-6] neigh show master NAME | |
4b418bff DA |
210 | |
211 | For example: | |
6e076537 | 212 | $ ip neigh show vrf red |
4b418bff DA |
213 | 10.2.1.254 dev eth1 lladdr a6:d9:c7:4f:06:23 REACHABLE |
214 | 10.2.2.254 dev eth2 lladdr 5e:54:01:6a:ee:80 REACHABLE | |
215 | ||
484f674b DA |
216 | $ ip -6 neigh show vrf red |
217 | 2002:1::64 dev eth1 lladdr a6:d9:c7:4f:06:23 REACHABLE | |
4b418bff DA |
218 | |
219 | ||
220 | 6. Show Addresses for a VRF | |
221 | ||
222 | To show addresses for interfaces associated with a VRF add the master | |
223 | option to the ip command: | |
6e076537 DA |
224 | $ ip addr show vrf NAME |
225 | $ ip addr show master NAME | |
4b418bff DA |
226 | |
227 | For example: | |
6e076537 DA |
228 | $ ip addr show vrf red |
229 | 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000 | |
4b418bff DA |
230 | link/ether 02:00:00:00:02:02 brd ff:ff:ff:ff:ff:ff |
231 | inet 10.2.1.2/24 brd 10.2.1.255 scope global eth1 | |
232 | valid_lft forever preferred_lft forever | |
233 | inet6 2002:1::2/120 scope global | |
234 | valid_lft forever preferred_lft forever | |
235 | inet6 fe80::ff:fe00:202/64 scope link | |
236 | valid_lft forever preferred_lft forever | |
6e076537 | 237 | 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000 |
4b418bff DA |
238 | link/ether 02:00:00:00:02:03 brd ff:ff:ff:ff:ff:ff |
239 | inet 10.2.2.2/24 brd 10.2.2.255 scope global eth2 | |
240 | valid_lft forever preferred_lft forever | |
241 | inet6 2002:2::2/120 scope global | |
242 | valid_lft forever preferred_lft forever | |
243 | inet6 fe80::ff:fe00:203/64 scope link | |
244 | valid_lft forever preferred_lft forever | |
6e076537 | 245 | 7: eth5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master red state DOWN group default qlen 1000 |
4b418bff DA |
246 | link/ether 02:00:00:00:02:06 brd ff:ff:ff:ff:ff:ff |
247 | ||
248 | Or in brief format: | |
6e076537 | 249 | $ ip -br addr show vrf red |
4b418bff DA |
250 | eth1 UP 10.2.1.2/24 2002:1::2/120 fe80::ff:fe00:202/64 |
251 | eth2 UP 10.2.2.2/24 2002:2::2/120 fe80::ff:fe00:203/64 | |
252 | eth5 DOWN | |
253 | ||
254 | ||
255 | 7. Show Routes for a VRF | |
256 | ||
257 | To show routes for a VRF use the ip command to display the table associated | |
258 | with the VRF device: | |
6e076537 | 259 | $ ip [-6] route show vrf NAME |
4b418bff DA |
260 | $ ip [-6] route show table ID |
261 | ||
262 | For example: | |
6e076537 | 263 | $ ip route show vrf red |
17c91884 | 264 | unreachable default metric 4278198272 |
4b418bff DA |
265 | broadcast 10.2.1.0 dev eth1 proto kernel scope link src 10.2.1.2 |
266 | 10.2.1.0/24 dev eth1 proto kernel scope link src 10.2.1.2 | |
267 | local 10.2.1.2 dev eth1 proto kernel scope host src 10.2.1.2 | |
268 | broadcast 10.2.1.255 dev eth1 proto kernel scope link src 10.2.1.2 | |
269 | broadcast 10.2.2.0 dev eth2 proto kernel scope link src 10.2.2.2 | |
270 | 10.2.2.0/24 dev eth2 proto kernel scope link src 10.2.2.2 | |
271 | local 10.2.2.2 dev eth2 proto kernel scope host src 10.2.2.2 | |
272 | broadcast 10.2.2.255 dev eth2 proto kernel scope link src 10.2.2.2 | |
273 | ||
6e076537 | 274 | $ ip -6 route show vrf red |
4b418bff DA |
275 | local 2002:1:: dev lo proto none metric 0 pref medium |
276 | local 2002:1::2 dev lo proto none metric 0 pref medium | |
277 | 2002:1::/120 dev eth1 proto kernel metric 256 pref medium | |
278 | local 2002:2:: dev lo proto none metric 0 pref medium | |
279 | local 2002:2::2 dev lo proto none metric 0 pref medium | |
280 | 2002:2::/120 dev eth2 proto kernel metric 256 pref medium | |
281 | local fe80:: dev lo proto none metric 0 pref medium | |
282 | local fe80:: dev lo proto none metric 0 pref medium | |
283 | local fe80::ff:fe00:202 dev lo proto none metric 0 pref medium | |
284 | local fe80::ff:fe00:203 dev lo proto none metric 0 pref medium | |
285 | fe80::/64 dev eth1 proto kernel metric 256 pref medium | |
286 | fe80::/64 dev eth2 proto kernel metric 256 pref medium | |
6e076537 | 287 | ff00::/8 dev red metric 256 pref medium |
4b418bff DA |
288 | ff00::/8 dev eth1 metric 256 pref medium |
289 | ff00::/8 dev eth2 metric 256 pref medium | |
17c91884 | 290 | unreachable default dev lo metric 4278198272 error -101 pref medium |
4b418bff DA |
291 | |
292 | 8. Route Lookup for a VRF | |
293 | ||
6e076537 DA |
294 | A test route lookup can be done for a VRF: |
295 | $ ip [-6] route get vrf NAME ADDRESS | |
296 | $ ip [-6] route get oif NAME ADDRESS | |
4b418bff DA |
297 | |
298 | For example: | |
6e076537 DA |
299 | $ ip route get 10.2.1.40 vrf red |
300 | 10.2.1.40 dev eth1 table red src 10.2.1.2 | |
4b418bff DA |
301 | cache |
302 | ||
6e076537 DA |
303 | $ ip -6 route get 2002:1::32 vrf red |
304 | 2002:1::32 from :: dev eth1 table red proto kernel src 2002:1::2 metric 256 pref medium | |
4b418bff DA |
305 | |
306 | ||
307 | 9. Removing Network Interface from a VRF | |
308 | ||
309 | Network interfaces are removed from a VRF by breaking the enslavement to | |
310 | the VRF device: | |
311 | $ ip link set dev NAME nomaster | |
312 | ||
313 | Connected routes are moved back to the default table and local entries are | |
314 | moved to the local table. | |
315 | ||
316 | For example: | |
317 | $ ip link set dev eth0 nomaster | |
318 | ||
319 | -------------------------------------------------------------------------------- | |
320 | ||
321 | Commands used in this example: | |
322 | ||
6e076537 DA |
323 | cat >> /etc/iproute2/rt_tables.d/vrf.conf <<EOF |
324 | 1 mgmt | |
325 | 10 red | |
326 | 66 blue | |
327 | 81 green | |
4b418bff DA |
328 | EOF |
329 | ||
330 | function vrf_create | |
331 | { | |
332 | VRF=$1 | |
333 | TBID=$2 | |
4b418bff | 334 | |
6e076537 DA |
335 | # create VRF device |
336 | ip link add ${VRF} type vrf table ${TBID} | |
4b418bff DA |
337 | |
338 | if [ "${VRF}" != "mgmt" ]; then | |
17c91884 | 339 | ip route add table ${TBID} unreachable default metric 4278198272 |
4b418bff | 340 | fi |
6e076537 | 341 | ip link set dev ${VRF} up |
4b418bff DA |
342 | } |
343 | ||
344 | vrf_create mgmt 1 | |
6e076537 | 345 | ip link set dev eth0 master mgmt |
4b418bff DA |
346 | |
347 | vrf_create red 10 | |
6e076537 DA |
348 | ip link set dev eth1 master red |
349 | ip link set dev eth2 master red | |
350 | ip link set dev eth5 master red | |
4b418bff DA |
351 | |
352 | vrf_create blue 66 | |
6e076537 | 353 | ip link set dev eth3 master blue |
4b418bff DA |
354 | |
355 | vrf_create green 81 | |
6e076537 | 356 | ip link set dev eth4 master green |
4b418bff DA |
357 | |
358 | ||
359 | Interface addresses from /etc/network/interfaces: | |
360 | auto eth0 | |
361 | iface eth0 inet static | |
362 | address 10.0.0.2 | |
363 | netmask 255.255.255.0 | |
364 | gateway 10.0.0.254 | |
365 | ||
366 | iface eth0 inet6 static | |
367 | address 2000:1::2 | |
368 | netmask 120 | |
369 | ||
370 | auto eth1 | |
371 | iface eth1 inet static | |
372 | address 10.2.1.2 | |
373 | netmask 255.255.255.0 | |
374 | ||
375 | iface eth1 inet6 static | |
376 | address 2002:1::2 | |
377 | netmask 120 | |
378 | ||
379 | auto eth2 | |
380 | iface eth2 inet static | |
381 | address 10.2.2.2 | |
382 | netmask 255.255.255.0 | |
383 | ||
384 | iface eth2 inet6 static | |
385 | address 2002:2::2 | |
386 | netmask 120 | |
387 | ||
388 | auto eth3 | |
389 | iface eth3 inet static | |
390 | address 10.2.3.2 | |
391 | netmask 255.255.255.0 | |
392 | ||
393 | iface eth3 inet6 static | |
394 | address 2002:3::2 | |
395 | netmask 120 | |
396 | ||
397 | auto eth4 | |
398 | iface eth4 inet static | |
399 | address 10.2.4.2 | |
400 | netmask 255.255.255.0 | |
401 | ||
402 | iface eth4 inet6 static | |
403 | address 2002:4::2 | |
404 | netmask 120 |