Commit | Line | Data |
---|---|---|
cfc80d9a SS |
1 | .. SPDX-License-Identifier: GPL-2.0 |
2 | ||
3 | ============ | |
4 | NET_FAILOVER | |
5 | ============ | |
6 | ||
7 | Overview | |
8 | ======== | |
9 | ||
10 | The net_failover driver provides an automated failover mechanism via APIs | |
f8a0fea9 | 11 | to create and destroy a failover master netdev and manages a primary and |
cfc80d9a | 12 | standby slave netdevs that get registered via the generic failover |
f8a0fea9 | 13 | infrastructure. |
cfc80d9a SS |
14 | |
15 | The failover netdev acts a master device and controls 2 slave devices. The | |
16 | original paravirtual interface is registered as 'standby' slave netdev and | |
17 | a passthru/vf device with the same MAC gets registered as 'primary' slave | |
18 | netdev. Both 'standby' and 'failover' netdevs are associated with the same | |
19 | 'pci' device. The user accesses the network interface via 'failover' netdev. | |
20 | The 'failover' netdev chooses 'primary' netdev as default for transmits when | |
21 | it is available with link up and running. | |
22 | ||
23 | This can be used by paravirtual drivers to enable an alternate low latency | |
24 | datapath. It also enables hypervisor controlled live migration of a VM with | |
25 | direct attached VF by failing over to the paravirtual datapath when the VF | |
26 | is unplugged. | |
ba5e4426 SS |
27 | |
28 | virtio-net accelerated datapath: STANDBY mode | |
29 | ============================================= | |
30 | ||
31 | net_failover enables hypervisor controlled accelerated datapath to virtio-net | |
f8a0fea9 | 32 | enabled VMs in a transparent manner with no/minimal guest userspace changes. |
ba5e4426 SS |
33 | |
34 | To support this, the hypervisor needs to enable VIRTIO_NET_F_STANDBY | |
35 | feature on the virtio-net interface and assign the same MAC address to both | |
36 | virtio-net and VF interfaces. | |
37 | ||
738baea4 | 38 | Here is an example libvirt XML snippet that shows such configuration: |
28809849 TH |
39 | :: |
40 | ||
41 | <interface type='network'> | |
42 | <mac address='52:54:00:00:12:53'/> | |
43 | <source network='enp66s0f0_br'/> | |
44 | <target dev='tap01'/> | |
45 | <model type='virtio'/> | |
46 | <driver name='vhost' queues='4'/> | |
47 | <link state='down'/> | |
738baea4 VK |
48 | <teaming type='persistent'/> |
49 | <alias name='ua-backup0'/> | |
28809849 TH |
50 | </interface> |
51 | <interface type='hostdev' managed='yes'> | |
52 | <mac address='52:54:00:00:12:53'/> | |
53 | <source> | |
54 | <address type='pci' domain='0x0000' bus='0x42' slot='0x02' function='0x5'/> | |
55 | </source> | |
738baea4 | 56 | <teaming type='transient' persistent='ua-backup0'/> |
28809849 | 57 | </interface> |
ba5e4426 | 58 | |
738baea4 VK |
59 | In this configuration, the first device definition is for the virtio-net |
60 | interface and this acts as the 'persistent' device indicating that this | |
61 | interface will always be plugged in. This is specified by the 'teaming' tag with | |
62 | required attribute type having value 'persistent'. The link state for the | |
63 | virtio-net device is set to 'down' to ensure that the 'failover' netdev prefers | |
64 | the VF passthrough device for normal communication. The virtio-net device will | |
65 | be brought UP during live migration to allow uninterrupted communication. | |
66 | ||
67 | The second device definition is for the VF passthrough interface. Here the | |
68 | 'teaming' tag is provided with type 'transient' indicating that this device may | |
69 | periodically be unplugged. A second attribute - 'persistent' is provided and | |
70 | points to the alias name declared for the virtio-net device. | |
71 | ||
ba5e4426 | 72 | Booting a VM with the above configuration will result in the following 3 |
738baea4 | 73 | interfaces created in the VM: |
28809849 TH |
74 | :: |
75 | ||
76 | 4: ens10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 | |
77 | link/ether 52:54:00:00:12:53 brd ff:ff:ff:ff:ff:ff | |
78 | inet 192.168.12.53/24 brd 192.168.12.255 scope global dynamic ens10 | |
79 | valid_lft 42482sec preferred_lft 42482sec | |
80 | inet6 fe80::97d8:db2:8c10:b6d6/64 scope link | |
81 | valid_lft forever preferred_lft forever | |
738baea4 | 82 | 5: ens10nsby: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master ens10 state DOWN group default qlen 1000 |
28809849 TH |
83 | link/ether 52:54:00:00:12:53 brd ff:ff:ff:ff:ff:ff |
84 | 7: ens11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ens10 state UP group default qlen 1000 | |
85 | link/ether 52:54:00:00:12:53 brd ff:ff:ff:ff:ff:ff | |
ba5e4426 | 86 | |
738baea4 VK |
87 | Here, ens10 is the 'failover' master interface, ens10nsby is the slave 'standby' |
88 | virtio-net interface, and ens11 is the slave 'primary' VF passthrough interface. | |
89 | ||
90 | One point to note here is that some user space network configuration daemons | |
91 | like systemd-networkd, ifupdown, etc, do not understand the 'net_failover' | |
92 | device; and on the first boot, the VM might end up with both 'failover' device | |
93 | and VF accquiring IP addresses (either same or different) from the DHCP server. | |
94 | This will result in lack of connectivity to the VM. So some tweaks might be | |
95 | needed to these network configuration daemons to make sure that an IP is | |
96 | received only on the 'failover' device. | |
97 | ||
98 | Below is the patch snippet used with 'cloud-ifupdown-helper' script found on | |
99 | Debian cloud images: | |
100 | ||
101 | :: | |
102 | @@ -27,6 +27,8 @@ do_setup() { | |
103 | local working="$cfgdir/.$INTERFACE" | |
104 | local final="$cfgdir/$INTERFACE" | |
105 | ||
106 | + if [ -d "/sys/class/net/${INTERFACE}/master" ]; then exit 0; fi | |
107 | + | |
108 | if ifup --no-act "$INTERFACE" > /dev/null 2>&1; then | |
109 | # interface is already known to ifupdown, no need to generate cfg | |
110 | log "Skipping configuration generation for $INTERFACE" | |
111 | ||
ba5e4426 SS |
112 | |
113 | Live Migration of a VM with SR-IOV VF & virtio-net in STANDBY mode | |
114 | ================================================================== | |
115 | ||
116 | net_failover also enables hypervisor controlled live migration to be supported | |
117 | with VMs that have direct attached SR-IOV VF devices by automatic failover to | |
118 | the paravirtual datapath when the VF is unplugged. | |
119 | ||
738baea4 VK |
120 | Here is a sample script that shows the steps to initiate live migration from |
121 | the source hypervisor. Note: It is assumed that the VM is connected to a | |
122 | software bridge 'br0' which has a single VF attached to it along with the vnet | |
123 | device to the VM. This is not the VF that was passthrough'd to the VM (seen in | |
124 | the vf.xml file). | |
28809849 | 125 | :: |
ba5e4426 | 126 | |
738baea4 | 127 | # cat vf.xml |
28809849 TH |
128 | <interface type='hostdev' managed='yes'> |
129 | <mac address='52:54:00:00:12:53'/> | |
130 | <source> | |
131 | <address type='pci' domain='0x0000' bus='0x42' slot='0x02' function='0x5'/> | |
132 | </source> | |
738baea4 | 133 | <teaming type='transient' persistent='ua-backup0'/> |
28809849 | 134 | </interface> |
ba5e4426 | 135 | |
738baea4 | 136 | # Source Hypervisor migrate.sh |
28809849 | 137 | #!/bin/bash |
ba5e4426 | 138 | |
738baea4 VK |
139 | DOMAIN=vm-01 |
140 | PF=ens6np0 | |
141 | VF=ens6v1 # VF attached to the bridge. | |
142 | VF_NUM=1 | |
143 | TAP_IF=vmtap01 # virtio-net interface in the VM. | |
144 | VF_XML=vf.xml | |
ba5e4426 | 145 | |
28809849 TH |
146 | MAC=52:54:00:00:12:53 |
147 | ZERO_MAC=00:00:00:00:00:00 | |
ba5e4426 | 148 | |
738baea4 | 149 | # Set the virtio-net interface up. |
28809849 | 150 | virsh domif-setlink $DOMAIN $TAP_IF up |
738baea4 VK |
151 | |
152 | # Remove the VF that was passthrough'd to the VM. | |
153 | virsh detach-device --live --config $DOMAIN $VF_XML | |
154 | ||
28809849 | 155 | ip link set $PF vf $VF_NUM mac $ZERO_MAC |
ba5e4426 | 156 | |
738baea4 VK |
157 | # Add FDB entry for traffic to continue going to the VM via |
158 | # the VF -> br0 -> vnet interface path. | |
159 | bridge fdb add $MAC dev $VF | |
160 | bridge fdb add $MAC dev $TAP_IF master | |
161 | ||
162 | # Migrate the VM | |
163 | virsh migrate --live --persistent $DOMAIN qemu+ssh://$REMOTE_HOST/system | |
164 | ||
165 | # Clean up FDB entries after migration completes. | |
166 | bridge fdb del $MAC dev $VF | |
167 | bridge fdb del $MAC dev $TAP_IF master | |
ba5e4426 | 168 | |
738baea4 VK |
169 | On the destination hypervisor, a shared bridge 'br0' is created before migration |
170 | starts, and a VF from the destination PF is added to the bridge. Similarly an | |
171 | appropriate FDB entry is added. | |
172 | ||
173 | The following script is executed on the destination hypervisor once migration | |
174 | completes, and it reattaches the VF to the VM and brings down the virtio-net | |
175 | interface. | |
176 | ||
177 | :: | |
178 | # reattach-vf.sh | |
28809849 | 179 | #!/bin/bash |
ba5e4426 | 180 | |
738baea4 VK |
181 | bridge fdb del 52:54:00:00:12:53 dev ens36v0 |
182 | bridge fdb del 52:54:00:00:12:53 dev vmtap01 master | |
183 | virsh attach-device --config --live vm01 vf.xml | |
184 | virsh domif-setlink vm01 vmtap01 down |