Hyper-V: Bug: Connection loss during LiveMigrations (Windows Firewall)
We have a problem with Live Migration of VMs that have Windows Firewall (InGuest) enabled.
After a Live Migration of VMs with Windows Firewall enabled the VM is no longer available for approximately 10 seconds. Also the VM doesn't answer ping (ICMP) for approximately 10 seconds.
When we deactivate the Windows Firewall on the VM, we have no problems. (maximum ping loss of 1 ping)
Hypervisor = Hyper-V 2016 (Core & GUI)
VMs = Windows Server 2012 R2 & Windows Server 2016
In the Live Migration scenario, they have changed the way to manage the interface “reconnection” for a classic disconnect/reconnect NIC.
The goal is to speed up the reconnection without modifying anything in the TCPIP stack.
Windows receives a NDIS_STATUS_NETWORK_CHANGE form the NDIS driver, which will end up in the firewall code with minimum information to be modified.
- We wait for a special EVENT to inform the firewall that all changes/checks have been realized, while we are waiting, we enter in the “interface quarantine” mode where we refuse all new incoming connections.
- This event never arrived to the waiting thread and we wait for the 7 sec (hardcoded value) before leaving out the quarantine interface state.
We know that new incoming requests for NEW traffic are blocked for 7 seconds. We also know that a NEW TCP session request will fail during the 7 sec but the TCP retransmit protocol is totally resilient to this short outage, and others TCP SYN retransmit are going to be sent OUTSIDE the 7 sec timeframe, permitting the TCP session to be established. So, no TCP sessions (with corresponding applications) should be disturbed during this “short period” of quarantine. The packet dropped on the firewall logs should confirm that point.
Indeed, UDP based applications will suffer of this 7 sec blocking period.
Workaround 1 (recommended) - Persistent (InGuest reboot required)
- Create following Registry-Key to disable Firewall Interface Quarantine on all VMs (InGuest)
Registry Path: HKLM\SYSTEM\CurrentControlSet\services\SharedAccess\Parameters\FirewallPolicy Registry Key: IntfQuarantineEnabled Values: 0 (DWORD 32bit Value)
- After creation of the Registry-Key, reboot all the VMs that have the Key.
Workaround 2 - OneTime (no reboot required)
- Stopping the Service "Network List Service"
Get-Service netprofm | Stop-Service
- LiveMigrate the VM
- Start the Service "Network List Service"
Get-Service netprofm | Start-Service
- Install Patch KB4338822 or newer
- Create following Registry-Key to "disable VM Notify (EventID 14)" on all Hyper-V Nodes
Registry Path: HKLM\System\CurrentControlSet\Services\VmsMp\Parameters Registry Key: SendLmNetworkChangeIndication Values: 0 (DWORD 32bit Value)
- After creation of the Registry-Key, reboot all Hyper-V Nodes that have the Key.