Hyper-V: Bug: Connection loss during LiveMigrations (Windows Firewall)

Aus Wiki-WebPerfect
Version vom 19. September 2018, 08:56 Uhr von Admin (Diskussion | Beiträge)

(Unterschied) ← Nächstältere Version | Aktuelle Version (Unterschied) | Nächstjüngere Version → (Unterschied)
Wechseln zu: Navigation, Suche

Error

We have a problem with Live Migration of VMs that have Windows Firewall (InGuest) enabled.
After a Live Migration of VMs with Windows Firewall enabled the VM is no longer available for approximately 10 seconds. Also the VM doesn't answer ping (ICMP) for approximately 10 seconds.
When we deactivate the Windows Firewall on the VM, we have no problems. (maximum ping loss of 1 ping)

Additional informations:
Hypervisor = Hyper-V 2016 (Core & GUI)
VMs = Windows Server 2012 R2 & Windows Server 2016



Cause

In the Live Migration scenario, they have changed the way to manage the interface “reconnection” for a classic disconnect/reconnect NIC. The goal is to speed up the reconnection without modifying anything in the TCPIP stack. Windows receives a NDIS_STATUS_NETWORK_CHANGE form the NDIS driver, which will end up in the firewall code with minimum information to be modified.
BUT:

  • We wait for a special EVENT to inform the firewall that all changes/checks have been realized, while we are waiting, we enter in the “interface quarantine” mode where we refuse all new incoming connections.
  • This event never arrived to the waiting thread and we wait for the 7 sec (hardcoded value) before leaving out the quarantine interface state.

We know that new incoming requests for NEW traffic are blocked for 7 seconds. We also know that a NEW TCP session request will fail during the 7 sec but the TCP retransmit protocol is totally resilient to this short outage, and others TCP SYN retransmit are going to be sent OUTSIDE the 7 sec timeframe, permitting the TCP session to be established. So, no TCP sessions (with corresponding applications) should be disturbed during this “short period” of quarantine. The packet dropped on the firewall logs should confirm that point.

Indeed, UDP based applications will suffer of this 7 sec blocking period.



Workaround

Workaround 1 (recommended) - Persistent (InGuest reboot required)

  • Create following Registry-Key to disable Firewall Interface Quarantine on all VMs (InGuest)
Registry Path: HKLM\SYSTEM\CurrentControlSet\services\SharedAccess\Parameters\FirewallPolicy
Registry Key: IntfQuarantineEnabled
Values: 0 (DWORD 32bit Value)
  • After creation of the Registry-Key, reboot all the VMs that have the Key.


Workaround 2 - OneTime (no reboot required)

  • Stopping the Service "Network List Service"
 Get-Service netprofm | Stop-Service
  • LiveMigrate the VM
  • Start the Service "Network List Service"
 Get-Service netprofm | Start-Service


Solution

  • Install Patch KB4338822 or newer
  • Create following Registry-Key to "disable VM Notify (EventID 14)" on all Hyper-V Nodes
Registry Path: HKLM\System\CurrentControlSet\Services\VmsMp\Parameters
Registry Key: SendLmNetworkChangeIndication
Values: 0 (DWORD 32bit Value)
  • After creation of the Registry-Key, reboot all Hyper-V Nodes that have the Key.