locked
Flanneld: error registering network: cannot find network with management IP RRS feed

  • Question

  • This is a bit of a cross-post from my stackoverflow question since I'm not getting any traction there.

    I'm setting up a mixed-mode kubernetes cluster (CentOS7 master, WS2019 worker) using Flannel in overlay (VXLAN) mode. I've gotten through the Microsoft Kubernetes for Windows Instructions but when I kick off start.ps1 I'm stuck in the "Waiting for the Network to be created" loop referenced here. Launching flanneld directly per their instructions, I get the following error:

    E0306 16:43:21.218797 2576 main.go:289] Error registering network: Cannot find network with Management IP [IPAddrofWorkerNIC].

    The IP referenced is the main IP of the worker on the "Ethernet" NIC as called per the --iface argument to flanneld.

    The master and worker are both Hyper-v VMs off of a Win10 1809 box with MAC spoofing enabled. I confirmed that 6433/tcp, 10250/tcp, 4096/udp, and 4789/udp are opened in firewalld of the master. I also tried again after disabling firewalld, with no change, so I don't think the issue is on the master side.

    I tried digging through the flanneld Go code at the referenced line 289 for clues but I'm not familiar with Go and had to concede defeat.

    Any ideas why I would be getting this error?


    • Edited by Ian.Fuller Thursday, March 7, 2019 3:28 PM title error
    Thursday, March 7, 2019 2:59 PM

Answers

  • Thank you for the reply. That info appears to be slightly outdated. The page I'm following (https://docs.microsoft.com/en-us/virtualization/windowscontainers/kubernetes/getting-started-kubernetes-windows) indicates:

    > requires either Windows Server 2019 with KB4482887 installed or Windows Server vNext Insider Preview Build 18317+

    On the other hand I just noticed it also says:

    > requires Kubernetes v1.14 (or above) with WinOverlay feature gate enabled

    and Kubernetes v1.14 won't be released until March 25: https://github.com/kubernetes/sig-release/tree/master/releases/release-1.14



    Follow-up question: Can you confirm Flannel in host-gw mode works currently without insider preview build? This looks to indicate it'll work with regular Server 2019.
    • Edited by Ian.Fuller Friday, March 8, 2019 4:42 AM follow up question
    • Marked as answer by Ian.Fuller Friday, March 8, 2019 4:13 PM
    Friday, March 8, 2019 4:36 AM

All replies

  • Greetings,

    Currently, Flannel + VxLan requires Windows Server Insider.

    https://github.com/Microsoft/SDN/tree/master/Kubernetes/flannel/overlay


    Sic Parvis Magna

    Friday, March 8, 2019 1:01 AM
  • Thank you for the reply. That info appears to be slightly outdated. The page I'm following (https://docs.microsoft.com/en-us/virtualization/windowscontainers/kubernetes/getting-started-kubernetes-windows) indicates:

    > requires either Windows Server 2019 with KB4482887 installed or Windows Server vNext Insider Preview Build 18317+

    On the other hand I just noticed it also says:

    > requires Kubernetes v1.14 (or above) with WinOverlay feature gate enabled

    and Kubernetes v1.14 won't be released until March 25: https://github.com/kubernetes/sig-release/tree/master/releases/release-1.14



    Follow-up question: Can you confirm Flannel in host-gw mode works currently without insider preview build? This looks to indicate it'll work with regular Server 2019.
    • Edited by Ian.Fuller Friday, March 8, 2019 4:42 AM follow up question
    • Marked as answer by Ian.Fuller Friday, March 8, 2019 4:13 PM
    Friday, March 8, 2019 4:36 AM
  • Yes, I can confirm host-gw mode is working.


    Sic Parvis Magna

    Friday, March 8, 2019 7:05 AM
  • Hello,

    I've switched my master over to host-gw mode and have tried to re-initiate the WS2019 host but I'm still getting an error.

    Command used:

    .\start.ps1 -ManagementIP [IPAddrofWorkerNIC] -NetworkMode l2bridge

    Failed to find a suitable network adapter, check your network settings.
    At C:\k\helper.psm1:287 char:7

    It of course then goes on to repeat the previous error:

    E0306 16:43:21.218797 2576 main.go:289] Error registering network: Cannot find network with Management IP [IPAddrofWorkerNIC].

    I also notice that the net-conf.json is getting changed to a backend name of "vxlan0" and type of "vxlan" even though the file I provided was "cbr0"/"host-gw"

    In regards to the powershell error, I notice that it has deleted (and not recreated) the vEthernet NIC that it made when I was trying VXLAN. Therefore the helper module Get-MgmtSubnet command fails.


    • Edited by Ian.Fuller Friday, March 8, 2019 10:13 PM formatting
    Friday, March 8, 2019 10:11 PM
  • Okay I think I solved the problem. After running line by line through the powershell I saw there's a CleanupOldNetwork function but it calls with the $NetworkName for the current run. So the overlay network from before still was around, presumably consuming an endpoint needed for the new l2bridge network. I was able to confirm with Get-HnsNetwork | ft name, ID, type. I used Remove-HNSNetwork to get rid of the network with the overlay type and re-ran the install and it succeeded. Then I ran start.ps1 again and it ran normally. I was able to confirm with kubectl get nodes that the windows node was now in a ready state.

    • Edited by Ian.Fuller Monday, March 11, 2019 7:04 PM formatting
    Monday, March 11, 2019 7:03 PM
  • In case anyone else stumbles across this thread. I had a new problem where flanneld would complain about being unable to access vxlan0 as soon as I deployed a pod. I tried completely tearing down and rebuilding the cluster but that didn't work. The solution was to (in addition to tearing down the cluster) remove all files associated with flannel on both the master

    rm -rf /var/lib/cni/
    rm -rf /var/lib/kubelet/*
    rm -rf /run/flannel
    rm -rf /etc/cni/

    and the worker (easier to reset the whole kubernetes)

    rmdir c:\var

    rmdir c:\etc

    rmdir c:\run

    Get-HNSNetwork | Remove-HNSNetwork

    etc.

    For Greg, would recommend adjusting the CleanupOldNetwork function to clean up both types (overlay and host-gw) and not just the current called type of network. Would also strongly recommend an uninstall.ps1 or cleanup.ps1.


    • Edited by Ian.Fuller Wednesday, March 13, 2019 2:53 PM clarification
    Wednesday, March 13, 2019 2:51 PM
  • Hello Ian,

    I am using VirtualBox snapshot to revert some bad things. ;)

    I have shared my script in another post from you.

    If you want to reset your master, you should use kubeadm reset.

    https://v1-13.docs.kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-reset/

    Sic Parvis Magna

    Thursday, March 14, 2019 1:11 AM
  • Appreciate the reply. I had performed a kubeadm reset on the master, however that was not sufficient. I'm not certain if the files on the master or the worker were at fault but my guess would be the worker.
    Thursday, March 14, 2019 2:08 AM