none
Windows Server 2019 SDN - NetworkController upgrade failure

    Question

  • Hello, 

    We are using 2019 SDN and the NetworkController is failing its upgrade. This is what we see in Microsoft-Service Fabric/Operational log:

    Log Name:      Microsoft-ServiceFabric/Operational
    Source:        Microsoft-ServiceFabric
    Date:          5/10/2019 4:20:18 PM
    Event ID:      29621
    Task Category: CM
    Level:         Information
    Keywords:      Default
    User:          NETWORK SERVICE
    Computer:      NCE-NCVM01.dswe.local
    Description:
    Application upgrade started: Application = fabric:/NetworkController, Application Type = NetworkController, Target Application Type Version = 12.0.6.0, Upgrade Type = Rolling, Rolling Upgrade Mode = Monitored, Failure Action = Rollback
    Event Xml:
    <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
      <System>
        <Provider Name="Microsoft-ServiceFabric" Guid="{cbd93bc2-71e5-4566-b3a7-595d8eeca6e8}" />
        <EventID>29621</EventID>
        <Version>0</Version>
        <Level>4</Level>
        <Task>115</Task>
        <Opcode>0</Opcode>
        <Keywords>0x4000000000000001</Keywords>
        <TimeCreated SystemTime="2019-05-10T13:20:18.896735300Z" />
        <EventRecordID>5192</EventRecordID>
        <Correlation />
        <Execution ProcessID="768" ThreadID="5396" />
        <Channel>Microsoft-ServiceFabric/Operational</Channel>
        <Computer>NCE-NCVM01.dswe.local</Computer>
        <Security UserID="S-1-5-20" />
      </System>
      <EventData>
        <Data Name="applicationName">fabric:/NetworkController</Data>
        <Data Name="applicationTypeName">NetworkController</Data>
        <Data Name="applicationTypeVersion">12.0.6.0</Data>
        <Data Name="upgradeType">1</Data>
        <Data Name="rollingUpgradeMode">3</Data>
        <Data Name="failureAction">1</Data>
      </EventData>
    </Event>

    Log Name:      Microsoft-ServiceFabric/Operational
    Source:        Microsoft-ServiceFabric
    Date:          5/10/2019 4:30:23 PM
    Event ID:      29623
    Task Category: CM
    Level:         Information
    Keywords:      Default
    User:          NETWORK SERVICE
    Computer:      NCE-NCVM01.dswe.local
    Description:
    Application rollback start: Application = fabric:/NetworkController, Application Type = NetworkController, Target Application Type Version = 12.0.2.1, Failure Reason = UpgradeDomainTimeout, Overall Elapsed Time = 600085ms
    Event Xml:
    <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
      <System>
        <Provider Name="Microsoft-ServiceFabric" Guid="{cbd93bc2-71e5-4566-b3a7-595d8eeca6e8}" />
        <EventID>29623</EventID>
        <Version>0</Version>
        <Level>4</Level>
        <Task>115</Task>
        <Opcode>0</Opcode>
        <Keywords>0x4000000000000001</Keywords>
        <TimeCreated SystemTime="2019-05-10T13:30:23.229147400Z" />
        <EventRecordID>5193</EventRecordID>
        <Correlation />
        <Execution ProcessID="768" ThreadID="9544" />
        <Channel>Microsoft-ServiceFabric/Operational</Channel>
        <Computer>NCE-NCVM01.dswe.local</Computer>
        <Security UserID="S-1-5-20" />
      </System>
      <EventData>
        <Data Name="applicationName">fabric:/NetworkController</Data>
        <Data Name="applicationTypeName">NetworkController</Data>
        <Data Name="applicationTypeVersion">12.0.2.1</Data>
        <Data Name="failureReason">3</Data>
        <Data Name="overallUpgradeElapsedTime.timespan">600085</Data>
      </EventData>
    </Event>

    Log Name:      Microsoft-ServiceFabric/Operational
    Source:        Microsoft-ServiceFabric
    Date:          5/10/2019 4:51:23 PM
    Event ID:      29626
    Task Category: CM
    Level:         Information
    Keywords:      Default
    User:          NETWORK SERVICE
    Computer:      NCE-NCVM01.dswe.local
    Description:
    Application upgrade domain completed: Application = fabric:/NetworkController, Application Type = NetworkController, Target Application Type Version = 12.0.2.1, Upgrade State = RollingBack, Upgrade Domains = (NCE-NCVM01.dswe.local), Upgrade Domain Elapsed Time = 1204436ms
    Event Xml:
    <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
      <System>
        <Provider Name="Microsoft-ServiceFabric" Guid="{cbd93bc2-71e5-4566-b3a7-595d8eeca6e8}" />
        <EventID>29626</EventID>
        <Version>0</Version>
        <Level>4</Level>
        <Task>115</Task>
        <Opcode>0</Opcode>
        <Keywords>0x4000000000000001</Keywords>
        <TimeCreated SystemTime="2019-05-10T13:51:23.426171400Z" />
        <EventRecordID>5194</EventRecordID>
        <Correlation />
        <Execution ProcessID="768" ThreadID="9488" />
        <Channel>Microsoft-ServiceFabric/Operational</Channel>
        <Computer>NCE-NCVM01.dswe.local</Computer>
        <Security UserID="S-1-5-20" />
      </System>
      <EventData>
        <Data Name="applicationName">fabric:/NetworkController</Data>
        <Data Name="applicationTypeName">NetworkController</Data>
        <Data Name="applicationTypeVersion">12.0.2.1</Data>
        <Data Name="upgradeState">4</Data>
        <Data Name="upgradeDomains">(NCE-NCVM01.dswe.local)</Data>
        <Data Name="upgradeDomainElapsedTime.timespan">1204436</Data>
      </EventData>
    </Event>

    In Microsoft-Service Fabric/Admin log we have (lots of these) warnings:

    Log Name:      Microsoft-ServiceFabric/Admin
    Source:        Microsoft-ServiceFabric
    Date:          5/10/2019 4:20:14 PM
    Event ID:      55809
    Task Category: FileStoreService
    Level:         Warning
    Keywords:      Default
    User:          NETWORK SERVICE
    Computer:      NCE-NCVM01.dswe.local
    Description:
    The request failed due to FABRIC_E_FILE_NOT_FOUND. StoreRelativePath:Store\NetworkController\SDNSLBM.Code.12.0.6
    Event Xml:
    <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
      <System>
        <Provider Name="Microsoft-ServiceFabric" Guid="{cbd93bc2-71e5-4566-b3a7-595d8eeca6e8}" />
        <EventID>55809</EventID>
        <Version>0</Version>
        <Level>3</Level>
        <Task>218</Task>
        <Opcode>0</Opcode>
        <Keywords>0x8000000000000001</Keywords>
        <TimeCreated SystemTime="2019-05-10T13:20:14.112331300Z" />
        <EventRecordID>264055</EventRecordID>
        <Correlation />
        <Execution ProcessID="4232" ThreadID="1680" />
        <Channel>Microsoft-ServiceFabric/Admin</Channel>
        <Computer>NCE-NCVM01.dswe.local</Computer>
        <Security UserID="S-1-5-20" />
      </System>
      <EventData>
        <Data Name="id">[(00000000-0000-0000-0000-000000003000:131955729520045166)+d2d515f0-89da-4c80-8d4b-3d9aed5957bb:0]</Data>
        <Data Name="type">ProcessRequestAsyncOperation</Data>
        <Data Name="text">The request failed due to FABRIC_E_FILE_NOT_FOUND. StoreRelativePath:Store\NetworkController\SDNSLBM.Code.12.0.6</Data>
      </EventData>
    </Event>

    Log Name:      Microsoft-ServiceFabric/Admin
    Source:        Microsoft-ServiceFabric
    Date:          5/10/2019 4:30:18 PM
    Event ID:      29441
    Task Category: CM
    Level:         Warning
    Keywords:      Default
    User:          NETWORK SERVICE
    Computer:      NCE-NCVM01.dswe.local
    Description:
    [(00000000-0000-0000-0000-000000002000:131955729520045166)+826bd87a-0708-4aa1-8780-3bda3401aea1:0] monitored upgrade timed out (Rollback): persisted[overall=10:00.081 UD=10:00.081 health=00.000] stopwatch[upgrade=00.003 health=00.000] timeouts[overall=1:00:00.000 ud=10:00.000]
    Event Xml:
    <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
      <System>
        <Provider Name="Microsoft-ServiceFabric" Guid="{cbd93bc2-71e5-4566-b3a7-595d8eeca6e8}" />
        <EventID>29441</EventID>
        <Version>0</Version>
        <Level>3</Level>
        <Task>115</Task>
        <Opcode>0</Opcode>
        <Keywords>0x8000000000000001</Keywords>
        <TimeCreated SystemTime="2019-05-10T13:30:18.979737900Z" />
        <EventRecordID>264061</EventRecordID>
        <Correlation />
        <Execution ProcessID="768" ThreadID="9544" />
        <Channel>Microsoft-ServiceFabric/Admin</Channel>
        <Computer>NCE-NCVM01.dswe.local</Computer>
        <Security UserID="S-1-5-20" />
      </System>
      <EventData>
        <Data Name="id">
        </Data>
        <Data Name="type">ApplicationUpgradeContext</Data>
        <Data Name="text">[(00000000-0000-0000-0000-000000002000:131955729520045166)+826bd87a-0708-4aa1-8780-3bda3401aea1:0] monitored upgrade timed out (Rollback): persisted[overall=10:00.081 UD=10:00.081 health=00.000] stopwatch[upgrade=00.003 health=00.000] timeouts[overall=1:00:00.000 ud=10:00.000]</Data>
      </EventData>
    </Event>


    The upgrade/rollback sequence starts again and the same entries repeat in the logs. The NetworkController is otherwise working, i.e. we have no trouble with the SDN so far, but due to it constantly trying to upgrade and failing we cannot perform some operations, like backing up the NC database. Can someone please help?


    Tuesday, May 14, 2019 2:33 PM

All replies

  • Can you elaborate on SDN? I am not familiar. 

    Also, are you hosting your cluster in Azure or on prem? 

    Wednesday, May 15, 2019 3:58 PM
    Moderator
  • Hello, 

    We are using Microsoft\s SDN on prem; hyper-v on Win 2019 server + VMM 2019; Network Controller role is installed on Win 2019 Core; all was installed as per hxxts://docs.microsoft.com/en-us/windows-server/networking/sdn/deploy/deploy-a-software-defined-network-infrastructure-using-scripts and hxxts://github.com/MicrosoftDocs/windowsserverdocs/tree/master/WindowsServerDocs/networking/sdn

    Friday, May 17, 2019 9:41 AM
  • If you are using Windows Server Core it does not appear that is a support OS for on premise Service Fabric Clusters which might be the issue. 

    https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-standalone-clusters-overview

    Friday, May 17, 2019 6:33 PM
    Moderator
  • Well, that's a good point, but we don't think the SDN people considered it. The Network Controller role for SND on prem deployments is supported on Server Core. To reiterate this is a brand new deployment that's working quite well, apart from the fact that the Network Controller app can't update itself. We can\t tell if the problem is with the application itself or the underlying SF cluster. Could someone at Microsoft please recognize who this problem should be addressed to and give further advice on what to do next?
    Monday, May 20, 2019 10:38 AM
  • I would lean more towards an issue with SDN but it's hard to say without really digging into the cluster. You likely would get the best results by opening a support ticket on this so we can collaborate with all the right teams to get it sorted out. 

    If you don't have the ability to open a technical support ticket, you can email me at AzCommunity@microsoft.com and provide me with your Azure SubscriptionID and link to this thread. I can then enable you for a free support request. 

    Monday, May 20, 2019 6:44 PM
    Moderator
  • Any update on this issue? 
    Friday, May 31, 2019 7:11 PM
    Moderator
  • Windows Server Core is the one without UI experience - I guess SF supports running on servers without UI.

    On the other hand the doc You mentioned does not list windows 2019 as a supported os, but it has been anounced here:

    Azure Service Fabric 6.4 Refresh Release

    It states:

    "We now support Windows Server 2019 and Windows Server 1809 for Azure and Standalone Service Fabric Clusters starting from 6.4.654.9590. "

    Wednesday, June 5, 2019 6:51 AM