locked
Cannot create cluster with X509 certificates (CommonNames, not thumbprints) RRS feed

  • Question

  • Last time I created an X509 secured cluster (on premises) I used Certificate Thumbprints in configuration (it was SF 6.1)

    This time, having v. 6.3 I wanted to try common names instead, but the creation of a cluster fails.

    The Test-Configuration runs fine. But running CreateServiceFabricCluster.ps1 results in an obscure error:

    System.InvalidOperationException: There is an error in XML document (136, 8). ---> System.Xml.Schema.XmlSchemaValidationException: The required attribute 'Value' is missing. 
       at System.Xml.Schema.XmlSchemaValidator.SendValidationEvent(XmlSchemaValidationException e, XmlSeverityType severity)                                                      
       at System.Xml.Schema.XmlSchemaValidator.SendValidationEvent(String code, String arg)                                                                                       
       at System.Xml.Schema.XmlSchemaValidator.CheckRequiredAttributes(SchemaElementDecl currentElementDecl)                                                                      
       at System.Xml.Schema.XmlSchemaValidator.ValidateEndOfAttributes(XmlSchemaInfo schemaInfo)                                                                                  
       at System.Xml.XsdValidatingReader.ProcessElementEvent()                                                                                                                    
       at System.Xml.XsdValidatingReader.Read()                                                                                                                                   
       at System.Xml.XmlReader.MoveToContent()                                                                                                                                    
       at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReader1.Read35_SettingsOverridesTypeSection(Boolean isNullable, Boolean checkType)                        
       at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReader1.Read37_ClusterManifestType(Boolean isNullable, Boolean checkType)                                 
       at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReader1.Read293_ClusterManifest()                                                                         

    The cluster configuration:

    {
        "name": "SampleCluster",
        "clusterConfigurationVersion": "1.0.0",
        "apiVersion": "10-2017",
        "nodes": [
            {
                "nodeName": "v-sfc06-n0",
                "iPAddress": "v-sfc06-n0.c.k.pl",
                "nodeTypeRef": "NodeType0",
                "faultDomain": "fd:/dc1/r0",
                "upgradeDomain": "UD0"
            },
            {
                "nodeName": "v-sfc06-n1",
                "iPAddress": "v-sfc06-n1.c.k.pl",
                "nodeTypeRef": "NodeType0",
                "faultDomain": "fd:/dc1/r1",
                "upgradeDomain": "UD1"
            },
            {
                "nodeName": "v-sfc06-n2",
                "iPAddress": "v-sfc06-n2.c.k.pl",
                "nodeTypeRef": "NodeType0",
                "faultDomain": "fd:/dc1/r2",
                "upgradeDomain": "UD2"
            }
        ],
        "properties": {
           "EnableTelemetry": false,
           "FabricClusterAutoupgradeEnabled": false,
           "diagnosticsStore":
            {
                "dataDeletionAgeInDays": "7",
                "storeType": "FileShare",
                "connectionstring": "e:\\SF\\DiagnosticsStore"
            },
            "security": {
                "ClusterCredentialType": "X509",
                "ServerCredentialType": "X509",
                "CertificateInformation": {
                    "ClusterCertificateCommonNames": {
                      "CommonNames": [
                        {
                            "CertificateCommonName": "V-SFC06"
                        }
                      ],
                      "X509StoreName": "My"
                    },
                    "ClusterCertificateIssuerStores": [
                        {
                             "IssuerCommonName": "A B Cert K",
                             "X509StoreNames" : "Root"
                        }
                    ],
                    "ServerCertificateCommonNames": {
                      "CommonNames": [
                        {
                            "CertificateCommonName": "V-SFC06-Server"
                        }
                      ],
                      "X509StoreName": "My"
                    },
                    "ServerCertificateIssuerStores": [
                        {
                            "IssuerCommonName": "A B Cert K",
                            "X509StoreNames" : "Root"
                        }
                    ],
                    "ReverseProxyCertificateCommonNames": {
                      "CommonNames": [
                          {
                            "CertificateCommonName": "V-SFC06-RevProxy"
                          }
                        ],
                        "X509StoreName": "My"
                    },
                    "ClientCertificateCommonNames": [
                        {
                            "CertificateCommonName": "V-SFC06-Admin",
                            "IsAdmin": true
                        },
                        {
                            "CertificateCommonName": "V-SFC06-Client",
                            "IsAdmin": false
                        }
                    ],
                    "ClientCertificateIssuerStores": [
                        {
                            "IssuerCommonName": "A B Cert K",
                            "X509StoreNames": "Root"
                        }
                    ]
                }
            },
            "nodeTypes": [
                {
                    "Name": "NodeType0",
                    "ClientConnectionEndpointPort": "19000",
                    "HttpGatewayEndpointPort": "19080",
                    "LeaseDriverEndpointPort": "19002",
                    "ClusterConnectionEndpointPort": "19001",
                    "ServiceConnectionEndpointPort": "19003",
                    "ApplicationPorts": {
                      "StartPort": "50000",
                      "EndPort": "51000"
                    },
                    "IsPrimary": true
                }
            ],
            "fabricSettings": [
                {
                    "name": "Setup",
                    "parameters": [
                        {
                            "name": "FabricDataRoot",
                            "value": "E:\\SF"
                        },
                        {
                            "name": "FabricLogRoot",
                            "value": "E:\\SF\\Log"
                        }
                    ]
                }
            ]
        }
    }

    The configuration is based on the provided sample... Whats wrong with it?

    Thursday, August 2, 2018 1:27 PM

Answers

  • I reached out to some of the product group members. 

    Here is the doc for standalone on premise: https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-windows-cluster-x509-security

    To mitigate, the parameter CertificateIssuerThumbprint inside each ClientCertificateCommonNames should be specified:

    "ClientCertificateCommonNames": [

                        {

                            "CertificateCommonName": "V-SFC06-Admin",

             "CertificateIssuerThumbprint": "[Thumbprint]",

                            "IsAdmin": true

                        },

                        {

                            "CertificateCommonName": "V-SFC06-Client",

             "CertificateIssuerThumbprint": "[Thumbprint]",

                            "IsAdmin": false

                        }

                    ]

    What you were seeing could have been related to a bug but it should have been fixed in 6.3. I am confirming this offline as well. 

    Tuesday, August 14, 2018 8:36 PM

All replies

  • I have successfully changed the server certificates of a local, 3 node cluster by executing 

    Start-ServiceFabricClusterConfigurationUpgrade -ClusterConfigPath C:\Temp\sfc-05\sf.newCerts.p1.json

    I needed to do it in two steps, one upgrade for the primary, and one for the secondary certificate. 

    But I cannot change the cluster certificates this way - whenever I try to start configuration upgrade with the primary or secondary cluster certificate changed I get the validation error, that says, I cannot upgrade 2 certificates at once...

    Am I missing something? Or should it work?

    The error is:

    VERBOSE: System.Fabric.FabricException: System.Runtime.InteropServices.COMException (-2147017627)
    ValidationException: Upgrading from 2 different certificates to 2 different certificates is not allowed. ---> System.Runtime.InteropServices.COMException: Exception from HRESULT: 0x80071C65
       at System.Fabric.Interop.NativeClient.IFabricClusterManagementClient10.EndUpgradeConfiguration(IFabricAsyncOperationContext context)
       at System.Fabric.Interop.Utility.<>c__DisplayClass22_0.<WrapNativeAsyncInvoke>b__0(IFabricAsyncOperationContext context)
       at System.Fabric.Interop.AsyncCallOutAdapter2`1.Finish(IFabricAsyncOperationContext context, Boolean expectedCompletedSynchronously)
       --- End of inner exception stack trace ---
    Start-ServiceFabricClusterConfigurationUpgrade : System.Runtime.InteropServices.COMException (-2147017627)
    ValidationException: Upgrading from 2 different certificates to 2 different certificates is not allowed.
    At line:1 char:1
    + Start-ServiceFabricClusterConfigurationUpgrade -ClusterConfigPath C:\ ...
    + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        + CategoryInfo          : InvalidOperation: (Microsoft.Servi...usterConnection:ClusterConnection) [Start-ServiceFa...gurationUpgrade], FabricException
        + FullyQualifiedErrorId : StartClusterConfigurationUpgradeErrorId,Microsoft.ServiceFabric.Powershell.StartClusterConfigurationUpgrade

    Thursday, August 2, 2018 12:52 PM
  • Have you attempted to run the cleanup service fabric script to remove any old instances of SF before deploying again? 
    Thursday, August 2, 2018 7:49 PM
  • Well, the VM's were new, there was no previous SF installed.

    But I run CleanFabric.ps1 for sure, it stated:

    Registry key is already removed.
    FabricHostSvc is already removed.
    FabricInstallerSvc is already removed.
    FabricRoot autodetermined at 'C:\Program Files\Microsoft Service Fabric'.
    FabricCodePath not present.
    Fabric is already uninstalled.

    I also found the XML document being validated (in temp dir), it's called SampleCluster.1.0.0.ClusterManifest.xml and starting from line 134 it looks like:

        <Section Name="Security/AdminClientX509Names">
          <Parameter Name="V-SFC06" Value="" />
          <Parameter Name="V-SFC06-Admin" />
        </Section>
        <Section Name="Security/ClientCertificateIssuerStores">
          <Parameter Name="Glowny Urzad Certyfikacji KG" Value="Root" />
        </Section>
        <Section Name="Security/ClientX509Names">
          <Parameter Name="V-SFC06" Value="" />
          <Parameter Name="V-SFC06-Admin" />
          <Parameter Name="V-SFC06-Client" />
        </Section>
        <Section Name="Security/ClusterCertificateIssuerStores">
          <Parameter Name="Glowny Urzad Certyfikacji KG" Value="Root" />
        </Section>
        <Section Name="Security/ClusterX509Names">
          <Parameter Name="V-SFC06" Value="" />
        </Section>
        <Section Name="Security/ServerCertificateIssuerStores">
          <Parameter Name="Glowny Urzad Certyfikacji KG" Value="Root" />
        </Section>
        <Section Name="Security/ServerX509Names">
          <Parameter Name="V-SFC06-Server" Value="" />
        </Section>

    The Value attribute appears to be missing in Parameter sections containing Admin and Client.

    So I tried to add  X509StoreName to ClientCertificates section like that:

                    "ClientCertificateCommonNames": [
                        {
                            "CertificateCommonName": "V-SFC06-Admin",
                            "IsAdmin": true
                        },
                        {
                            "CertificateCommonName": "V-SFC06-Client",
                            "IsAdmin": false
                        }
                    ],
                    "X509StoreName": "My",
                    "ClientCertificateIssuerStores": [
                        {
                            "IssuerCommonName": "Glowny Urzad Certyfikacji KG",
                            "X509StoreNames": "Root"
                        }
                    ]

    but it did not help - the generated XML manifest still has no Value attributes next to client certificates...

    Friday, August 3, 2018 7:46 AM
  • Is it possible to change ClusterCredentialType of a running cluster?

    Is it possible to do it through Start-ServiceFabricClusterConfigurationUpgrade?

    I have changed the cluster configuration from:

          "ClusterCredentialType": "X509",
          "ServerCredentialType": "X509",
          "CertificateInformation": {
            "ClusterCertificate": {
              "Thumbprint": "550C73BAD64E6A29FE5E9DABE99120D2FB51249E",
              "ThumbprintSecondary": "E222114CD4548394A58B6743CAC6AD4D54461C9E",
              "X509StoreName": "My"
            },
            "ServerCertificate": {
              "Thumbprint": "550C73BAD64E6A29FE5E9DABE99120D2FB51249E",
              "ThumbprintSecondary": "5BF4CE3DDF416C381D2EC560750466DF77DC9F77",
              "X509StoreName": "My"
            },

    to:

        "Security": {
          "ClusterCredentialType": "Windows",
          "WindowsIdentities": {
            "ClusterIdentity": "domain\\sfc05-nodes"
          },
          "ServerCredentialType": "X509",
          "CertificateInformation": {
            "ServerCertificate": {
              "Thumbprint": "550C73BAD64E6A29FE5E9DABE99120D2FB51249E",
              "ThumbprintSecondary": "5BF4CE3DDF416C381D2EC560750466DF77DC9F77",
              "X509StoreName": "My"
            },

    but when I start the upgrade, the Start-ServiceFabricClusterConfigurationUpgrade silently finishes...

    Get-ServiceFabricClusterConfigurationUpgradeStatus returns previous upgrade status...





    Friday, August 3, 2018 8:34 PM
  • Hi Sabastian, I saw you had multiple threads open regarding this similar issue. 

    What is the current status of the problem? Where do we need to start first to help get this resolved for you?

    Tuesday, August 7, 2018 6:59 PM
  • Nothing new - I had no time for SF last week. 

    The problem is not urgent - we're preparing to go for production with SF and test possible scenarios on a test cluster. The first question (changing cluster certificate) seems most importand, second would be setting up clusters with certificates' common names. 


    Monday, August 13, 2018 6:36 AM
  • Thanks for the update Sebastian. 

    Have you review of documentation on managing certs in Service Fabric? 

    https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-security-update-certs-azure

    As well as our doc on changing certs to common names

    https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-change-cert-thumbprint-to-cn

    Monday, August 13, 2018 7:15 PM
  • Micah,

    the documentation describes managing ServiceFabric clusters hosted on azure. 

    I might have failed stating it clear enough, but we work with SF on premises, stand alone local clusters on VMs with Server 2016:

    "Last time I created an X509 secured cluster (on premises) I used ..." or "I have successfully changed the server certificates of a local, 3 node cluster by ..."

    I cannot use neither the azure portal or Resouse Manager powershell commands...

    Tuesday, August 14, 2018 6:37 AM
  • Got it. I am working offline with the Service Fabric product team to get this sorted out. Will update you once I hear back with more. 
    Tuesday, August 14, 2018 7:37 PM
  • I reached out to some of the product group members. 

    Here is the doc for standalone on premise: https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-windows-cluster-x509-security

    To mitigate, the parameter CertificateIssuerThumbprint inside each ClientCertificateCommonNames should be specified:

    "ClientCertificateCommonNames": [

                        {

                            "CertificateCommonName": "V-SFC06-Admin",

             "CertificateIssuerThumbprint": "[Thumbprint]",

                            "IsAdmin": true

                        },

                        {

                            "CertificateCommonName": "V-SFC06-Client",

             "CertificateIssuerThumbprint": "[Thumbprint]",

                            "IsAdmin": false

                        }

                    ]

    What you were seeing could have been related to a bug but it should have been fixed in 6.3. I am confirming this offline as well. 

    Tuesday, August 14, 2018 8:36 PM
  • I was also informed that a workaround could be to pass an empty string for the thumprint
    Wednesday, August 15, 2018 10:44 PM
  • Thanks, adding empty certificate issuer's thumbprints helped - in a way. :-)

    I no longer get the validation error and the cluster is being "successfully" created - but it hardly works.

    I can connect to it either through PowerShell or management portal (https://...:19800/), but then I only get errors:

    <sfx-pager class="ng-isolate-scope" list="sortedFilteredList" list-settings="listSettings"></sfx-pager>
    Error: Get cluster health failed. Code: FABRIC_E_SERVER_AUTHENTICATION_FAILED Message: FABRIC_E_SERVER_AUTHENTICATION_FAILED: CertificateNotMatched
    Error: Get applications failed. Code: FABRIC_E_SERVER_AUTHENTICATION_FAILED Message: FABRIC_E_SERVER_AUTHENTICATION_FAILED: CertificateNotMatched
    Error: Get application health failed. Code: FABRIC_E_SERVER_AUTHENTICATION_FAILED Message: FABRIC_E_SERVER_AUTHENTICATION_FAILED: CertificateNotMatched
    Error: Get nodes failed. Code: FABRIC_E_SERVER_AUTHENTICATION_FAILED Message: FABRIC_E_SERVER_AUTHENTICATION_FAILED: CertificateNotMatched

    I'm investigating the issues. Shall I continue with this thread or mark it as answered and start a new one?

    Any news on the problem with changing cluster certificate through configuration upgrade?

    Sebastian


    Wednesday, August 22, 2018 3:03 PM
  • Hi Sebastian. I was unable to get any additional information other than the doc I shared on changing cluster certs on prem. 

    If needed we can get you in touch with a support engineer to work over a phone and screen share to help get this settled for good. 

    Thursday, August 23, 2018 9:30 PM