The following forum(s) are migrating to a new home on Microsoft Q&A (Preview): Azure Service Fabric!

Ask new questions on Microsoft Q&A (Preview).
Interact with existing posts until December 13, 2019, after which content will be closed to all new and existing posts.

Learn More

Service Fabric HA (standalone) RRS feed

  • Question

  • Last week I've tried to upgrade one of the test, standalone SF clusters consisting of 3 nodes.

    It was at 6.5.639 (initial 6.5 release) and i tried to upgrade it to 6.5.664.

    The Copy-ServiceFabricClusterPackage failed and I decided to restart node 0.

    After restart the node 0 never went back to the cluster.

    As it is a test cluster I'm not going to spend too much time investigating the question why does it fail. I hoped I'd be able to add another node and then remove the faulty one. But adding a node to faulty cluster (2 nodes available, one down) failed too.

    I was able to successfully add node to 3 nodes cluster earlier (but the cluster was healthy that time)

    The question is - is adding a node to a cluster with less then 3 nodes supported?

    Is that the reason why production cluster must have at least 5 nodes?

    Tuesday, September 17, 2019 8:22 PM

All replies

  • Yes, you can add node to a cluster with less than 3 nodes depending on the kind of workloads. Please note that add/remove node functionality is not supported in local development clusters.

    Please refer to the similar stackoverflow post:

    Here is the document to add a new node to an existing standalone cluster

    Please try the steps in the document above and let me know if it resolves the issues. If not, I can dig deeper.

    Wednesday, September 18, 2019 11:53 PM
  • Any update on this issue?

    If the above helped please remember to "Up-vote" and "Mark as Answer" so others in the community can benefit. 

    Friday, September 20, 2019 11:35 PM
  • Sorry for the delay, but I need some time to confirm that I will be able to add a node to both, single node cluster and a cluster of 3 nodes with one node down (that's what I tried and it failed with a timeout exception, -  but maybe there were other problems)

    As I already wrote, I read an used the documentation for adding nodes, and successfully scaled out a cluster from 3 to 5 nodes.


    Monday, September 23, 2019 6:28 PM
  • Thanks for the update.
    Monday, September 23, 2019 8:28 PM
  • So far no good. I had successfully added a second node - i.e. the addnode.ps1 finished succesfully and showed two nodes. Also, connecting to explorer via web showed two healthy nodes.

    But trying to upgrade configuration resulted in an error:

    PS C:\tfs\DevOps\ServiceFabric\Configurations\Dev> Start-ServiceFabricClusterConfigurationUpgrade -ClusterConfigPath .\v-sfc01.json
    Start-ServiceFabricClusterConfigurationUpgrade : System.Runtime.InteropServices.COMException (-2147017627)
    ValidationException: Cluster size: 2 is not supported. Total node count needs to be either one node for testing/demo purpose or at least 3 and maximum 1000.

    Well ok, I added a third node. Again, addNode.ps1 finished succesfully, showing three nodes.

    I connected to the cluster using

    PS C:\tfs\DevOps\ServiceFabric\Configurations\Dev> Connect-ServiceFabricCluster -ConnectionEndpoint v-sfc01-n1:19000,v-sfc01-n0:19000,v-sfc01-n2:19000 -X509Credential -FindType FindByThumbprint -FindValue '09B21E35F6402C9A304DC2E72D3A3EA12F15CED5' -StoreLocation CurrentUser -StoreName My
    WARNING: Cluster connection with the same name already existed, the old connection will be deleted
    ConnectionEndpoint   : {v-sfc01-n1:19000, v-sfc01-n0:19000, v-sfc01-n2:19000}
    FabricClientSettings : {
                           ClientFriendlyName                   : PowerShell-da260741-a3f6-414b-8c59-2e139fcad769
                           PartitionLocationCacheLimit          : 100000
                           PartitionLocationCacheBucketCount    : 1024
                           ServiceChangePollInterval            : 00:02:00
                           ConnectionInitializationTimeout      : 00:00:02
                           KeepAliveInterval                    : 00:00:20
                           ConnectionIdleTimeout                : 00:00:00
                           HealthOperationTimeout               : 00:02:00
                           HealthReportSendInterval             : 00:00:00
                           HealthReportRetrySendInterval        : 00:00:30
                           NotificationGatewayConnectionTimeout : 00:00:30
                           NotificationCacheUpdateTimeout       : 00:00:30
                           AuthTokenBufferSize                  : 4096
    GatewayInformation   : {
                           NodeAddress                          : v-sfc01-n1.***.***.**:19000
                           NodeId                               : 912212e8a223b3a42c6390586d79ef1
                           NodeInstanceId                       : 132139151424364112
                           NodeName                             : v-sfc01-n1

    But when I tried to upgrade cluster configuration again

    PS C:\tfs\DevOps\ServiceFabric\Configurations\Dev> Start-ServiceFabricClusterConfigurationUpgrade -ClusterConfigPath .\v-sfc01.json
    Start-ServiceFabricClusterConfigurationUpgrade : Operation timed out.

    It died - the cluster does not exist anymore.

    Neither node is reachable, the event log is filled with:

    EventName: NodeAborted Category: StateTransition EventInstanceId {aa8af10f-a1d9-43a6-8bd7-971e127c9c01} NodeName v-sfc01-n0 Node has aborted with upgrade domain: UD0, fault domain: fd:/dc1/r0, address: v-sfc01-n0.***.***.**, hostname: V-SFC01-N0.***.***.**, isSeedNode: true, versionInstance: 6.5.664.9590:1, id: e5418986b2a628c9a12b37ae2ef0bb93, dca instance: 132139562972115602

    I'm not 100% sure I did all correct. And I already found once that case of hostname is important - if it was different on AddNode and manifest it did explode. So I'll make another experiment. Hold on

    Thursday, September 26, 2019 11:00 AM
  • Thanks for sharing the update. The answer would also depend on the kind of workload you are working on, for example, production and tests, stateful and stateless. Please refer to the capacity guidance provided here:

    For example, the doc says:

    Number of VM instances to run test workloads in Azure  You can specify a minimum primary node type size of 1 or 3. The one node cluster, runs with a special configuration and so, scale out of that cluster is not supported. The one node cluster, has no reliability and so in your Resource Manager template, you have to remove/not specify that configuration (not setting the configuration value is not enough). If you set up the one node cluster set up via portal, then the configuration is automatically taken care of. One and three node clusters are not supported for running production workloads.

    Please go through the doc link and see if you can find the one that applies to you.
    If this doesn't help, we can get you in touch with a support engineer. If you do not have the ability to create a support ticket, please share your subscription Id to and I will enable one time free support ticket for you.

    Thursday, September 26, 2019 6:30 PM
  • Any further questions on this? 

    If the proposed answer was useful please remember to "Up-Vote" and "Mark as Answer" to benefit the community. 

    Thursday, October 3, 2019 8:23 PM
  • Hi, after spending a week on sick leave, I'm back :-)

    I did another test. New single node SF cluster. Works.

    Added second node via AddNode.ps1 - it still works. Cluster shows 2 nodes.

    Added third node via AddNode.ps1 - it still works. Cluster sees 3 nodes. Moreover, getting cluster configuration via powershell results in config showing all 3 nodes, regardless which machine I connect to. As if upgrading configuration were not needed :-)

    Starting the cluster configuration upgrade again (as stated in documentation) killed the entire cluster again.

    There is one thing that I might have done differently - I might have added all nodes to the same upgrade domain - maybe that would help. 

    And I suspect, that the problem with the upgrade might be related to the fact, that after adding the two nodes, the cluster still had only one seed node. So to me it looks like the scenario is not supported.

    Simply following the documentation does not work.

    Saturday, October 5, 2019 8:41 PM