none
Service Fabric Load Balancer not forwarding traffic correctly RRS feed

  • Question

  • I have a ASP.NET website that connects to a set of WCF services in a service fabric cluster behind an internal load balancer. The service connection strings in the website points to the address of the internal load balancer. There are three nodes in the cluster and three copies of backend services.

    When I manually restart one of the node, I find that the website failed to load correctly because the load balancer seems to be still forwarding requests to the service in the restarting node. Shouldn't the load balancer forward requests to the two other available services? Does anyone know whats going on here?

    Monday, October 7, 2019 11:25 PM

All replies

  • Depends how you have it setup. Can you share your configuration and load balancing rules? Do you have your health probes in place? Are you balancing to the cluster or to a specific service? 

    Also, if you just shut down a node, it can take a bit for the load balancer to notice it is down and start directing traffic differently. That has to do with how you have your health probes setup. 

    Will need some additional info on your current setup and how you have it configured. 

    Here are some useful links for balancing services in Service Fabric as well

    https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-resource-manager-balancing

    https://docs.microsoft.com/en-us/azure/service-fabric/create-load-balancer-rule

    https://stackoverflow.com/questions/31598366/can-i-use-load-balancer-for-scaling-azure-service-fabric-apps

    Monday, October 7, 2019 11:58 PM
    Moderator
  • Hi Micah,

    Thanks for your reply. I will have a look at the links. I have health probes in place. I think I am balancing to the cluster. I haven't setup any service specific rules. I have only created rules in the load balancer, I have not defined any rules in cluster manifest. Below is the config of one of the rules. In one instance, one of the service on the node has Inbuilt status for a few hours, and the load balancer still forward to that node only:

    

    Tuesday, October 8, 2019 12:31 AM
  • To correctly setup the probe - do I need to have an application listening on the probe port and return a response? How do I tell the load balancer to forward the flow to a specific instance?
    Tuesday, October 8, 2019 10:19 AM
  • I wonder if the default tcp probe on a port works the same as the command:

    telnet <endpoint ip> <port>

    To determine if the endpoint can be connected.

     

    Tuesday, October 8, 2019 11:21 PM
  • I am spinning up a cluster for some testing to confirm expected behavior. Will update shortly. 
    Wednesday, October 9, 2019 6:07 PM
    Moderator
  • I tested this out by building my own Service Fabric Cluster.

     

    I deployed a 5 node cluster with an application running on port 8080. The port doesn't matter that is just what I picked.

     

    After deploying the cluster, all the nodes, networks and routes were created for me. I did not change any rules. Here are some details of my load balancing rules that were automatically created for me. 

    Load Balancer backend pools:

     

    Health Probes. You see I have opened port 80, 443 and 8080 when deploying my cluster so a health probe was created automatically for each

     

    You also have the probe for the client endpoint and the service fabric cluster manager endpoint

     

    Next you have the load balancing rules

    And lastly the inbound NAT rules

     

    Navigating to my load balancers public IP and the port of my application I get the following:

    So we can see it is working as expected. Next, I manually deallocated 2 of my VMSS nodes

     

    While deallocating and after the deallocation completed my application never lost connection as the app is replicated across all nodes and the load balancer automatically directed the traffic to an available node

     

    You can see I have errors in my Service Fabric Cluster manager as essentially I destroyed two nodes

     

     

     

     

    However the cluster remains functioning as I still have 3 nodes available.

     

    Once I start the 2 deallocated nodes, the cluster comes fully back online and healthy.

    Can you tell me more about how you went about deploying your cluster and your application? Did you use visual studio? PowerShell?

     

     

    Wednesday, October 9, 2019 7:33 PM
    Moderator
  • Hi Micah,

    Thank you very much! 

    I have about 30~40 WCF services wrapped inside fabric services in a three node test cluster. The instance count for each service is all set to -1. The cluster is built using a dev ops pipeline (Iac). The pipeline creates the resource group, certificates, and then cluster with load balancers using an ARM template. I have two load balancers, one public and another internal. Unfortunately our services are not true micro-services and there are a lot of inter-dependencies between services. The services need to call other services. I am using the internal load balancer for this, I think this might be a wrong approach. In each service's app.config, it could be something like below:

     <client>
          <endpoint name="Endpoint" address="net.tcp://10.5.0.250:7011" binding="netTcpBinding" bindingConfiguration="DefaultBinding" contract="my contract" behaviorConfiguration="DefaultBehavior" />
          <endpoint name="Endpoint" address="net.tcp://10.5.0.250:9011" binding="netTcpBinding" bindingConfiguration="DefaultBinding" contract="my contract" behaviorConfiguration="DefaultBehavior" />
          <endpoint name="Endpoint" address="net.tcp://10.5.0.250:8020" binding="netTcpBinding" bindingConfiguration="DefaultBinding" contract="my contract" behaviorConfiguration="DefaultBehavior" />
    </client>
    
          

    Where 10.5.0.250 is the internal load balancer address. I am hoping the internal load balancer is able to automatically forward the request to other working service instances in the cluster, but this doesn't seem to be working?

    I have created a TCP probe for each port, changed to loop back addresses so the traffic don't go through internal load balancer, only the services in the same node can be reached. Website still gets interrupted when I restart two nodes and leaving only one node running, I need to refresh the browser, but this seems to be better than using internal LB.

    If the LB has session persistence enabled, and health probe returns failure, does it still routes to the same backend node?

    Is the instance health state same as health state of the probe?

    Wednesday, October 9, 2019 10:28 PM
  • One other quick question, why are you using an internal LB for you services to communicate? As long as the nodes are all in the same virtual network the nodes can communicate with eachother. They don't need to go through a load balancer to reach each other. 
    Wednesday, October 9, 2019 10:52 PM
    Moderator
  • I think they still need to go through an internal load balancer, otherwise I'll need to hardcode the ip addresses of the node in the config file. The other option is to use loop back address, but all services need to be present on every node.

    I will try fabric's DNS service..

    Thursday, October 10, 2019 9:12 PM
  • I am pretty sure if you enable Reverse Proxy it takes care of the cross communication 

    https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-reverseproxy

    Thursday, October 10, 2019 9:18 PM
    Moderator
  • Have you had any luck with this? Using the Fabric DNS service? 
    Tuesday, October 15, 2019 5:15 PM
    Moderator
  • I couldn't get DNS service to work if I used 0.0.0.0:

    var host = "0.0.0.0";
    var endpointConfig = context.CodePackageActivationContext.GetEndpoint("ServiceEndpoint");
    int port = endpointConfig.Port;
    string schema = endpointConfig.Protocol.ToString();
    string uri = $"net.tcp://{host}:{port}";
    
    var listner = new WcfCommunicationListener<IMyInterface>(
                wcfServiceObject: _instance,
                serviceContext: context,
                listenerBinding: CreateTcpBinding(),
                address: new EndpointAddress(uri)
                );
    
    return listner;

    I have to use IP address for it to work. 

    Wednesday, October 16, 2019 4:52 AM
  • Do you happen to have an Azure Subscription? If so, you can email me at AzCommunity@microsoft.com and provide me with your SubscriptionID and link to this thread and I can enable you for a support ticket to work directly with the Service Fabric engineers to help get this all sorted out. 
    Wednesday, October 16, 2019 4:58 PM
    Moderator