locked
Network Load Balancing vs. Hardware Load Balancing (CISCO Arrowpoint) RRS feed

  • Question

  •  

    Hi Have a query regarding using the Windows 2003 Network Load Balancing Service. My ISP provides Load Balancing functionality via there CISCO (Arrowpoint) Device. We were experiencing some problems with the Arrowpoint so during testing I installed the NLB service and got all my websites working to a tee I decided that this is perhaps the best way forward anyway as I had more control over adding and removing servers to the cluster, but none the less I decided to ask my ISP's advice as to which solution was the best. Their response was...

    "Microsoft's implementation of NLB is not a particularly network friendly application of load balancing. It operates by spoofing the source MAC address in ARP replies. This causes all switches to not be able to learn the originating port of traffic destined to that particular MAC address. Hence in a switched environment all traffic to the specific IP address will get copied to every port in that particular VLAN/Broadcast domain. This enables all of the Microsoft devices to see all of the traffic and then decide at a low level which particular sessions that it will process. It is essentially a way of limiting processor utilisation at the expense of loading the network layer.

    We have seen particular issues with one of our customers that use NLB for SQL traffic which cause effective broadcast storms by flooding all of the traffic to the cluster address across all ports in the VLANs. This may only be a 20MB transfer between two devices but as it floods to all ports in every switch with that VLAN configured it can cause issues elsewhere on the network. Careful consideration should be made with what ports in what switches and spanning of VLANs across multiple switches.
     
    We therefore as networking people don't particularly like Microsoft NLB, especially for high use, split site applications. Arrowpoint load balancing will not cause any port flooding and will maintain state between client and an individual server."
     
    I am a Web Developer so my experience with Networking and Load Balancing is very limited. My problem is I have a one sided view of Network Load Balancing from someone who may perhaps be quite biased due to the fact that they are a "Network Person" anyway if there is anyone who can give me some advice as to what may be the best option NLB or Arrowpoint, Software or Hardware. Below is a brief description of my environment.
     
    2 Web Servers (Windows 2003 Web Edition)
    1 DB Server (Windows 2003 Standard Edition, SQL Server 2000)
     
    5 websites, 4 with SSL
     
    ISSUES Experienced with Arrowpoint:
    Issue: Initially there was no URL checking in place so we couldnt tell if a website had been stopped in IIS, would then receive bad request error.
     
    Solution: Enabled URL Checking on Arrowpoint and solved problem
     
    Issue: HTTPS would only direct to one server constantly.
     
    Solution: Thought is was a Certificate issue but tested this. No solution was found, was told that everything had been configured correctly on Arrowpoint. Then installed NLB and received the same issues and fixed this but never returned to arrowpoint to check.
     
    At this point I felt more in control of my environment because I had full access to the load balancing functionality, at least I didnt have to contact the ISP support every time I needed to add or remove servers from the cluster or if I needed any configuration change.
     
    Thanks in Advance
    Thursday, February 1, 2007 10:26 AM

All replies

  • "Issue: HTTPS would only direct to one server constantly."

    Don't know the solution as I'm no CISCO expert but heres some questions that may help

    • If you access the servers directly (i.e. without going through the ArrowPoint) do they both work correctly with SSL or non-SSL?  This should tell you if they are correctly configured.
    • Have you tried switching off SSL and see what happens?
    • When you say they only go to one server, is that from the web server log files?  If so do you see the Keep Alive page being callled also?  Do you have a keep alive set up for the SSL port?
    • Are you configured to use sticky sessions controlled by the Arrowpoint?  Its possible that this is symptom of that if the system does not think the other server is fully alive.

    A work around may be if the ISP are using a CISCO 6503, they usually have and SSL termination device installed so you could SSL terminate prior to hitting the load balancer.  This would also reduce the load on the web servers as processing SSL traffic is very processor intensive.

    Thursday, February 1, 2007 2:06 PM
  • Hi Ben

    Thanks for the reply

    • Q) If you If you access the servers directly (i.e. without going through the ArrowPoint) do they both work correctly with SSL or non-SSL?  This should tell you if they are correctly configured. access the servers directly (i.e. without going through the ArrowPoint) do they both work correctly with SSL or non-SSL?  This should tell you if they are correctly configured.
    • A) Yes I have access the server individually over the VIPs and the Physical IPs SSL worked fine.

     

    • Q) Have you tried switching off SSL and see what happens?
    • A) If I remember correctly I did and it worked fine

     

    • Q) When you say they only go to one server, is that from the web server log files?  If so do you see the Keep Alive page being callled also?  Do you have a keep alive set up for the SSL port?
    • A) I tested my Web App using Microsofts ACT to produce a simultanoes browser connections, the response from the server was logged and its was constantly returning the response from the same server. My thought was the session persisted the connection to the same server, so I took the server "down" and restested. I then received a 404 error from the server(the one that wasnt taken down) indicating that the server could not fulfill the request. The same test was done directly to the server as explained previously and it fulfilled the request.

     

    • Q) Are you configured to use sticky sessions controlled by the Arrowpoint?  Its possible that this is symptom of that if the system does not think the other server is fully alive.
    • A) Dont know but I assume so, I would have to confirm with the ISP. In NLB I have set the affinity to Single for SSL requests.

    I guess all I really need is an informed response as to why using NLB in Windows 2003 in a live multi site environment would be a disasterous option or not especially when I have the option to use a hardware load balancer. The only Pros that I can think of for using NLB are as follows:

    • Full Control over nodes in the cluster
    • In my testing adding and removing nodes does not interrupt service at all (Ok Arrowpoint does the same but I need to contact support to take the specific server out of the configuration)

    The big question is, Is Microsoft's Windows 2003 Network Load Balancing going to be able handle the load in OUR live environment approx 5000 requests per hour.

    I haven't been able to find any documentation on the net that does a comparison of load balancers that include the Microsoft NLB perhaps that is a bad sign.


     

     

     

       

       

      Thursday, February 1, 2007 3:00 PM
    • Less than 2 request per second (5000 per hour) is not a problem for the Windows NLB - the question is how time does your application add

      In any event the best bet would be to create a staging environment (production envrionment replica) and perform load tests there (using LoadRunner or similar tools) You can also use sniffers and other network monitors to see how the network behaves

       

      Arnon

      Thursday, February 1, 2007 5:29 PM
    • Whenever you implement Load balancing - the first question you should ask is: Can the Load balancer stop sending traffic to this server if the application on this is down.

      For instance - you are load balancing the web servers - just ask the question - If you use NLB - can it stop sending traffic to a web server whose App pool goes down or which has lost DB connectivity or which is out of memory?

      Then decide which works for you. If both methods work ( and I dont believe NLB will) - then you should dig deep into pros and cons.

      Friday, February 2, 2007 4:25 AM
    • For fail-over in NLB one would need to write a script or use a remote tool to manage cluster membership.  For instance, we have seen folks write scripts that request pages and based on bad responses, will pull a server out of the cluster.  This is risky where you don't have a watermark to stop all servers from being pulled, but most hardware load balancers also don't have this feature (not sure about the Arrowpoint).  If you don't have time to write a little tool to automatically pull the server from the cluster, I would recommend working on the issues around the Arrowpoint with your provider.  Might be the best route either way, but if you can't automate fail-over, the customer experience is going to be poor at some point so, best to avoid that.

      The other thing to consider if you do go with NLB is that micro-segmentation of the VLAN is a good idea.  We have seen what your provider is describing mostly when we have NLB and non-NLB servers on the same VLAN trying to do different work (not web frontend communicating with SQL backend).  If you segment the VLANs, the broadcast traffic won't cross the boundaries.

      In terms of load, NLB will handle nearly as much as a good hardware load balancer for normal load balancing.  We don't have a public paper on it, but we have tested an 8 server cluster handling 2500+ RPS against mostly ASP.NET pages on each server.  Obviously app design and backend SQL/DB considerations come into play, but the point is NLB will scale for normal load balancing as long as you micro-segment the VLANs.

      Friday, February 2, 2007 8:25 AM
    • Hi Arnon

      In the first post I detailed how my environment is set out 2 web servers, 1 DB servers this is my testing environment. I dont think I would have a job if was trying to make these types of decision while setting up my live environment. Once I have done enough testing, then some more testing, some research, then some testing I may possibly then be ready to implement my live environment.

      I am currently using Microsofts ACT for load testing my .NET applications.

      Thanks for the heads up on the network sniffers I will give that a try.

      R

       

       

       

      Friday, February 2, 2007 9:45 AM
    • Hi Paul

      Thanks for the most comprehensive reply out of all of them and something that may help me, As far as switch flooding is concerned the research that I have done suggests that you can limit flooding in a multicast enabled cluster by enabling IGMP, Do you think will possibly provide me with the same sort of "protection" as the "micro-segment" option?

      Regarding the failover, in my limited experience with the Arrowpoint (bearing in mind I have never in seen the device as its in my ISPs hosting center) pulling the server out of the cluster does not appear to be part of its design as part of the failover. I was under the impression that NLB in Windows 2003 has a "callback" feature that periodically checks the servers in the cluster if one is found not to respond then it is removed from the cluster and convergance is executed (for want of a better word) on all the other servers. I will definitely check that in my test analysis.

      As far load is concerned all our sites are  .NET 2.0, Sql backend with some process intensive pages however I cannot forsee our load increasing to 2500+ RPS at least within the next year or so.

      Thanks

       

      Friday, February 2, 2007 10:23 AM
    • Hi Pranshu

      I think NLB was designed with that type of functionality in mind. Well at least it seemed like it when I load tested it.

      I ran a test against my web app, .NET app using sql session state management over SSL. I stopped one web app in IIS and there was no break in Service I restarted the app in IIS and stop the other one there was no break in Service. The Cisco router on the other hand has to have URL checking enabled and a page in each webdirectory that it needs to access to see if the web app is actually running in my testing I received several "Bad Responses" when performing the same test against Arrowpoint Load Balancing.

      Friday, February 2, 2007 10:26 AM
    • IGMP has caused issues on our networks in the past so I'd have to suggest you avoid the setting unless you have time to do some solid testing and implement very good monitoring before going that route.  As well, the normal setup would be to have a small dedicated VLAN facing the Internet where one NIC on each web server is plugged in and that VLAN would not be shared with other web servers unless they are yours as well.  Then there is another small VLAN used for the web to SQL connections.  If that is the setup, you have microsegmentation already.  On the other hand, if you have something like one VLAN with all web and SQL connections happening on that VLAN, you'll have to be more concerned about NLB broadcasts stomping on the web to SQL connections under very high load.  On a 100Mbps Fast Ethernet network this could be as low as 20Mbps of traffic (that is still a lot of web traffic).

      Most site admins don't need to be concerned with this, but do keep in mind that NLB won't stand up well to SYN flooding.  Your ISP may have other mitigations in place such as Cisco Guards, but hardware load balancers provide some mitigation against this as well so something to consider.

      Paul

       

      Monday, February 5, 2007 3:11 AM
    • I reply to your question (from a network technician who is trying to find out more about NLB)

      You need to understand the differences. Cisco Arrowpoint Loadbalancing is load balanced in Hardware (also available with SSL termination) were the NLB load balancing is done in software. Therefore using Cisco Arropoint will remove the load off the webservers.

      If your web pages are retuning fast enough then do you need hardware load balancing and ssl termination?

      If not, normally the reason a hardware load balancer is put in place, I would advise using SSL stickyness (which keeps the the ssl session going to the same server). In your instance I would check with the ISP if he is seeing sessions being shared across both servers as it is a simple configuration change to the load balancing method to share servers.

      To take control of taking your servers out of the load balancer get your ISP to call a page and check for a specific word ie call default.html and find "server UP", you then create this page on your webserver, if you want to take it down you just change the page wording, if you are an apps guy you can get clever and get your app to change the page if your DB serbver goes down etc.

       

      Tuesday, March 13, 2007 9:44 PM