Performance Issue in a scalable environement while implementing Service Broker

Unanswered Performance Issue in a scalable environement while implementing Service Broker

  • Wednesday, December 12, 2012 5:20 PM
     
     

    We have been testing performance of our application using SQL Server 2008 R2. We have implemented Service Broker as part of the overall architecture of our solution. We are noticing several issues in our performed tests and hence I would like to get some guidance based on the detailing given by me below.

    Scenario: Our application server (independent machine) processes and sends data to the initiator server (SQL Server 2008 R2 independent machine). Initiator forwards them to the target server (SQL Server 2008 R2 independent machine) which in turn processes the requests and inserts the data into a physical table. We have also implemented of the suggestions in the article (http://msdn.microsoft.com/en-us/library/dd576261(v=sql.100).aspx) (including the 150 trick) since our application is a live stream processing application and these techniques made sense to us.

    The application server machine is not a part of the domain but in workgroup mode. The Initiator Server and Target Server are a part of the windows domain.

    Issues:  As an example with our testing our application server can collect and process about 6000 records a second without sending it to the initiator server. Our hope is to be able to be close to that number when inserting those records in the target server via the initiator server. But at this time the best we can do is about 2100 records per second inserted in the target database.

    What we do see is that the rate at which our application server processes data goes down significantly when it is sending data to the initiator server. Using profiler and other counters to monitor, we noticed that there is a huge drop in data transfer rate during the test period. The test when started from the application server shows a healthy data transfer rate (approx 175MB/sec) which eventually drops down to 5Mb/sec to 30MB/sec for upto 40 seconds, multiple times during the test which reduces the number of records that get inserted. The  SQL profiler shows the following issues mentioned below:

    Message: This message was dropped because it could not be dispatched on time. State:3 (during this error the data transfer rate drop is observed)

    Message: This message was dropped because it could not be dispatched on time. State:2 (during this error the data transfer rate drop is observed)

    Message: This message could not be delivered because it is a duplicate.

    We assumed the network connection between the initiator and target to be of some issue but the latency is always below 1ms between them.

    Can anyone provide insight into what may be causing this?

    We have followed a lot of the best practices like spreading tempdb, raising sql server priority, spreading log files etc. In our testing our disk sec/write and average queue lengths are in the manageable range suggesting that disk may not be the bottleneck and CPU resources are never fully utilized.


All Replies

  • Wednesday, December 12, 2012 6:18 PM
     
     

    I am in the process of reaserching something similar with Service Broker and am very concernedabout the performance. You say that disk IO is not and issue; but have you ruled out blocking caused by indexes and other requests waiting on others to complete? What about network bandwith between the SQL servers?

    Bill


    William F. Kinsley


    • Edited by WFKinsley Wednesday, December 12, 2012 6:20 PM
    •  
  • Wednesday, December 12, 2012 6:30 PM
     
     
    Bill -- the network latency between the sql servers is less than a millisecond. As far as the blocking, we are evaluating the wait stats collected from service broker to understand what resource is it waiting for primarily. Quite perplexing at the moment. We are also going to try this on SQL Server 2012 to see if there is any difference.
  • Wednesday, December 12, 2012 6:32 PM
     
     
    Surprisingly it does not matter if we user a server with double the CPU cores  as the initiator server. We end up seeing numbers in the same range.