locked
Strange Results From a Smoke Test RRS feed

  • Question

  • Hello Everyone,

    I have a little problem that is puzzling me, and I have turned to the superior minds here for help !

    I have only just started working with VSTS and am setting up a load test for a website being created by my company.

    One of the first things I wanted to do was to smoke test VSTS with a very short "mock-up" test against a very small "mock-up site". In particular I was interested to find out how many users I could simulate on a single machine before I maxed out either the processor or memory. 

    So........ I set up a load test starting at 10 users, and adding 10 users every 10 seconds all the way up to 2000 users. Thinking that a very long way before that point I would exceed the resources available on my laptop, and this would give me a rough idea of how many users I could relatively safely simulate before being required to buy the Agent licenses (and I wasn't hoping for more than a few dozen).

    To my surprise........ my machine just kept on racking up those test users until it had reached 2000. At no point did the CPU usage exceed 50% or my systems free memory drop below 30%.

    Immediately, I suspect something is awry.

    Looking further at the stats....... the average number of requests per second started at around 70 rps..... and continued at that pace, 70 rps, no matter how many users were added. Again I would have expected this figure to climb with the user count.

    Finally, I noticed the RPS was inverseley correlated with the average response time. That as the average response time went up the RPS went down, and vice versa.

    All of this, of course, suggests there is some kind of bottleneck that is only allowing approx. 70 RPS to be made (no matter the notional number of users) ....... but, I can't find a single counter concerning my machines performance that maxed out, and as I was testing through a LAN I had access at 100Mbps to the site, which should be more than enough for 70 RPS (unless each request returned 1.5Mbps which cannot be correct).

    The conmstant 70RPS value (although it does fluctuate a little) suggests the bottlenecks in the requests made...... but discounting the memory, disk usage, CPU and connection leaves what ?

    Does anyone have any idea where the bottleneck may be ?

    Yours,

    TGP

    Monday, January 22, 2007 4:05 PM

Answers

  • My guess is that you are running out of connections in the connection pool.  By default there are 50 connections available in the connection pool.  If only 50 requests can be in flight at a time, then the math almost works out (.7sec ave response time, 50 concurrent requests, 77 requests/sec).

    I would look the the "Ave time waiting for connection" counter to see if it is climbing as the user load ramps up.  Time waiting for a connection is not included in the response time for a request.  There is probably a threshold violation in your load test attempting to alert you that the connection pool is a bottleneck in your environment (by default there is a threshold rule associated with the "Ave time waiting for connection" counter what will trigger if the amount of time spent waiting for a connection is greater than 20% of the response time for the request).

    Could you check to see if this is happening?
    Thanks,
    Rick

    Tuesday, January 23, 2007 4:20 PM
    Moderator

All replies

  • I was wondering if you set up the counters for the web server itself?

    I know it sounds foolish and simple but the way the counters are displayed it sort of looks like you are monitoring the web server autmatically when that is not the case.

    Your own box may never max out while the web server could be taking a pounding.

    Anyway, just a thought.

     

    Monday, January 22, 2007 4:32 PM
  • TGP -

    I'd agree with your assessment of a bottlneck somewhere.  The inverse correlation of RPS to response time is expected since it would appear your server is unable to handle more than your given load of 70 RPS.  (In other words, the requests keep pouring in, but your server can only handle 70/sec, so those in the queue have to wait longer and longer the more load you add)

    Based on your wording above it seems like you're only keeping track of the performance counters on the client machine (your laptop).  However my guess is the bottleneck is on the server itself.  Have you tried adding CPU / IIS counters for the server machine?  If not, I'd start there.

    --Mike

    Monday, January 22, 2007 4:33 PM
    Moderator
  • Thanks for your answers guys.

    It is indeed true that I haven't yet set up any counters on the server. I suspect that will have to be my next step......... if only to eliminate this possiblity.

    However, this would raise another question...... I had assumed it could not be the server for the simple reason that  the"average response time" was still reasonable, and didn't increase with the number of users.

    Afterall, as I noted........ Eventually, I had 2000 users going, after adding them at 1 a second over 40 minutes........and I had 0 think time down.

    Surely, if the bottleneck was the server then the average response time should have been up in the minutes (by the end of the run) rather than (as it was) under a second, well in line with the response time at the start. The run gave(roughly) the same "average response time" with 2000 users as it did with 20........ if '000's of requests were going out only 70 RPS are coming back then the response time by the end of the run should have been huge, on average. It wasn't......... still just a few tenths of a second as at the start.

    Whatsmore, I would have expected my CPU time and Memory to still rise throughout the run as the costs of sending all those requests mounted (even if none came back) and these too remained constant-ish, just fluctuating around a figure.

    This suggested to me that the bottleneck was my end, in the no. of requests sent, if it were not there would have been a gradual progression in avergae response time as more users were added AND with my CPU and Memory usage mounting.

    Nevertheless, I think I will have to set up the counters on the server as this is the obvious next step in figuring out what is going on........... I just suspect the problem is at my PC "going out" rather than anywhere else "coming back"......... afterall, a single laptop should NOT be able to simulate the output 2000 users, whether the server returns those requests promptly or not.

    It's a shame we can't post images here or I would share my graph which illustrates this.

    Perhaps I should give you my min-max-avg figures so you can see how constant it all is.

    % Processor Time Min 11.6%, Max 26.5%, Avg.16.9%

    Available MB of Memory Min 557MB, Max 1234MB, Avg. 801MB

    Avg. Response Time Min 0.22sec, Max 1.82 sec, Avg 0.69sec

    Requests Per Sec.Min 29.8, Max 93.0, Avg 77.2

    User Load Min 20, Max 2,000, Avg 1200

    None of these graphs progress from their min to max as the users are added.

    They all start at a level and just fluctuate around that level whether there are 20 users or 2,000.

    It's very perplexing. I just can't work out what the limiting factor is........ even though I am sure it is on my laptop, rather than with the server.

    Still.........will check out that server and report back. Thanks for that suggestion.

    Yours,

    TGP

    Monday, January 22, 2007 5:34 PM
  • My guess is that you are running out of connections in the connection pool.  By default there are 50 connections available in the connection pool.  If only 50 requests can be in flight at a time, then the math almost works out (.7sec ave response time, 50 concurrent requests, 77 requests/sec).

    I would look the the "Ave time waiting for connection" counter to see if it is climbing as the user load ramps up.  Time waiting for a connection is not included in the response time for a request.  There is probably a threshold violation in your load test attempting to alert you that the connection pool is a bottleneck in your environment (by default there is a threshold rule associated with the "Ave time waiting for connection" counter what will trigger if the amount of time spent waiting for a connection is greater than 20% of the response time for the request).

    Could you check to see if this is happening?
    Thanks,
    Rick

    Tuesday, January 23, 2007 4:20 PM
    Moderator
  • There is some information regarding the configuration of the connection pool in this article.

    http://blogs.msdn.com/billbar/articles/517081.aspx

    See the section "Choose the Appropriate Connection Pool Model"

    Tuesday, January 23, 2007 4:24 PM
    Moderator
  • Aha !

    Thanks, Rick. Success of a sort.

    After raising the connection pool to 200 I am still not maxing out my CPU/Memory as I had hoped. But it now appears the bottleneck is the server and not my machine.

    That is......... requests per second rise (to slightly higher than before) then drop off.

    However, this time.........at the RPS plateau the "Average response time" rockets up through the 0.5's and 0.7's seen before to 3/4/5/6/7 seconds or more. Wheras before they both kept pace with each other.

    This is the kind of response I expected if the server (which is only a little virtual thing for this smoke test) is maxing out. It appears the bottleneck is now there, rather than on the client machine, which is an acceptable result.

    Before the number of users rocketed........ and everything just rumbled on at the same level...... 0.8 response time/60 odd-RPS. 

    Now, at about 70-80 users the server "loses it" and just can't keep with the pace and the "average response time" rapidly diverges upwards from the user load .

    Thanks everyone who answered this question. All your replies helped (if only in making me think twice) but Rick's proved to be the solution.

    It appears the bottleneck was the connection pool.

    Yours,

    TGP

    Tuesday, January 23, 2007 5:17 PM
  • Hi All,

    Almost forgot.........

    I also have to report that the "Average connection time" Rick suggested I monitor remained at 0 both through the original tests at the "pool of 50 connections" and through the "200 connection pool" test.

    Suggesting that, although connections seemed to be the problem, this counter wasn't indicating it as such for some reason.

    I have no idea why this was the case.... the problem was clearly the connection pool. The results now look much more like I would expect (given a server bottleneck, rather than for my originally hoped for laptop bottleneck). Any ideas anyone ?

    Yours,

    TGP

    Tuesday, January 23, 2007 5:24 PM