none
Intermittent slow performance on Web Role

    Question

  • I need some advice on where or what to look for in regards to a performance issue I am having in an Azure Cloud Web Role. The issue is not limited to a single endpoint but across the application. In application insights it is as if the request is slow to process. The SQL dependencies might be logged as a total of 100ms in one run with the same type of activity and take 2 seconds to complete while another run might have 150ms and take 30 seconds to run.

    The following shows a distribution of one of the queries:

    Each request is very similar (same steps taken) so the slowdown is most likely not directly related to the content.

    The following trace shows that after ~127ms of dependency call the endpoint is already around 5 seconds where normally it would still be less than 200ms.

    I have looked at the number of requests, CPU and memory on the instance that is running and do not see a bottleneck (in application insights). I checked the db also and the dtu % does not indicate the database is struggling. 

    I also ruled out a lock in the app as the requests range from queries to updates.

    What else should I look for? Unfortunately this is a 4.5 .net framework app so some tracing is not available but I do have access to the web roles so I could enable additional logging and/or review traces.

    I am just a bit unsure what could be causing this. It almost seems like a general application process like GC but I would have expected a CPU spike if this was the case... but I am 100% sure how to identify if it is GC.

    Any help is appreciated.


    Cheers, Jeff

    Tuesday, April 23, 2019 4:45 AM

All replies

  • Is there any consistency around the slow downs? 

    Hard to say what exactly the issue could be. Personally, I think it would be best to open a technical support ticket so our engineers can take a look at the backend and see what might be causing the issue. Do you happen to have the ability to open a technical support ticket? If not, you can email me at AzCommunity@microsoft.com and provide me with your SubscriptionID and link to this issue. I can enable your subscription for that request. 

    Tuesday, April 23, 2019 10:07 PM
    Moderator
  • Thanks Michah. I will open a support ticket.

    I have been trying to deduce a pattern in regards to application start (due to scaling) or other activities but I have not been able to spot it.


    Cheers, Jeff

    Tuesday, April 23, 2019 10:10 PM
  • Yeah it is strange you are not able to spot anything. Generally there would be some indication that would be simple to see. If you open that ticket, please share the ticket number with me. I am happy to follow and help where I can. 
    Tuesday, April 23, 2019 10:12 PM
    Moderator
  • Hello Micah, support request id: 119042425000431.


    Cheers, Jeff

    Wednesday, April 24, 2019 5:28 AM
  • Just curious, how the app tier setup overall. Like app is using PAAS, DB and is also using PAAS. or App is using PAAS and DB is on IAAS, however with comments of yours like dtu%, seems, both of the app components like front and backend tiers are on PAAS.

    Check tier location of app and db tier. Also, how these are connecting and can be these be optmized in the sense, closure, in terms of location, DC etc..you might need to check with MS on it.

    Also, DB is on PAAS, can you put a job to store system process on how execution and lock are happening along with memory and cpu usage patterns or better be a trace if support is there. Do you find some specific trend.

    Does app services need any optmization too like cache etc.

    Are these systems using domain and how long that resolution is taking place.


    Santosh Singh

    Wednesday, April 24, 2019 7:43 AM
  • Thanks Santosh for the post. The solution is a cloud service with a SQL Azure DB. It does use some caching (Redis) but mostly in-memory caching. Using the metrics available in the portal including application insights, I have not been able to determine a pattern.

    It is across a range of endpoints so I don't think it is endpoint specific (re., logic or a specific query). As the number of connections is still low (compared to capacity), I don't think it is a bottleneck on the number of requests to the DB or website. 

    I do have a support ticket in so let's see what comes up after MS have had a look.


    Cheers, Jeff

    Wednesday, April 24, 2019 8:52 PM
  • Any update? 
    Friday, May 3, 2019 7:14 PM
    Moderator
  • MS has suggested using DebugDiag to capture a dump when the endpoint has a long elapsed time. 

    Unfortunately with a cloud service with auto-scale this is difficult so temporarily we have fixed the number of instances and installed the tool. Now it is a waiting game to see what we come up with.

    Will post again when we have more.


    Cheers, Jeff

    Friday, May 3, 2019 10:17 PM
  • MS is reviewing the DebugDiag Dump at the moment.

    I reviewed the files generated but was not able to find anything in particular.

    At this stage, I do believe there is a connection with the auto-scale/deployments and the slowdown. It was hard to tell initially as the servers had inconsistent activity (re., test teams started at different times each day).

    So, I am looking at what information gets cached that would influence the handling of the different requests. The application does have a warmup script which has completed so the libraries should have been loaded into memory so my hunch is this is related to different in-memory caches the web application uses (re., menus, security, etc.).

    Still investigating but wanted to give an update to the forum.


    Cheers, Jeff

    Friday, May 24, 2019 12:50 AM