none
Web Server cache vs Optimized DB queries RRS feed

  • Question

  • Hi,

    Scenario: a web page needs a resultset which requires to call a very slow stored procedure, lets say which could takes about 15 seconds to a minute to complete. The resultset is about 50 to 300 rows of 10 columns. The volume of data is small (short strings or integers). The SP is slow simply because the code & DB schema are very badly designed. As a result, the application team decided to cache the data to minimize the number of calls to this SP.

    This would be ideal, except that the content of the recordset depend on let say 4 parameters. And caching each combination involves a certain stress on the web server.

    Question: If there is a way to improve the SP speed 100+ times faster so that the SP now executes in less than 0.1 second. Is it still worth the efforts to continue to use the web caching mechanism? In other words, is a DB connection + execute SP something very costly that should be avoided as much as possible?

    Thanks very much in advance for any help.
    Monday, January 28, 2008 9:04 PM

Answers

  • I'd say that the problems are always going to be with the database, and it's inaccurancies whatever you do.

     

    Is there no way to fix that problem, instead of working around it, and trying to find solutions to issues that should not be present in the first place?

     

    In terms of caching the data, I think that it really depends on the load on the web server, the size of the cached data in total, and the amount of times that the cached objects are invalidated.  This will give rise to an amount of memory being used up, as you pointed out, how big an effect this has, is largely tied to other factors of the application. 

    On the basis that the database is poorly designed, it seems likely that you will experience other problems due to that fact that the application needs to be worked on instead of working around?

     

    Then you need to decide how you are going to manage the cache.  You can push the data using notification, or you can pull the data either as you need it, or in a prefetched manner.  How you do this is largely dependant upon how the application is used.

     

    Since the queries take up to a minute, without changing them, you have a totally unacceptible situation.  This is true even if you cache the data, as it is likely to be quickly out of date.  If you have no way to re-write the application, then you will have to live with the performance hit of performing caching, as that is highly unlikely to take longer than the query, to execute.

     

    I hope this helps you to find out a little more about the application, and analyse the impact of the decisions you make.

     

    Good luck,

     

    Martin Platt.

    Tuesday, January 29, 2008 11:38 PM
  • As everyone is saying, I think the problem is multi- faceted.

     

    My first point of call would be to tune the stored procedure:

    • Use your query analyser to look at the execution plan and pin point the bottlenecks.
    • Does it use cursors? If so, rewrite it. I have not found an instance where a cursor based sproc could not be rewritten to get rid of them. Don't forget, as well as being slow, they also create locks on your data, which seriously impedes performance and scalability.
    • Is it locking the data unnecessarily? Could you specify queries using NO LOCK?

    How often is the data changing? Could you pre-calculate data or denormalise data?

     

    Architecturally, how are you accessing the data? Do you have a seperate data tier? If so, you could potentially cache data there. As mentioned previously, you have to think carefully about your caching strategy, balancing the work done caching the data and refreshing the cache, the work done at the RDBMS to generate the data and the allowable latency of your data.

     

    On the web front end, don't forget you can use fragment caching, full page caching, both of which can be done by parameter.

     

    Your overall strategy woudl be to combine all these techniques, but do it one step at a time.. and of course, you are using test driven development and continuous integration, aren't you????? 

     

    Thursday, February 7, 2008 7:21 AM

All replies

  • Any one can help please?
    Thanks.
    Tuesday, January 29, 2008 7:26 PM
  • I'd say that the problems are always going to be with the database, and it's inaccurancies whatever you do.

     

    Is there no way to fix that problem, instead of working around it, and trying to find solutions to issues that should not be present in the first place?

     

    In terms of caching the data, I think that it really depends on the load on the web server, the size of the cached data in total, and the amount of times that the cached objects are invalidated.  This will give rise to an amount of memory being used up, as you pointed out, how big an effect this has, is largely tied to other factors of the application. 

    On the basis that the database is poorly designed, it seems likely that you will experience other problems due to that fact that the application needs to be worked on instead of working around?

     

    Then you need to decide how you are going to manage the cache.  You can push the data using notification, or you can pull the data either as you need it, or in a prefetched manner.  How you do this is largely dependant upon how the application is used.

     

    Since the queries take up to a minute, without changing them, you have a totally unacceptible situation.  This is true even if you cache the data, as it is likely to be quickly out of date.  If you have no way to re-write the application, then you will have to live with the performance hit of performing caching, as that is highly unlikely to take longer than the query, to execute.

     

    I hope this helps you to find out a little more about the application, and analyse the impact of the decisions you make.

     

    Good luck,

     

    Martin Platt.

    Tuesday, January 29, 2008 11:38 PM
  • Very interesting answer. I know exactly that we have performance trouble in the DB area. The fixes are known but will require a complete redesign of the DB and a rewrite of roughly a thousand of SPs. We don't own the DB schema, and there is a team of 10 developers. This is not something to happen overnight.

    Over the years, the dev teams have found their way to rely extensively on the web cache mechanism. So much the application takes very long time to start. Which is very frustrating during development / debug phase because stop, build, start are frequent.

    What puzzles me is that I wonder how the cache mechanism could perform better than a database which is specialized for this task. Seeking an item in the cache, casting it back to its orinial type, seems slower than a DB trip (in the scenario where our DB issues are fixed). During debug, I saw clearly that an item pulled out of the cache takes sometimes 2 seconds. My optimized SP will do much faster than that.

    If you don't mind, I'd like to get things clear about this web cache mechanism that I don't know very well how to monitor or measure the performance.

    Question 2: if the SP get the data faster in 0.1 second (but the dataset still needs to travel through the network to reach the client code). Pulling the item out of the cache and recast it to DataSet takes 1 second. Does the cache still a better design? May be because we don't make an out of process call or avoiding stressing the network?

    Question 3: in case of stress situation. Let's say 1,000 concurrent users during a minute. Pulling data 1000 times out of the cache and making 1000 DB connections + exec SP + get back the recordset. Which approach is more scalable? (Providing the DB is perfectly tuned).

    Question 4: Currently the cache is rather is pretty much our magical stuff all bag. Which is true in our present situation because of the DB troubles. However with a well tuned DB, are there any adverse effects of over-using the web cache? Hidden performance hit or something bad we should know of?

    By "over-using" the web cache, I mean adding RAM to the web server and rely more on the cache than a normal DB call. Pratically almost every call to the DB is shadowed by looking up the cache first and then if not found, call DB and store in cache for future use.

    Thanks very much in advance for any help.
    Wednesday, January 30, 2008 1:51 AM
  • I'm glad that you've considered the DB refactoring proposition!

     

    Now to your questions:

     

    Qn 2.  It would be largely dependent upon connection speed, and network traffic, as well as physical layout of the application.  I mean, even if the stored procedure executes in 0.1 seconds, if in total to set up and connect to the database, set up and pass parameters to the stored procedure, and retrieve the result back to the calling code takes longer than a second, then yes.  Then subsequent calls with the same parameters will then take 1 second, or whatever the time from cache is. 

     

    Qn 3.  From a scalability perspective, when a lot of cache invalidations are being experienced, the overhead of accessing the cache first is going to make the solution slower when using the cached mechanism.  If on the other hand the cache is being used the vast majority of the time, due to the answer to Qn 2, the solution should be more scalable, assuming that there is no problems with blocking in the cache.

     

    Qn 4.  I'm very interested in what a "magical stuff all bag" is!!  If the database was nicely tuned, the database servers often do their own caching, then there's connection pooling and such like.  There are a lot of options to navigate through.  As with answers to Qn 2. and Qn 3., it all depends on the situation.  If a database and the query is well organised, and tuned then to justify using the cache, you'd have to also justify if calling every time is better than accepting the overhead of accessing the cache, and calculating if the cache is invalid, or the cache does not contain the particular object.  Generally speaking, I'd not employ a cache unless you have some reason to do so, there is some reasonable expectation that using a cache is going to provide performance gains, due to traffic, response times, or whatever.

     

    Obviously there's the cost of adding more objects to a cache to the point that you eventually hold everything in memory, such that that search space could be very large, and the amount of memory required would be immense.  As the cache gets more and more filled, I'd expect performance to degrade accordingly. 

     

    As you using the enterprise application block for caching, if so, then that will have taken care of a lot of the issues that you would encounter from a hand-rolled solution.  The situation you describe of checking the cache, and the retrieving if it is not found is correct.

     

    I hope this helps,

     

    Martin Platt.

    Wednesday, February 6, 2008 4:44 AM
  • As everyone is saying, I think the problem is multi- faceted.

     

    My first point of call would be to tune the stored procedure:

    • Use your query analyser to look at the execution plan and pin point the bottlenecks.
    • Does it use cursors? If so, rewrite it. I have not found an instance where a cursor based sproc could not be rewritten to get rid of them. Don't forget, as well as being slow, they also create locks on your data, which seriously impedes performance and scalability.
    • Is it locking the data unnecessarily? Could you specify queries using NO LOCK?

    How often is the data changing? Could you pre-calculate data or denormalise data?

     

    Architecturally, how are you accessing the data? Do you have a seperate data tier? If so, you could potentially cache data there. As mentioned previously, you have to think carefully about your caching strategy, balancing the work done caching the data and refreshing the cache, the work done at the RDBMS to generate the data and the allowable latency of your data.

     

    On the web front end, don't forget you can use fragment caching, full page caching, both of which can be done by parameter.

     

    Your overall strategy woudl be to combine all these techniques, but do it one step at a time.. and of course, you are using test driven development and continuous integration, aren't you????? 

     

    Thursday, February 7, 2008 7:21 AM
  • Hi All,

     

    This is an important discussion going on.Sorry for forking a new thread realted to this thread here.

    I have few questions here:

    1. Can we develop the UI layer using test driven development and continous integration.If yes ,please state how.

    2. As per my understanding, caching should be done when 80% of the data is for read only and no write.So for a general web application like weather information,govt policies, we should think about caching postivily but i am still thinking about whether we should use caching on a portal where the data is very much user and inputs dependent.Your comments.

    3. I need to know about the general/best practice used in the commercial installed web applications about building and installing it on production server.Do we use vs.net to compile the application every time we are installing it to the web server, or do we use NANT.
    4. I need to know best practice for implementing logging. Should we use log4net or enterpirse library logging bloc

     

    Regards

    TicArch

     

    Wednesday, February 13, 2008 6:54 AM
  • Create a new thread instead of asking in this thread.  I will answer that question then.  This has nothing to do with the original question.

     

    I look forward to your question on another thread!

     

    Martin Platt.

    Friday, February 15, 2008 4:47 AM
  •  

    For those interested in the refactoring alternative, I can suggest a book from the Fowler's colection and VS features on the topic

     

    Thursday, April 3, 2008 8:42 PM