none
SQL Server 2005 CPU 100%

    Question

  • Hi,
    This is a think client C# application. The database size is 15GB and is on a server with 8 Core and 32GB Ram.
    The application has some 100 live users any given time and the issue we are facing is the CPU usage is going 100%, all the time.
    The disk usage and memory is very low but the CPU is always keeing 100%.

    One thing that I have observed in the code is that it has tooooo many stored procedure calls.
    Which means: If the application need to get 1000 Product details for a given user, the application loops thru each product ID and execute the sp to get the product details for one product at a time, this is repeated 1000 times to get all the product details. This is pretty much the architecture in the application, which I feel is suboptimal.

    1. Each stored procedure returns data in a reasonable time, in 0 to 20 secods range, but still why would the CPU usage be 100% while keeping the memory and Disk low? Is it because tooooo many queries to the server as explained above?
    2. If we re-write the stored procedures and and get all the product details in ONE query instead of 1000 individual calls, what difference will it make?
    3. What is the difference between getting all data is a single stored procedure vs. looping thru each Product ID and getting one product data at a time? What are the overhead included in calling an sp 1000 times vs. one single call to get all data?

    Any insight in to these will be greatly appreciated.


    Sams
    Sunday, February 28, 2010 2:36 PM

Answers

  • Look at your top wait stats:

    SELECT TOP 10 *
    FROM sys.dm_os_wait_stats
    ORDER BY wait_time_ms DESC

    You will probably find CXPACKET in the list which would be a sign that you have queries running in parallel across multiple CPU's and under highly concurrent worklaods like the one you have described this could be problemattic.  Generally for a 8 CPU system that was a OLTP I'd set max degrees of parallelism to 2 maybe 4, and then adjust the cost threshold for parallelism option from 5 to 25-30, increasing the cost for a query to run in parallel.

    If the server is 64 bit, you HAVE to set MaxServerMemory or you can end up in paging problems.  For 32 GB of RAM I would start at 26-28GB for SQL and then monitor the Memory\Available MBytes counter to determine if SQL can use more of the memory without causing external memory pressure.  This counter should always be > 150MB free.  Make sure you add the SQL Service account to the Lock Pages in Memory, User Right Assignment in the Local Security Policy.  If you are on Standard Edition you need to apply the latest Service Pack and CU and enable a trace flag so that Lock Pages can be used by SQL preventing the process from being paged by the OS.

    http://support.microsoft.com/kb/918483
    http://support.microsoft.com/kb/970070/

    The next thing to look at is Missing Indexes, which cause table scans to occur, and driving IO and CPU usage through the roof.  There are a few ways to do this, but my favorite method is to scan the plan cache so that you can look at the SQL Statement and execution plan associated with the missing index.

    http://sqlblog.com/blogs/jonathan_kehayias/archive/2009/07/27/digging-into-the-sql-plan-cache-finding-missing-indexes.aspx

    Beyond all of that, post your wait stats and we can offer more information.


    Jonathan Kehayias
    http://sqlblog.com/blogs/jonathan_kehayias/
    http://www.twitter.com/SQLSarg
    http://www.sqlclr.net/
    Please click the Mark as Answer button if a post solves your problem!
    Sunday, February 28, 2010 11:57 PM
    Moderator
  • One thing that I have observed in the code is that it has tooooo many stored procedure calls. Which means: If the application need to get 1000 Product details for a given user, the application loops thru each product ID and execute the sp to get the product details for one product at a time, this is repeated 1000 times to get all the product details. This is pretty much the architecture in the application, which I feel is suboptimal.

    1. Each stored procedure returns data in a reasonable time, in 0 to 20 secods range, but still why would the CPU usage be 100% while keeping the memory and Disk low? Is it because tooooo many queries to the server as explained above?
    2. If we re-write the stored procedures and and get all the product details in ONE query instead of 1000 individual calls, what difference will it make?
    3. What is the difference between getting all data is a single stored procedure vs. looping thru each Product ID and getting one product data at a time? What are the overhead included in calling an sp 1000 times vs. one single call to get all data?

    Any insight in to these will be greatly appreciated.


    Sams

    Few comments:

    The key is in the execution plan. Look if you are getting index seek vs index scan. 
    Regarding your #2), definitely yes. The biggest strength of SQL is process data in bulk with less resources and iterating thru each record will *always* be expensive when you are dealing with large sets of data.

    Regarding #3), given that your memory size is double the db size cpu usage is the only thing I can think of. Also check if you are getting plan reuse.
    Below code snippet [SQL Server 2005 Diagnostic Information Queriesi] will guide you to find the queries taking too much cpu usage. From this list you can work on specific queries.
    -- Cached SP's By Worker Time (SQL 2005) Worker time relates to CPU cost
    SELECT TOP(25) qt.[text] AS [SP Name], qs.total_worker_time AS [TotalWorkerTime], 
    qs.total_worker_time/qs.execution_count AS [AvgWorkerTime],
    qs.execution_count AS [Execution Count], 
    ISNULL(qs.execution_count/DATEDIFF(Second, qs.creation_time, GETDATE()), 0) AS [Calls/Second],
    ISNULL(qs.total_elapsed_time/qs.execution_count, 0) AS [AvgElapsedTime], 
    qs.max_logical_reads, qs.max_logical_writes, 
    DATEDIFF(Minute, qs.creation_time, GETDATE()) AS [Age in Cache]
    FROM sys.dm_exec_query_stats AS qs
    CROSS APPLY sys.dm_exec_sql_text(qs.[sql_handle]) AS qt
    WHERE qt.[dbid] = DB_ID() -- Filter by current database
    ORDER BY qs.total_worker_time DESC;




    Sankar Reddy http://sankarreddy.com/
    Sunday, February 28, 2010 7:24 PM
    Moderator

All replies

  • Hi Sams,

    First of all, you need to check if the "100%" CPU are used by SQL server process. Could you collect a perfmon with Process counter >> cpu TIME for all instance. Then check with process are consumming the most of CPU.

    Have you take a trace profiler to see the CPU Time consumed by each calls ?


    Michel DEGREMONT , my blog http://www.micheldegremont.com
    Sunday, February 28, 2010 3:13 PM
  • One thing that I have observed in the code is that it has tooooo many stored procedure calls. Which means: If the application need to get 1000 Product details for a given user, the application loops thru each product ID and execute the sp to get the product details for one product at a time, this is repeated 1000 times to get all the product details. This is pretty much the architecture in the application, which I feel is suboptimal.

    1. Each stored procedure returns data in a reasonable time, in 0 to 20 secods range, but still why would the CPU usage be 100% while keeping the memory and Disk low? Is it because tooooo many queries to the server as explained above?
    2. If we re-write the stored procedures and and get all the product details in ONE query instead of 1000 individual calls, what difference will it make?
    3. What is the difference between getting all data is a single stored procedure vs. looping thru each Product ID and getting one product data at a time? What are the overhead included in calling an sp 1000 times vs. one single call to get all data?

    Any insight in to these will be greatly appreciated.


    Sams

    Few comments:

    The key is in the execution plan. Look if you are getting index seek vs index scan. 
    Regarding your #2), definitely yes. The biggest strength of SQL is process data in bulk with less resources and iterating thru each record will *always* be expensive when you are dealing with large sets of data.

    Regarding #3), given that your memory size is double the db size cpu usage is the only thing I can think of. Also check if you are getting plan reuse.
    Below code snippet [SQL Server 2005 Diagnostic Information Queriesi] will guide you to find the queries taking too much cpu usage. From this list you can work on specific queries.
    -- Cached SP's By Worker Time (SQL 2005) Worker time relates to CPU cost
    SELECT TOP(25) qt.[text] AS [SP Name], qs.total_worker_time AS [TotalWorkerTime], 
    qs.total_worker_time/qs.execution_count AS [AvgWorkerTime],
    qs.execution_count AS [Execution Count], 
    ISNULL(qs.execution_count/DATEDIFF(Second, qs.creation_time, GETDATE()), 0) AS [Calls/Second],
    ISNULL(qs.total_elapsed_time/qs.execution_count, 0) AS [AvgElapsedTime], 
    qs.max_logical_reads, qs.max_logical_writes, 
    DATEDIFF(Minute, qs.creation_time, GETDATE()) AS [Age in Cache]
    FROM sys.dm_exec_query_stats AS qs
    CROSS APPLY sys.dm_exec_sql_text(qs.[sql_handle]) AS qt
    WHERE qt.[dbid] = DB_ID() -- Filter by current database
    ORDER BY qs.total_worker_time DESC;




    Sankar Reddy http://sankarreddy.com/
    Sunday, February 28, 2010 7:24 PM
    Moderator
  • Thank you all for good suggestions. Just another information about the database Isolation model that this environment uses SNAPSHOT ISOLATION MODEL.

    This system is pretty optimized. I have profiled the sql and observed on a load test environment... When only one user is active the longest execution time of stored procedure is less than 5 seconds and prettymuch 90% of the queries return less than a second and also CPU usage is very normal. But when the number of users is increased to 25, 50...and to 80 then the CPU usage is spiking to 100% and stays there. Still the application is responding even though slow. And the longest executed stored procedure is 27 seconds.

    This application is desinged from its begining to return one Product's data at a time and and execute the sp to get ONE product data as many times as the number of ProductIDs.
    To change this logic to get all records at one time, (eg. with a comma seperated list of ProductIDs) will take good investment. 

    1. I know that it is going to be beneficial... but up to what extent (%) ?
    2. How do I figureout the difference between calling one stored procedure (with all ProductIDs) for getting all records at one time VS. Calling a stored procedure which returns one product data and executed as many times as ProductIDs and each call is made from the Applicaton code?

    Could the CPU usage becoming 100% be because of somany procedure calls are occuring, [(around 401254 procedures in around 715 seconds during a load test) which means 561 sp executions per second.] while the memory usage and disk usage are very low?

    Thank you!
    Sams
    Sunday, February 28, 2010 10:23 PM
  • Look at your top wait stats:

    SELECT TOP 10 *
    FROM sys.dm_os_wait_stats
    ORDER BY wait_time_ms DESC

    You will probably find CXPACKET in the list which would be a sign that you have queries running in parallel across multiple CPU's and under highly concurrent worklaods like the one you have described this could be problemattic.  Generally for a 8 CPU system that was a OLTP I'd set max degrees of parallelism to 2 maybe 4, and then adjust the cost threshold for parallelism option from 5 to 25-30, increasing the cost for a query to run in parallel.

    If the server is 64 bit, you HAVE to set MaxServerMemory or you can end up in paging problems.  For 32 GB of RAM I would start at 26-28GB for SQL and then monitor the Memory\Available MBytes counter to determine if SQL can use more of the memory without causing external memory pressure.  This counter should always be > 150MB free.  Make sure you add the SQL Service account to the Lock Pages in Memory, User Right Assignment in the Local Security Policy.  If you are on Standard Edition you need to apply the latest Service Pack and CU and enable a trace flag so that Lock Pages can be used by SQL preventing the process from being paged by the OS.

    http://support.microsoft.com/kb/918483
    http://support.microsoft.com/kb/970070/

    The next thing to look at is Missing Indexes, which cause table scans to occur, and driving IO and CPU usage through the roof.  There are a few ways to do this, but my favorite method is to scan the plan cache so that you can look at the SQL Statement and execution plan associated with the missing index.

    http://sqlblog.com/blogs/jonathan_kehayias/archive/2009/07/27/digging-into-the-sql-plan-cache-finding-missing-indexes.aspx

    Beyond all of that, post your wait stats and we can offer more information.


    Jonathan Kehayias
    http://sqlblog.com/blogs/jonathan_kehayias/
    http://www.twitter.com/SQLSarg
    http://www.sqlclr.net/
    Please click the Mark as Answer button if a post solves your problem!
    Sunday, February 28, 2010 11:57 PM
    Moderator