locked
Incorrect data in Execution Plan? RRS feed

  • Question

  • I'm trying to tune a procedure that is currently taking around 2 minutes to execute.   I look at the Actual Execution Plan from my latest EXEC, and it shows me that query #4 has a cost of 52% relative to the batch.   \

    So I run query #4 by itself and it takes less than a second.  

    So I think, maybe in the context of the proc it's taking a minute, so I put PRINT GETDATE() before and after query #4, and execute the proc again.   The Proc still takes 2 minutes to execute.   The execution plan still says that query #4 costs 52% of it.  

    And my before and after times show that query #4 took less than a second.

    So... is it the apocalypse?   Have I discovered that the execution plan is just throwing random numbers out for query cost?   Or is there some non-euclidian algorithm where 52% of 2 minutes can be less than a second?

    Any idea what's going on here?

    • Moved by Tom Phillips Monday, December 27, 2010 3:34 PM TSQL Question (From:SQL Server Database Engine)
    Thursday, December 23, 2010 6:30 PM

Answers

  • Ok, so now that you ask, lets see what drives the cost.

    As you know, SQL Server Optimizer is a cost based optimizer. But what drives the cost? Its the CPU cycles comsumed that drives the cost of the query.

    When you consider a query execution, there are 4 parameters that are important - CPU, Memory, IO and Time. Of these, CPU is the costliest, followed by Memory and then Disk. Time is one factor that SQL Server has no control on - this is because lets say, SQL Server has issued an IO request - can SQL Server guess how long it will take for the IO to complete? This is why cost is not a factor of time.

    In your case, rather than looking at the costliest query, look at the query with the longest duration, and check what you can do to speed it up.

     


    Suhas De
    --------------------------------------------------------------------------------
    Please mark answered if your question is completely answered; vote it as helpful to help others find a solution quicker.
    This posting is provided "AS IS"; with no warranties, and confers no rights.
    --------------------------------------------------------------------------------
    My Blog: http://blogs.msdn.com/b/suhde
    • Marked as answer by KJian_ Monday, January 3, 2011 8:33 AM
    Friday, December 24, 2010 10:23 AM
  • Hi Tab,

    The percentages shown in the execution plan are all based on the
    estimated cost, not the actual cost, nor the actual duration.

    The only difference between estimated and actual execution plan is
    that the actual execution plan adds actual row counts and actual
    execution counts to the estimated row counts and execution counts of
    each operator. Everything else is unchanged.

    And you should also consider that cost and execution time are not
    always in sync. The most extreme example for this is if a query has to
    wait some time for another process to release a lock - no CPU is used,
    no I/O is done, but time does still pass.


    Hugo Kornelis, SQL Server MVP
    My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
    • Proposed as answer by Gert-Jan Strik Monday, December 27, 2010 6:28 PM
    • Marked as answer by KJian_ Monday, January 3, 2011 8:33 AM
    Thursday, December 23, 2010 10:52 PM
  • I suspected this to be the case. The purpose of the optimizer is to maximize performance. The strategy it uses to achieve this, is to minimize cost, under the assumption that if the cost estimates are done in a smart way, the lowest cost will also lead to the best performance. Add to that some extra tricks such as parallellism and a very smart storage engine, and you get a pretty good result.

    Absolutely. As I had already mentioned, Cost is a factor of the amount of resources being used. Hence, the logic behind building the Cost Based Optimizer is that the less the Cost, the less we are dependant on the physical resources, the less we have to wait for the availability of the resources and hence the less will be the execution time. Essentially, by minimizing the cost, we are trying to minimize the time.

    You'll have to forgive me that it is still my opinion that the cost indicator represents time to produce the result, and not (necessarily) an indicator of the amount of resources used. Because otherwise the strategy would not achieve its goal.

    We can continue our discussion/debate on this, but what I have experienced and learnt so far is, that the Optimizer does not focus on time. As I had already mentioned, there are numerous factors that affect time, and many of these are outside the SQL Server process itself. This means that SQL Server has no control over many of these factors, an example being time required to complete an IO.

    Hence, SQL Server tries to find out a plan by which it has least dependancy on these external factors (IO in our example), so that the query can be processed the fastest. And it uses the Cost to determine how we can minimize our dependancy on the external resources.

    Also, I will need to mention here, since we have already gone so far deep in this topic, is the presence of a Scheduler in the SQL Server. In SQL Server 7 and 2000 we had the User Mode Scheduler, which has now been transformed to the SQLOS. The primary task of this is to schedule the SQL Server threads internally before sending the threads to the OS scheduler. In any machine, CPU cycles are always limited and there is no point in sending threads that are Sleeping or Suspended to the OS Scheduler - this will lead to a waste of CPU cycles. Hence, the SQL Server Scheduler exposes only those threads to the OS scheduler that need active CPU cycles.


    Suhas De
    --------------------------------------------------------------------------------
    Please mark answered if your question is completely answered; vote it as helpful to help others find a solution quicker.
    This posting is provided "AS IS"; with no warranties, and confers no rights.
    --------------------------------------------------------------------------------
    My Blog: http://blogs.msdn.com/b/suhde
    • Marked as answer by Kalman Toth Thursday, February 3, 2011 5:08 PM
    Thursday, December 30, 2010 6:55 AM

All replies

  • Hi Tab,

    The percentages shown in the execution plan are all based on the
    estimated cost, not the actual cost, nor the actual duration.

    The only difference between estimated and actual execution plan is
    that the actual execution plan adds actual row counts and actual
    execution counts to the estimated row counts and execution counts of
    each operator. Everything else is unchanged.

    And you should also consider that cost and execution time are not
    always in sync. The most extreme example for this is if a query has to
    wait some time for another process to release a lock - no CPU is used,
    no I/O is done, but time does still pass.


    Hugo Kornelis, SQL Server MVP
    My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis
    • Proposed as answer by Gert-Jan Strik Monday, December 27, 2010 6:28 PM
    • Marked as answer by KJian_ Monday, January 3, 2011 8:33 AM
    Thursday, December 23, 2010 10:52 PM
  • I'm not sure how cost is computed, now that you ask.

    It may be in logical reads, which may actually be very cheap if everything is cached, or in CPU which is cheap if you have a server with a lot of cores.

    I suppose we all ought to, like, Google this - excuse me, Bing it - or something, or open one of Kalen's books, or maybe rtfm and figure it out.  But I do know it's never a simple thing to balance all the numbers and resources, I'm not surprised at this at all.

    I prefer to look at logical reads per statement in profiler, and stuff like that, to locate the pigs.  Only after I find a pig and can't see what's wrong with the SQL (indexes, etc), do I look at execution plan estimates.

    Josh

     

    Friday, December 24, 2010 3:38 AM
  • Ok, so now that you ask, lets see what drives the cost.

    As you know, SQL Server Optimizer is a cost based optimizer. But what drives the cost? Its the CPU cycles comsumed that drives the cost of the query.

    When you consider a query execution, there are 4 parameters that are important - CPU, Memory, IO and Time. Of these, CPU is the costliest, followed by Memory and then Disk. Time is one factor that SQL Server has no control on - this is because lets say, SQL Server has issued an IO request - can SQL Server guess how long it will take for the IO to complete? This is why cost is not a factor of time.

    In your case, rather than looking at the costliest query, look at the query with the longest duration, and check what you can do to speed it up.

     


    Suhas De
    --------------------------------------------------------------------------------
    Please mark answered if your question is completely answered; vote it as helpful to help others find a solution quicker.
    This posting is provided "AS IS"; with no warranties, and confers no rights.
    --------------------------------------------------------------------------------
    My Blog: http://blogs.msdn.com/b/suhde
    • Marked as answer by KJian_ Monday, January 3, 2011 8:33 AM
    Friday, December 24, 2010 10:23 AM
  • So , as others explained there are many factors contributing to the Query subtree cost and not only time .
    Abhay Chaudhary OCP 9i, MCTS/MCITP (SQL Server 2005, 2008, 2005 BI) ms-abhay.blogspot.com/
    Friday, December 24, 2010 1:38 PM
  • Hugo

    >>The only difference between estimated and actual execution plan is
    >>that the actual execution plan adds actual row counts and actual
    >>execution counts to the estimated row counts and execution counts of
    >>each operator. Everything else is unchanged.

    Only if you have not updated more than 20% of the table. :-)))


    Best Regards, Uri Dimant SQL Server MVP http://dimantdatabasesolutions.blogspot.com/ http://sqlblog.com/blogs/uri_dimant/
    Sunday, December 26, 2010 7:43 AM
    Answerer
  • When you consider a query execution, there are 4 parameters that are important - CPU, Memory, IO and Time. Of these, CPU is the costliest, followed by Memory and then Disk. Time is one factor that SQL Server has no control on


    I don't think that that is correct at all.

    Ultimately, the speed is constraint if any of the above resources (except of course Time) is exhausted. And add to that Blocking (=waiting for locks to be released).

    Having said that, I/O is typically several orders of magnitude slower than retrieving the same amount of data from memory. That makes I/O by far the most expensive resource. When it comes to CPU versus Memory, you see that all kinds of caches are present on CPU's. Modern CPU have more than one levels of local memory cache. That indicates that Memory (RAM) is slower than the cache on the CPU. So arguably that makes CPU the cheapest resource.

    It is also not true that SQL Server has no control over time. If SQL Server is not restricted by the Operating System (which might be the case if the server is not dedicated for SQL Server, and the OS is switching tasks and redistributing Memory), then SQL Server has complete control over time. On the micro-level, a query execution might be temporarily halted, for example because of blocking, but it is SQL server which is doing that. The whole purpose of the query optimizer is to come up with a query plan that will most likely perform best. Best as in Lowest Elapsed Time. So the optimizer will estimate the cost of something like a Hash Join, compare that to the estimated cost of maybe a Merge Join and determine which would be finished earliest.


    Gert-Jan
    Monday, December 27, 2010 6:49 PM
  • Having said that, I/O is typically several orders of magnitude slower than retrieving the same amount of data from memory. That makes I/O by far the most expensive resource. When it comes to CPU versus Memory, you see that all kinds of caches are present on CPU's. Modern CPU have more than one levels of local memory cache. That indicates that Memory (RAM) is slower than the cache on the CPU. So arguably that makes CPU the cheapest resource.

    This is completely wrong when you look at how the SQL Server Optimizer works. What you are saying is probably true from human perception, and not from an Optimizer Point of view.

    Have you even thought why the Optimizer chooses Nested Loop over Hash Match when the number of rows coming from one of the child branches are less? If IO was costlier than CPU, SQL Server should have always done Hash Matches instead of Nested Loop.

    I am not saying that IO does not contribute to the cost; I just want to say that CPU has higher precidence over IO.

    A good test of this would be to look at the execution plan. There are 3 columns that I want you to concentrate on - SubtreeCost, Estimated CPU and Estimated IO. If you look at the actual SubtreeCost of each of the Plan Nodes (not the cumulative figure as displayed in the execution plan), you will find that the SubtreeCost is directly proportional to the Estimated CPU, and not the Estimated IO. This means that if the Estimated CPU is high, the SubtreeCost for that node will definitely be high; however, same is not always true for Estimated IO.

    It is also not true that SQL Server has no control over time. If SQL Server is not restricted by the Operating System (which might be the case if the server is not dedicated for SQL Server, and the OS is switching tasks and redistributing Memory), then SQL Server has complete control over time. On the micro-level, a query execution might be temporarily halted, for example because of blocking, but it is SQL server which is doing that. The whole purpose of the query optimizer is to come up with a query plan that will most likely perform best. Best as in Lowest Elapsed Time. So the optimizer will estimate the cost of something like a Hash Join, compare that to the estimated cost of maybe a Merge Join and determine which would be finished earliest.

    This is also completely wrong. Blocking happens not because the SQL Server wants to keep resources locked, but because SQL Server is forced to lock resources to provide Transactional Consistency. The ACID properties of a Transaction cannot be preserved unless SQL Server holds locks. Additionally, if you experience heavy blocking in your environment, it is because your design is not optimal or your code is not optimal. SQL Server is nothing more than another piece of code behaving as you have designed it to.

    Again, the optimizer is not supposed to come up with a plan that will complete in the least time; it is designed to come up with a plan that will have the lowest TotalSubtreeCost. And as I have already explained earlier, the TotalSubTreeCost is dependant more on the CPU rather than IO.

    It is because of this that there is a concept of Query Hints, that we can use to obtain more efficient plans that take less time. If you have used Query Hints in your environment, you will certainly have noticed that the Native Plan (the one chosen by the SQL Server optimizer without any Query Hints) always has a TotalSubTreeCost less than the Plan generated with Query Hints and that is why SQL Server did not choose that plan in the first case.

    Hence, to conclude, the SQL Server Optimizer does not optimize queries to give Least Time; rather, it optimizes for Least Cost, and Cost of a query is independant of the Elapsed Time for the query.

     


    Suhas De
    --------------------------------------------------------------------------------
    Please mark answered if your question is completely answered; vote it as helpful to help others find a solution quicker.
    This posting is provided "AS IS"; with no warranties, and confers no rights.
    --------------------------------------------------------------------------------
    My Blog: http://blogs.msdn.com/b/suhde
    Tuesday, December 28, 2010 5:17 AM
  • This could be an interesting discussion. One where I will learn something too.

    I have to start with this key statement

    the optimizer is not supposed to come up with a plan that will complete in the least time; it is designed to come up with a plan that will have the lowest TotalSubtreeCost.

    I cannot disagree more!

    Can you provide a Microsoft reference that states the purpose of the optimizer? Because in my opinion the sole purpose of the optimizer is to return the result in the least time. For all intends and purposes, this is the only measure of performance.

    Does the optimizer make its final decision based on the TotalSubTreeCost? I don't know. Maybe it does. But it is definitely not trying to minimize cost!

    If it tried to minimize cost - as in "how much resources will this query plan use" - it would never choose a parallel query plan, since these will always consume more resources than the serial query plan. So if it is TotalSubTreeCost that drives the decision, then there is much more to it than just CPU cost. If it doesn't incorporate the cost of all relevant resources, then the approach would be seriously flawed.

    Customers want the correct answer as fast as possible. No one cares whether the system has to work hard or is idle most of the time. No one cares whether the machine uses more energy or depreciates faster. All customers care about is that the result come back immediately (or at least as fast as possible).

    Have you even thought why the Optimizer chooses Nested Loop over Hash Match when the number of rows coming from one of the child branches are less?

    Yes I have. The optimizer simply estimates the potential I/O cost of the Loop Join query plan (most likely a lot of Index Seeks) and the potential I/O cost of the Hash Match (most likely one Index Scan) and picks whichever is lower.

    Index seeks typically require several logical reads per row. Index scans require a logical read for each page in the index. In the worst case situation, all logical reads result in physical reads. Because of that, the estimated I/O cost of a Loop Join quickly exceeds that of a Hash Match.

    If you have a hot cache, it is not uncommon that a LOOP hint improves query performance dramatically, simply because (in borderline cases) the optimizer's cost estimates are wrong in the situation when all data is cached.

    With a hot cache, the borderline case should typically involve a much higher number of rows before the optimizer should switch from Loop Join to Hash Match. If CPU were more important than I/O, then that is what you would see. Instead, you see the switch with a very low number.


    Gert-Jan
    Tuesday, December 28, 2010 8:32 PM
  • Hi Gert

    Both of us here are trying to approach the same problem but from different angles. :-)

    While I cannot completely disagree with what you say, I cannot completely agree either. Anyway, here is the official statement from Microsoft (please refer the parts in bold):

    Ref: http://technet.microsoft.com/en-us/library/cc917545.aspx
    The process of choosing one execution plan out of several possible plans is called optimization. The query optimizer is one of the most important components of an SQL database system. While some overhead is used by the optimizer to analyze the query and choose a plan, this overhead is saved back several-fold when the optimizer picks an efficient execution plan. For example, two construction companies can be given identical blueprints for a house. If one company spends a few days at the start to plan how they will build the house, and the other company just starts building without planning, the company that takes the time to plan their project will most likely finish first.

    The SQL Server query optimizer is a cost-based optimizer. Each possible execution plan has an associated cost in terms of the amount of computing resources used. The optimizer must analyze the possible plans and choose the one with the lowest estimated cost. Some complex SELECT statements have thousands of possible execution plans. In these cases, the optimizer does not analyze all possible combinations. It instead uses sophisticated algorithms to quickly find an execution plan that has a cost reasonably close to the theoretical minimum.

    The SQL Server query optimizer does not strictly choose the execution plan with the lowest resource cost; it chooses the plan that most quickly returns results to the user with a reasonable cost in resources. For example, processing a query in parallel typically uses more resources than processing it serially, but completes the query faster. The SQL Server optimizer will use a parallel execution plan to return results if the load on the server will not be adversely affected.

    The optimizer relies heavily on distribution statistics when estimating the resource costs of different methods of extracting information from a table or index. Distribution statistics are kept for columns and indexes. They indicate the selectivity of the values in a particular index or column. For example, in a table representing cars, many cars have the same manufacturer, but each car has a unique vehicle identification number. An index on the vehicle identification number is more selective than an index on manufacturer. If the index statistics are not current, the optimizer may not make the best choice for the current state of the table. For more information about keeping index statistics current, see "Statistical Information" in Microsoft SQL Server Database Developer's Companion.

    The optimizer is important because it lets the database server adjust dynamically to changing conditions in the database without needing input from a programmer or database administrator. This frees programmers to focus on describing the final result of the query. They can trust the optimizer to always build an efficient execution plan for the state of the database each time the statement is run.

     


    Suhas De
    --------------------------------------------------------------------------------
    Please mark answered if your question is completely answered; vote it as helpful to help others find a solution quicker.
    This posting is provided "AS IS"; with no warranties, and confers no rights.
    --------------------------------------------------------------------------------
    My Blog: http://blogs.msdn.com/b/suhde
    Wednesday, December 29, 2010 4:29 AM
  • 1) Can you provide a Microsoft reference that states the purpose of the optimizer? Because in my opinion the sole purpose of   the optimizer is to return the result in the least time. For all intends and purposes, this is the only measure of performance.

    2) Does the optimizer make its final decision based on the TotalSubTreeCost? I don't know. Maybe it does. But it is definitely not trying to minimize cost!

    3) If it tried to minimize cost - as in "how much resources will this query plan use" - it would never choose a parallel query plan, since these will always consume more resources than the serial query plan. So if it is TotalSubTreeCost that drives the decision, then there is much more to it than just CPU cost. If it doesn't incorporate the cost of all relevant resources, then the approach would be seriously flawed

    4) Statement from Suhas :Have you even thought why the Optimizer chooses Nested Loop over Hash Match when the number of rows coming from one of the child branches are less?

     

    1) Its true that SQL Server Query Optimizer is COST based and not time based .During the compilation time it makes many plans but chooses the one which is least cost consuming (assume you need to go to airport and will not only keep time to reach there but also the amount of petrol , wear and tear of vehicle , traffic , how smooth the roads are etc.Just an hypothetical example but the algorithm for optimizers are written keping in mind several such hypothetical situations ).However , the optimizer is also smart enough to decide to consume more resources when it assumes that with the least cost it would be TOO LATE when there are CPU cycles getting wasted with no processing going on , on the cores.For this , it checks if there is any processor available which is free and not doing any processing (not going too deep into thread priority etc).It only goes for parallelism when it sees that the processors are not being utilised to the optimum (which is also a waste of resources) .I remember I read all this when I was in MS ...Suhas too would have done a lot of testing and scanning through internal documents which are confidential and cannot be shared.

    2) Yes It does and chooses the least subtree cost plan .This is derrived out of the statiscics of the columns .I remembered that there was a traceflag that can show you al the plans generated for the query ...But I do not remember the number now and I might be wrong :).

    3) I hope this has been answered above.

    4) Suhas , should'nt it be vice versa ..i.e. hash match over Nested loops ...please correct me if I am not :) ...

    Some Good References :

    http://technet.microsoft.com/en-us/magazine/2007.11.sqlquery.aspx

    On parallel execution I would recommend you to read the section " Plan Evaluation" in the whit paper "Understanding and Controlling Parallel Query Processing in SQL Server" http://msdn.microsoft.com/en-us/library/gg415714.aspx

    In the end, not only the optimizer tries to save the cost but also makes sure that the query returns the results quickly .

    Regards


    Abhay Chaudhary OCP 9i, MCTS/MCITP (SQL Server 2005, 2008, 2005 BI) ms-abhay.blogspot.com/
    Wednesday, December 29, 2010 5:00 AM
  • 4) Suhas , should'nt it be vice versa ..i.e. hash match over Nested loops ...please correct me if I am not :) ...

    Hi Abhay, you are mistaken only on this point. Lets say, we have the following query (lets take an example of a very simple query):

    select A.Col1, B.Col2 from TableA A join TableB B on A.Colx = B.Colx where A.Coly = <some value> and B.Coly = <some value>

    Here, obviously, the optimizer would check (from the statistics) how many rows will be returned by the condition A.Coly = <some value>, and how many rows will be returned by the condition B.Coly = <some value>.

    Now, lets say, the condition A.Coly = <some value> or the condition B.Coly = <some value> returns very less number of rows, it does not make sense to do a Hash Match as a Hash Match is a very CPU intensive operations, and this will drive the cost of the query to be very high. You can test this very easily - create 2 tables, insert about 1000 rows in each table, run the query and check the duration and the cost. Now, modify the query to include OPTION (HASH MATCH) and then run the query and check the duration and the cost.

    However, if both of the conditions return huge number of rows, then a Nested Loop will need too many iterations; thus both the cost of the query and the time taken will be very high leading this to be an inefficient plan. In this case, Hash Match or a Merge Join will be generally used. However, I do not know under what conditions Hash Match is given preference over Merge Join or vice versa.

    Please let me know your thoughts on this.


    Suhas De
    --------------------------------------------------------------------------------
    Please mark answered if your question is completely answered; vote it as helpful to help others find a solution quicker.
    This posting is provided "AS IS"; with no warranties, and confers no rights.
    --------------------------------------------------------------------------------
    My Blog: http://blogs.msdn.com/b/suhde
    Wednesday, December 29, 2010 5:21 AM
  • I got it :)....however , I have seen many times that, when one of the columns in the join returns very less rows as compared to the other column that is returning huge rows , hash match occurs provided those columns are not sorted , else it either goes for hash match or merge join ...

    But I got the whole jist of what you are trying to explain ....nice discussion it was even though I was aware of some facts but got to know a bit more ..


    Abhay Chaudhary OCP 9i, MCTS/MCITP (SQL Server 2005, 2008, 2005 BI) ms-abhay.blogspot.com/
    Wednesday, December 29, 2010 9:56 AM
  • Abhay, there are too many other factors that go in determining which is the best join algorithm to join two data sets. Unfortunately, things do not always happen the way we assume it should or hope it would. :-)

    Anyways, it was really nice dicussing this over this thread. I am happy that I could help. :-)

     


    Suhas De
    --------------------------------------------------------------------------------
    Please mark answered if your question is completely answered; vote it as helpful to help others find a solution quicker.
    This posting is provided "AS IS"; with no warranties, and confers no rights.
    --------------------------------------------------------------------------------
    My Blog: http://blogs.msdn.com/b/suhde
    Wednesday, December 29, 2010 11:13 AM
  • Suhas,

    Thanks for the response (Abhey as well). Some quite interesting information in this thread. I'll have a look at the links later.

    I suspected this to be the case. The purpose of the optimizer is to maximize performance. The strategy it uses to achieve this, is to minimize cost, under the assumption that if the cost estimates are done in a smart way, the lowest cost will also lead to the best performance. Add to that some extra tricks such as parallellism and a very smart storage engine, and you get a pretty good result.

    You'll have to forgive me that it is still my opinion that the cost indicator represents time to produce the result, and not (necessarily) an indicator of the amount of resources used. Because otherwise the strategy would not achieve its goal.


    Gert-Jan
    Wednesday, December 29, 2010 7:02 PM
  • Suhas,

    Thanks for the response (Abhey as well). Some quite interesting information in this thread. I'll have a look at the links later.

    I suspected this to be the case. The purpose of the optimizer is to maximize performance. The strategy it uses to achieve this, is to minimize cost, under the assumption that if the cost estimates are done in a smart way, the lowest cost will also lead to the best performance. Add to that some extra tricks such as parallellism and a very smart storage engine, and you get a pretty good result.

    You'll have to forgive me that it is still my opinion that the cost indicator represents time to produce the result, and not (necessarily) an indicator of the amount of resources used. Because otherwise the strategy would not achieve its goal.


    Gert-Jan

     

    If you want to get right down to it, the cost indicator/figures REALLY DO represent time... at least originally.

    Those mysterious cost numbers you see were (originally) the number of seconds that the operator took to run on "Nick's Machine"... a specific PC that the optimization team had back in the 7.0 days.

    You can read about it here:

     

     


    --Brad (My Blog)
    Wednesday, December 29, 2010 7:35 PM
  • Nice articles Brad.But after doing the tests I disagree with you .May be its true in the past but now in the present versions .If you can give me some live examples (queries) or any blog you/someone else has written in the recent past.


    Abhay Chaudhary OCP 9i, MCTS/MCITP (SQL Server 2005, 2008, 2005 BI) ms-abhay.blogspot.com/
    Thursday, December 30, 2010 4:48 AM
  • I suspected this to be the case. The purpose of the optimizer is to maximize performance. The strategy it uses to achieve this, is to minimize cost, under the assumption that if the cost estimates are done in a smart way, the lowest cost will also lead to the best performance. Add to that some extra tricks such as parallellism and a very smart storage engine, and you get a pretty good result.

    Absolutely. As I had already mentioned, Cost is a factor of the amount of resources being used. Hence, the logic behind building the Cost Based Optimizer is that the less the Cost, the less we are dependant on the physical resources, the less we have to wait for the availability of the resources and hence the less will be the execution time. Essentially, by minimizing the cost, we are trying to minimize the time.

    You'll have to forgive me that it is still my opinion that the cost indicator represents time to produce the result, and not (necessarily) an indicator of the amount of resources used. Because otherwise the strategy would not achieve its goal.

    We can continue our discussion/debate on this, but what I have experienced and learnt so far is, that the Optimizer does not focus on time. As I had already mentioned, there are numerous factors that affect time, and many of these are outside the SQL Server process itself. This means that SQL Server has no control over many of these factors, an example being time required to complete an IO.

    Hence, SQL Server tries to find out a plan by which it has least dependancy on these external factors (IO in our example), so that the query can be processed the fastest. And it uses the Cost to determine how we can minimize our dependancy on the external resources.

    Also, I will need to mention here, since we have already gone so far deep in this topic, is the presence of a Scheduler in the SQL Server. In SQL Server 7 and 2000 we had the User Mode Scheduler, which has now been transformed to the SQLOS. The primary task of this is to schedule the SQL Server threads internally before sending the threads to the OS scheduler. In any machine, CPU cycles are always limited and there is no point in sending threads that are Sleeping or Suspended to the OS Scheduler - this will lead to a waste of CPU cycles. Hence, the SQL Server Scheduler exposes only those threads to the OS scheduler that need active CPU cycles.


    Suhas De
    --------------------------------------------------------------------------------
    Please mark answered if your question is completely answered; vote it as helpful to help others find a solution quicker.
    This posting is provided "AS IS"; with no warranties, and confers no rights.
    --------------------------------------------------------------------------------
    My Blog: http://blogs.msdn.com/b/suhde
    • Marked as answer by Kalman Toth Thursday, February 3, 2011 5:08 PM
    Thursday, December 30, 2010 6:55 AM