not apples to apples RRS feed

  • Question

  • 4 core 8 gb RAM sql 2008 STD box vs 4 hexacore 64 gb box sql 2008 r2 STD. 

    Migrated database (backup and restore )from stand alone 4 core 8 gb RAM sql 2008 box to 4 hexacore 64 gb box sql 2008 r2 cluster.  Statistcs were updated. Simple queries taking about twice as long.  Statistics were updated with full scan after restore.  Plan cache flushed on both servers prior to testing.  What other things should we be looking at?  Have loocked at waits.  HAve set max degree of parallelism to 8 and cost threshold of parallelism to 20.


    Wednesday, February 23, 2011 1:03 AM


All replies

  • Before jumping to SQL checkup you need to make sure that we are not hitting any issue related to operating system itself. There are bunch of fixes for Windows Server 2008 R2 which would fix performance degradation issues.
    http://support.microsoft.com/kb/976700 (An application stops responding, experiences low performance, or experiences high privileged CPU usage if many large I/O operations are performed in Windows 7 or Windows Server 2008 R2)

    http://support.microsoft.com/kb/979149 (A computer that is running Windows 7 or Windows Server 2008 R2 becomes unresponsive when you run a large application)

    http://support.microsoft.com/kb/2155311 (Poor performance occurs on a computer that has NUMA-based processors and that is running Windows Server 2008 R2 or Windows 7 if a thread requests lots of memory that is within the first 4 GB of memory)
    Additionally, check if http://support.microsoft.com/default.aspx?scid=kb;en-US;2207548  (Degraded overall performance on Windows Server 2008 R2) is applicable

    Please go ahead and apply ALL fixed listed above and change the power options as listed in 2207548

    Next, What is the max wait on new server?  Do you still have old server? If yes, pick up any query which you find running slow and compare statistics IO and query plan on both server.

    Balmukund Lakhani | Please mark solved if I've answered your question, vote for it as helpful to help other user's find a solution quicker
    This posting is provided "AS IS" with no warranties, and confers no rights.
    My Blog: http://blogs.msdn.com/blakhani
    Team Blog: http://blogs.msdn.com/sqlserverfaq
    Wednesday, February 23, 2011 1:49 AM
  • Check your Power Management Settings in the Bios and in Windows.  Make sure that it is set to High Performance and not Balanced in both places. 


    This is the most likely thing to cause the problem you describe.  

    On a side note, for a hexacore server, the correct MAXDOP setting is 6 which equates to the number of physical processor cores in a single NUMA Node (http://support.microsoft.com/kb/2023536). 

    Jonathan Kehayias | Senior Database Administrator and Consultant
    Feel free to contact me through My Blog or Twitter
    Please click the Mark as Answer button if a post solves your problem!
    Wednesday, February 23, 2011 2:11 AM
  • Thanks you so much for the information.  We have all ready picked a query and compared execution plans.  They are virtually identical.  But can hardly wait to apply the other fixes.
    Wednesday, February 23, 2011 3:22 AM
  • thank you so much for the quick reply.  I just checked the power plan is balanced we will be changing that in the AM.  I also set the maxdop to 6.



    Wednesday, February 23, 2011 3:23 AM
  • On a side note, for a hexacore server, the correct MAXDOP setting is 6 which equates to the number of physical processor cores in a single NUMA Node (http://support.microsoft.com/kb/2023536 ). 


    I saw you mention this the other day on another thread, and it caught my eye.


    This will increase the likelyhood of all parallel threads of a query to be located within a NUMA Node and avoid costly remote node data look ups.

    I wondered just what they mean by "likelyhood".  If SQL Server was really NUMA aware and counting, wouldn't it be a guarantee?  And if it's not fully aware and counting, then - well, many questions.  But one would be, if it's even semi-random or semi-sequential how a thread attaches to a processor, wouldn't smaller numbers be even more "likely" to stay within NUMA boundaries?


    This whole thread is very informative, the number of serious gotchas for a new Windows server and a new SQL Server install seems to be at a high point compared to the last several years - or is that an accurate statement?




    Wednesday, February 23, 2011 5:44 AM
  • Funny how things come full circle isn't it Josh?  I only say that because I recall a previous thread where you bashed Microsoft for the defaults of SQL Server at the time and I mentioned how NUMA would become mainstream and people would bash Microsoft because they didn't account for NUMA being mainstream in the defaults.  In case you forget the thread I am mentioning is http://social.msdn.microsoft.com/Forums/en/sqldatabaseengine/thread/cf55ac60-f768-499f-a939-d16b0d47b0ff.  

    SQL Server is fully NUMA aware, but that doesn't mean that things can't become imbalanced over time, and what should be a best practice fails under specific workloads.  However, the scheduling mechanisms in SQL Server are quite robust and should be able to adequately cope with the best practice configuration that limits SQL Server to the processing capabilities of a single NUMA node.  The worst thing that you could encounter would be scheduling across NUMA nodes and with the newer Nehalem chips being capable of of attaining 2× QPI 6.4 GT/s (http://en.wikipedia.org/wiki/Nehalem_(microarchitecture)) for the internode transfers depending on the specific chipset selected you have nothing to worry about with memory locality as far as performance is concerned except under the most demanding of workloads,and even then the fastest processors may have trouble keeping up with the QPI.

    The days of recommending 'max degree of parallelism' = 2 or 4, or less than the number of physical cores that exist in a single NUMA node are well past gone; they died somewhere around the time that official support for SQL Server 2000 expired.  If you are going to keep up with the hardware changes that have occurred within the last year, you have to change how you think about SQL Server, and the impact that newer architectures have on the performance of SQL Server overall.  I have a quad octacore server that performs best with 'max degrees of parallelism' set at 16, which equates two of the four NUMA nodes available on the server, but you have to test any configuration that will exceed the best practice implementation before actually implementing the configuration in a production environment.  During testing we noted that exceeding the 2 NUMA node configuration degraded performance in a manner that was unacceptable.  It is up to each end user to test each specific workload to determine what works for their specific environment.  Without a doubt what was recommended one or even two years ago doesn't apply to newer hardware configurations.

    Jonathan Kehayias | Senior Database Administrator and Consultant
    Feel free to contact me through My Blog or Twitter
    Please click the Mark as Answer button if a post solves your problem!
    Wednesday, February 23, 2011 6:38 AM
  • I wish I could state this better than Jonathan already has, but I don't see how that would be possible.

    When SQL Server assigns worker threads to a CPU, it assigns the process to the next CPU thread in a round-robin fashion. Then it assigns the parallel threads to the successive CPU threads. At some point, SQL may have to make a decision as to whether to span NUMA nodes or cycle back around to the start of the same NUMA node. Because of the way memory is partitioned to each NUMA node, SQL will often cycle back around to the start of the NUMA node to keep the data in memory all localized on the same node. Under certain circumstances, such as if the current NUMA node is very busy and the next is not, it will opt to span the NUMA nodes. In short, SQL does whichever it perceives will yield the best results.

    Like Jonathan said, the new Nehalem chips don't have the same issues with memory localization is the older NUMA chips. How aware SQL is of this, I do not know. I would not expect SQL Server 2005 to be aware of these improvements and maybe not even SQL 2008. I know that SQL 2008 R2 does have improved memory handling, though I'm not sure if it has special handling for Nehalem chips or not.

    It's about to get a lot more confusing as they rewrote a lot of memory handling in SQL Server Denali. Definitely, I would expect it to make much better decisions about handling of NUMA nodes.

    My blog: SQL Soldier
    Twitter: @SQLSoldier
    Microsoft Certified Master: SQL Server 2008
    My book: Pro SQL Server 2008 Mirroring
    Wednesday, February 23, 2011 7:04 AM
  • Robert, so SQL Server is NUMA-processor-aware enough to come back around on the same chip as it assigns threads, that was my main question, thanks for answering. 

    But you raise another, which is whether SQL Server would be NUMA-memory-aware about putting buffers for those threads in local memory ... which then raises the issue of to what degree it manages those buffers as a separate pool, given no user-specified instructions.  But as Jonathan says the cross-memory access is better now, which has to help, but it does leave open the question of what the optimal behaviors are, and we still have to learn what the SQL engine has now been built to do!  If you do or don't explicitly define workload groups, will SQL Server end up with two (or more) copies of the same table data in RAM at the same time?  There's always one more question.

    Certainly this NUMA logic has to help as the number of cores on a chip, and total cores on a server, goes up and up!  Y'know, back when we were doing kilobytes, we used to worry a lot about memory interleaving.  Everything old is new again, just six orders of magnitude larger!  Geez, try reading that last sentence again, six orders of magnitude ...

    Denali is looking like a real winner, in feature terms.  But I still have my little question, that somehow has tickled Jonathan into seeing as bashing, so - will Denali out of the box now default to maxdop=6 on a server with four six-core processors, or to 24?  And for extra fun, didn't I read about some AMD chip with - was it 24 cores on one chip?  I wonder what its NUMA capabilities are.

    Jonathan, on your quad octacore server where you found best performance at maxdop=16, is that supporting a mixed workload, or more batch/ETL/BI, or more OLTP?  For a pure OLTP workload, I think you might want to reconsider whether those smaller maxdop numbers might not still work very well.  But perhaps the small maxdop numbers would be best if they are a factor of the number of cores, 2 or 4 for an octocore rather than 3 or 5 or 7?  Back to your case, also, you found that maxdop=16 worked well and that performance dropped off at - at what, at 24?  I would ask *why* it dropped off at 24, because of some additional cross-memory issues, or because of the kind of processor resource starvation and idling and scheduling issues I was worried about a few months ago, as maxdop exceeds half the number of total cores.

    It's great we have all of these resources, I hope it both helps me squeeze more work out of the boxes - and more work out of employers who need all of these issues handled!  But there sure are a lot of questions that can come up in practice, then you have to see if anyone at Microsoft has publically documented what the SQL Server behaviors are, and then make sure you are looking at the right case, and if it still doesn't work right, start worrying about bugs in this more complex software - or even the hardware.  Just noodling about this and letting my fingers keep typing here ... the dmv's, with some work, will allow you to snapshot spids and threads and - even processor assignments? But maybe in Denali SQL Server might do some logging of the efficiency, so if someone specifies a bad maxdop like 7 it would be more easily detected?  Or maybe we leave that to third-party products.

    Fun and games!

    I just finished a gig where they were just bringing up new servers with 32 and 64 cores a couple of weeks after I left (of course was supposed to be weeks before I left, but what else is new!), and I doubt that anyone there is paying the least attention to maxdop.



    Wednesday, February 23, 2011 4:07 PM
  • sorry back on topic

    DL585 G6 These three settings Power Regulator for Proliant, Ultra Low Power State & Memory Channel Mode appear to have a big impact on sql server per

    • Marked as answer by Joe_Hell Monday, March 14, 2011 9:24 PM
    Thursday, February 24, 2011 6:00 PM
  • Also of a QUAD core box maxdop to 4?
    Thursday, February 24, 2011 7:37 PM