none
SQL Server failed with error code 0xc0000000 to spawn a thread.......Error RRS feed

  • Question

  • I am looking at the sql error log and once in a while it is giving eror "SQL Server failed with error code 0xc0000000 to spawn a thread to process a new login
    or connection. Check the SQL server log and windows event logs for information about possible related problems [CLIENT: 10.33.X.X]

    The client IP here is the server where SQL Server resides. On the soruce column of Log File Viewer , I see "Logon"

    When I looked in the windows events logs I don't see much information either.

    I am not sure why sql server is writing this error log once in a while (Multiple times per day).  This database server has 1041 database (Ranging from few MB upto 4 GB) and sql server verision is SQL 2008 R1 SP1

    Monday, January 31, 2011 7:39 PM

All replies

  • You are experiencing worker thread starvation in the instance.  Look at the THREADPOOL wait type in sys.dm_os_wait_stats and it should have a high number of waiting tasks and duration.  What is the server hardware specification?  #CPU's specifically and is it 32 bit or 64 bit?

    select cpu_count, hyperthread_ratio, max_workers_count, scheduler_count
    from sys.dm_os_sys_info
    
    select *
    from sys.dm_os_wait_stats
    where wait_type = 'THREADPOOL'
    

    You can also look at the connectivity ring buffer and see it in the LoginTaskEnqueuedInMilliseconds output.

    SELECT 
    	record.value('(Record/@id)[1]', 'int') as id,
    	record.value('(Record/@type)[1]', 'varchar(50)') as type,
    	record.value('(Record/ConnectivityTraceRecord/RecordType)[1]', 'varchar(50)') as RecordType,
    	record.value('(Record/ConnectivityTraceRecord/RecordSource)[1]', 'varchar(50)') as RecordSource,
    	record.value('(Record/ConnectivityTraceRecord/Spid)[1]', 'int') as Spid,
    	record.value('(Record/ConnectivityTraceRecord/SniConnectionId)[1]', 'uniqueidentifier') as SniConnectionId,
    	record.value('(Record/ConnectivityTraceRecord/SniProvider)[1]', 'int') as SniProvider,
     record.value('(Record/ConnectivityTraceRecord/OSError)[1]', 'int') as OSError,
     record.value('(Record/ConnectivityTraceRecord/SniConsumerError)[1]', 'int') as SniConsumerError,
     record.value('(Record/ConnectivityTraceRecord/State)[1]', 'int') as State,
    	record.value('(Record/ConnectivityTraceRecord/RemoteHost)[1]', 'varchar(50)') as RemoteHost,
    	record.value('(Record/ConnectivityTraceRecord/RemotePort)[1]', 'varchar(50)') as RemotePort,
    	record.value('(Record/ConnectivityTraceRecord/LocalHost)[1]', 'varchar(50)') as LocalHost,
    	record.value('(Record/ConnectivityTraceRecord/LocalPort)[1]', 'varchar(50)') as LocalPort,
    	record.value('(Record/ConnectivityTraceRecord/RecordTime)[1]', 'datetime') as RecordTime,
    	record.value('(Record/ConnectivityTraceRecord/LoginTimers/TotalLoginTimeInMilliseconds)[1]', 'bigint') as TotalLoginTimeInMilliseconds,
    	record.value('(Record/ConnectivityTraceRecord/LoginTimers/LoginTaskEnqueuedInMilliseconds)[1]', 'bigint') as LoginTaskEnqueuedInMilliseconds,
    	record.value('(Record/ConnectivityTraceRecord/LoginTimers/NetworkWritesInMilliseconds)[1]', 'bigint') as NetworkWritesInMilliseconds,
    	record.value('(Record/ConnectivityTraceRecord/LoginTimers/NetworkReadsInMilliseconds)[1]', 'bigint') as NetworkReadsInMilliseconds,
    	record.value('(Record/ConnectivityTraceRecord/LoginTimers/SslProcessingInMilliseconds)[1]', 'bigint') as SslProcessingInMilliseconds,
    	record.value('(Record/ConnectivityTraceRecord/LoginTimers/SspiProcessingInMilliseconds)[1]', 'bigint') as SspiProcessingInMilliseconds,
    	record.value('(Record/ConnectivityTraceRecord/LoginTimers/LoginTriggerAndResourceGovernorProcessingInMilliseconds)[1]', 'bigint') as LoginTriggerAndResourceGovernorProcessingInMilliseconds,
    	record.value('(Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsInputBufferError)[1]', 'int') as TdsInputBufferError,
    	record.value('(Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsOutputBufferError)[1]', 'int') as TdsOutputBufferError,
    	record.value('(Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsInputBufferBytes)[1]', 'int') as TdsInputBufferBytes,
    	record.value('(Record/ConnectivityTraceRecord/TdsDisconnectFlags/PhysicalConnectionIsKilled)[1]', 'int') as PhysicalConnectionIsKilled,
    	record.value('(Record/ConnectivityTraceRecord/TdsDisconnectFlags/DisconnectDueToReadError)[1]', 'int') as DisconnectDueToReadError,
    	record.value('(Record/ConnectivityTraceRecord/TdsDisconnectFlags/NetworkErrorFoundInInputStream)[1]', 'int') as NetworkErrorFoundInInputStream,
    	record.value('(Record/ConnectivityTraceRecord/TdsDisconnectFlags/ErrorFoundBeforeLogin)[1]', 'int') as ErrorFoundBeforeLogin,
    	record.value('(Record/ConnectivityTraceRecord/TdsDisconnectFlags/SessionIsKilled)[1]', 'int') as SessionIsKilled,
    	record.value('(Record/ConnectivityTraceRecord/TdsDisconnectFlags/NormalDisconnect)[1]', 'int') as NormalDisconnect,
    	record.value('(Record/ConnectivityTraceRecord/TdsDisconnectFlags/NormalLogout)[1]', 'int') as NormalLogout
    FROM
    (	SELECT CAST(record as xml) as record
    	FROM sys.dm_os_ring_buffers
    	WHERE ring_buffer_type = 'RING_BUFFER_CONNECTIVITY') as tab
    
    
    

    Unfortunately your only option is going to scale out and split the load across multiple SQL Servers.  You can try to scale up to a much larger server with many more CPU's, but the increase in workers for additional CPU's is not going to be as significant as you'd think, and just increasing the number of worker threads is not a solution to the problem either as discussed by a Robert Davis, a Microsoft Certified Master for SQL Server, in his blog post:

    http://www.sqlservercentral.com/blogs/robert_davis/archive/2010/05/07/The-Toilet-Analogy-_2620_-or-Why-I-Never-Recommend-Increasing-Worker-Threads.aspx

    Some other items you could look at is the number of tasks and workers currently in the system:

    select COUNT(*)
    from sys.dm_os_workers
    
    select COUNT(*)
    from sys.dm_os_tasks

    This will tell you along with the other information what your problems are.


    Jonathan Kehayias | Senior Database Administrator and Consultant
    Feel free to contact me through My Blog or Twitter
    Please click the Mark as Answer button if a post solves your problem!

    Tuesday, February 1, 2011 4:23 AM
    Moderator
  • Check the error logs for messages starting with the text "A significant part of sql server process memory has been paged out...".  If you find that, it the 'failed to spawn a thread' message is misleading you: your problem is a consequence of Windows paging SQL Server out to disk, and has nothing to do with the number of worker threads available.

    If this is 64-bit SQL Server, I almost always recommend setting lock pages in memory and a sensible value for the maximum size of the buffer pool (using the max server memory option).  If you are on 2008 Standard Edition, you will need to be at Cumulative Update 2 for SP1 and use start-up Trace Flag 845.

    Using locked pages will prevent Windows paging SQL Server out to disk, but you should be careful to determine why SQL Server was being paged out to begin with.  Often it is because the default max server memory setting is being used, and there are other processes on the server that require memory.

    For more details see:

    http://support.microsoft.com/kb/918483

    http://blogs.msdn.com/b/psssql/archive/2009/09/11/fun-with-locked-pages-awe-task-manager-and-the-working-set.aspx

    http://blogs.msdn.com/b/psssql/archive/2009/05/19/an-update-for-standard-sku-support-for-locked-pages.aspx

    Tuesday, February 1, 2011 7:15 AM
  • Thanks Jonathan,

    This server is 64 bit windows 2008 box with sql server 2008 R1 SP1 installed. It has 2GB of memory and 1 CPU. This server has 1040 databases.

    select cpu_count, hyperthread_ratio, max_workers_count, scheduler_count
    from sys.dm_os_sys_info

    cpu_count hyperthread_ratio max_workers_count scheduler_count
    1 1 512 1

     select *
    from sys.dm_os_wait_stats
    where wait_type = 'THREADPOOL'

    wait_type waiting_tasks_count wait_time_ms max_wait_time_ms signal_wait_time_ms
    THREADPOOL 2949698 13591716459 23851 830

    select COUNT(*)
    from sys.dm_os_workers
    518
    select COUNT(*)
    from sys.dm_os_tasks
    28

     For the LoginTaskEnqueuedInMilliseconds output I am seeing most of the rows returned as NULL (there are 1024 rows returned from the query). But Some rows I see a value 10943, 14466, 18860,57, 588 etc

     

    Tuesday, February 1, 2011 3:35 PM
  • To be honest, if it's one-off error, it's really hard to diagnose sometime. If it's repeating, you should raise the concern.

    You probably only had seen this once? If yes, I'd say you can safely ignore it.


    Sevengiants.com
    Tuesday, February 1, 2011 4:38 PM
  • I got the email notification that you replied, but I can't see the reply on the forums currently so I am just going to reply to my last post with regards to the email notification I received. 

    With 1 CPU and 2GB RAM and 1040 databases I am surprised that the server is even online and that you aren't having significantly more problems than this.  I was expecting that your server might be under powered hardware wise, but not to this extent.  The only way that you are going to resolve this issue is to scale up and/or scale out your server and I am not talking about just to two or four CPU's, you should be thinking dual quad/hexa/octa cores to have enough worker threads to handle the load, and/or multiple SQL Servers to split out the databases to balance the load.


    Jonathan Kehayias | Senior Database Administrator and Consultant
    Feel free to contact me through My Blog or Twitter
    Please click the Mark as Answer button if a post solves your problem!
    Tuesday, February 1, 2011 6:30 PM
    Moderator
  • To be honest, if it's one-off error, it's really hard to diagnose sometime. If it's repeating, you should raise the concern.

    You probably only had seen this once? If yes, I'd say you can safely ignore it.


    Sevengiants.com


    It is very obvious from the outputs provided that this is a repeating problem, and it is also going to be signficantly impacting performance in general.  The system had 2949698 waits on ThreadPool reseources.  That means that there was work to be done, but no workers available to be assigned the task.  You can not safely ignore anything about the information provided in the response to my initial questions.  Actually, you could, but your system is not going to perform very well, and the problems will just keep growing until it stops working altogether.


    Jonathan Kehayias | Senior Database Administrator and Consultant
    Feel free to contact me through My Blog or Twitter
    Please click the Mark as Answer button if a post solves your problem!
    Tuesday, February 1, 2011 7:17 PM
    Moderator
  • Its not one-off. I am seeing this multiple time (15 times during 24 hours)
    Tuesday, February 1, 2011 8:12 PM
  • Its not one-off. I am seeing this multiple time (15 times during 24 hours)

    It is very hard to make sense of the information you have provided so far, possibly because you haven't given very much background information.  On the face of it, there are two main options:

    1.  The server has a very large number of databases, but they are infrequently used, and the server generally only handles a very light load (you might have configured each of the databases with AUTO_CLOSE to release resources when a database in not in use, for example).  Perhaps the SQL Server is also running in a virtual machine...you did not say.

    Anyway, this is about the only scenario I can think of where I might configure SQL Server with only one CPU and 2GB RAM, when it has so many databases to look after.

    The THREADPOOL waits might then be a result of some infrequent activity by some application or utility that tries to perform a concurrent activity in each of the databases - SQL Server does not have enough threads to run even one task in each database at the same time - and the lack of available worker threads might cause logins (associated with the normal low-level activity) to fail.

    If the SQL Server generally runs quite happily (i.e. not thrashing the disks and flat-lining at 100% CPU) you might only need to look at whatever specific activity is causing the sudden spikes in concurrent workload, and take whatever action is necessary to spread the number of concurrently-executing batches over time a bit more.

    Running such a lowly-resourced SQL Server with over a thousand databases is very unusual, so I am very much hoping there a special considerations that mean this makes sense for you.

    2.  The second option is that the SQL Server has just been hopelessly under-resourced for a really demanding workload.

    ...

    Which of these two scenarios most closely matches your environment?

    Paul

    Wednesday, February 2, 2011 6:36 AM
  • Open your SSMS activity monitor, give us the resource waits tab details.

    Let's see what's the highest waits?

     

    Sorry, I didnt notice that you have 1041 databases on the same server, how many users do you have co-currently?

     

    Also, you might want to have a cyclic job - kill idle connections - to maintain a good resource pool given the very limited physical resource you got - 1 CPU and 2G RAM.

    BTW, whats your physical server spec? are you using affinity and AWE?

     


    Sevengiants.com
    Thursday, February 3, 2011 3:53 PM
  • Open your SSMS activity monitor, give us the resource waits tab details.

    Let's see what's the highest waits?

     

    Sorry, I didnt notice that you have 1041 databases on the same server, how many users do you have co-currently?

     

    Also, you might want to have a cyclic job - kill idle connections - to maintain a good resource pool given the very limited physical resource you got - 1 CPU and 2G RAM.

    BTW, whats your physical server spec? are you using affinity and AWE?

     


    Sevengiants.com

    I don't know how to respond to this except to ask why?  Why would you recommend that someone open Activity Monitor and add load onto a server that is already worker starved and having issues when the TSQL queries I provided provide more than enough information about the problem?  Why would you ask about affinity and AWE on a single CPU system with 2GB RAM where neither have any application at all?  Why would you recommend a job to kill connections which is not only bad, but likely to cause other problems and flood the ErrorLogs with killed connection messages? 
    Jonathan Kehayias | Senior Database Administrator and Consultant
    Feel free to contact me through My Blog or Twitter
    Please click the Mark as Answer button if a post solves your problem!
    Thursday, February 3, 2011 4:05 PM
    Moderator
  • Er...

     

    1. use Activity Monitor is the easier way for a user not DBA to get information out and he can understand as well in my opinion, and according to him, the system is not very loaded (15 times during 24 hours), use Activity Monitor will give a much better full picture in terms of CPU, waits, requests/sec, etc.. which will be great helpful for us to understand his system.

    Next, Yes, agreed, there are quite a log of threadpool waits, but we dont know how long the system had been up and running, for a month? or longer?

    2. I think I missed the part he mentioned about 64bit system.

    3. why do you think kill idle connections is bad? I can't agree that. Of course, he has to think about and test before make any change in his production. If he can mange it well, would it be a possible way to reduce un-necessary resource usage?


    Sevengiants.com
    Thursday, February 3, 2011 4:34 PM
  • In addition to what Jonathan and Paul have already stated, looking at total waits is not a good methodology for resolving a specific issue. You need to look at the waits being actively experienced while the problem is occurring or at the very least, you need to calculate deltas for the waits and look at how they are changing.

    In my opinion, I would never advocate using a GUI to troubleshoot a system that is under stress because it adds stress to the system and has an effect on what you are tryign to measure. It's like using a heated thermometer to check the temperature of water.



    My blog: SQL Soldier
    Twitter: @SQLSoldier
    Microsoft Certified Master: SQL Server 2008
    My book: Pro SQL Server 2008 Mirroring
    Thursday, February 3, 2011 4:48 PM
  • Next, Yes, agreed, there are quite a log of threadpool waits, but we dont know how long the system had been up and running, for a month? or longer?

    It really doesn't matter how long the systems been up based on the error message that started this post, the high number of thread pool waits, the number of workers that exist in SQL, and the connectivity ring buffer output showing multiple connections that had to wait 10+ seconds to be bound to a worker thread to process the logon attempt. The bigger picture is showing that the hardware can't handle the demand and the solution is to go to a bigger server with more CPU's or multiple servers with more CPU's to spread the load out.

    why do you think kill idle connections is bad? I can't agree that. Of course, he has to think about and test before make any change in his production. If he can mange it well, would it be a possible way to reduce un-necessary resource usage?

    Why do you think killing idle connections is going to help here? A connection doesn't use a worker thread unless it is executing a task. Killing connections from SQL is hardly the correct solution to any scenario where connection exist and are idle. The correct solution is to back down the number connection pool size in the application, or if you don't want it to pool connections at all, turn pooling off.  I have a SQL Server 2000 32 bit cluster with 4CPU's and 255 worker threads that has 2882 connections to it currently and it is working just fine.  The problem is not the number of connections to the server, its the number of concurrently executing requests and the threads that they are consuming.


    Jonathan Kehayias | Senior Database Administrator and Consultant
    Feel free to contact me through My Blog or Twitter
    Please click the Mark as Answer button if a post solves your problem!
    Thursday, February 3, 2011 5:04 PM
    Moderator
  • Er...

     

    1. use Activity Monitor is the easier way for a user not DBA to get information out and he can understand as well in my opinion, and according to him, the system is not very loaded (15 times during 24 hours), use Activity Monitor will give a much better full picture in terms of CPU, waits, requests/sec, etc.. which will be great helpful for us to understand his system.

    Next, Yes, agreed, there are quite a log of threadpool waits, but we dont know how long the system had been up and running, for a month? or longer?

    2. I think I missed the part he mentioned about 64bit system.

    3. why do you think kill idle connections is bad? I can't agree that. Of course, he has to think about and test before make any change in his production. If he can mange it well, would it be a possible way to reduce un-necessary resource usage?


    Sevengiants.com

    I appreciate the fact that you were intending to be helpful here, but suggesting affinity (for a single logical processor SQL Server), AWE (for a 64-bit server), Activity Monitor and KILL...well let's just say it helps to read the question carefully :)  By my reckoning, you missed the number of CPUs, architecture, and number of databases ;c)

    As Jonathan says, idle connections don't use workers (though they do consume other resources) and KILL is never the right way to fix this problem.  How do you know the connection isn't just about to send a batch when you KILL it?  What happens when the session hangs in an odd state after the kill requiring a reboot of the server?  And so on, and so on.

    I am not having a pop at you - we've all misread questions in our time - I'm just explaining why you got the responses you did.

    On a separate note, I'm guessing that the answer to my previous question is 'option 2'.  Ah well.

    Thursday, February 3, 2011 5:15 PM
  • Guys,

    Everyone has his own opinion, everyone comes in different angle. I appreciate that pointing out the information I missed in this thread earlier.

    I also full understand about to be problem specific, however sometime look at bigger picture can give you much better idea.

     

    1. why look at CPU, activity monitor?

    - Assuming we only have 1 core, assuming user is not using lightweight pooling fibers connection, 1 work threader will be tight to one windows thread. Now whether CPU utilization can tell us something? If it is always 100% or at very high level, would you think there could be very bad query which is taking all CPU all the time? Why would I related it? like we've found there is max 512, and we are already 518? Have you looked at task queue? would it because one long running task took all CPU and everything else is queue and due to massive tasks queue, all work threads had been allocated?

    - from the activity monitor, we can see the all resource waits. if we see what are the other waits, if the system pagelatch is very slow, each task would take longer to complete, queue is going to be longer?

     

    2. Why kill idle connections?

    http://msdn.microsoft.com/en-us/library/ms189267.aspx

    In theory, system should free work threader at task finishing, however it might be at some extreme cases, idle connections may still hold some worker threads?

    Also, please note every connection would need its own memory which even very small overhead, but if there are too many, it will still cause smaller data cache or potential more IOs?

    http://msdn.microsoft.com/en-us/library/aa337559%28SQL.90%29.aspx

    3. why affinity?

    Yes, from DMV, we can see only one CPU, nowadays, people always have dual-core? quad-core CPU? How would you so sure about there is only one CORE/CPU for the whole system?

     

    Just a few thoughts to share, I am not going to say I am definitely right, but just a few ideas.

     

    Thanks


    Sevengiants.com
    Friday, February 4, 2011 11:28 AM
  • Now whether CPU utilization can tell us something? If it is always 100% or at very high level, would you think there could be very bad query which is taking all CPU all the time? Why would I related it? like we've found there is max 512, and we are already 518?

    Even if there was a poorly performing query that was utilizing 100% CPU, it can only execute the scheduler for the duration of its quantum before having to yield the scheduler to allow other tasks to execute. If this didn't happen, you would get non-yielding scheduler warnings and a memory dump exception. The additional threads above the 512 max worker thread value are the threads used for the system processes listed in the SQL Server Batch or Task Scheduling BOL Topic you referenced (http://msdn.microsoft.com/en-us/library/ms189267.aspx). The fact the number of workers is so high could be attributed to a couple of things, but the same BOL topic points out:

    The pool of worker threads only has to be large enough to service the number of user connections that are executing batches at the same time in that instance.

    so the high number of workers basically says that the system is under high concurrent requests. Yes it is possible that the fact that there is only 1 scheduler to do all the work is driving the concurrent request count up, but that goes back to my recommendation to scale the hardware up, or scale the databases out, the only way to fix this problem.

    Have you looked at task queue? would it because one long running task took all CPU and everything else is queue and due to massive tasks queue, all work threads had been allocated?

    There were only 28 tasks on the server at the time the original poster ran the queries that I provided, but it could and would certainly be higher when the problem occurs. Each task has a quantum for execution on the scheduler before it moves to the waiter list allowing other tasks to execute for their quantum. Now the fact that there is only one scheduler means that more tasks would be following the execution model moving from running->yield->wait for quantum renewal->runnable->running because there is only one scheduler and it has to handle all of the activity, and the physical CPU may also have to context switch to handle other operations of the OS and server.

    from the activity monitor, we can see the all resource waits.

    You can see this from sys.dm_os_wait_stats and you can see active waiters in sys.dm_os_waiting_tasks without adding the additional load that Activity Monitor adds to the system. If you haven't done so already, run a trace against a test system and fire up Activity Monitor to see what all its doing. Specifically, look at the temporary stored procedure that gets created in tempdb to collect the wait statsistics and waiting task information from SQL Server. You can get more information with less impact just hitting the DMV's directly.

    2. Why kill idle connections?

    http://msdn.microsoft.com/en-us/library/ms189267.aspx

    In theory, system should free work threader at task finishing, however it might be at some extreme cases, idle connections may still hold some worker threads?

    The BOL topic you referenced above already pointed out that a worker is bound to a connection when it is executing a batch and not bound otherwise.  A connection that is not executing doesn't hold on to worker threads, if it did, that would be a bug in SQL Server and lead to system instability.

    Also, please note every connection would need its own memory which even very small overhead, but if there are too many, it will still cause smaller data cache or potential more IOs?

    http://msdn.microsoft.com/en-us/library/aa337559%28SQL.90%29.aspx

    Have you actually worked out the math for what the memory footprint of a connection would be? Lets assume that the default network packet size of 8K is being used by all of the connections to a SQL Server, and there are 1000 connections. That is 1000*((3*8000)+94) or 24094000 bytes of memory. That is only 23MB of memory for 1000 connections, which is not going to make a bit of difference in cache or IO.

    3. why affinity?

    Yes, from DMV, we can see only one CPU, nowadays, people always have dual-core? quad-core CPU? How would you so sure about there is only one CORE/CPU for the whole system?

    In my first set of queries I grabbed two pieces of information that validate this is a single core single socket CPU setup. In sys.dm_os_sys_info I pulled back the cpu_count which is the number of logical processors seen by Windows and SQL Server, and hyperthread_ratio which is the number of logical processors in a single physical socket. Both returned 1 so there is no way it has multiple cores available to it.


    Jonathan Kehayias | Senior Database Administrator and Consultant
    Feel free to contact me through My Blog or Twitter
    Please click the Mark as Answer button if a post solves your problem!
    Friday, February 4, 2011 3:18 PM
    Moderator
  • 1. why look at CPU, activity monitor?

    - Assuming we only have 1 core, assuming user is not using lightweight pooling fibers connection, 1 work threader will be tight to one windows thread. Now whether CPU utilization can tell us something? If it is always 100% or at very high level, would you think there could be very bad query which is taking all CPU all the time? Why would I related it? like we've found there is max 512, and we are already 518? Have you looked at task queue? would it because one long running task took all CPU and everything else is queue and due to massive tasks queue, all work threads had been allocated?

    - from the activity monitor, we can see the all resource waits. if we see what are the other waits, if the system pagelatch is very slow, each task would take longer to complete, queue is going to be longer?

    There's nothing wrong with looking at things like CPU utilization (as I suggested earlier - perhaps you missed that?) or waiting tasks...the point everyone was making is that Activity Monitor is the wrong tool for that job.  Using facilities like sys.dm_os_waiting_tasks will have much less impact on the system than running AM.

    2. Why kill idle connections?

    http://msdn.microsoft.com/en-us/library/ms189267.aspx

    In theory, system should free work threader at task finishing, however it might be at some extreme cases, idle connections may still hold some worker threads?

    Also, please note every connection would need its own memory which even very small overhead, but if there are too many, it will still cause smaller data cache or potential more IOs?

    http://msdn.microsoft.com/en-us/library/aa337559%28SQL.90%29.aspx

    The first reference you cite does not seem relevant to your point at all.  A worker thread is either assigned to a task, or it is not.  Once that task finishes, the worker will pick up another task, or wait for one.  Idle connections are by definition idle (no tasks!) so cannot be tying up a worker.

    As far as your point about memory is concerned, again that point has already been made (perhaps you missed it).  The correct solution, however, is to address the cause of the problem, not the effect: KILLing connections can destabilize SQL Server and is not recommended.  Fix the cause.

    3. why affinity?

    Yes, from DMV, we can see only one CPU, nowadays, people always have dual-core? quad-core CPU? How would you so sure about there is only one CORE/CPU for the whole system?

    Because Sujn posted output from sys.dm_os_sys_info showing a single scheduler.  Did you miss that?

    Until we hear back from Sujn, there does not seem to be much value in pursuing the debate further.

    Paul

    Friday, February 4, 2011 3:36 PM
  • Joanthan,

    Of course the best solution is up-scale, no one doubts it.

    People have different ways of looking at problem, ultimately goal is trying to solve the problem to pin down the root cause.

    After all these long paragraphs, I believe Sujn should already have something to take away to look at his end. Let's wait and see what come out of our discussions.


    Sevengiants.com
    Friday, February 4, 2011 3:45 PM
  • Hi Jonathan,

    I have the same issue, even we just moved to a a windows 2008 server with 32GB memory and 8CPU.

    Thursday, March 1, 2012 10:51 AM