Monday, October 22, 2012 3:51 PM
We have a notification query that went into production that called a table with a column of type text which was a mistake and should be xml.
I have tried to change the datatype to xml but every time I do this I get cxpacket waits and the spid blocks itself on parallel opertaions. SQl reports statistics are missing I have tried updating statistics and manually adding them with no success. I am also unable to recreate the issue in our staging enviroment.The only difference between the two enviroments is the amount of data in that table instaging maybe around 1000 rows in produdction 500,000
Has anyone ran into this issue and found a resolution?
Wednesday, October 24, 2012 7:12 AMModerator
Hi ma.malatek,Thank you for your question.
I am trying to involve someone more familiar with this topic for a further look at this issue. Sometime delay might be expected from the job transferring. Your patience is greatly appreciated.
Thank you for your understanding and support.
Please remember to mark the replies as answers if they help and unmark them if they provide no help.
Wednesday, October 24, 2012 9:21 PMThank you very much. I look forward to your response
Monday, October 29, 2012 4:32 PM
I have not run into the exact issue, but some my prior troubleshooting experience with service broker may apply here.
Since Query notification uses Service Broker, its processes are probably hitting the table lot more in your production environment, than in Dev. I would recommend to put your database in single user mode and then try to change the column type. This has worked for me wherever altering service broker related properties causes the operation to hang up. Hope this helps.
Sanil Mhatre | Database Developer | MCTS | If you find my reply useful in any way, please vote it as helpful. If it has helped answer your question, please mark it as Answer. http://sqlwithsanil.com
Tuesday, October 30, 2012 7:07 PMModerator
CXPACKET is what one or more of a spid's worker threads are expected to wait upon, when they have asynchronously completed their work. The spid's worker threads in the CXAPCKET wait state are waiting for other of the spid's worker threads to asynchronously complete their work. The spid's waiting worker threads should be observed to be "blocked by" other worker threads (for the same spid) that are still running (not blocked). Using a sports analogy: When one person on a sports team is waiting for the other team members to complete their work, the waiting person is essentially in a CXPACKET wait type. The person waiting for (or being "blocked by") other team members isn't necessarily doing a "bad thing". It can be important to identify a spid's worker threads that are being not blocked. But given the ALTER TABLE is being run, I would expect the spid's runnable worker threads to themselves be more than occasionally caught waiting upon PAGEIOLATCH (I/O) waits. A sports analogy is not perfect, because there are certain sports (such as "synchronized swimming") where the goal is to harness parallelism in a synchronous manner (i.e. SQL Server's behavior would be considered a "bad thing"). Luckily, SQL Server doesn't follow or use "synchronized swimming" rules :). In essence, the CXAPCKET wait time is telling you a statement is expensive, and is telling you SQL Server is trying to reduce that statement's run time by harnessing parallelism. The spid's worker threads waiting upon CXPACKET may have wait_resource that could shed some additional light (such as waiting for the root page of on index). Given what the ALTER TABLE is doing, any identified wait_resources may be inevitable.
If you are seeing hundreds of worker threads waiting on CXPACKET for ALTER TABLE, and if those hundreds of worker threads were consuming a significant portion of your system's total worker threads, you may want to be ALTER TABLE's "referee" by saying "too many players on the field", which technically is handled by using the MAXDOP hint. However, if you decide to go to the extreme of MAXDOP 1, suddenly your "team" has become a single person sport, and the time it takes to progress ALTER TABLE can become longer than the time it takes for the former team to carry forward (albeit with less players on the field). SQL Server does consider how many worker threads are available when the parallel plan is run, and SQL Server's choice to use parallelism is based upon the statement's (in this situation, ALTER TABLE's) underlying statistics. When considering/comparing parallelism between systems, it would not be meaningful or useful to compare two different systems with a different number of CPUs, nor should two different systems with different concurrent client activity be compared (concurrent client activity impacts the availability of other worker threads) nor would it be useful to compare two different systems with different underlying statistics (i.e. row counts). A 500:1 difference in row counts may not scale upwards in a linear manner: Larger row counts may need log transaction file growths while the smaller table avoids the grwoths. I am unsure what statistical update syntax was chosen, but it should not be necessary to update statistics with fullscan for an ALTER TABLE statement. Even without updating statistics , SQL Server knows ALTER TABLE will be touching every row and every leaf level page, which (given a half million records) is destined to have a significant cost (and thus go parallel, whether or not statistics are full sampled/up to "date").
It would be best to consider the sp_spaceused by the table, a trace of what SQL Server is doing, while considering that ALTER TABLE must be executed in an atomic, consistent, independent and durable (i.e. a logged) manner. Further details and potential optimizations are offered within http://blogs.msdn.com/b/sqlcat/archive/2006/03/01/541550.aspx and http://blogs.msdn.com/b/sqlcat/archive/2006/03/31/566046.aspx (those blogs' basic principles remain unchanged in later versions of SQL Server). All this said, I am not sure I have the complete understanding of the issue at-hand, for which polls of sys.dm_exec_requests and sys.dm_os_tasks, and/or a trace should assist (see two blog articles and BOL for further details ). Given the text column, I suspect query notification was not functional. If left enabled, I think it would be best to avoid query notification while the ALTER TABLE is running.
Tuesday, October 30, 2012 9:57 PMI guess I should have clarified the issue is not running the alter statement that works just fine. It is when I enable the queue and a notification happens that I start seeing the cxpacket waits. So I disable the queue before making the change alter the table and then enable the queue. A notification happens and system crawls because of 100's of cxpacket waits.
Wednesday, November 21, 2012 2:53 PMModerator
You can determine what requests are running in parallel by using the following query:
max(isnull(exec_context_id, 0)) as number_of_workers,
join sys.dm_os_tasks t on r.session_id = t.session_id
join sys.dm_exec_sessions s on r.session_id = s.session_id
s.is_user_process = 0x1
having max(isnull(exec_context_id, 0)) > 0
With the above information, you can retrieve the text of the query by using sys.dm_exec_sql_text, and you can retrieve the plan by using sys.dm_exec_cached_plan.
Once a spids' number of worker threads, statement,, and plan are known, it should be possible to determine why parallelism (as expressed by the CXPACKET wait) was found useful.
If there is a need to consider why CXAPCKET waits are accumulated, it will by revealed by the exec_context_id's which have wait_type's that are not CXAPCKET.