The deadlock between RangeS-S and RangeI-N
-
Monday, November 24, 2008 4:01 AM
Hi,
I have moitored a deadlock issue,
But I can't understand why the below occur the deadlock?
One mode is: RangeS-S, the other is: RangeI-N.
So could you help me explain this scenario why occur the deadlock?
Thanks very much!
koukai
---------------------------------------------------------------------------------------------------------------------------------------------------------------
<deadlock-list>
<deadlock victim="process90aa78">
<process-list>
<process id="process90aa78" taskpriority="0" logused="0" waitresource="OBJECT: 9:1567500813:0 " waittime="46" ownerId="485513" transactionname="user_transaction" lasttranstarted="2008-11-14T19:31:16.347" XDES="0xd0d2040" lockMode="IX" schedulerid="1" kpid="4784" status="suspended" spid="63" sbid="0" ecid="0" priority="0" transcount="2" lastbatchstarted="2008-11-14T19:31:16.347" lastbatchcompleted="2008-11-14T19:31:16.347" clientapp=".Net SqlClient Data Provider" hostname="NCS-LIFENG" hostpid="2776" loginname="test" isolationlevel="serializable (4)" xactid="485513" currentdb="9" lockTimeout="4294967295" clientoption1="673316896" clientoption2="128056">
<executionStack>
<frame procname="xxxxx.dbo.P_IC_AUDIT_LOG_I" line="40" stmtstart="1822" stmtend="2330" sqlhandle="0x03000900b0d65c591167e5001c9a00000100000000000000">
INSERT INTO T_IC_AUDIT_LOG(
APP_ID,
TRANSACTION_ID ,
[USER_NAME] ,
LOG_FUNCTION ,
LOG_TIME ,
APP_VERSION
)
VALUES(
@v_app_id,
@p_transaction_id,
@p_user_name ,
@p_log_function ,
@p_log_time ,
@p_app_version
) </frame>
</executionStack>
<inputbuf>
Proc [Database Id = 9 Object Id = 1499256496] </inputbuf>
</process>
<process id="process998f28" taskpriority="0" logused="936" waitresource="KEY: 9:72057594137280512 (5702e611629e)" waittime="62" ownerId="485490" transactionname="user_transaction" lasttranstarted="2008-11-14T19:31:16.347" XDES="0xcc9e5e0" lockMode="RangeI-N" schedulerid="2" kpid="2684" status="suspended" spid="60" sbid="0" ecid="0" priority="0" transcount="2" lastbatchstarted="2008-11-14T19:31:16.347" lastbatchcompleted="2008-11-14T19:31:16.347" clientapp=".Net SqlClient Data Provider" hostname="NCS-LIFENG" hostpid="2776" loginname="test" isolationlevel="serializable (4)" xactid="485490" currentdb="9" lockTimeout="4294967295" clientoption1="673316896" clientoption2="128056">
<executionStack>
<frame procname="xxxxx.dbo.P_IC_AUDIT_LOG_I" line="40" stmtstart="1822" stmtend="2330" sqlhandle="0x03000900b0d65c591167e5001c9a00000100000000000000">
INSERT INTO T_IC_AUDIT_LOG(
APP_ID,
TRANSACTION_ID ,
[USER_NAME] ,
LOG_FUNCTION ,
LOG_TIME ,
APP_VERSION
)
VALUES(
@v_app_id,
@p_transaction_id,
@p_user_name ,
@p_log_function ,
@p_log_time ,
@p_app_version
) </frame>
</executionStack>
<inputbuf>
Proc [Database Id = 9 Object Id = 1499256496] </inputbuf>
</process>
</process-list>
<resource-list>
<objectlock lockPartition="0" objid="1567500813" subresource="FULL" dbid="9" objectname="xxxx.dbo.T_IC_AUDIT_LOG" id="lockcbc5340" mode="SIX" associatedObjectId="1567500813">
<owner-list>
<owner id="process998f28" mode="SIX"/>
</owner-list>
<waiter-list>
<waiter id="process90aa78" mode="IX" requestType="convert"/>
</waiter-list>
</objectlock>
<keylock hobtid="72057594137280512" dbid="9" objectname="xxxxx.dbo.T_IC_AUDIT_LOG" indexname="C_IC_AUDIT_LOG_PK" id="lockcbc7400" mode="RangeS-S" associatedObjectId="72057594137280512">
<owner-list>
<owner id="process90aa78" mode="RangeS-S"/>
</owner-list>
<waiter-list>
<waiter id="process998f28" mode="RangeI-N" requestType="wait"/>
</waiter-list>
</keylock>
</resource-list>
</deadlock>
</deadlock-list>
All Replies
-
Monday, November 24, 2008 7:47 AM
Anyone help?
Thanks & Regards
Koukai
-
Monday, November 24, 2008 8:48 AM
Hi,
One thing that pops out is that both spids/processes are running under the highest possible transaction isolation level; serializable - is that really necessary (READ COMMITTED is the default)? (this is why you are getting these range locks - you can read more about them in Books Online "Key Range Locking").
The transaction isolation level is set either on the component in the application layer or in T-SQL code.
/Elisabeth
-
Monday, November 24, 2008 9:00 AM
Hi Elisabeth,
Thanks for your kind reply.
BTW, I have something to ask your help more.
I know the 2 transaction are running under the serizable mode.( we have using the ado.net code, its default is seriable which is different from SQL Server default).
I have viewed the key range MSDN.
It shows if the 2 lock is RangeS-S and RangeI-N, SQL Server will think them conflicts with each other.
So
1) Why they raise the RangeS-S and RangeI-N lock?
2) Why conflication will lead the deadlock? Confliction == Deadlock??
Thanks in advance.
Regards
Koukai
-
Monday, November 24, 2008 10:40 AM
Hi,
The Range locks stems from the fact that you are in the Serializable isolation level. Is this stored procedure doing something more than just the INSERT? I am guessing it is because it doesn't make sense to wrap a single INSERT statement in a transaction. Also it looks like they have more than 1 transaction open each.
To simplify; there are two reasons you can get into a deadlock;
1. Two different queries are accesing two resources in "opposite direction" i.e.
T1
Begin Tx Begin Tx
Update tblA Insert tblB
Insert tblB Update tblA
This is probably not the case for you since the spids appear to be running the same stored procedure.
2. The execution plan of the statement - i.e. the access path SQL Server choose to find the qualifying rows and the indexes being updated, can also lead to deadlocks because of "opposite directions" being used.
Bart Duncan has written a step-by-step on how to resolve deadlocks (there is a Part 1, 2 and 3). I suggest that before you spend time on dissecting the execution plan, do review the transaction model you have chosen to ensure that the business requirements really calls for this isolation level. I think most people would agree that SERIALIZABLE (although default for COM+/.net transactions) is overkill.
I would even go further and suggest you handle the transaction in your stored procedure down in the database layer - in that way it will not span the network roundtrips to your business components.
http://blogs.msdn.com/bartd/archive/2006/09/09/Deadlock-Troubleshooting_2C00_-Part-1.aspx'
HTH
/Elisabeth
-
Monday, November 24, 2008 3:48 PMModerator
Koukai,
Make sure that your indexes have ALLOW_ROW_LOCKS = ON and ALLOW_PAGE_LOCKS = ON. You are pulling a FULL resource lock on your table which is an expensive escalation of locking, and generally I only see this when one of the above options is turned OFF.
<objectlock lockPartition="0" objid="1567500813" subresource="FULL" dbid="9" objectname="xxxx.dbo.T_IC_AUDIT_LOG" id="lockcbc5340" mode="SIX" associatedObjectId="1567500813">
The other thing to do would be to set your transaction isolation level inside your stored procedure or code call explicitly to get out of Serializable. You can do this by adding a SET TRANSACTION ISOLATION LEVEL command inline with your code. If you do this in a stored procedure, the isolation level is good for the duration of the stored procedures execution and then it reverts back to the default isolation level when the stored procedure terminates.
-
Tuesday, November 25, 2008 1:08 AM
Hi Elisabetch,
You are correct.
In my scenario, I have used the serilizable level of isolation. It is very strict as the MSDN says.
I have posted the 2 Procedure I have used, could you give me some explainion about my above question.
Why RangeS-S and RangeI-N occur deadlock?
-------------------------------------------------------P_IC_AUDIT_LOG_I------------------------------------
USE [XXXXXXX]
GO
/****** Object: StoredProcedure [dbo].[P_IC_AUDIT_LOG_I] Script Date: 11/17/2008 20:15:22 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GOCREATE PROCEDURE [dbo].[P_IC_AUDIT_LOG_I]
@p_app_name varchar(256),
@p_transaction_id nvarchar(50),
@p_user_name nvarchar(256),
@p_log_function nvarchar(256),
@p_log_time datetime,
@p_app_version nvarchar(20)
AS
/*
Module : Audit Trail
Author : xxxxxx
Date : 25 Apr 2006
Desc : Insert new audit log master record
Change Revision
-----------------------------------------------------
Date Author Remarks*/
SET NOCOUNT ON
SET XACT_ABORT ONBEGIN
DECLARE @v_app_id VARCHAR(50)
DECLARE @v_error INTEXEC @v_error = P_IC_APP_GET_ID @p_app_name, @v_app_id OUTPUT
IF @v_error <> 0 RETURN @v_error
DECLARE @v_count INT
SELECT @v_count = COUNT(*) FROM T_IC_AUDIT_LOG WHERE TRANSACTION_ID=@p_transaction_id;IF (@v_count >0 )
BEGIN
RETURN;
END
INSERT INTO T_IC_AUDIT_LOG(
APP_ID,
TRANSACTION_ID ,
[USER_NAME] ,
LOG_FUNCTION ,
LOG_TIME ,
APP_VERSION
)
VALUES(
@v_app_id,
@p_transaction_id,
@p_user_name ,
@p_log_function ,
@p_log_time ,
@p_app_version
)END
SET NOCOUNT OFF
SET XACT_ABORT OFF
--------------------------P_IC_APP_GET_ID-------------------USE [xxxxxxx]
GO
/****** Object: StoredProcedure [dbo].[P_IC_APP_GET_ID] Script Date: 11/17/2008 20:15:58 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GOCREATE PROCEDURE [dbo].[P_IC_APP_GET_ID]
@p_app_name NVARCHAR(256),
@p_app_id VARCHAR(50) OUTPUT
AS/*
Module : Security
Author : xxxxxDate : 30 Oct 2007
Desc : Get application id by application name. Throw exception if can not found.
Output : application id. If there is exceptions, the error number will return.
Change Revision
-----------------------------------------------------
Date Author Remarks*/
BEGIN
SET NOCOUNT ON
SET XACT_ABORT ONSELECT @p_app_id = APP_ID
FROM T_IC_APP
WHERE LOWERED_APP_NAME = LOWER(@p_app_name)
IF @@ROWCOUNT = 0
BEGIN
RAISERROR ('ApplicationNameNotExists',16,1)
RETURN @@ERROR
ENDSET NOCOUNT OFF
SET XACT_ABORT OFFEND
Regards
Koukai
-
Tuesday, November 25, 2008 1:14 AM
Hi Jonathan,
I have posted my used SQL above.
We have used the distributed architecture based on the MS Entlib 3.0. Any procedure and function can't include the transaction statement, we have control the transaction via the System.Transaction in the Application Layer.
One of the deadlock reason due to the isolation level is serializable, and the other may be due to the SQL.
I need your help to explain 2 points to my issue.
1) Why RangeS-S and RangeI-N confliction will lead to the deadlock?
2) Is my Procedure a poor performance or lead to the deadlock?
Thanks very much for your kind help.
Regards
Hong Hai
-
Tuesday, November 25, 2008 3:02 AMModerator
Koukai Chou wrote: One of the deadlock reason due to the isolation level is serializable, and the other may be due to the SQL.
I need your help to explain 2 points to my issue.
1) Why RangeS-S and RangeI-N confliction will lead to the deadlock?
2) Is my Procedure a poor performance or lead to the deadlock?
Hong,
They are both due to the isolation level.
http://msdn.microsoft.com/en-us/library/ms191272.aspx
You have one spid that takes a RangeS-S lock while the other takes the RangeI-N lock. Then the initial thread trys to get a RangeI-N lock, but is blocked and waits on the second spid, which then completes the task on the RangeI-N lock it has but needs a RangeS-S lock in the same transaction, so now you have your deadlock. This is generally an impossible scenario to reproduce testing, since it is a millisecond timing of the appropriate locks crossing. I wrote about a similar case in
Understanding the Anatomy of a Deadlock.
If you are controlling all isolation levels from the app, and you can't change from serializable, then your only recourse is to recode the app to catch the 1205 error that is thrown by SQL Server in the SqlException and resubmit the transaction. This will generally allow the transaction to succeed the second time.
Sharepoint does this for any deadlock it encounters, and it deadlocks around 20 times while doing index crawls in my environment due to a missing index, but support has decided not to fix it because there is no way to determine the impact of the index (Confused by that one? Good cause I am too.)
The point I am trying to make is that if you handle the deadlock in the application properly, it will not be a problem in most cases.
-
Tuesday, November 25, 2008 12:45 PM
Jonathan Kehayias wrote: Koukai Chou wrote: One of the deadlock reason due to the isolation level is serializable, and the other may be due to the SQL.
I need your help to explain 2 points to my issue.
1) Why RangeS-S and RangeI-N confliction will lead to the deadlock?
2) Is my Procedure a poor performance or lead to the deadlock?
Hong,
They are both due to the isolation level.
http://msdn.microsoft.com/en-us/library/ms191272.aspx
You have one spid that takes a RangeS-S lock while the other takes the RangeI-N lock. Then the initial thread trys to get a RangeI-N lock, but is blocked and waits on the second spid, which then completes the task on the RangeI-N lock it has but needs a RangeS-S lock in the same transaction, so now you have your deadlock. This is generally an impossible scenario to reproduce testing, since it is a millisecond timing of the appropriate locks crossing. I wrote about a similar case in
Understanding the Anatomy of a Deadlock.
If you are controlling all isolation levels from the app, and you can't change from serializable, then your only recourse is to recode the app to catch the 1205 error that is thrown by SQL Server in the SqlException and resubmit the transaction. This will generally allow the transaction to succeed the second time.
Sharepoint does this for any deadlock it encounters, and it deadlocks around 20 times while doing index crawls in my environment due to a missing index, but support has decided not to fix it because there is no way to determine the impact of the index (Confused by that one? Good cause I am too.)
The point I am trying to make is that if you handle the deadlock in the application properly, it will not be a problem in most cases.
Hi Jonathan,
I have read your mentioned articles already, inlcude the range locks matrixs in the MSDN.
You are correct.
For my project, I have resolution to resolve the issue.
1) Explictly set the isolation level to 'ReadCommitted'
2) Move the select count(*) from our procedure because it is invalid due to current app code
The table is the audit table, maybe I will remove the non-clustered index if not use query in the table.
Maybe you couldgive me some advice about this plucky idea(remove the clustered idx).
Finally I also have new deeply question to ask for your kind help.
From my pasted deadlock graph xml file, I have read carefully again based on your analysis in another article.
--------------------Fromh ttp://code.msdn.microsoft.com/SQLExamples/Wiki/View.aspx?title=DeadlockAnatomy&referringTitle=Home--------------------------
Keeping in mind that this is an overly simplistic example, what will happen to cause the deadlock is that two SPIDS
will start within microseconds of each other, one issuing the SELECT, and the other attempting to INSERT a new Row.
The SELECT immediately takes a shared lock on the NonClustered Index and the INSERT immediately takes a
Exclusive Lock on the Clustered Index. For the SELECT to complete, it will need to take a shared lock on the clustered
index to get the additional columns in the SELECT list, but it will be blocked. At the same time, to complete the INSERT
operation will require an Exclusive lock on the nonclustered indexes on the table, which is incompatible with the existing
shared lock held by the SELECT operation, so both processes will be blocked waiting on the other to release its lock.
This is where the deadlock occurs.---------------------
My table create statement is:
---------------------------------------------------------------------------------------------------------
USE [xxxxxxxx]
GO
/****** Object: Table [dbo].[T_IC_AUDIT_LOG] Script Date: 11/25/2008 20:32:44 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[T_IC_AUDIT_LOG](
[APP_ID] [varchar](50) NOT NULL,
[TRANSACTION_ID] [varchar](50) NOT NULL,
[USER_NAME] [nvarchar](256) NOT NULL,
[LOG_FUNCTION] [nvarchar](256) NULL,
[LOG_TIME] [datetime] NOT NULL,
[APP_VERSION] [nvarchar](20) NULL,
CONSTRAINT [C_IC_AUDIT_LOG_PK] PRIMARY KEY NONCLUSTERED
(
[TRANSACTION_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]GO
SET ANSI_PADDING OFF---------------------------------------------------------------------------------------------------------
In my graph, process has 2(process90aa78 & process998f28). and lock resource has 2 also,
One is the OBJECT lock(all), the other is the keylock. From the SQL Server lock matrix,
The lock mode of SIX conflicts with the lock mode of IX;
The Range lock mode of RangeS-S conflicts with the the range lock mode of RangeI-N.
1) So why SQL Server allow the latter wait(I mean SQL Engine allow IX wait for SIX, because they conflicts with each other)? Maybe you could explain about the event sequence about my sample, It is very helpful for me. Because in my table, don't have the clustered index. ( From the SIX to IX, RangeI-N to RangeS-S cross with the process and resource).
Like your article explaination.
Really, I can only describe the below few comments for the lock list and cant' contiue to explain the left event.
Maybe via your explaination will enhance my understanding on deadlock list(althogh this type occur little in the read environment).
-----------------------------------My explaination(only a few)----------------------------------
Due to isolation level is serializable
When execute the first statement(query), the first one will lock all the table(include the index), using the SIX, Really it doesn't execute the insert statement, the first one only want to prepare for the insert to use the exclusive lock in the table.
?? next..
----------------------------------------------------------------------------------------------------------------------
Thanks for your kind help!
Hong Hai
-
Tuesday, November 25, 2008 1:56 PM
Hi,
Don't forget there is also SELECT involved in the stored procedure that acquires and keeps the Shared locks until the end of the transaction (because this is what you are asking SQL Server to do when you set the isolation level to serializable). And one waiter is not waiting to acquire a lock; it is waiting for a conversion:
<waiter id="process90aa78" mode="IX" requestType="convert"/>
In 2005 locks are granted in a "relaxed FIFO" fashion, and e.g. state CONVERT has priority over nonconversion requests.
The only way for you to completely dissect this is to run the statement and capture the Lock Acquired and Lock Released events with Profiler.
/Elisabeth
-
Tuesday, November 25, 2008 2:02 PMModerator
Elisabeth Redei wrote: The only way for you to completely dissect this is to run the statement and capture the Lock Acquired and Lock Released events with Profiler.
Don't do that on a production SQL Server. It will bring even a mildly busy server to its knees. I am a graphical type person when it comes to these kinds of problems, so I am working on a visio diagram that I will put up on my blog in a few minutes that will make this make more sense. The issue is the select before insert as Elisabeth says above, and it is very similar to the other situation, only slightly different because you have a heap allocation. If you were to convert the nonclustered index on trasaction_id to a clustered index, which I don't know why you don't have it that way to begin with, the problems you are having should go away, and the diagrams will help explain this. I'll post the link in a little bit.
-
Wednesday, November 26, 2008 1:26 AM
Hi Jonathan,
Thanks!
When I set the isolation level from Serializable to Readcommitted, the new deadlock occurred, I have captured the deadlock event graph. This time may be due to this sql statement:
"SELECT @v_count = COUNT(*) FROM T_IC_AUDIT_LOG WHERE TRANSACTION_ID=@p_transaction_id; "
Below is the new deadlock event data.
Wait for your some comments.
Thanks!
Hong Hai
_________________________________________________________________________________________
<deadlock-list>
<deadlock victim="process998b68">
<process-list>
<process id="process90ac58" taskpriority="0" logused="492"
waitresource="PAGE: 9:1:31968" waittime="453" ownerId="1603549"
transactionname="user_transaction" lasttranstarted="2008-11-17T13:56:33.253"
XDES="0x15a7f750" lockMode="S" schedulerid="1" kpid="676" status="suspended"
spid="58" sbid="0" ecid="0" priority="0" transcount="1"
lastbatchstarted="2008-11-17T13:56:33.270" lastbatchcompleted="2008-11-17T13:56:33.270"
clientapp=".Net SqlClient Data Provider" hostname="NCS-XIONGWEI" hostpid="3680" loginname="test"
isolationlevel="read committed (2)" xactid="1603549" currentdb="9" lockTimeout="4294967295"
clientoption1="673316896" clientoption2="128056">
<executionStack>
<frame procname="xxxxx.P_IC_AUDIT_LOG_I"
line="32" stmtstart="1542"
stmtend="1722"
sqlhandle="0x03000900b0d65c591167e5001c9a00000100000000000000">
SELECT @v_count = COUNT(*) FROM T_IC_AUDIT_LOG WHERE TRANSACTION_ID=@p_transaction_id; </frame>
</executionStack>
<inputbuf>
Proc [Database Id = 9 Object Id = 1499256496] </inputbuf>
</process>
<process id="process998b68" taskpriority="0" logused="492" waitresource="PAGE: 9:1:8584"
waittime="453" ownerId="1603526" transactionname="user_transaction"
lasttranstarted="2008-11-17T13:56:33.253" XDES="0x16258040" lockMode="S"
schedulerid="2" kpid="7864" status="suspended" spid="57" sbid="0" ecid="0"
priority="0" transcount="1" lastbatchstarted="2008-11-17T13:56:33.270"
lastbatchcompleted="2008-11-17T13:56:33.270" clientapp=".Net SqlClient Data Provider"
hostname="NCS-XIONGWEI" hostpid="3680" loginname="test" isolationlevel="read committed (2)"
xactid="1603526" currentdb="9" lockTimeout="4294967295" clientoption1="673316896" clientoption2="128056">
<executionStack>
<frame procname="xxxx.P_IC_AUDIT_LOG_I" line="32" stmtstart="1542"
stmtend="1722" sqlhandle="0x03000900b0d65c591167e5001c9a00000100000000000000">
SELECT @v_count = COUNT(*) FROM T_IC_AUDIT_LOG WHERE TRANSACTION_ID=@p_transaction_id; </frame>
</executionStack>
<inputbuf>
Proc [Database Id = 9 Object Id = 1499256496] </inputbuf>
</process>
</process-list>
<resource-list>
<pagelock fileid="1" pageid="31968" dbid="9" objectname="xxx.T_IC_AUDIT_LOG" id="lockcb4c8c0" mode="SIX" associatedObjectId="72057594137280512">
<owner-list>
<owner id="process998b68" mode="SIX"/>
</owner-list>
<waiter-list>
<waiter id="process90ac58" mode="S" requestType="wait"/>
</waiter-list>
</pagelock>
<pagelock fileid="1" pageid="8584" dbid="9" objectname="xxxx.T_IC_AUDIT_LOG" id="lockd2d5e80" mode="IX" associatedObjectId="72057594137280512">
<owner-list>
<owner id="process90ac58" mode="IX"/>
</owner-list>
<waiter-list>
<waiter id="process998b68" mode="S" requestType="wait"/>
</waiter-list>
</pagelock>
</resource-list>
</deadlock>
</deadlock-list>
_________________________________________________________________________________________ -
Wednesday, November 26, 2008 1:38 AM
........... The issue is the select before insert as Elisabeth says above, and it is very similar to the other situation, only slightly different because you have a heap allocation.
If you were to convert the nonclustered index on trasaction_id to a clustered index, which I don't know why you don't have it that way to begin with, the problems you are having should go away, and the diagrams will help explain this. I'll post the link in a little bit.
Hi,
Append something,
1) only slightly different because you have a heap allocation.
Q: Could you tell me why different is the heap allocation?
a little complicated2)If you were to convert the nonclustered index on trasaction_id to a clustered index, which I don't know why you don't have it that way to begin with
Q: Do you mean we should drop the non-clustered index and create a clustered index instead?
And why occur this request type of Converting from non-clustered to clustered?
Thanks!!
-
Wednesday, November 26, 2008 1:51 AMModerator
Koukai Chou wrote: 1) only slightly different because you have a heap allocation.
Q: Could you tell me why different is the heap allocation? a little complicated
I actually was completing my write up on this when you posted:
Anatomy of a Deadlock - Part Deux
That should show why you have the problem, but if you need me to elaborate a bit more on it I can.
2)If you were to convert the nonclustered index on trasaction_id to a clustered index, which I don't know why you don't have it that way to begin with
Q: Do you mean we should drop the non-clustered index and create a clustered index instead?
And why occur this request type of Converting from non-clustered to clustered?
Have a look at the above post that shows the cross locking that is occuring between the heap allocation and the non-clustered index.
Warning - the following is all my opinion, and some may want to argue the points I am going to make, and they certainly are allowed their own opinions, but this is mine.
Heaps should not be allowed in SQL Server except through the use of an advanced option. There are numerous problems involved with heaps. The data is not stored in any logical order being the first one, so the database engine has to do table scans and RID lookups to find rows of data which is very inefficient. Every table should have a clustered index on it in most OLTP database systems. Rare cases exist where this rule does not apply like sliding window partitioning, and bulk loading stage tables that get truncated at the end of the loading process, and perhaps one or two other rare situations. Otherwise, every table should have a clustered index on it. Heaps have other problems like forwarded records, that cause excess unused space allocations for the table, where they would not exist with a clustered index.
Since your use of this table is not one of the specialized situations above, you should create a clustered index on it. It will also solve the deadlocking problem that you are having.
-
Thursday, December 04, 2008 12:58 PM
Koukai Chou wrote:
Hi Jonathan,Thanks!
When I set the isolation level from Serializable to Readcommitted, the new deadlock occurred, I have captured the deadlock event graph. This time may be due to this sql statement:
"SELECT @v_count = COUNT(*) FROM T_IC_AUDIT_LOG WHERE TRANSACTION_ID=@p_transaction_id; "
Below is the new deadlock event data.
Wait for your some comments.
Thanks!
Hong Hai
_________________________________________________________________________________________
<deadlock-list>
<deadlock victim="process998b68">
<process-list>
<process id="process90ac58" taskpriority="0" logused="492"
waitresource="PAGE: 9:1:31968" waittime="453" ownerId="1603549"
transactionname="user_transaction" lasttranstarted="2008-11-17T13:56:33.253"
XDES="0x15a7f750" lockMode="S" schedulerid="1" kpid="676" status="suspended"
spid="58" sbid="0" ecid="0" priority="0" transcount="1"
lastbatchstarted="2008-11-17T13:56:33.270" lastbatchcompleted="2008-11-17T13:56:33.270"
clientapp=".Net SqlClient Data Provider" hostname="NCS-XIONGWEI" hostpid="3680" loginname="test"
isolationlevel="read committed (2)" xactid="1603549" currentdb="9" lockTimeout="4294967295"
clientoption1="673316896" clientoption2="128056">
<executionStack>
<frame procname="xxxx.P_IC_AUDIT_LOG_I"
line="32" stmtstart="1542"
stmtend="1722"
sqlhandle="0x03000900b0d65c591167e5001c9a00000100000000000000">
SELECT @v_count = COUNT(*) FROM T_IC_AUDIT_LOG WHERE TRANSACTION_ID=@p_transaction_id; </frame>
</executionStack>
<inputbuf>
Proc [Database Id = 9 Object Id = 1499256496] </inputbuf>
</process>
<process id="process998b68" taskpriority="0" logused="492" waitresource="PAGE: 9:1:8584"
waittime="453" ownerId="1603526" transactionname="user_transaction"
lasttranstarted="2008-11-17T13:56:33.253" XDES="0x16258040" lockMode="S"
schedulerid="2" kpid="7864" status="suspended" spid="57" sbid="0" ecid="0"
priority="0" transcount="1" lastbatchstarted="2008-11-17T13:56:33.270"
lastbatchcompleted="2008-11-17T13:56:33.270" clientapp=".Net SqlClient Data Provider"
hostname="NCS-XIONGWEI" hostpid="3680" loginname="test" isolationlevel="read committed (2)"
xactid="1603526" currentdb="9" lockTimeout="4294967295" clientoption1="673316896" clientoption2="128056">
<executionStack>
<frame procname="xxxx.P_IC_AUDIT_LOG_I" line="32" stmtstart="1542"
stmtend="1722" sqlhandle="0x03000900b0d65c591167e5001c9a00000100000000000000">
SELECT @v_count = COUNT(*) FROM T_IC_AUDIT_LOG WHERE TRANSACTION_ID=@p_transaction_id; </frame>
</executionStack>
<inputbuf>
Proc [Database Id = 9 Object Id = 1499256496] </inputbuf>
</process>
</process-list>
<resource-list>
<pagelock fileid="1" pageid="31968" dbid="9" objectname="xxxx.T_IC_AUDIT_LOG" id="lockcb4c8c0" mode="SIX" associatedObjectId="72057594137280512">
<owner-list>
<owner id="process998b68" mode="SIX"/>
</owner-list>
<waiter-list>
<waiter id="process90ac58" mode="S" requestType="wait"/>
</waiter-list>
</pagelock>
<pagelock fileid="1" pageid="8584" dbid="9" objectname="xxxx.T_IC_AUDIT_LOG" id="lockd2d5e80" mode="IX" associatedObjectId="72057594137280512">
<owner-list>
<owner id="process90ac58" mode="IX"/>
</owner-list>
<waiter-list>
<waiter id="process998b68" mode="S" requestType="wait"/>
</waiter-list>
</pagelock>
</resource-list>
</deadlock>
</deadlock-list>
_________________________________________________________________________________________Hi Jonathan,
Disturb you again....

When I try to describle why occur deadlock when I changed the isolation level from "Serializable" to "ReadCommitted"(please note: the index is also a non-clustered index), I can't continue.
Because the deadlock chain show it is Page Block( the table, in the heap).
From the resource list,
<resource-list>
<pagelock fileid="1" pageid="31968" dbid="9" objectname="xxxxx.T_IC_AUDIT_LOG" id="lockcb4c8c0" mode="SIX" associatedObjectId="72057594137280512">
<owner-list>
<owner id="process998b68" mode="SIX"/>
</owner-list>
<waiter-list>
<waiter id="process90ac58" mode="S" requestType="wait"/>
</waiter-list>
</pagelock>
<pagelock fileid="1" pageid="8584" dbid="9" objectname="xxxx.dbo.T_IC_AUDIT_LOG" id="lockd2d5e80" mode="IX" associatedObjectId="72057594137280512">
<owner-list>
<owner id="process90ac58" mode="IX"/>
</owner-list>
<waiter-list>
<waiter id="process998b68" mode="S" requestType="wait"/>
</waiter-list>
</pagelock>
</resource-list>
The non-clurstered index has not been involved, am I right? Very confuzed.Could you kindly explain for the type deadlock? Very thanks!
Regards
Hong Hai -
Thursday, December 04, 2008 1:42 PMModeratorFirst can you explain why this is a heap, and why you don't convert the non-clustered index over to a clustered index on transactionid?
-
Thursday, December 04, 2008 11:25 PM
1) Heap: Table without clustered index
2) I only want to some underlying mechanism why occur deadlock to apply in the future project to avoid this type in other table, Only I know the mechanism, I could apply it.
-
Thursday, December 04, 2008 11:49 PMModerator
I am trying to understand why you have this as a heap, and why you don't have or want to create a clustered index on the table. To actually figure out why you get the deadlock, would require looking at information for the heap like whether you have forwarded records or not, and if so, how many? What is your reason for not having a clustered index? That is what I am asking? The problem initially wasn't that you were using Serializable Isolation as much as your table is a heap, as proven by deadlocking when you change the isolation level. Why don't you have a clustered index on this table?
As a heap, your data has not logical order on disk. It is stored in whatever order it was entered. A clustered index would store the data in a sorted order, and if created on the transaction_id column would solve this entire problem. The lack of a clustered index on the table is the root of your entire deadlock problem at this point, as I pointed out in my blog post and earlier in this thread.
EDIT:
What is the output of the following query when run inside your database?
Code SnippetSELECT
index_type_desc, page_count, avg_page_space_used_in_percent,avg_record_size_in_bytes
,forwarded_record_countFROM
sys.dm_db_index_physical_stats(db_id(),72057594137280512, null, null, 'detailed'); -
Friday, December 05, 2008 1:11 AM
When I run this statement, it will throw,
------------------------------------------------------------------------------------------
Arithmetic overflow error converting expression to data type int.
Please advice.
Thanks
-
Friday, December 05, 2008 1:20 AMModerator
Ok use the following then:
Code SnippetSELECT
index_type_desc, page_count, avg_page_space_used_in_percent, avg_record_size_in_bytes,forwarded_record_countFROM
sys.dm_db_index_physical_stats(db_id(),object_id('T_IC_AUDIT_LOG'), null, null, 'detailed');Apparently the associatedObjectId="72057594137280512" is actually the hobtid in the case I saw it. The table object_id is actually what needs to be in that field and the above code should get it. If not try substituting 1567500813 for the 72057594137280512 in the original query.
If you would please, stop ignoring the questions regarding why you won't create a clustered index on this table? You are asking for help, but you aren't answering the one question that is the key to understand this problem. We are hitting the limits of what is going to be able to be done on a forum to solve your problem. You aren't far from a recommendation to contact the PSS team and open a support case with Microsoft, or hiring a consultant to come on site to look at your problem. There is just to much information required to dig much further, than can be posted through a forum.
-
Friday, December 05, 2008 1:22 AM
Update
........have forwarded records or not, and if so, how many?
--- --- I am not sure how to reply u, maybe I am not sure about your question.
What is your reason for not having a clustered index?
-------We have used a architecture, include the front application and backend datababase, it includes some common table,
such as auditlog, codetable,authentication, in the original design, the index is non-clustered. I don't know why they set it as this. Sorry
-
Friday, December 05, 2008 1:36 AM
Hi,
I have tried the SQL statement and got the same answer.
SELECT index_type_desc, page_count, avg_page_space_used_in_percent,
avg_record_size_in_bytes,forwarded_record_count
FROM sys.dm_db_index_physical_stats(db_id(),1567500813, null, null, 'detailed');
HEAP 244 91.0920681986657 122.869 0
NONCLUSTERED INDEX 128 70.9152829256239 49 NULL
NONCLUSTERED INDEX 1 76.9705955028416 46.687 NULL
SELECT index_type_desc, page_count, avg_page_space_used_in_percent, avg_record_size_in_bytes,forwarded_record_count
FROM sys.dm_db_index_physical_stats(db_id(),object_id('T_IC_AUDIT_LOG'), null, null, 'detailed');
HEAP 244 91.0920681986657 122.869 0
NONCLUSTERED INDEX 128 70.9152829256239 49 NULL
NONCLUSTERED INDEX 1 76.9705955028416 46.687 NULL
BTW, in our product, I have changed the non-clustered index to clustered index.
Here, I re-post my question to get your some help about the SQL deadlocking knowledge.
I will follow your advice and ask some DBA to help us analysis.
Thanks all the same.
Regards
Hong Hai
-
Friday, December 05, 2008 2:36 AMModerator
If you've made the change to a clustered index, you got rid of the information that would have explained the deadlock, unless you are still getting the deadlock, in which case, what is the deadlock graph of the new deadlock? I would be very surprised to see the deadlocks occuring with a clustered index on transaction_id.
So here is my theory of what was happening, and I can't prove it since you changed to a clustered index.
When a table is a heap, there is no logical order to how data is stored. This means that each row is inserted right behind the next in the order recieved in the data pages. Because of this, if you update a record and it requires more space than it was initially allocated, and there isn't available space in the page it sits on, the entire row is moved to the end of the table, and a pointer called a forwarded record is left in its place. When you do a scan on the data, the database engine reads the pages in allocation order, but everytime it finds a forwarded pointer, it has to lock the page, and go to the pointers location, and then share lock that page to read the data. Depending on how much change you generate on a given row, there could be dozens to hundreds of forwarded records, so you end up locking across the pages in the table to get the information. It would be possible for two select queries to cross lock each other on forwarded record lookups in theory, though I haven't ever seen it because I don't have any heaps in my databases.
When you create the clustered index, the forwarded record problem goes away completely, because the data is stored in a sorted order based on the clustered index key. If a row is updated and requires more space and none exists on the page that the row sits on, you get what is called a page split, where 1/2 the data on the page is moved to a new page and the update then succeeds. You can also get page splits when you insert a row that falls within the range of a page that is full. This is how the data is maintained in physical storage, and it is far better than a heap for almost all tables in a online transactional processing database.
If your other tables are heaps, run the query I provided above and change the table name out to see if there are forwarded records. Unless you have a really good reason for not having a clustered index, create one on a column that makes the most sense for there to be a clustered index on. Generally speaking this a column that would be used most often for searching, or a column that uniquely identifies the row and increments in a manner so as to not cause page splitting, (ie, a identity column that is the primary key).
Speaking of primary key's, the fact that you have no clustered index also points to a lack of a primary key on the table in most cases. I say this because by default SQL Server creates a clustered index on the primary key of a table. I am not trying to be difficult on this post, I really want to help you solve this problem as deadlocks are one of my favorite problems to look at and solve. I like complex puzzles and deadlocks can really be the most complex things around at times. If you would like to see a picture that shows what I am describing above for forwarded records, let me know, and I will try my best to create one, but that is a really complex thing to try and show. I wish I knew how to do digital animation and make movies to show this stuff.
If you still have a deadlocking problem after creating the clustered index, please let us know and post the graph in this thread. I would be interested to see what it looks like.
For a good post on Forwarded Records and the problems they can create see:
Kalen Delaney : Geek City: What's Worse Than a Table Scan?
- Proposed As Answer by Deepak RangarajanModerator Tuesday, January 27, 2009 9:03 AM
-
Friday, December 05, 2008 5:08 AM
Thanks.
In our production, I have monitored it about 3days and no deadlock occurs after I have made 2 changes
1) Isolation level: change Serializable to ReadCommitted
2) Index: change non-clustered into clustered index
For your latest reply, It is useful to me.
-
Friday, December 05, 2008 5:29 AMModeratorThat is really good to hear, and what I would have expected after creating the clustered index. I guess somewhere within this thread we crossed communication, because until one of your last posts tonight, I was under the impression that you still hadn't created the clustered index. If you happen to encounter another deadlock on this table, please let us know. I don't expect that you will.

